Returning to Turing Test Methodology

I‘m a little disappointed in the commentariat for not bitching me out when I showed you the winners and losers in the Christian and Atheist rounds of the Turing Test without ponying up sample size numbers.  This year, each entry had its own survey link, so the number of respondents was not constant across a round.  A fairly predictable trend emerged.

Note: This isn’t the total number of respondents, it’s the N for the number of self-declared atheists judging the first round and self-declared Christians judging the Christian round.  Looking at how well people do at spotting fakes in the group they don’t belong to is interesting, but I exclude those people from the figures that identify the winners.

Unsurprisingly, there’s a big drop off as people got tired of reading and judging entries.  The difference between participation in the Christian and Atheist round is mainly a function of how many bloggers in each category I got to promote the test to their readers.  I’ll probably do this again next year, and I’m wondering if readers have any advice on tweaking the methodology to avoid this precipitous drop in response rates.

Making people answer all the questions at once is a pretty big commitment, so it depresses response rates across the board.  I got lucky last year, in that Andrew Sullivan linked the test, so my click throughs were high enough to be able to bleed off a lot of people and still have good numbers.  I can’t count on huge links like that when I design my methodology.    The big upside of the all-in-one survey is that I can compute each participant’s accuracy and then maybe compare whether converts and deconverts did better than people who have stuck to only one philosophy.

Ideally, I’d be able to randomize the order that people answered in, but that seems pretty impractical.  And that set-up would make it awfully hard to read a couple, vote, and then take a break before you got burned out.  Anything that would let you store answers as you go, and then return to them later would be beyond my webdesign/database skills at present.  (For last year’s all-in-one survey, I recommended people keep a paper and pen by their computer).

Another way to track responses across separate surveys would be to get people to generate userids that wouldn’t give too much data away.  (Think last four digits of phone number followed my MMDD birthday).  Something easy for you to remember and enter on each individual survey, not too likely to be duplicated, and not that dangerous to share with me.

I’d be interested in your thoughts on the tradeoffs of these approaches and advice on tweaks.  Mind you, I’d really love for next year to be the round we try out chat logs instead of essays, so a more radical overhaul may be necessary.

What Are Your Thoughts?leave a comment
  • Pasha

    So the user experience of answering the questions was pretty poor.

    The entry and the voting for it should be on one webpage. Your previous answers should be shown: (1a 2c, etc). Switching between entries should be easy in a tab format, so that you don’t have to click back and try to remember whether you have voted or not. You should only enter your info once.

    The reading was not the tough part, the clicking back and forth was.

    • leahlibresco

      I agree that would be awesome, but it’s a bit beyond my coding. We’ll see where I am next year.

      • Ignatius Theophorus

        I thought that I had offered this elsewhere, but if you want help programming, I am more than willing. I can be reached @sonoftheophorus and

    • Loud

      Divide the articles into groups of ten with five real and five contrived, email each judge (have them register, or something) one group for each subject(each group being vetted by at least three judges), and have them vote on which they think to be true.

  • Slan21

    It’d be cool to have a webpage that display sequentially the entries with the voting at the end, jumping to a random other one not done yet after each one is completed.
    To be able to begin and return to it later, you could store the data with http cookies.
    One year should be fair enough to learn those skills 😉 or else you can ask a fellow reader to help you.

    I haven’t had time to take the test this year, but the previous one wasn’t very user friendly for what i remember.

  • Ted Seeber

    Two suggestions actually, both from my 16 years experience designing GUIs:
    1. Use Survey Monkey to put the category of *common* questions on a different page so they don’t have to be answered many times.
    2. Cut your survey into smaller chunks, and post say half the questions one day, and half the other.
    3. OR, alternatively, randomize the stories people get.

  • deiseach

    Offer prizes? A trail of breadcrumbs?

    Yeah, it’s hard to see what you can do because you’re limited to asking for respondents from the pool of people who read your blog (and in turn urge other people to come along and take the survey).

    You could maybe either cut down the numbers of participants (instead of thirteen each, maybe only six?) so that people might be more inclined to stick with the full course, or cut down the number of questions, again so that people don’t get overwhelmed with making a large number of decisions.

    • deiseach

      Or maybe divide up the questions and answers differently, so instead of each person giving two answers (e.g. true atheist/Christian and fake atheist/Christian answer), pick at random (maybe pull names out of a hat) so that you have six real atheists giving the atheist answers and six Christians giving the fake answers in the atheist round, then a different six atheists giving the fake Christian answers in that round, if you see what I mean?

  • Jubal DiGriz

    A simple way to increase readability for samples is to decrease the maximum word limit. While that would also decrease the interesting content in the sample, it would make it more likely that people will read more samples. A fairly small word limit (such as 600 words) might even have the effect of diversifying the content between samples and exaggerating the individual writing styles, which would further enhance overall readability.