Play Along with Rationality Camp at Home!

While you’re all missing me, there are three games you might want to try that I’ve been playing at rationality camp.

The first two games are related.  The first one is called the Calibration Game and the second is called the Updating Game.  (Note: both those links start downloads of zip files).  Both are trivia games, but, although you’re trying to get questions right, the focus is less on how knowledgeable you are and more on how good you are at gauging your own uncertainty.

The Calibration Game asks you question with two possible answers and asks you how sure you are that you’re right.  You gain or lose points depending on whether you’re right or wrong, and the change in your points is proportional to how certain you rated yourself.  While you’re playing, you can see a graphical representation of your own accuracy about yourself.

If I were well calibrated, you would expect that when I’m 80% confident, I’d be right about 80% of the time.  The bars should all hit a y=x line.  That’s obviously still not the case (though, I will say in my own defense, my 70 and 80% bars are especially terrible because I’m very rarely 70-80 percent confident for any of these prompts, so any errors throw me way off).

The Updating Game also tries to make you aware of how badly calibrated you are, but using a different approach.  Instead of picking from two answers, you get posed an open-ended question (i.e. What year did The Muppet Show premiere?) and you give an upper and lower bound for a 95% confidence interval.  The game keeps track of how often the right answer falls into your specified range (it should be 95% of the time!).

The last game is the one I invented for LessWrong’s Be Specific prompt.  I called it the Monday-Tuesday game.

On Monday, your proposition is true. On Tuesday, your proposition is false. Tell me a story about each of the days so I can see how they are different. Don’t just list the differences (because you’re already not doing that well). Start with “I wake up” so you start concrete and move on in that vein, naming the parts of your day that are identical as well as those that are different.

In CFAR’s implementation we paired off, picked two prompts from a hat, and tried to differentiate between two worlds, one in which a proposition was true and one in which it was false.  The prompts varied a lot in how specific/quantifiable/etc they were, and my partner and I both pulled this one for our exercise:

On Monday, suffering builds character. On Tuesday, it doesn’t. Describe how Monday and Tuesday are different.

My partner suggested picking out two thousand people and randomizing half of them to torture horribly.  Then he planned to give every subject one of the kinds of honesty tests you might find discussed in Bruce Schneier’s book Liars and Outliers.  Everyone takes an exam and half of each group gets to grade their own papers.  You compare the average score of the self-graders to the average score of the non-self-graders in each group to see how much they lied about their scores.  In one world he’d expect that the people who were tortured were more honest than the controls, and in the other, he’s expect they did not differ significantly or that the torture victims were less honest.

I said that I’d expect that, in the world where suffering built character, that the people I most admired and wanted to model myself after would have suffered more relative to the general population.  I thought about saying that, in the pro-suffering world, that people would use a lot of positive language to talk about suffering (more specifically that it would sound similar to the way non-athletes talk about exercise), but I wasn’t confident enough that this wouldn’t happen in the Tuesday world, too (because people have Panglossian tendencies) to use it as a criterion.

We ended up with pretty different approaches, since my partner was trying to construct the best experiment he could, practicality and ethics be damned, while I was trying to think, in the course of a normal day, what could I notice that would vary across worlds.  I think both approaches can be helpful, especially since we don’t usually carry out experiments for all our beliefs, even if they’d help us be more accurate.

Do folks want to play along with that prompt in the comments?  I guess you can also propose other True-on-Monday/False-on-Tuesday ideas and suggest ways that the two worlds differ.  The funniest example from class was for “The Sun goes around the Earth on Tuesday” and the student said that, if all laws of physics weren’t believed to be different, he’d expect he’d find out that scientists were all in a conspiracy that he could infiltrate to perpetuate a big lie about how physics works.

About Leah Libresco

Leah Anthony Libresco graduated from Yale in 2011. She works as a statistician for a school in Washington D.C. by day, and by night writes for Patheos about theology, philosophy, and math at www.patheos.com/blogs/unequallyyoked. She was received into the Catholic Church in November 2012."

  • http://www.mccaughan.org.uk/g/ g

    I’m not convinced by your proposed way of operationalizing “suffering builds character”, because (1) character (or things correlated with it) might lead to suffering — consider the notion of martyrdom — and (2) you might think you’re admiring character when you’re actually also impressed by suffering. Oh, and (3) if those people are allowed to include ones you’ve merely heard of, the famousness of a person might be affected by what they’ve suffered.

    • B. R. Lind

      POETRY NINJA!

      We shouldn’t worship suffering: the world’s
      a spinning rack where suffering indicates
      all goes well we’re alive and not curled
      up in the black hushhush death dictates
      as its first condition: no screaming there

      We crown ourselves with thorns of past
      transgressions Sharp spears of deed spare
      no rib of pain: around the cross crashed
      common lightning usual blood Who earns
      our reverence should break both cross and crutch
      in the face of suffering: while the rack turns

      and tightens they’ll smile at the sense of touch
      Suffering’s too common to be worth
      anything joy too rare to be priced
      The saints we search for will embrace the earth:
      what wild-eyed murderer suffers less than Christ?

      by Peter Meinke
      Source: http://www.poets.org/viewmedia.php/prmMID/19098

      Sorry if this is off-topic. I just love this poem, and your comment (“you might think you’re admiring character when you’re actually also impressed by suffering”) reminded me of it. And hey, it alludes to the Passion, so it’s relevant to the blog, right?

  • Matthew

    Those games are really addicting. I’ll have a go at your prompt. I’m going to assume that you mean suffering broadly and not narrowly (i.e, not suffering in pursuit of a worthy cause). On Monday, crime is low in economically depressed areas; on Tuesday it’s high (or at least not low). I was tempted to predict that social mobility would be greater in a world where suffering builds character but I’m not all that sure character correlates with worldly success. But it surely correlates with a low propensity to commit crime. The poor suffer more. If suffering, broadly, built character, the poor would be less likely to commit crimes.

    • evetushnet

      Doesn’t this assume that the category “things which are illegal” maps VERY well onto the category “things which are really wrong to do”? If you’re in a society which has poor and rich in the first place, I am guessing you’re also in a society in which the rich can both a) influence which activities are considered criminal and b) hide their crimes better.

      • deiseach

        Agreed. Over here, we have to pay for television licences. Long story short, it’s a way for the national public television and radio service to be financed (though unlike the BBC in Britain, they are also permitted to run ads for the revenue) and it’s a hangover from the old days when anyone in possession of “any apparatus for wireless telegraphy” had to have a licence for same (yes, from 1926 up until 1972, if you had a radio in your house, you needed a licence).

        There have been decades worth of ads trying to tar people who don’t have a licence as “spongers” and criminals who are somehow defrauding the rest of us. Most people pay for their licence because otherwise they’ll end up in court and be fined, not because they think “Oh, no! What an awful crime!”

        Yes, they have inspectors going round to houses to check if people have licences. In our neighbouring island, they have detector vans (public opinion is divided as to whether these actually work as they are alleged to do – that is, they can detect the signal of a tv set and so know you have a television set in your house – or are just part of an elaborate PR campaign to psychologically harass people into buying a licence by fear of “Big Brother is out on the streets and he knows if you’ve got a tv”).

        Me personally? Yes, I pay, but if I saw in the local paper that a neighbour was up in court for not paying, my reaction would not be “Justice never sleeps and crime has been rightfully punished!” It might be illegal but I’m not convinced it’s immoral (unlike, say, members of the boards of directors of banks which were bailed out with taxpayers’ money going to court to force payment of the bonuses and ‘golden handshakes’ written into their contracts).

  • jenesaispas

    Glad you’re having a good time.
    I tried the calibration game but it wasn’t really working out very well for me because I’m not American. Oh well…:)

    • deiseach

      That’s how I soothed my vanity – if the majority of the questions hadn’t been on American topics (how the heck do I know which Vice President came first or who won the Superbowl?), I iwould have scored much, much higher :-)

      • deiseach

        Sample of questions I could have scored really highly on for the Calibration Game:

        Who won the Munster Hurling Final in 2011?
        How do you bucket feed a sucking calf?
        Hay or straw? (Identify picture)
        Oats, wheat, or barley? (Picture question)
        Head of Irish state? (Trick question – we all know it’s Angela Merkel!)

  • http://last-conformer.net/ Gilbert

    I think they implemented your exercise badly. They are making you think about specific predictions, but in this case it’s actually the hypothesis itself that needs to get a lot more specific.

    Some type and intensity of suffering builds some aspects of character. And I think it’s more likely to show up by the test you’re thinking about.

    But basically you and your partner are both testing different hypotheses (yours likely true, his almost certainly false) both of which sound like reasonable interpretations of “the” hypothesis you are supposed to examine. And that in an exercise on getting specific.

  • http://thelostcoin.org Marc

    Updating Game does not open in Mac OS.

  • http://last-conformer.net/ Gilbert

    Your Bayes score is NaN.

    You are underconfident. You should make your intervals Infinity times more narrow. If you did that your Bayes score would be NAN points higher. (Assigning NaN times as much probability to the data.)

    So if I should make my intervals Infinity times more narrow that must mean I always hit the target exactly. Sorry folks, the singularity is over and I won.

    (But more fundamentally, who came up with the $%&§$Q! idea to assume all my estimated probability distributions would be normal and base the scores on that? This is bunk even if one buys the technical explanation of technical explanation stuff.)

    • http://moralmindfield.wordpress.com Brian Green

      Now that you’ve won the singularity can you share it with the rest of us? I want a piece of it to stick on my wall.

  • PJ Jedlovec

    I think it would be interesting to see data on how different people do on the Calibration game to see if certain types of people tend to be overestimate their certainty or underestimate their certainty. Nonetheless, I think it’s a great game to play to keep ourselves honest about how much we really know or don’t know!

  • Pingback: Markets in Everything!

  • Pingback: Baseball, Fusion and the Mushroom Cloud

  • http://moralmindfield.wordpress.com Brian Green

    Thanks for sharing the games, Leah. The calibration game is a lot of fun, but I cannot get the updating game to give me results that I can understand. Something about infinitely wider intervals. I suppose that would tend to work…


CLOSE | X

HIDE | X