Earlier today, in my final Sondheim post, I was writing in praise of the grotesque. This post is on the same topic, but I’m code-switching a bit and doing it a more analytic, not aesthetic framework. Over at LessWrong, there’s a great illustration of how positive bias works:
I am teaching a class, and I write upon the blackboard three numbers: 2-4-6. “I am thinking of a rule,” I say, “which governs sequences of three numbers. The sequence 2-4-6, as it so happens, obeys this rule. Each of you will find, on your desk, a pile of index cards. Write down a sequence of three numbers on a card, and I’ll mark it “Yes” for fits the rule, or “No” for not fitting the rule. Then you can write down another set of three numbers and ask whether it fits again, and so on. When you’re confident that you know the rule, write down the rule on a card. You can test as many triplets as you like.”
Here’s the record of one student’s guesses:
4, 6, 2 No
4, 6, 8 Yes
10, 12, 14 Yes
At this point the student wrote down his guess at the rule. What do you think the rule is? Would you have wanted to test another triplet, and if so, what would it be? Take a moment to think before continuing.
Think of your answer before you pop over and read the whole thing. I’ll wait.
— — —
It’s important to practice thinking of negative results that your model predicts (and then coming up with a way to test them). I think there’s another interesting class of results to try to check: what’s the negative result that comes closest to passing your model’s test?
It’s a little less useful in the example given in Eliezer’s post, but once you’re getting into fuzzier philosophical arguments, I think it could be helpful. What might pass a good-enough diagnostic, but doesn’t actually belong in the category? If I’m trying to explain the characteristics of a good relationship, what kind of relationship would take the most work to differentiate from one that met the criteria?
It can be interesting when a small change pushes a hypothetical clearly into one category or the other, but I’m most energized when I’m pushing a thought-experiment around what I guess is the uncanny valley of concepts. This is what I’m talking about when I say the love story in Passion is grotesque; it’s very close to the thing we’re looking for, but the divergences are jarring.
Once you’re playing around with tension and paradox, you may realize the fault lay not in your test cases, but in your partitioning of conceptspace. If the grotesque example really does belong to your rule, maybe you’ve been ignoring some of the dangers of the ideal you were interested in. If you can figure out what feature bars it from membership in the set, you can look back at some ideas you’ve previously coded as positive results and make sure they didn’t sneak this quality through in some milder form.
And, for an added benefit: this skill also ends up being a drill in modeling other minds. If you can figure out where an ambiguity or obscurity in your model might lead someone into an honest mistake, maybe that practice will help you defuse your combat reflexes in an argument when you’re about to leap to accusing your interlocutor of willfully misinterpreting or ignoring something they should know.