You’ve just been to the doctor, and she has some bad news. There’s a deadly new disease sweeping the population – one which strikes 1 out of 100 people and invariably kills everyone who catches it. Medical science has developed a test that is 95% accurate during the incubation period: that is, when given to someone who has the disease, it correctly returns a positive result 95% of the time, and when given to someone who does not have the disease, it correctly returns a negative result 95% of the time. Your test results have just come back positive. Are you doomed?
Before I say anything more, a simple exercise: Based on the facts I just gave, estimate the probability that you actually have the disease. A detailed calculation isn’t necessary – just write down what you think the general neighborhood of the number is.
So, a 95% accurate test has indicated that you have this 1-in-100, invariably fatal disease. Should you feel terrified? Should you feel despair? Should you call your lawyer and start making out your will?
Nope. In fact, you should be optimistic. The probability that you actually have the disease is only about 16%.
Did you write down 95%, or something in that range? If so, you’re probably not alone. That’s the common answer. But without an education in statistics, human beings are not very good at calculating conditional probabilities off the cuff. To see where your unexpected reprieve came from, we have to consider a famous statistical principle called Bayes’ rule.
Named for its discoverer, the 18th-century mathematician Thomas Bayes (also a Christian minister, ironically), Bayes’ rule is a theorem for calculating conditional probabilities. In other words, if you have two possible events, A and B, each with its own independent probability, and you know the probability of B given A, Bayes’ rule is the formula for determining the probability of A given B.
This may sound abstract, so let’s frame it in terms of concrete events rather than variables. Let’s say that A is the probability of catching the deadly disease I mentioned earlier. That’s 1 out of 100, or in other words, 1%. We can write that as 0.01 for convenience.
Let’s also say that B is the probability that your test results came back positive. This is a little more complicated to calculate, but we’ll come back to it. We know the probability of B given A – given that you actually have the disease, the probability of a positive test is 95%, or 0.95. What you really want to know is the probability of A given B – the probability that you actually have the disease, given a positive test result.
Here’s what Bayes’ rule says (read “|” as “given”):
P(A|B) = ( P(B|A) * P(A) ) / P(B)
Let’s fill this out with some numbers. P(A), the probability of having the disease, is 0.01. P(B|A), the probability of a positive test result given that you have the disease, is 0.95. That gives us:
P(A|B) = ( 0.95 * 0.01 ) / P(B)
What we need to know is P(B), the overall probability of a positive test result. To figure this out, let’s break it down into cases.
There are two cases to consider: the probability that you have a positive test result if you’re one of the people who has the disease, plus the probability that you have a positive test result if you’re not one of the people who has the disease. Here are those two cases:
0.01 * 0.95 = 0.0095
99% of people do not have the disease, and 5% of those will test positive. That gives us:
0.99 * 0.05 = 0.0495
Adding up these terms, we get an overall P(A) of:
0.0095 + 0.0495 = 0.059, or 5.9%
Now, put that term into our equation:
P(A|B) = ( 0.95 * 0.01 ) / P(B)
P(A|B) = ( 0.95 * 0.01 ) / 0.059
P(A|B) = ( 0.0095 ) / 0.059
P(A|B) = 0.161, or about 16%
It seems like mathematical sleight of hand, but it’s not. A more intuitive way to explain this result is this: the test is highly accurate, but the disease is rare. Therefore, the vast majority of people who are tested won’t actually have it – and the number of false positives from that group, though small compared to the size of that group, is larger than the relatively small number of people who actually have the disease and correctly test positive.
Bayesian reasoning turns up in critical thinking contexts as well. One of the best examples is in criminal trials, where both sides often make claims about the odds of a particular outcome occurring given the defendant’s guilt or innocence. For example, let’s assume that DNA is found at a crime scene and the police cross-check it against a database, and a match is found. The odds that the crime scene DNA would match a randomly selected DNA segment are 1 in 10 million. The police therefore arrest the person whose DNA matches and haul him into court. But the odds of the suspect’s innocence are not 1 in 10 million, even though those are the odds of a match. In fact, in a country the size of America, with about 300 million people, we would expect about 30 people in the populace to match. Thus, the chances that the specific person being accused is actually the guilty party are only 1 in 30, or about 3%! (This is sometimes called the prosecutor’s fallacy. See also here and here.)
The lesson to be learned here is that we should never let single cases or anecdotes guide our decisions. Given a sufficiently large pool of chance events, even very unlikely things are bound to happen on occasion. Bayes’ rule gives us the tools to see those occurrences in context. On the other hand, people who pay attention only to unique and striking events, while disregarding the background they come from, are almost certain to reach incorrect conclusions.
Other posts in this series: