I have been a Bayesian since I first learned Bayes’ theorem when I was a teaching assistant for a Statistics 101 course at UC Berkeley around 25 years ago.

It’s one of those neat ideas that one should know to be able to reason competently about the world we live in. Thanks to the Indian education system, I had not been exposed to any hint of that fascinating rule in my 18 years of attending school in India. Anyway, better late than never.

The late great John McCarthy insisted that “he who refuses to do arithmetic is doomed to talk nonsense.” I think he who is ignorant of Bayes’ rule cannot avoid talking nonsense about probabilistic events. Consider the question posed in the image above.

The hypothetical diagnostic test at hand is not perfect but it appears to be fairly accurate since the error rate is only 5% false positives. (There is no mention of false negatives in the formulation above. So let’s assume that it has zero false negatives.) The question then is “what is the chance that a person who tested positive actually has the disease?”

We are not told if the person has symptoms of the disease or not. We are told that the disease is quite uncommon — only 1 out of 1000 people in the population actually has the disease.

Given that the person tested positive, is it 95% probable that he has the disease? Many would say yes. Bayes would slap them silly and say no. The correct answer is around 2%. Even if the test comes out positive, the actual chances that the person has the disease is very small. If you are the betting kind, bet that he’s disease free.

Bayes’ rule appears a little intimidating when written out as a formula with probabilities. So I will not go into it. Look up the wiki page if you wish. Here I will give the logic of the result using simple arithmetic.

Imagine that the population is only 1000, and the test is administered to everyone. The test will come back positive for 50 people — since the test has a 5% false positive rate. But we know that only 1 out of those 1000 people actually have the disease. That means of the 50 people who tested positive, only 1 person has the disease and the 49 others who also tested positive don’t have the disease. Therefore there is only around a 2% chance that a person who tested positive actually has the disease.

Without the test, we’d estimate the chances of a person having the disease to be 0.01 percent. Doing the test bumps up that probability 20 times — to 2 percent. That is a huge increase but still not all that compelling because of the low prevalence of the disease in the population, combined with a pretty flawed test. A positive test in this case is only an indicator that more investigation is warranted.

It’s worth noting that a test with a 5% false positives would be quite good to have if the actual prevalence of the disease is very high, say, 60 percent. I leave it to you to do the arithmetic.

***

If you wish to learn Bayes’ theorem, Steven Pinker has a lecture on it (the source of the question I discussed) in his “Rationality” series of lectures that I mentioned in a previous post. Check it out for a link. I highly recommend the entire course. You may become more rational, and who knows, you may become a Bayesian.

Forget about understanding Baye’s theorem, most people can not figure out probability of repeated random events which are independent. For example if you have got ten heads in coin toss successively they think the chance of getting tail in the next toss is higher but it is same as 0.5 as each event is independent. On a serious note same mistake prompts many people to try to get a son after getting many daughters in succession.

LikeLiked by 1 person

I guess you haven’t heard about Nassim Taleb and Fat Tony. In the real world if you get ten heads in coin toss successively it is very likely that the coin is loaded. Only in textbooks, one can simply assume that the coin is fair and each toss is completely independent.

LikeLike

I have spent 5 minutes with this post now. I still think that since the person has been tested and found positive, his probability of actually having the disease is 95%. The fact that disease is prevalent in 1 among 1000 has nothing to do with the final answer.

I guess Mr Bayes has already slapped me N-times by now. I will keep munching on the problem though, till I see how can the answer be 2% as suggested by Atanu.

LikeLike

20 minutes now, and I am still at 95%. Just to ensure I got the question properly, the probability is being asked AFTER we have found out that a person has tested positive. Right?

Assume the test is 100% accurate (no false positive, no false negative). In that case, if a person has a positive result, it means he has 100% probability (i.e. sure) of having the disease. The fact that the disease prevalence is 1 in 100 or 1 in 1000 or 1 in 10-crore is irrelevant.

So why the prevalence becomes a factor for a 95% accurate test?

LikeLike

Simply because in rare disease the sample of people not having disease is too large and as a result small amout of false positive will render test statisticts too vague. The basic advance bayes theorem brought in probabilistic approach is taking prior information into account (here prior probability of prevalence of disease) to come to more sensible number in posterior probabilities.

LikeLiked by 1 person

I can work out how we can get the 1.96%. Let me call that workout#2. Workout#1 is, of course, the quick reasoning which leads me to 95% answer.

Workout#2:

Since I do not like dealing with fractional human beings, I will assume 1 in 1000 to mean 100 people in 100000(1 lakh) are really positive.

That means 100000 – 100 = 99000 are healthy.

But due to faulty test-kit, 99000*5/100 = 4995 will me marked false positive.

So the total number of marked-positive people = 4995 (false positive) + 100 (real positive).

Probability of a marked-positive to be real-positive = 100/(100 + 4995) = 1.96%.

But honestly, I do not understand the underlying reason why workout#1 or workout#2 is correct? Which one is incorrect and why?

This is not the first time Atanu has done it. He did it to me with ‘I see blue-eyed people’ puzzle as well.

What a smashing blog, this deeshaa is.

LikeLike

Its about prior probability (prior knowledge or heuristic) and posterior proabablity(after observation or test).

In case of large sample size and low prior probability bayes gives more sensible outcome. Resources which could have been applied to true positive may wasted on large numbers of false positive.

LikeLiked by 1 person