I have been a Bayesian since I first learned Bayes’ theorem when I was a teaching assistant for a Statistics 101 course at UC Berkeley around 25 years ago.

It’s one of those neat ideas that one should know to be able to reason competently about the world we live in. Thanks to the Indian education system, I had not been exposed to any hint of that fascinating rule in my 18 years of attending school in India. Anyway, better late than never.

The late great John McCarthy insisted that “he who refuses to do arithmetic is doomed to talk nonsense.” I think he who is ignorant of Bayes’ rule cannot avoid talking nonsense about probabilistic events. Consider the question posed in the image above.

The hypothetical diagnostic test at hand is not perfect but it appears to be fairly accurate since the error rate is only 5% false positives. (There is no mention of false negatives in the formulation above. So let’s assume that it has zero false negatives.) The question then is “what is the chance that a person who tested positive actually has the disease?”

We are not told if the person has symptoms of the disease or not. We are told that the disease is quite uncommon — only 1 out of 1000 people in the population actually has the disease.

Given that the person tested positive, is it 95% probable that he has the disease? Many would say yes. Bayes would slap them silly and say no. The correct answer is around 2%. Even if the test comes out positive, the actual chances that the person has the disease is very small. If you are the betting kind, bet that he’s disease free.

Bayes’ rule appears a little intimidating when written out as a formula with probabilities. So I will not go into it. Look up the wiki page if you wish. Here I will give the logic of the result using simple arithmetic.

Imagine that the population is only 1000, and the test is administered to everyone. The test will come back positive for 50 people — since the test has a 5% false positive rate. But we know that only 1 out of those 1000 people actually have the disease. That means of the 50 people who tested positive, only 1 person has the disease and the 49 others who also tested positive don’t have the disease. Therefore there is only around a 2% chance that a person who tested positive actually has the disease.

Without the test, we’d estimate the chances of a person having the disease to be 0.01 percent. Doing the test bumps up that probability 20 times — to 2 percent. That is a huge increase but still not all that compelling because of the low prevalence of the disease in the population, combined with a pretty flawed test. A positive test in this case is only an indicator that more investigation is warranted.

It’s worth noting that a test with a 5% false positives would be quite good to have if the actual prevalence of the disease is very high, say, 60 percent. I leave it to you to do the arithmetic.

***

If you wish to learn Bayes’ theorem, Steven Pinker has a lecture on it (the source of the question I discussed) in his “Rationality” series of lectures that I mentioned in a previous post. Check it out for a link. I highly recommend the entire course. You may become more rational, and who knows, you may become a Bayesian.

SRINIVAS KWednesday May 6, 2020 / 9:28 pmForget about understanding Baye’s theorem, most people can not figure out probability of repeated random events which are independent. For example if you have got ten heads in coin toss successively they think the chance of getting tail in the next toss is higher but it is same as 0.5 as each event is independent. On a serious note same mistake prompts many people to try to get a son after getting many daughters in succession.

LikeLiked by 1 person

Engr. RaviSunday May 10, 2020 / 9:52 amI guess you haven’t heard about Nassim Taleb and Fat Tony. In the real world if you get ten heads in coin toss successively it is very likely that the coin is loaded. Only in textbooks, one can simply assume that the coin is fair and each toss is completely independent.

LikeLike

baransam1Wednesday May 6, 2020 / 11:34 pmI have spent 5 minutes with this post now. I still think that since the person has been tested and found positive, his probability of actually having the disease is 95%. The fact that disease is prevalent in 1 among 1000 has nothing to do with the final answer.

I guess Mr Bayes has already slapped me N-times by now. I will keep munching on the problem though, till I see how can the answer be 2% as suggested by Atanu.

LikeLike

baransam1Thursday May 7, 2020 / 12:25 am20 minutes now, and I am still at 95%. Just to ensure I got the question properly, the probability is being asked AFTER we have found out that a person has tested positive. Right?

Assume the test is 100% accurate (no false positive, no false negative). In that case, if a person has a positive result, it means he has 100% probability (i.e. sure) of having the disease. The fact that the disease prevalence is 1 in 100 or 1 in 1000 or 1 in 10-crore is irrelevant.

So why the prevalence becomes a factor for a 95% accurate test?

LikeLike

Alok JainThursday May 7, 2020 / 5:45 amSimply because in rare disease the sample of people not having disease is too large and as a result small amout of false positive will render test statisticts too vague. The basic advance bayes theorem brought in probabilistic approach is taking prior information into account (here prior probability of prevalence of disease) to come to more sensible number in posterior probabilities.

LikeLiked by 1 person

baransam1Thursday May 7, 2020 / 6:32 amI can work out how we can get the 1.96%. Let me call that workout#2. Workout#1 is, of course, the quick reasoning which leads me to 95% answer.

Workout#2:

Since I do not like dealing with fractional human beings, I will assume 1 in 1000 to mean 100 people in 100000(1 lakh) are really positive.

That means 100000 – 100 = 99000 are healthy.

But due to faulty test-kit, 99000*5/100 = 4995 will me marked false positive.

So the total number of marked-positive people = 4995 (false positive) + 100 (real positive).

Probability of a marked-positive to be real-positive = 100/(100 + 4995) = 1.96%.

But honestly, I do not understand the underlying reason why workout#1 or workout#2 is correct? Which one is incorrect and why?

This is not the first time Atanu has done it. He did it to me with ‘I see blue-eyed people’ puzzle as well.

What a smashing blog, this deeshaa is.

LikeLike

Alok JainThursday May 7, 2020 / 6:41 amIts about prior probability (prior knowledge or heuristic) and posterior proabablity(after observation or test).

In case of large sample size and low prior probability bayes gives more sensible outcome. Resources which could have been applied to true positive may wasted on large numbers of false positive.

LikeLiked by 1 person

Ajit R. JadhavThursday May 7, 2020 / 12:58 pmThe terms “prior” and “posterior” are prevalent in the Bayesian circles, but their usual usage can be a bit misleading.

The

prior[and in fact both the marginals] is [are] evaluated in abroadercontext, viz., that of the universal set (applicable in a situation). Theposterioris evaluated in the context of aselected prior.What matters is such logical relationships between sets, i.e., their respective

referentsorscopes. The chronological order doesn’t matter.Simple examples to illustrate the point can be easily constructed (and often happen in real life research). For instance, think of (i) measuring the “rate” (i.e. the probability w.r.t. the

universalset) first, i.e., at a timebeforeever (ii) measuring or separately, and onlyafterwardsin time (iii) arriving at a conclusion of what and are really like—which, in the Bayesian terminology, happen to be your “prior”s.Best,

–Ajit

LikeLike

Ajit R. JadhavThursday May 7, 2020 / 4:29 amAm I a Bayesian? No, I am not. I am neither a Bayesian nor a frequentist, though I do have a definite sense of my grounds. I think my position can best be described as being an abstract frequentist. (Explaining it well will take much more than a comment.)

After my data science studies, I find that Bayes’ is one of those theorems that have been too grossly over-rated. (Other examples: Heisenberg’s principle in QM; cycles-based analyses in economics / stock market predictions…)

Classification problems should be the home-ground for using Bayes’ theorem. Yet, during my recent work on the MNIST problem (as almost every time before), I found that

notbothering with Bayes’ theorem itself was actually being more helpful. The theorem—its formal statement—is actually an hindrance to developing a good understanding of the situation. Even in situation where the theorem is at all applicable, practically speaking.With that said, of course, you still must understand that layer of concepts which lies

underneathBayes’ theorem. This part is absolutely necessary.This underlying layer involves Venn’s diagrams, and understanding the fulcrum on which Bayes’ theorem rests, namely, the fact that the very definition of conditional probability

changesthe sample space being used. … Understand this part well, and you never have to bother with the higher-level theorem itself.The best explanation of Bayes’ theorem, in my opinion, is that by Oscar Bonilla; see here: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/.

… His explanation is great precisely because he focuses on the underlying layer. […Too bad such resources weren’t available when I was in various graduate schools.]

If you will permit me one further point: Bonilla’s explanation subtly uses a certain viewpoint which can only be described as being a

frequentistapproach. (Anabstractversion of the frequentist approach, really speaking.)But then,

anything to do with frequencies (including evenabstractviews of the frequentist arguments) is an anathema to the true Bayesian. And with that kind of a bias, they can only end up being very poor teachers. They do. Typically, the Bayesians are the worst teachers of Bayes’ theorem, in my observation.Typically, academics love deductions, and so, typically, they are Bayesians. Typically, therefore, professors are the worst teachers of Bayes’ theorem.

And as I said, practically speaking, you don’t need the theorem itself in most any practical situation. You need only Venn’s diagrams, set operations, and a careful attention to which set is being used as the sample space to define which probability. That’s all. Do this part well, and you will likely never go wrong.

Best,

–Ajit

LikeLike