Exercise Solution 4.6

With hypothesis testing, we would like to ask the question “how probable is it that  is true given the data {x[1]x[2], … , x[m]} we obtained?” Instead, we ask “how probable would it have been for us to obtain the data {x[1]x[2], … , x[m]}, assuming  is true?” These are very different questions. The answer to the first may depend on a variety of factors that may or may not be known. The latter can be answered based only on the probability distribution of X, which will typically be fully specified given an assumption that  is true.

Consider an example. A coin is randomly drawn from a bag that is known to contain nine fair coins and one non-fair coin, for which the probability of obtaining heads on a single toss is 0.52. The coin is tossed 100 times, and the proportion of heads is noted. The null hypothesis  is that it was the non-fair coin we drew from the bag and tossed.

The question “what is the probability that the coin was non-fair given the proportion of heads we obtained?” can be answered only by considering both the proportion of heads obtained and the fact that the coin was drawn from a bag that contained nine fair coins and one non-fair coin. The latter factor will dominate the answer, which will be close to 0.10 irrespective of the proportion of heads obtained on the 100 coin tosses. This is because 100 is such a small sample size. Only if the coin were tossed many thousands of time would the proportion of heads obtained significantly contribute to the answer for the first question.

On the other hand, the question “what is the probability of obtaining the proportion of heads that we did, assuming we were tossing the non-fair coin” can be calculated directly from the B(100,0.52) distribution. The fact that the coin was drawn from a bag known to contain nine fair coins and one non-fair coin is irrelevant because the question assumes that the non-fair coin was drawn.