Some supporters of the null hypothesis significance testing procedure recognize that the logic on which it depends is invalid because it only produces the probability of the data if given the null hypothesis and not the probability of the null hypothesis if given the data (e.g., J. Krueger, 2001). However, the supporters argue that the procedure is good enough because they believe that the probability of the data if given the null hypothesis correlates with the probability of the null hypothesis if given the data. The present authors' main goal was to test the size of the alleged correlation. To date, no other researchers have done so. The present findings indicate that the correlation is unimpressive and fails to provide a compelling justification for computing p values. Furthermore, as the significance rule becomes more stringent (e.g., .01, .001), the correlation decreases.

Keywords: Bayes, p-rep, p value

<aside> 💡 THERE HAS BEEN A GREAT DEAL OF CONTROVERSY about the null hypothesis significance testing procedure (NHSTP) involving a large number of supporters (e.g., Abelson, 1997; Chow, 1998; Hagen, 1997; Mulaik, Raju, & Harshman, 1997) and a large number of detractors (e.g., Bakan, 1966; Cohen, 1994; Rozeboom, 1960; Schmidt, 1996; Schmidt & Hunter, 1997). Stated briefly, NHSTP requires that the researcher propose a null hypothesis and an alternative hypothesis, collect data, and use the data to compute the probability of obtaining a finding as extreme or more extreme than the one actually obtained, given that the null hypothesis is true. If this probability is low (e.g., p < .05), then the researcher rejects the null hypothesis in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

Good definition


<aside> 💡 Possibly, the most compelling argument against NHSTP is that it is logically invalid. Stated simply, the fact that a rare finding, given the null hypothesis, has been obtained does not justify the conclusion that the null hypothesis is likely to be false.


This has been pointed out numerous times (for a review, see Nickerson, 2000), and Trafimow (2003) even provided a quantitative demonstration of the invalidity.

For those unfamiliar with the details of the controversy concerning NHSTP, it may be useful to consider an example.

Suppose that a researcher randomly assigns participants to receive money or not to receive money and then measures how much the participants like the experiment. The experimental hypothesis is that participants will like the experiment better if they get money than if they do not. The null hypothesis is that the money has no effect (or more technically, that the two groups are drawn from the same population).

The researcher performs a statistical significance test and finds that p value is .05, which meets the usual criterion for statistical significance. Can the researcher then conclude that the probability of the null hypothesis being true, given this finding, is .05 (or some other very small number)?

It would be nice if this were so because researchers would then have a good reason to reject the null hypothesis.

<aside> 💡 However, this conclusion is not justified. To see why, it is useful to consider exactly what p is: the probability of the finding (or a more extreme finding) given that the null hypothesis is true. Or, in terms of the present example, it is the probability of obtaining the finding given that the samples of participants who were or were not given money are from the same population.

Good definition of p-value


<aside> 💡 Contrary to common belief, p is not the probability that the samples are from the same population given the finding that was obtained. Or, to state the problem another way, the probability of the finding given the null hypothesis p or p(F|H0) is not the same thing as the probability of the null hypothesis given the finding p(H0|F).


<aside> 💡 Consequently, a low value for p, which is the same thing as p(F|H0), does not allow the researcher to validly conclude that p(H0|F) also has a low value, and therefore the rejection of the null hypothesis cannot be justified on this basis.

BKTK: You cannot infer that a low p-value tells you the probability of the null hypothesis is low given the finding of the study.


<aside> 💡 The only way to traverse the distance from the former value (which can be obtained from an experiment) to the latter (which cannot be obtained from an experiment) is by using Bayes’s theorem.



The details of that theorem are not essential at present, except that its use requires the researcher to have a value for the prior probability of the null hypothesis—a value that is generally unknowable (for a review, see Trafimow, 2006). Many of the researchers who support NHSTP fully understand that it is logically invalid for the reason previously presented but suggest that the logical invalidity need not cause the researcher to completely abandon NHSTP. For example, Krueger (2001) pointed out correctly that NHSTP results in p values or p(F|H0) that do not allow researchers to validly draw conclusions about p(H0|F), which is what researchers really need to know to reject the null hypothesis. But Krueger also suggested correctly that these two values are correlated. Because p(F|H0) is correlated with p(H0|F), and because the data necessary to actually obtain p(H0|F) are generally unobtainable, it may be reasonable for researchers to settle for p(F|H0).

Let us reiterate the correlation argument in the specific context of the example.