When R. A. Fisher put down the phrase “p=0.05, or 1 in 20†in his famous book, Statistical Methods for Research Workers, very likely with a cup of tea besides his elbow, he might not realize that this p value has become so magic that it is as influential as the e in mathematics and engineer.
However, Fisher himself didn’t stick to that number. To him, p value sort of measures the evidence against a hypothesis. For example the usual p value for type 1 error is used to against the null hypothesis. He was very liberal in interpreting the p values. He sometimes treated p=0.08 as significant and sometimes he did not. On the other hand, Pearson and Neyman thought that we should use a fixed cutpoint for statistical tests, probably for the sake of simplicity.
So far, Fisher’s practice has been viewed as unprincipled. People just love standard, cutpoint, and simplicity. We always say “a conservative estimate is ……†and sometimes we set the cutpoint at p=0.01 to be more conservative.
This notion, however, had been challenged recently. People gradually realized that a fixed cutpoint couldn’t tell a whole story, let alone the true story. Take the relationship between low fat diet and invasive breast cancer in the Women’s Health Initiative study. The paper reported a p value of 0.09 and concluded that it was not significant, which was translated into media as “no relationship, nothing†etc. Skeptical researchers pointed out that p=0.09 was not that bad, and if the study were followed for a longer period, it was possible that this p would be smaller than 0.05, as suggested by the shrinking trend of the confidence interval for the hazard ratio in the plot.

Nevertheless, people always jump to the conclusion whenever they see a p value greater than 0.05. They will write stuff like “the insignificant effect of A suggests that there is no evidence supporting the hypothesis that…..†Even if we accept the rule of thumb about p=0.05, the above quote is not true. The p=0.05 only suggests that rejecting the null hypothesis and accepting the alternative hypothesis will error at a rate of 1 in 20. Thus, it is a good bet to say that the alternative hypothesis is wrong, or there is not enough evidence to lead us to accept the alternative hypothesis. But “not enough evidence†doesn’t mean “no evidenceâ€. This is exactly what Douglas G Altman and J Martin Bland from England in 1995 wrote in a BMJ statistical note called “absence of evidence is not evidence of absence.â€
Given the above philosophical discussion, we should be cautious about the negative findings. However, to efficiently and also correctly refute a negative finding demands more thinking and a thorough understanding of study design. Here is an example.
Last week the New England Journal of Medicine published a study which examined what factors affected people receiving recommended health care. The Rand researchers concluded that it was only income that mattered. Poor people received much less recommend health care than those rich, while blacks, Hispanics, and other racial groups had similar rates of health care. The latter stirred up a hot debate among researchers working on race issues. They accused the Rand researchers and media for over-interpreting the results (in fact, the Rand researchers also got puzzled about the findings about race). People pointed out that this study was a complex survey and there were a lot of participants who refused to give there written permissions for extracting their medical charts, and minority groups were severely underrepresented.
In general I agree with those accusations. With only a handful minority people in the study, and more than half participants being excluded in the final analysis, it is almost certain they would not reach any conclusion.
I have further concerns about the way they did the analyses. In their analysis, they put income, race, and other variables in the same model (mutually adjusted). I think this is inappropriate. In the causal pathway, minority groups are likely to be poor and live in poor areas. It is true that poverty will cause poor quality of care, but if the race is adjusted for income, what is the meaning of “race†here? Does race adjusted for income measure something like “discrimination perceived by minority people?â€
Therefore, the income adjusted race does not answer the question about “being a black,†instead, it refers to a subset of this broad “race” concept. In my opinion, the income should not be in the model when race is in it, or we can use a more sophisticated method like path analysis or simultaneous equations in which several related regressions are estimated together.
A positive finding is not necessary true, as illustrated in contradicting conclusions about hormone therapy and breast cancer between the Women’s Health Study and other observational studies. However, a negative finding is not necessary true either. There are too many dark secrets behind the scene.
Agree with you on this. It can take a very long time to prove everything.
Comment by Web Marketing Mentor — November 18, 2006 @ 1:30 am