Friday, June 14, 2019

Why Many Scientific Articles are Wrong

 Suppose you are a professional academic who wants to publish a journal article in order to improve your chance of getting an offer, getting tenure, getting a raise. One way to do so is to produce and write up research that provides support for a novel theory. One problem is that, if the theory is true, it is quite likely that someone else in your field, over the past century or so, has already discovered it and published it, making your result not novel, hence likely to be rejected by the journal you submit it to.

If, on the other hand, your theory is false, the odds are much better that nobody else will have come up with it, found evidence to support it, and published. So if you can produce what looks like good evidence for a false theory, the odds that it will be novel, hence publishable, hence will contribute to your career, is much higher than for a true theory.

How do you produce evidence good enough to be publishable for a result that is not true? 

One solution is a specification search, aka p-hacking. Your theory is that eating onions reduces the risk of Alzheimers disease. To test it, you find a sample of old people who have been tested for symptoms of cognitive decline and survey them on their dietary habits. As a first crude test, you run a regression with degree of cognitive decline as the dependent variable, estimated previous onion consumption as the independent variable.

Unfortunately, that doesn't work—there is no significant correlation between the two. You rerun the regression, this time doing it separately for men and women. Then separately by race. Then by race and gender. Then limited to people over 80. Then to people over 90. Then making your independent variable not estimated onion consumption but whether they report eating onions frequently, occasionally, or not at all. Then do that version for all your racial and gender categories. Then ...

When you are done, you have run a hundred different regressions, testing different variants of the theory that onions are protective against Alzheimers. You are gratified to discover that three of them are significant at the .05 level, with the right sign. You pick the best one and publish it: "Cognitive Effect of Self-Reported Onion Consumption on Elderly Afro-American Women."

The fact that a regression result is significant at the .05 level means that, if your theory is not true, the chance that the evidence in favor of it will, by pure chance, be as good as what you found is only .05. It follows that, if your theory is false, a hundred separate experiments can be expected to produce about five that support it at the .05 level.

In this version of the story, the researcher is deliberately trying multiple experiments and only reporting the results that support his theory. The same effect could occur via multiple experiments by multiple researchers. If a hundred different researchers produce one experiment each, all for false theories, about five will show evidence for the theory at the .05 level. 

And get published.