Following a discussion on Gelman's blog, I was playing around with simulating scientists looking for significant effects. Suppose each of 1000 scientists run 200 experiments in their lifetime, and suppose that 20% of the experiments are such that the null is true. Assume a low power experiment (standard in psycholinguistics; eyetracking studies even in journals like JML can easily have something like 20 subjects). E.g., with a sample size of 1000, delta of 2, and sd of 50, we have power around 15%. We will add the stringent condition that the scientist has to get one replication of a significant effect before they publish it.

What is the proportion of scientists that will publish at least one false positive in their lifetime? That was the question. Here's my simulation. You can increase the effect_size to 10 from 2 to see what happens in high power situations.

Comments and/or corrections are welcome.

## 4 comments:

I am wondering if there is a simpler calculation. A false positive means you have claimed a difference when the null hypothesis is true. So, power and effect size are irrelevant to the calculation (because they depend on the alternative being true). Therefore, the relevant calculation depends primarily on the alpha criterion.

nScientists = 1000

alpha = 0.025 #two-tailed t-test

nExp=200

P_Null=0.2

#First - What's the no false positive rate for one scientist

nNull=nExp*P_Null

P_NoFalsePositives_Onescientist=(1-alpha)^nNull

#Second - What's the no false positive rate for N scientists

P_NoFalsePositives = P_NoFalsePositives_Onescientist^nScientists

(P_AtleastOneFalsePositive = 1 - P_NoFalsePositives)

"P_AtleastOneFalsePositive" will also be the proportion of scientists who will have at least one false positive in their lifetime. Given your values, it should be ~1.

Have I misunderstood the question?

I just realized that I didn't take into consideration the following condition "We will add the stringent condition that the scientist has to get one replication of a significant effect before they publish it."

One last comment. I tried to imagine the analytic calculation that includes that last condition "We will add the stringent condition that the scientist has to get one replication of a significant effect before they publish it."

Again, I might be missing a point, it seems to me that the constraint is rather vague. What counts as a replication. For example, if a scientist can try a 198 times, and then gets a similar positive finding as the 1st attempt on the 199th retry, does that count as a replication? If that counts as a replication, then really the constraint is not stringent at all. It seems to me that more constraints are needed to implement what you have in mind. Perhaps, the replication counts only if it is in the immediately following experiment (or some such window)?

Thanks for these comments. My stringent condition (sorry for not being clear) that if an experimenter gets a p-value < 0.05, they must repeat the experiment AND get the p-value to be <0.05 once again (successively). This is very hard to get in psycholinguistics, in real life! That's why I called in stringent.

Post a Comment