What is the proportion of scientists that will publish at least one false positive in their lifetime? That was the question. Here's my simulation. You can increase the effect_size to 10 from 2 to see what happens in high power situations.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Our simulated scientist will declare | |
## significance only if he/she gets | |
## 2 replications with p<0.05: | |
stringent<-FALSE | |
## Set the above to FALSE if you want to | |
## have the scientist publish a single | |
## expt. as soon as it's significant: | |
#stringent <- FALSE | |
## num of scientists to simulate: | |
nscientists<-1000 | |
## sample size in each expt: | |
n<-1000 | |
## number of expts run by each scientist: | |
nexp<-200 | |
## prob of sampling from a population | |
## where the null is true: | |
prob<- 0.20 | |
## true mean: | |
## we use something low to reflect realistic | |
## studies done with low power: | |
effect_size <- 10 | |
## standard deviation: | |
s<-50 | |
## the above gives about 14% power: | |
power.t.test(n,delta=effect_size,sd=s) | |
## store proportion of false positives | |
## and true positives | |
## in one lifetime of 200 experiments: | |
prop_tps<-prop_fps<-rep(NA,nscientists) | |
## store count of false positives per scientist: | |
scientist<-rep(NA,nscientists) | |
do_stringent_test<-function(test1,n,mu=0,s){ | |
## | |
if(test1>0.05){ | |
return(test1) | |
} else { | |
## if result significant, do second | |
## test: | |
x<-rnorm(n,mean=mu,sd=s) | |
test2<-t.test(x)$p.value | |
## if both results significant: | |
if(test1<=0.05 && test2<=0.05){ | |
## just choose one of the two p's: | |
return(test1) | |
} else {## declare n.s.: | |
return(0.07) | |
} | |
} | |
} | |
## simulate 1000 scientists, each with | |
## a lifetime plan of doing 200 experiments, | |
## with a 20% chance of sampling from a | |
## distribution where null is true: | |
for(k in 1:nscientists){ | |
## vector for holding results (p-values): | |
tests_null_true<-c(0) | |
tests_null_false<-c(0) | |
for(i in 1:nexp){ | |
## does the sample come from a | |
## distribution with null true? | |
null_true<-rbinom(n=1,size=1,p=prob) | |
if(null_true==1){ | |
x<-rnorm(n,mean=0,sd=s) | |
test1<-t.test(x)$p.value | |
if(stringent==FALSE){ | |
## do test and store result: | |
tests_null_true[i]<-test1 | |
} else { ## if stringent==TRUE | |
result<-do_stringent_test(test1,n,0,s) | |
tests_null_true[i]<-result | |
} | |
} else { | |
## sample from distribution where | |
## null is false: | |
x<-rnorm(n,mean=effect_size,sd=s) | |
test1<-t.test(x)$p.value | |
if(stringent==FALSE){ | |
tests_null_false[i]<-test1 | |
} else { | |
## if stringent TRUE | |
result<-do_stringent_test(test1,n,effect_size,s) | |
tests_null_false[i]<-result | |
} | |
} | |
} | |
## Compute proportion of false positives | |
## for each scientist: | |
## remove NAs: | |
tests_null_true<-tests_null_true[!is.na(tests_null_true)] | |
tests_null_false<-tests_null_false[!is.na(tests_null_false)] | |
## Proportion of all null-true tests that ended up with a Type I error: | |
#mean(tests_null_true<0.05) | |
## number of false positives: | |
num_false_pos<-length(which(tests_null_true<0.05)) | |
## track current scientist's | |
## false positives: | |
scientist[k]<-num_false_pos | |
## proportion of all null-false tests | |
## that correctly detected effect | |
## ("observed" power): | |
#mean(tests_null_false<0.05) | |
## number of "true positives": | |
num_true_pos<-length(which(tests_null_false<0.05)) | |
## prop. of false positive results | |
## published in lifetime for scientist k: | |
prop_fps[k]<-num_false_pos/(num_false_pos+num_true_pos) | |
## prop. of true positives: | |
prop_tps[k]<-num_true_pos/(num_false_pos+num_true_pos) | |
} | |
hist(prop_fps,freq=FALSE,main="Distribution of | |
proportion of false positives") | |
hist(prop_tps,freq=FALSE,main="Distribution of | |
proportion of true positives") | |
summary(prop_fps) | |
summary(prop_tps) | |
## those scientists with one or more FP: | |
hist(scientist) | |
scientist<-scientist[which(scientist>0)] | |
length(scientist)/nscientists | |
Comments and/or corrections are welcome.
4 comments:
I am wondering if there is a simpler calculation. A false positive means you have claimed a difference when the null hypothesis is true. So, power and effect size are irrelevant to the calculation (because they depend on the alternative being true). Therefore, the relevant calculation depends primarily on the alpha criterion.
nScientists = 1000
alpha = 0.025 #two-tailed t-test
nExp=200
P_Null=0.2
#First - What's the no false positive rate for one scientist
nNull=nExp*P_Null
P_NoFalsePositives_Onescientist=(1-alpha)^nNull
#Second - What's the no false positive rate for N scientists
P_NoFalsePositives = P_NoFalsePositives_Onescientist^nScientists
(P_AtleastOneFalsePositive = 1 - P_NoFalsePositives)
"P_AtleastOneFalsePositive" will also be the proportion of scientists who will have at least one false positive in their lifetime. Given your values, it should be ~1.
Have I misunderstood the question?
I just realized that I didn't take into consideration the following condition "We will add the stringent condition that the scientist has to get one replication of a significant effect before they publish it."
One last comment. I tried to imagine the analytic calculation that includes that last condition "We will add the stringent condition that the scientist has to get one replication of a significant effect before they publish it."
Again, I might be missing a point, it seems to me that the constraint is rather vague. What counts as a replication. For example, if a scientist can try a 198 times, and then gets a similar positive finding as the 1st attempt on the 199th retry, does that count as a replication? If that counts as a replication, then really the constraint is not stringent at all. It seems to me that more constraints are needed to implement what you have in mind. Perhaps, the replication counts only if it is in the immediately following experiment (or some such window)?
Thanks for these comments. My stringent condition (sorry for not being clear) that if an experimenter gets a p-value < 0.05, they must repeat the experiment AND get the p-value to be <0.05 once again (successively). This is very hard to get in psycholinguistics, in real life! That's why I called in stringent.
Post a Comment