## Friday, June 01, 2018

I welcome comments and criticism on the following paper:

Title: The statistical significance filter leads to overoptimistic expectations of replicability
Authors: Vasishth, Mertzen, Jäger, Gelman

Abstract: It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overop- timistic expectations of replicability. These overoptimistic expectations arise due to Type M(agnitude) error: when underpowered studies yield significant results, effect size estimates are guaranteed to be exaggerated and noisy. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving best practices.

## Friday, April 13, 2018

### The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

# Introduction

[Thanks to Reinhold Kliegl and Daniel Schad for useful discussions.]

First, I summarize some known relationships in statistics. We will be using these below.

First result

The variance of the sum of two random variables is:

$$$Var(X_1+X_2)=Var(X_1) + Var(X_2) + 2\times Cov(X_1, X2)$$$

Second result

The variance of the difference of two random variables is:

$$$Var(X_1-X_2)=Var(X_1) + Var(X_2) - 2\times Cov(X_1, X2)$$$

Third result

The covariance of the sum of two random variables with some coefficients a,b,c,d is:

$$$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$$$

For a proof, see Rice, p 138.

# 2x2 repeated measures design

Assume that we have four random variables $X_1,\dots,X_4$, each has standard deviation $\sigma$ and pairwise correlations are $\rho$.

Suppose have the following contrast coding:

cntrsts<-matrix(c(1,1,-1,-1, ## main effect
1,-1,1,-1, ## main effect
1,-1,-1,1), ## interaction
ncol=4,byrow = TRUE)
rownames(cntrsts)<-c("ME1","ME2","Int")
cntrsts
##     [,1] [,2] [,3] [,4]
## ME1    1    1   -1   -1
## ME2    1   -1    1   -1
## Int    1   -1   -1    1

Notice that a main effect like the first one above:

cntrsts[1,]
## [1]  1  1 -1 -1

implies the comparison $(X_1 + X_2) - (X_3 + X_4)$. If the contrast matrix was scaled to 1/2, we would have had:

cntrsts/2
##     [,1] [,2] [,3] [,4]
## ME1  0.5  0.5 -0.5 -0.5
## ME2  0.5 -0.5  0.5 -0.5
## Int  0.5 -0.5 -0.5  0.5

and the comparison would be: $(X_1 + X_2)/2 - (X_3 + X_4)/2$.

The second main effect:

cntrsts[2,]
## [1]  1 -1  1 -1

implies the comparison: $(X_1 + X_3) - (X_2 + X_4)$.

Similarly, the interaction

cntrsts[3,]
## [1]  1 -1 -1  1

implies a comparison like $(X_1 - X_2)- (X_3 - X_4)$. Opening out the brackets and rearranging, this is: $(X_1 + X_4)-(X_2 + X_3)$. Thus, rearranging terms, we can write the interaction as a difference of sums, just like a main effect. So, we can see an interaction as a difference of differences or a difference of sums. I believe Jake Westfall has made a similar point somewhere.

Next, we look at the relationship between the main effects and interactions, done as paired t-tests, in comparison to a linear mixed model with the above contrast coding.

## Main effect 1

To carry out the statistical test for the first main effect, we need to know the variance of $(X_1+X_2) - (X_3 + X_4)$.

$Var(X_1+X_2) = Var(X_3+X_4) = \sigma^2 + \sigma^2 + 2 \rho \sigma^2 = 2\sigma^2 (1+\rho)$

Let $Y_1 = X_1+X_2, Y_1 = X_3+X_4$.

So, $Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 Cov(Y_1,Y_2)$

We need to find $Cov(Y_1,Y_2)$. Now, $Cov(Y_1, Y_2) = Cov(X_1+X_2, X_3+X_4)$. Using the third result above:

$$$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$$$

We set a=b=c=d=1. This gives us:

$Cov(X_1 + X_2, X_3 + X_4) = Cov(X_1,X_3) + Cov(X_2,X_3)+Cov(X_1,X_4)+Cov(X_2,X_4)$

The covariance of any two RVs is $\rho \sigma^2$. So:

$Cov(X_1 + X_2, X_3 + X_4) = 4 \rho \sigma^2$

It follows that:

$Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 \times 4 \rho \sigma^2$

This gives us:

$Var(Y_1-Y_2) = 4\sigma^2 (1+\rho) - 8 \rho \sigma^2 = 4\sigma^2 (1+\rho - 2 \rho) = 4\sigma^2 (1-\rho)$

So, the statistic for a main effect is

$t_{n-1}\sim \frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}$

## Main effect 2

This is the same as above, with different indices for the conditions, so I skip this case.

## Interaction

For an interaction, we want a difference of differences. The indices of the random variable $X$ don’t really matter: any difference of differences can be seen as an interaction.

We want to know the variance of $(X_1-X_2) - (X_3 - X_4)$.

First we find out the variance of

$Var(X_1-X_2) = \sigma^2 + \sigma^2 - 2 \rho \sigma^2 = 2\sigma^2 (1-\rho)$

Similarly, $Var(X_3-X_4) = 2\sigma^2 (1-\rho)$

Let $Y_1 = X_1 - X_2, Y_2 = X_4 - X_3$.

So we need to find

$Var(Y_1-Y_2) = Var(Y_1)+Var(Y_2) - 2 Cov(Y_1, Y_2) = 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2)$

Using the above result about covariances (restated here in terms of the interaction):

$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$

Here, a=c=1, and b=d=-1.

$Cov(Y_1, Y_2) = Cov(X_1 - X_2, X_3 - X_4) = Cov(X_1,X_3) - Cov(X_2,X_3) - Cov(X_1,X_4) + Cov(X_2,X_4) = \rho \sigma^2-\rho \sigma^2-\rho \sigma^2+\rho \sigma^2 = 0$

This means that

$Var(Y_1-Y_2) = 2 \sigma^2 (1-\rho) + 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2) = 4 \sigma^2 (1-\rho) - 0 = 4 \sigma^2 (1-\rho)$

This is the same variance as the main effect variance above.

The statistic for the interaction is the same as for the main effect:

$t_{n-1}=\frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}$

Thus, in the repeated measures 2x2 factorial design with equal variances in each condition and identical pairwise correlations, all that differs in the statistic for the interaction vs the main effect is that the $Y_1 - Y_2$ is a difference of differences in the interaction. In the main effect, $Y_1 - Y_2$ is the difference of sums. But as I showed above, a difference of differences can be re-cast as a differences of sums, so there is no difference between what we call a main effect and an interaction, as far as the implied contrasts are concerned.

Thus, if the four condition means are:

mu <- c(.3, .2, .6, .1)

The difference of means (the numerator in the t-test) in the main effect is:

mu[1]+mu[2]-(mu[3]+mu[4])
## [1] -0.2

In the interaction, the numerator in the t-test is:

mu[1]-mu[2]-(mu[3]-mu[4])
## [1] -0.4

So, obviously, the t-statistic in main effects will depend on the numerator. But it won’t depend on the denominator under the assumptions here about equal variance and identical pairwise correlations.

## Confirming the formulas derived above using simulated data (balanced data)

Here, I will compare the main effects and interactions computed using the t-test and the closed-form formulas above. I consider only balanced data here.

library(MASS)
samplesize<-12
mu <- c(.3, .2, .6, .1)
Sigma <- matrix(1, nrow=4, ncol=4) + diag(4)
Sigma<-Sigma/2
Sigma
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5
## [4,]  0.5  0.5  0.5  1.0
x <- mvrnorm(n=samplesize, mu=mu, Sigma=Sigma, empirical=TRUE)
##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
rho<-cor(x[,1],x[,2])
stddev<-sd(x[,1])
n<-length(x[,1])

## main effect: (X1+X2)-(X3+X4)
t.test((x[,1]+x[,2]),(x[,3]+x[,4]),
paired=TRUE)
##
##  Paired t-test
##
## data:  (x[, 1] + x[, 2]) and (x[, 3] + x[, 4])
## t = -0.4899, df = 11, p-value = 0.6338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.0985484  0.6985484
## sample estimates:
## mean of the differences
##                    -0.2

Next, we compare this to the closed-form solution derived above:

## using closed form solution:
me<-(mean(x[,1])+mean(x[,2]))-(mean(x[,3])+mean(x[,4]))

me/(2*stddev*sqrt((1-rho)/n))
## [1] -0.4898979

The two are identical.

Next, the interaction:

## interaction effect:
y1<-(x[,1]-x[,2])
y2<-(x[,3]-x[,4])

t.test(y1,y2,paired=TRUE)
##
##  Paired t-test
##
## data:  y1 and y2
## t = -0.9798, df = 11, p-value = 0.3482
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2985484  0.4985484
## sample estimates:
## mean of the differences
##                    -0.4

Compare this with the derived result above:

# by hand using closed form solution:
(mean(y1)-mean(y2))/(2*stddev*sqrt((1-rho)/n))
## [1] -0.9797959

## Using linear mixed models

So, the paired t-tests will deliver exactly the same t-scores as the varying intercepts linear mixed model.

We verify this next with our simulated data:

##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
dv<-c(x[,1],x[,2],x[,3],x[,4])
cond<-factor(rep(letters[1:4],each=samplesize))
contrasts(cond)<-t(cntrsts)
contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1
subj<-rep(1:samplesize,4)
dat<-data.frame(subj,cond,dv)
library(lme4)
me1<-ifelse(dat$cond%in%c("a","b"),1,-1) me2<-ifelse(dat$cond%in%c("a","c"),1,-1)
int<-me1*me2
summary(lmer(dv~me1+me2+int+(1|subj),dat))
## Linear mixed model fit by REML ['lmerMod']
## Formula: dv ~ me1 + me2 + int + (1 | subj)
##    Data: dat
##
## REML criterion at convergence: 127.6
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -1.90187 -0.60083  0.04319  0.64841  1.78553
##
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subj     (Intercept) 0.5      0.7071
##  Residual             0.5      0.7071
## Number of obs: 48, groups:  subj, 12
##
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.3000     0.2282   1.314
## me1          -0.0500     0.1021  -0.490
## me2           0.1500     0.1021   1.470
## int          -0.1000     0.1021  -0.980
##
## Correlation of Fixed Effects:
##     (Intr) me1   me2
## me1 0.000
## me2 0.000  0.000
## int 0.000  0.000 0.000

We now compute the main effects and interactions using the above contrast coding, repeated below:

contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1

The main effect I computed matches the main effect in lme4:

round(t.test((x[,1]+x[,2]),(x[,3]+x[,4]),paired=TRUE)$statistic,2) ## t ## -0.49 The main effect II also matches lmer output: round(t.test((x[,1]+x[,3]),(x[,2]+x[,4]),paired=TRUE)$statistic,2)
##    t
## 1.47

The interaction can be written in two ways, as I discussed above: $(X_1 - X_2) - (X_3 - X_4) = (X_1 + X_4) - (X_2 + X_3)$.

round(t.test((x[,1]+x[,4]),(x[,2]+x[,3]),paired=TRUE)$statistic,2) ## t ## -0.98 Note that the paired t-test corresponding to the linear mixed model can also be written as a difference of differences: round(t.test((x[,1]-x[,2]),(x[,3]-x[,4]),paired=TRUE)$statistic,2)
##     t
## -0.98

## Conclusion

The ANOVA contrast coding in the linear mixed model can be decomposed into a series of paired t-tests, all of which have the same standard error in the denominator of the t-test. Only the magnitudes of the main effects and interactions determine the power you will need.