Search

Thursday, August 16, 2018

How to capitalize on a priori contrasts in linear (mixed) models: A tutorial

We wrote a short tutorial on contast coding, covering the common contrast coding scenarios, among them: treatment, helmert, anova, sum, and sliding (successive differences) contrasts.  The target audience is psychologists and linguists, but really it is for anyone doing planned experiments.
 
The paper has not been submitted anywhere yet. We are keen to get user feedback before we do that. Comments and criticism very welcome. Please post comments on this blog, or email me.
 
Abstract:

Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. This is unfortunate as researchers typically have specific hypotheses about which condition means differ from each other. A priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be "tested instead of, rather than as a supplement to, the ordinary `omnibus' F test" (Hayes, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, Helmert, and polynomial contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the weight coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more precise confirmatory tests of experimental hypotheses than standard omnibus F-test.


 Full paper: https://arxiv.org/abs/1807.10451

Thursday, July 26, 2018

Stan Pharmacometrics conference in Paris July 24 2018

I just got back from attending this amazing conference in Paris:

http://www.go-isop.org/stan-for-pharmacometrics---paris-france

A few people were disturbed/surprised by the fact that I am linguist ("what are you doing at an pharmacometrics conference?"). I hasten to point out that two of the core developers of Stan are linguists too (Bob Carpenter and Mitzi Morris). People seem to think that all linguists do is correct other people's comma placements. However, despite my being a total outsider to the conference, the organizers were amazingly welcoming, and even allowed me to join in the speaker's dinner, and treated me like a regular guest.

Here is a quick summary of what I learnt:

1. Gelman's talk: The only thing I remember from his talk was the statement that when economists fit multiple regression models and find that one predictor's formerly significant effect was wiped out by adding another predictor, they think that the new predictor explains the old predictor. Which is pretty funny. Another funny thing was that he had absolutely no slides, and was drawing figures in the air, and apologizing for the low resolution of the figures.

 2. Bob Carpenter gave an inspiring talk on the exciting stuff that's coming in Stan:

- Higher Speeds (Stan 2.10 will be 80 times faster with a 100 cores)

- Stan book

- New functionality (e.g., tuples, multivariate normal RNG)

- Gaussian process models will soon become tractable

- Blockless Stan is coming! This will make Stan code look more like JAGS (which is great). Stan will forever remain backward compatible so old code will not break.

My conclusion was that in the next few years, things will improve a lot in terms of speed and in terms of what one can do.

3. Torsten and Stan:

- Torsten seems to be a bunch of functions to do PK/PD modeling with Stan.

- Bill Gillespie on Torsten and Stan: https://www.metrumrg.com/wp-content/uploads/2018/05/BayesianPmetricsMBSW2018.pdf

- Free courses on Stan and PK/PK modeling: https://www.metrumrg.com/courses/

4. Mitzi Morris gave a great talk on disease mapping (accident mapping in NYC) using conditional autoregressive models (CAR). The idea is simple but great: one can model the correlations between neighboring boroughs. A straightforward application is in EEG, modeling data from all electrodes simultaneously, and modeling the decreasing correlation between neighbors. This is low-hanging fruit, esp. with Stan 2.18 coming.

5. From Bob I learnt that one should never provide free consultation (I am doing that these days), because people don't value your time then. If you charge them by the hour, this sharpens their focus. But I feel guilty charging people for my time, especially in medicine, where I provide free consulting: I'm a civil servant and already get paid by the state, and I get total freedom to do whatever I like. So it seems only fair that I serve the state in some useful way (other than studying processing differences in subject vs object relative clauses, that is).

For psycholinguists, there is a lot of stuff in pharmacometrics that will be important for EEG and visual world data: Gaussian process models, PK/PD modeling, spatial+temporal modeling of a signal like EEG. These tools exist today but we are not using them. And Stan makes a lot of this possible now or very soon now.

Summary: I'm impressed.

Friday, June 01, 2018

Soliciting comments on paper

I welcome comments and criticism on the following paper:

Title: The statistical significance filter leads to overoptimistic expectations of replicability
Authors: Vasishth, Mertzen, Jäger, Gelman


Abstract: It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overop- timistic expectations of replicability. These overoptimistic expectations arise due to Type M(agnitude) error: when underpowered studies yield significant results, effect size estimates are guaranteed to be exaggerated and noisy. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving best practices.

You can download the pdf from here: https://osf.io/eyphj/

Friday, April 13, 2018

The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

Introduction

[Thanks to Reinhold Kliegl and Daniel Schad for useful discussions.]

First, I summarize some known relationships in statistics. We will be using these below.

First result

The variance of the sum of two random variables is:

\[\begin{equation} Var(X_1+X_2)=Var(X_1) + Var(X_2) + 2\times Cov(X_1, X2) \end{equation}\]

Second result

The variance of the difference of two random variables is:

\[\begin{equation} Var(X_1-X_2)=Var(X_1) + Var(X_2) - 2\times Cov(X_1, X2) \end{equation}\]

Third result

The covariance of the sum of two random variables with some coefficients a,b,c,d is:

\[\begin{equation} Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4) \end{equation}\]

For a proof, see Rice, p 138.

2x2 repeated measures design

Assume that we have four random variables \(X_1,\dots,X_4\), each has standard deviation \(\sigma\) and pairwise correlations are \(\rho\).

Suppose have the following contrast coding:

cntrsts<-matrix(c(1,1,-1,-1, ## main effect
         1,-1,1,-1, ## main effect
         1,-1,-1,1), ## interaction
         ncol=4,byrow = TRUE)
rownames(cntrsts)<-c("ME1","ME2","Int")
cntrsts
##     [,1] [,2] [,3] [,4]
## ME1    1    1   -1   -1
## ME2    1   -1    1   -1
## Int    1   -1   -1    1

Notice that a main effect like the first one above:

cntrsts[1,]
## [1]  1  1 -1 -1

implies the comparison \((X_1 + X_2) - (X_3 + X_4)\). If the contrast matrix was scaled to 1/2, we would have had:

cntrsts/2
##     [,1] [,2] [,3] [,4]
## ME1  0.5  0.5 -0.5 -0.5
## ME2  0.5 -0.5  0.5 -0.5
## Int  0.5 -0.5 -0.5  0.5

and the comparison would be: \((X_1 + X_2)/2 - (X_3 + X_4)/2\).

The second main effect:

cntrsts[2,]
## [1]  1 -1  1 -1

implies the comparison: \((X_1 + X_3) - (X_2 + X_4)\).

Similarly, the interaction

cntrsts[3,]
## [1]  1 -1 -1  1

implies a comparison like \((X_1 - X_2)- (X_3 - X_4)\). Opening out the brackets and rearranging, this is: \((X_1 + X_4)-(X_2 + X_3)\). Thus, rearranging terms, we can write the interaction as a difference of sums, just like a main effect. So, we can see an interaction as a difference of differences or a difference of sums. I believe Jake Westfall has made a similar point somewhere.

Next, we look at the relationship between the main effects and interactions, done as paired t-tests, in comparison to a linear mixed model with the above contrast coding.

Main effect 1

To carry out the statistical test for the first main effect, we need to know the variance of \((X_1+X_2) - (X_3 + X_4)\).

\(Var(X_1+X_2) = Var(X_3+X_4) = \sigma^2 + \sigma^2 + 2 \rho \sigma^2 = 2\sigma^2 (1+\rho)\)

Let \(Y_1 = X_1+X_2, Y_1 = X_3+X_4\).

So, \(Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 Cov(Y_1,Y_2)\)

We need to find \(Cov(Y_1,Y_2)\). Now, \(Cov(Y_1, Y_2) = Cov(X_1+X_2, X_3+X_4)\). Using the third result above:

\[\begin{equation} Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4) \end{equation}\]

We set a=b=c=d=1. This gives us:

\(Cov(X_1 + X_2, X_3 + X_4) = Cov(X_1,X_3) + Cov(X_2,X_3)+Cov(X_1,X_4)+Cov(X_2,X_4)\)

The covariance of any two RVs is \(\rho \sigma^2\). So:

\(Cov(X_1 + X_2, X_3 + X_4) = 4 \rho \sigma^2\)

It follows that:

\(Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 \times 4 \rho \sigma^2\)

This gives us:

\(Var(Y_1-Y_2) = 4\sigma^2 (1+\rho) - 8 \rho \sigma^2 = 4\sigma^2 (1+\rho - 2 \rho) = 4\sigma^2 (1-\rho)\)

So, the statistic for a main effect is

\(t_{n-1}\sim \frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}\)

Main effect 2

This is the same as above, with different indices for the conditions, so I skip this case.

Interaction

For an interaction, we want a difference of differences. The indices of the random variable \(X\) don’t really matter: any difference of differences can be seen as an interaction.

We want to know the variance of \((X_1-X_2) - (X_3 - X_4)\).

First we find out the variance of

\(Var(X_1-X_2) = \sigma^2 + \sigma^2 - 2 \rho \sigma^2 = 2\sigma^2 (1-\rho)\)

Similarly, \(Var(X_3-X_4) = 2\sigma^2 (1-\rho)\)

Let \(Y_1 = X_1 - X_2, Y_2 = X_4 - X_3\).

So we need to find

\(Var(Y_1-Y_2) = Var(Y_1)+Var(Y_2) - 2 Cov(Y_1, Y_2) = 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2)\)

Using the above result about covariances (restated here in terms of the interaction):

\(Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)\)

Here, a=c=1, and b=d=-1.

\(Cov(Y_1, Y_2) = Cov(X_1 - X_2, X_3 - X_4) = Cov(X_1,X_3) - Cov(X_2,X_3) - Cov(X_1,X_4) + Cov(X_2,X_4) = \rho \sigma^2-\rho \sigma^2-\rho \sigma^2+\rho \sigma^2 = 0\)

This means that

\(Var(Y_1-Y_2) = 2 \sigma^2 (1-\rho) + 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2) = 4 \sigma^2 (1-\rho) - 0 = 4 \sigma^2 (1-\rho)\)

This is the same variance as the main effect variance above.

The statistic for the interaction is the same as for the main effect:

\(t_{n-1}=\frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}\)

Thus, in the repeated measures 2x2 factorial design with equal variances in each condition and identical pairwise correlations, all that differs in the statistic for the interaction vs the main effect is that the \(Y_1 - Y_2\) is a difference of differences in the interaction. In the main effect, \(Y_1 - Y_2\) is the difference of sums. But as I showed above, a difference of differences can be re-cast as a differences of sums, so there is no difference between what we call a main effect and an interaction, as far as the implied contrasts are concerned.

Thus, if the four condition means are:

mu <- c(.3, .2, .6, .1)

The difference of means (the numerator in the t-test) in the main effect is:

mu[1]+mu[2]-(mu[3]+mu[4])
## [1] -0.2

In the interaction, the numerator in the t-test is:

mu[1]-mu[2]-(mu[3]-mu[4])
## [1] -0.4

So, obviously, the t-statistic in main effects will depend on the numerator. But it won’t depend on the denominator under the assumptions here about equal variance and identical pairwise correlations.

Confirming the formulas derived above using simulated data (balanced data)

Here, I will compare the main effects and interactions computed using the t-test and the closed-form formulas above. I consider only balanced data here.

library(MASS)
samplesize<-12
mu <- c(.3, .2, .6, .1)
Sigma <- matrix(1, nrow=4, ncol=4) + diag(4)
Sigma<-Sigma/2
Sigma
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5
## [4,]  0.5  0.5  0.5  1.0
x <- mvrnorm(n=samplesize, mu=mu, Sigma=Sigma, empirical=TRUE)
head(x)
##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
rho<-cor(x[,1],x[,2])
stddev<-sd(x[,1])
n<-length(x[,1])

## main effect: (X1+X2)-(X3+X4)
t.test((x[,1]+x[,2]),(x[,3]+x[,4]),
       paired=TRUE)
## 
##  Paired t-test
## 
## data:  (x[, 1] + x[, 2]) and (x[, 3] + x[, 4])
## t = -0.4899, df = 11, p-value = 0.6338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.0985484  0.6985484
## sample estimates:
## mean of the differences 
##                    -0.2

Next, we compare this to the closed-form solution derived above:

## using closed form solution:
me<-(mean(x[,1])+mean(x[,2]))-(mean(x[,3])+mean(x[,4]))

me/(2*stddev*sqrt((1-rho)/n))
## [1] -0.4898979

The two are identical.

Next, the interaction:

## interaction effect:
y1<-(x[,1]-x[,2])
y2<-(x[,3]-x[,4])

t.test(y1,y2,paired=TRUE)
## 
##  Paired t-test
## 
## data:  y1 and y2
## t = -0.9798, df = 11, p-value = 0.3482
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2985484  0.4985484
## sample estimates:
## mean of the differences 
##                    -0.4

Compare this with the derived result above:

# by hand using closed form solution:
(mean(y1)-mean(y2))/(2*stddev*sqrt((1-rho)/n))
## [1] -0.9797959

Using linear mixed models

I have shown in another blog post that the paired t-test is exactly equivalent to the varying intercepts linear mixed model.

So, the paired t-tests will deliver exactly the same t-scores as the varying intercepts linear mixed model.

We verify this next with our simulated data:

head(x)
##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
dv<-c(x[,1],x[,2],x[,3],x[,4])
cond<-factor(rep(letters[1:4],each=samplesize))
contrasts(cond)<-t(cntrsts)
contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1
subj<-rep(1:samplesize,4)
dat<-data.frame(subj,cond,dv)
library(lme4)
## Loading required package: Matrix
me1<-ifelse(dat$cond%in%c("a","b"),1,-1)
me2<-ifelse(dat$cond%in%c("a","c"),1,-1)
int<-me1*me2
summary(lmer(dv~me1+me2+int+(1|subj),dat))
## Linear mixed model fit by REML ['lmerMod']
## Formula: dv ~ me1 + me2 + int + (1 | subj)
##    Data: dat
## 
## REML criterion at convergence: 127.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.90187 -0.60083  0.04319  0.64841  1.78553 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subj     (Intercept) 0.5      0.7071  
##  Residual             0.5      0.7071  
## Number of obs: 48, groups:  subj, 12
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.3000     0.2282   1.314
## me1          -0.0500     0.1021  -0.490
## me2           0.1500     0.1021   1.470
## int          -0.1000     0.1021  -0.980
## 
## Correlation of Fixed Effects:
##     (Intr) me1   me2  
## me1 0.000             
## me2 0.000  0.000      
## int 0.000  0.000 0.000

We now compute the main effects and interactions using the above contrast coding, repeated below:

contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1

The main effect I computed matches the main effect in lme4:

round(t.test((x[,1]+x[,2]),(x[,3]+x[,4]),paired=TRUE)$statistic,2)
##     t 
## -0.49

The main effect II also matches lmer output:

round(t.test((x[,1]+x[,3]),(x[,2]+x[,4]),paired=TRUE)$statistic,2)
##    t 
## 1.47

The interaction can be written in two ways, as I discussed above: \((X_1 - X_2) - (X_3 - X_4) = (X_1 + X_4) - (X_2 + X_3)\).

round(t.test((x[,1]+x[,4]),(x[,2]+x[,3]),paired=TRUE)$statistic,2)
##     t 
## -0.98

Note that the paired t-test corresponding to the linear mixed model can also be written as a difference of differences:

round(t.test((x[,1]-x[,2]),(x[,3]-x[,4]),paired=TRUE)$statistic,2)
##     t 
## -0.98

Conclusion

The ANOVA contrast coding in the linear mixed model can be decomposed into a series of paired t-tests, all of which have the same standard error in the denominator of the t-test. Only the magnitudes of the main effects and interactions determine the power you will need.