## Saturday, October 05, 2019

### Estimating the carbon cost of psycholinguistics conferences

Estimating the carbon cost of psycholinguistics conferences

#### 10/5/2019

Note: If I have made some calculation error, please point it out and I will fix it.

# Suggestion

At the University of Potsdam we are discussing how to reduce our carbon footprint in science-related work. One thought I had was that we could reduce our carbon footprint in psycholinguistics significantly (I bet nobody will believe I just used that word!) if we had the CUNY and AMLaP conference on alternate years as opposed to every year.

# Objections to alternating years for CUNY and AMLaP, and responses

I posed this question to my community, and got some interesting responses:

## Response 1: More people would start traveling to each conference, neutralizing the gains

“…it’s not clear whether this would actually reduce the net amount of travel. In particular, if there will now be just one language conference (well, a language conference with a focus on experimental and computational approaches to language), with an alternating location (US vs. Europe), more US-based researchers may come to AMLaP than before, and more Europe-based researchers may come to CUNY than before. It seems important to assess this in figuring out whether this would help.”
Response: This is a reasonable point, but it presupposes that people will disregard the environmental cost of flying and only look out for their self-interest. I think this is unlikely; most people seem to be aware of how urgent this problem is. Most students at Potsdam seem to be very concerned and want to know how carbon emissions can be reduced (also by them); I assume this is the same in the US. If a regional conference (e.g., LSA, and plenty of European conferences each year) is substituted for an international one, the data don’t seem to support such a shift (see this paper).

## Response 2: Cohesiveness of the community would be damaged, and inequality would increase

“This was also my thought when I read this idea. Also, it would likely impact access and participation, especially for students. Only the best funded labs will be able to send people to conferences overseas, which means that many people will not participate every year, and the cohesiveness of our field will suffer.”
Response: This is also a reasonable point. But it is already the case that the best funded labs are the only ones able to send people to conferences, and nobody does anything about it. But even if there is some additional cost of this nature, there is no free lunch. The idea that one can do something to contribute to reducing environmental damage without giving up a single thing is unrealistic. The question is whether the cost is worth it. Not having a world to travel in at all might be too high a cost compared to these other costs.

# Some back-of-the-envelope calculations

One paper states: “On a per capita basis, CO2 emissions for the ESA meetings ranged from 0.46-0.66 metric tons. The estimated per capita AAG carbon footprint, 0.58 metric tons of carbon dioxide, fell within this range of values.” p 67, Ponette‐Gonzàlez et al 2011.
Using the previous years’ AMLaP attendance counts, we have the following numbers of attendees:
dat<-read.table("amlapdat.txt",header=TRUE)

dat
##   year total

## 1 2015   194
## 2 2016   305
## 3 2017   300
## 4 2018   298
Taking the above estimates of .46-.66 metric tons per person on average, the minimum and maximum emissions in metric tons per conference range from:
dat$mincost<-dat$total*.46

summary(dat$mincost) ## Min. 1st Qu. Median Mean 3rd Qu. Max.  ## 89.2 125.1 137.5 126.2 138.6 140.3 dat$maxcost<-dat$total*.66 summary(dat$maxcost)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 

##     128     180     197     181     199     201
The full range looks like this:
dat

##   year total mincost maxcost

## 1 2015   194   89.24  128.04
## 2 2016   305  140.30  201.30
## 3 2017   300  138.00  198.00
## 4 2018   298  137.08  196.68
So each AMLaP conference generates about 90 to 201 metric tonnes of carbon emissions. CUNY may be comparable, perhaps a bit larger, so at the upper end. Simply multiplying by two, our annual carbon emission would then be estimated to be:
## minimum (metric tons)

90*2
## [1] 180

## maximum (metric tons)

201*2
## [1] 402

The paper says: “Depending on the model, we estimated an average 18-59% reduction in carbon emissions for multiple regional compared with national meetings.” p 67.
For the smallest conference that we have data on, an 18%-59% reduction would amount to a emissions ranging from:
180-.18*180 ## down from 180 

## [1] 147.6

180-.59*180 ## down from 180

## [1] 73.8

For the largest conference we have data on, an 18%-59% reduction would amount to a emissions ranging from:
201-.18*201 ## down from 201 

## [1] 164.82

201-.59*201 ## down from 201

## [1] 82.41

What does a maximum reduction of 119 metric tons (201-82) mean? As a baseline, consider that India produced 1.7 metric tons per capita in an entire year (2014; you can google this).

# Conclusion

There could be a significant reduction in carbon emissions. There will be a cost of course, but it may not be environmental. (There could be unintended environmental costs such as people starting to producing more babies as a result of not going to conferences; publishing more papers; or have more time to produce more trained psycholinguists per year.)
In particular, hoping that we can go on with business as usual is guaranteed to lead to a net loss.

# Future directions

• The above analysis is probably very coarse-grained. One could do a more principled analysis of emission costs by using data from CUNY 2020 and AMLaP 2020. Since Brian Dillon and I are holding these two conferences, we could coordinate our analyses. Incidentally, in conferences in general, about 10% of the attendees account for 50% of the emissions; one could take the individual-level cost into account in a more nuanced manner.

• One could simply implement the change from 2021 and track the change in carbon emissions over the years pre-2021 and post-2021 to see if the fear that carbon emissions will go up instead of down is realized. I offer myself to carry out that analysis. If it goes up, it would obviously be a bad idea and should be scrapped. Based on the papers I read, I would be pre-register my prediction that that will not happen. Obviously, it would be too late to do anything by the time enough data come in.

• Comments on this post are welcome, and suggestions for improvement, or corrections, are also most welcome!

## Sunday, September 23, 2018

### Recreating Michael Betancourt's Bayesian modeling course from his online materials

Several people wanted to have the slides from Betancourt's lectures at SMLP2018. It is possible to recreate most of the course from his writings:

1. Intro to probability:
https://betanalpha.github.io/assets/case_studies/probability_theory.html

2. Workflow:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

3. Diagnosis:
https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html

5. Validating inference: https://arxiv.org/abs/1804.06788

6. Calibrating inference: https://arxiv.org/abs/1803.08393

## Thursday, August 16, 2018

### How to capitalize on a priori contrasts in linear (mixed) models: A tutorial

We wrote a short tutorial on contast coding, covering the common contrast coding scenarios, among them: treatment, helmert, anova, sum, and sliding (successive differences) contrasts.  The target audience is psychologists and linguists, but really it is for anyone doing planned experiments.

The paper has not been submitted anywhere yet. We are keen to get user feedback before we do that. Comments and criticism very welcome. Please post comments on this blog, or email me.

Abstract:

Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. This is unfortunate as researchers typically have specific hypotheses about which condition means differ from each other. A priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be "tested instead of, rather than as a supplement to, the ordinary omnibus' F test" (Hayes, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, Helmert, and polynomial contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the weight coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more precise confirmatory tests of experimental hypotheses than standard omnibus F-test.

Full paper: https://arxiv.org/abs/1807.10451

## Thursday, July 26, 2018

### Stan Pharmacometrics conference in Paris July 24 2018

I just got back from attending this amazing conference in Paris:

http://www.go-isop.org/stan-for-pharmacometrics---paris-france

A few people were disturbed/surprised by the fact that I am linguist ("what are you doing at an pharmacometrics conference?"). I hasten to point out that two of the core developers of Stan are linguists too (Bob Carpenter and Mitzi Morris). People seem to think that all linguists do is correct other people's comma placements. However, despite my being a total outsider to the conference, the organizers were amazingly welcoming, and even allowed me to join in the speaker's dinner, and treated me like a regular guest.

Here is a quick summary of what I learnt:

1. Gelman's talk: The only thing I remember from his talk was the statement that when economists fit multiple regression models and find that one predictor's formerly significant effect was wiped out by adding another predictor, they think that the new predictor explains the old predictor. Which is pretty funny. Another funny thing was that he had absolutely no slides, and was drawing figures in the air, and apologizing for the low resolution of the figures.

2. Bob Carpenter gave an inspiring talk on the exciting stuff that's coming in Stan:

- Higher Speeds (Stan 2.10 will be 80 times faster with a 100 cores)

- Stan book

- New functionality (e.g., tuples, multivariate normal RNG)

- Gaussian process models will soon become tractable

- Blockless Stan is coming! This will make Stan code look more like JAGS (which is great). Stan will forever remain backward compatible so old code will not break.

My conclusion was that in the next few years, things will improve a lot in terms of speed and in terms of what one can do.

3. Torsten and Stan:

- Torsten seems to be a bunch of functions to do PK/PD modeling with Stan.

- Bill Gillespie on Torsten and Stan: https://www.metrumrg.com/wp-content/uploads/2018/05/BayesianPmetricsMBSW2018.pdf

- Free courses on Stan and PK/PK modeling: https://www.metrumrg.com/courses/

4. Mitzi Morris gave a great talk on disease mapping (accident mapping in NYC) using conditional autoregressive models (CAR). The idea is simple but great: one can model the correlations between neighboring boroughs. A straightforward application is in EEG, modeling data from all electrodes simultaneously, and modeling the decreasing correlation between neighbors. This is low-hanging fruit, esp. with Stan 2.18 coming.

5. From Bob I learnt that one should never provide free consultation (I am doing that these days), because people don't value your time then. If you charge them by the hour, this sharpens their focus. But I feel guilty charging people for my time, especially in medicine, where I provide free consulting: I'm a civil servant and already get paid by the state, and I get total freedom to do whatever I like. So it seems only fair that I serve the state in some useful way (other than studying processing differences in subject vs object relative clauses, that is).

For psycholinguists, there is a lot of stuff in pharmacometrics that will be important for EEG and visual world data: Gaussian process models, PK/PD modeling, spatial+temporal modeling of a signal like EEG. These tools exist today but we are not using them. And Stan makes a lot of this possible now or very soon now.

Summary: I'm impressed.

## Friday, June 01, 2018

I welcome comments and criticism on the following paper:

Title: The statistical significance filter leads to overoptimistic expectations of replicability
Authors: Vasishth, Mertzen, Jäger, Gelman

Abstract: It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overop- timistic expectations of replicability. These overoptimistic expectations arise due to Type M(agnitude) error: when underpowered studies yield significant results, effect size estimates are guaranteed to be exaggerated and noisy. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving best practices.

## Friday, April 13, 2018

### The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

The relationship between paired t-tests and linear mixed models in a 2x2 repeated measures design

# Introduction

[Thanks to Reinhold Kliegl and Daniel Schad for useful discussions.]

First, I summarize some known relationships in statistics. We will be using these below.

First result

The variance of the sum of two random variables is:

$$$Var(X_1+X_2)=Var(X_1) + Var(X_2) + 2\times Cov(X_1, X2)$$$

Second result

The variance of the difference of two random variables is:

$$$Var(X_1-X_2)=Var(X_1) + Var(X_2) - 2\times Cov(X_1, X2)$$$

Third result

The covariance of the sum of two random variables with some coefficients a,b,c,d is:

$$$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$$$

For a proof, see Rice, p 138.

# 2x2 repeated measures design

Assume that we have four random variables $X_1,\dots,X_4$, each has standard deviation $\sigma$ and pairwise correlations are $\rho$.

Suppose have the following contrast coding:

cntrsts<-matrix(c(1,1,-1,-1, ## main effect
1,-1,1,-1, ## main effect
1,-1,-1,1), ## interaction
ncol=4,byrow = TRUE)
rownames(cntrsts)<-c("ME1","ME2","Int")
cntrsts
##     [,1] [,2] [,3] [,4]
## ME1    1    1   -1   -1
## ME2    1   -1    1   -1
## Int    1   -1   -1    1

Notice that a main effect like the first one above:

cntrsts[1,]
## [1]  1  1 -1 -1

implies the comparison $(X_1 + X_2) - (X_3 + X_4)$. If the contrast matrix was scaled to 1/2, we would have had:

cntrsts/2
##     [,1] [,2] [,3] [,4]
## ME1  0.5  0.5 -0.5 -0.5
## ME2  0.5 -0.5  0.5 -0.5
## Int  0.5 -0.5 -0.5  0.5

and the comparison would be: $(X_1 + X_2)/2 - (X_3 + X_4)/2$.

The second main effect:

cntrsts[2,]
## [1]  1 -1  1 -1

implies the comparison: $(X_1 + X_3) - (X_2 + X_4)$.

Similarly, the interaction

cntrsts[3,]
## [1]  1 -1 -1  1

implies a comparison like $(X_1 - X_2)- (X_3 - X_4)$. Opening out the brackets and rearranging, this is: $(X_1 + X_4)-(X_2 + X_3)$. Thus, rearranging terms, we can write the interaction as a difference of sums, just like a main effect. So, we can see an interaction as a difference of differences or a difference of sums. I believe Jake Westfall has made a similar point somewhere.

Next, we look at the relationship between the main effects and interactions, done as paired t-tests, in comparison to a linear mixed model with the above contrast coding.

## Main effect 1

To carry out the statistical test for the first main effect, we need to know the variance of $(X_1+X_2) - (X_3 + X_4)$.

$Var(X_1+X_2) = Var(X_3+X_4) = \sigma^2 + \sigma^2 + 2 \rho \sigma^2 = 2\sigma^2 (1+\rho)$

Let $Y_1 = X_1+X_2, Y_1 = X_3+X_4$.

So, $Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 Cov(Y_1,Y_2)$

We need to find $Cov(Y_1,Y_2)$. Now, $Cov(Y_1, Y_2) = Cov(X_1+X_2, X_3+X_4)$. Using the third result above:

$$$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$$$

We set a=b=c=d=1. This gives us:

$Cov(X_1 + X_2, X_3 + X_4) = Cov(X_1,X_3) + Cov(X_2,X_3)+Cov(X_1,X_4)+Cov(X_2,X_4)$

The covariance of any two RVs is $\rho \sigma^2$. So:

$Cov(X_1 + X_2, X_3 + X_4) = 4 \rho \sigma^2$

It follows that:

$Var(Y_1-Y_2) = 2\sigma^2 (1+\rho) + 2\sigma^2 (1+\rho) - 2 \times 4 \rho \sigma^2$

This gives us:

$Var(Y_1-Y_2) = 4\sigma^2 (1+\rho) - 8 \rho \sigma^2 = 4\sigma^2 (1+\rho - 2 \rho) = 4\sigma^2 (1-\rho)$

So, the statistic for a main effect is

$t_{n-1}\sim \frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}$

## Main effect 2

This is the same as above, with different indices for the conditions, so I skip this case.

## Interaction

For an interaction, we want a difference of differences. The indices of the random variable $X$ don’t really matter: any difference of differences can be seen as an interaction.

We want to know the variance of $(X_1-X_2) - (X_3 - X_4)$.

First we find out the variance of

$Var(X_1-X_2) = \sigma^2 + \sigma^2 - 2 \rho \sigma^2 = 2\sigma^2 (1-\rho)$

Similarly, $Var(X_3-X_4) = 2\sigma^2 (1-\rho)$

Let $Y_1 = X_1 - X_2, Y_2 = X_4 - X_3$.

So we need to find

$Var(Y_1-Y_2) = Var(Y_1)+Var(Y_2) - 2 Cov(Y_1, Y_2) = 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2)$

Using the above result about covariances (restated here in terms of the interaction):

$Cov(aX_1 + bX_2, cX_3 + dX_4) = acCov(X_1,X_3) + bcCov(X_2,X_3)+adCov(X_1,X_4)+bdCov(X_2,X_4)$

Here, a=c=1, and b=d=-1.

$Cov(Y_1, Y_2) = Cov(X_1 - X_2, X_3 - X_4) = Cov(X_1,X_3) - Cov(X_2,X_3) - Cov(X_1,X_4) + Cov(X_2,X_4) = \rho \sigma^2-\rho \sigma^2-\rho \sigma^2+\rho \sigma^2 = 0$

This means that

$Var(Y_1-Y_2) = 2 \sigma^2 (1-\rho) + 2 \sigma^2 (1-\rho) - 2Cov(Y_1, Y_2) = 4 \sigma^2 (1-\rho) - 0 = 4 \sigma^2 (1-\rho)$

This is the same variance as the main effect variance above.

The statistic for the interaction is the same as for the main effect:

$t_{n-1}=\frac{Y_1 - Y_2}{2\sigma \sqrt{(1-\rho)/n}}$

Thus, in the repeated measures 2x2 factorial design with equal variances in each condition and identical pairwise correlations, all that differs in the statistic for the interaction vs the main effect is that the $Y_1 - Y_2$ is a difference of differences in the interaction. In the main effect, $Y_1 - Y_2$ is the difference of sums. But as I showed above, a difference of differences can be re-cast as a differences of sums, so there is no difference between what we call a main effect and an interaction, as far as the implied contrasts are concerned.

Thus, if the four condition means are:

mu <- c(.3, .2, .6, .1)

The difference of means (the numerator in the t-test) in the main effect is:

mu[1]+mu[2]-(mu[3]+mu[4])
## [1] -0.2

In the interaction, the numerator in the t-test is:

mu[1]-mu[2]-(mu[3]-mu[4])
## [1] -0.4

So, obviously, the t-statistic in main effects will depend on the numerator. But it won’t depend on the denominator under the assumptions here about equal variance and identical pairwise correlations.

## Confirming the formulas derived above using simulated data (balanced data)

Here, I will compare the main effects and interactions computed using the t-test and the closed-form formulas above. I consider only balanced data here.

library(MASS)
samplesize<-12
mu <- c(.3, .2, .6, .1)
Sigma <- matrix(1, nrow=4, ncol=4) + diag(4)
Sigma<-Sigma/2
Sigma
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.5  0.5  0.5
## [2,]  0.5  1.0  0.5  0.5
## [3,]  0.5  0.5  1.0  0.5
## [4,]  0.5  0.5  0.5  1.0
x <- mvrnorm(n=samplesize, mu=mu, Sigma=Sigma, empirical=TRUE)
head(x)
##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
rho<-cor(x[,1],x[,2])
stddev<-sd(x[,1])
n<-length(x[,1])

## main effect: (X1+X2)-(X3+X4)
t.test((x[,1]+x[,2]),(x[,3]+x[,4]),
paired=TRUE)
##
##  Paired t-test
##
## data:  (x[, 1] + x[, 2]) and (x[, 3] + x[, 4])
## t = -0.4899, df = 11, p-value = 0.6338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.0985484  0.6985484
## sample estimates:
## mean of the differences
##                    -0.2

Next, we compare this to the closed-form solution derived above:

## using closed form solution:
me<-(mean(x[,1])+mean(x[,2]))-(mean(x[,3])+mean(x[,4]))

me/(2*stddev*sqrt((1-rho)/n))
## [1] -0.4898979

The two are identical.

Next, the interaction:

## interaction effect:
y1<-(x[,1]-x[,2])
y2<-(x[,3]-x[,4])

t.test(y1,y2,paired=TRUE)
##
##  Paired t-test
##
## data:  y1 and y2
## t = -0.9798, df = 11, p-value = 0.3482
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2985484  0.4985484
## sample estimates:
## mean of the differences
##                    -0.4

Compare this with the derived result above:

# by hand using closed form solution:
(mean(y1)-mean(y2))/(2*stddev*sqrt((1-rho)/n))
## [1] -0.9797959

## Using linear mixed models

So, the paired t-tests will deliver exactly the same t-scores as the varying intercepts linear mixed model.

We verify this next with our simulated data:

head(x)
##             [,1]       [,2]        [,3]       [,4]
## [1,]  0.66310780  0.9374838  0.75646691  1.4784750
## [2,]  1.79375299  1.2948389  2.38988148  2.2699178
## [3,]  0.38013167 -0.8722642  0.33391841 -0.5987544
## [4,] -0.02921166  0.1167126 -0.67462429  0.6929755
## [5,]  0.83084376 -0.8339335 -0.07134439 -0.7898326
## [6,] -1.94323512 -0.7229310 -0.11062696 -0.5152638
dv<-c(x[,1],x[,2],x[,3],x[,4])
cond<-factor(rep(letters[1:4],each=samplesize))
contrasts(cond)<-t(cntrsts)
contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1
subj<-rep(1:samplesize,4)
dat<-data.frame(subj,cond,dv)
library(lme4)
## Loading required package: Matrix
me1<-ifelse(dat$cond%in%c("a","b"),1,-1) me2<-ifelse(dat$cond%in%c("a","c"),1,-1)
int<-me1*me2
summary(lmer(dv~me1+me2+int+(1|subj),dat))
## Linear mixed model fit by REML ['lmerMod']
## Formula: dv ~ me1 + me2 + int + (1 | subj)
##    Data: dat
##
## REML criterion at convergence: 127.6
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -1.90187 -0.60083  0.04319  0.64841  1.78553
##
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subj     (Intercept) 0.5      0.7071
##  Residual             0.5      0.7071
## Number of obs: 48, groups:  subj, 12
##
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.3000     0.2282   1.314
## me1          -0.0500     0.1021  -0.490
## me2           0.1500     0.1021   1.470
## int          -0.1000     0.1021  -0.980
##
## Correlation of Fixed Effects:
##     (Intr) me1   me2
## me1 0.000
## me2 0.000  0.000
## int 0.000  0.000 0.000

We now compute the main effects and interactions using the above contrast coding, repeated below:

contrasts(cond)
##   ME1 ME2 Int
## a   1   1   1
## b   1  -1  -1
## c  -1   1  -1
## d  -1  -1   1

The main effect I computed matches the main effect in lme4:

round(t.test((x[,1]+x[,2]),(x[,3]+x[,4]),paired=TRUE)$statistic,2) ## t ## -0.49 The main effect II also matches lmer output: round(t.test((x[,1]+x[,3]),(x[,2]+x[,4]),paired=TRUE)$statistic,2)
##    t
## 1.47

The interaction can be written in two ways, as I discussed above: $(X_1 - X_2) - (X_3 - X_4) = (X_1 + X_4) - (X_2 + X_3)$.

round(t.test((x[,1]+x[,4]),(x[,2]+x[,3]),paired=TRUE)$statistic,2) ## t ## -0.98 Note that the paired t-test corresponding to the linear mixed model can also be written as a difference of differences: round(t.test((x[,1]-x[,2]),(x[,3]-x[,4]),paired=TRUE)$statistic,2)
##     t
## -0.98`

## Conclusion

The ANOVA contrast coding in the linear mixed model can be decomposed into a series of paired t-tests, all of which have the same standard error in the denominator of the t-test. Only the magnitudes of the main effects and interactions determine the power you will need.