Search

Friday, April 07, 2017

A comment on the DeLong et al 2005 nine-labs replication failure

Nieuwland et al replication attempts of DeLong at al. 2005

Note: Revised 8 April 2017 following comments from Nieuwland and others.

Recently, Nieuwland et al did an interesting series of replication attempts of the DeLong et al 2005 Nature Neuroscience paper. Nieuwland et al report a failure to replicate the original effect. This is just a quick note to suggest that it may be difficult to claim that this is a replication failure. I focus only on the article data.

First we load the Nieuwland et al data (available from https://osf.io/eyzaq).

articles <-read.delim("public_article_data.txt", quote="")
#head(articles)

Then make sure columns are correct data types, and get lab names, convert cloze to z-scores as Nieuwland et al did:

articles$item <-as.factor(articles$item)
articles$cloze <- as.numeric(as.character(articles$cloze))

labnames<-unique(articles$lab)

# Z-transform cloze and prestimulus interval
articles$zcloze <- scale(articles$cloze, center = TRUE, scale = TRUE)
articles$zprestim <- scale(articles$prestim, center = TRUE, scale = TRUE)

Then use lmer to compute estimates from each lab separately. I also fit ``maximal’’ models in Stan (HT: Barr et al 2013 ;) but nothing much changes so I use these varying slopes models without correlations:

library(lme4)
## Loading required package: Matrix
# models for article fit for each lab separately:
res<-matrix(rep(NA,9*2),ncol=2)
for(i in 1:9){
model <- lmer(base100 ~ zcloze + ( 1| subject) +
                (zcloze|| item), 
              data = subset(articles,lab==labnames[i]),
                control=lmerControl(
                  optCtrl=list(maxfun=1e5)))
res[i,]<-summary(model)$coefficients[2,1:2]
}

Next, we fit a random-effects meta-analysis using the nine-labs replications.

Random effects meta-analysis:

The model specification was as follows.

Assume that:

  1. \(y_i\) is the observed effect in microvolts in the \(i\)-th study with \(i=1\dots n\);
  2. \(\theta\) is the true (unknown) effect, to be estimated by the model;
  3. \(\sigma_{i}^2\) is the true variance of the sampling distribution; each \(\sigma_i\) is estimated from the standard error available from the study \(i\);
  4. The variance parameter \(\tau^2\) represents between-study variance.

We can construct a random-effects meta-analysis model as follows:

\[\begin{equation} \begin{split} y_i \mid \theta_i,\sigma_i^2 \sim & N(\theta_i, \sigma_i^2) \quad i=1,\dots, n\\ \theta_i\mid \theta,\tau^2 \sim & N(\theta, \tau^2), \\ \theta \sim & N(0,10^2),\\ \tau \sim & N(0,10^2)T(0,) \hbox{(truncated normal)} \\ \end{split} \end{equation}\]

The priors shown above are just examples. Other priors can be used and are probably more suitable in this case. I use Gelman et al 2014’s non-centralized parameterization in any case in the Stan code below because of the sparse data we have here.

## Stan meta-analysis:
library(rstan)
## Loading required package: ggplot2
## Loading required package: StanHeaders
## rstan (Version 2.14.1, packaged: 2017-01-20 19:24:23 UTC, GitRev: 565115df4625)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

dat<-list(y=res[,1],n=length(res[,1]),s=res[,2])

## is this the DeLong et al mean and se? Could add it to meta-analysis if original data ever becomes available.
#dat$y<-c(dat$y,0.4507929)
#dat$s<-c(dat$s,.5)
#dat$n<-dat$n+1

fit <- stan(file='rema2.stan', data=dat,
            iter=2000, chains=4, seed=987654321,
            control = list(adapt_delta = 0.99))

paramnames<-c("mu","tau","theta")
print(fit,pars=paramnames,prob=c(0.025,0.975))
## Inference for Stan model: rema2.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd  2.5% 97.5% n_eff Rhat
## mu       0.11       0 0.07 -0.04  0.25  2153    1
## tau      0.11       0 0.08  0.00  0.31  1496    1
## theta[1] 0.11       0 0.10 -0.09  0.31  4000    1
## theta[2] 0.13       0 0.10 -0.08  0.36  4000    1
## theta[3] 0.10       0 0.11 -0.13  0.33  4000    1
## theta[4] 0.10       0 0.11 -0.13  0.30  4000    1
## theta[5] 0.03       0 0.11 -0.23  0.22  4000    1
## theta[6] 0.06       0 0.13 -0.25  0.27  4000    1
## theta[7] 0.14       0 0.11 -0.05  0.39  4000    1
## theta[8] 0.18       0 0.11  0.00  0.42  4000    1
## theta[9] 0.15       0 0.11 -0.04  0.40  4000    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Apr  8 19:51:28 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
params<-extract(fit,pars=paramnames)

Results

The interesting thing here is the posterior distribution of \(\mu\), the effect of cloze probability on the N400.

## posterior probability that mu is > 0:
mean(params$mu>0)
## [1] 0.9345
mean(params$mu)
## [1] 0.1084961
## lower and upper bounds of credible intervals:
lower<-quantile(params$mu,prob=0.025)
upper<-quantile(params$mu,prob=0.975)
SE<-(upper-lower)/4 ## approximate, assuming symmetry which seems reasonable here
hist(params$mu,main="Posterior distribution of effect \n Article",xlab="mu",freq=F)
abline(v=0)
arrows(x0=lower,y0=1,x1=upper,y1=1,angle=90,code=3)

Thus, even in the Nieuwland et al nine-labs data, the N400 amplitude decreases (becomes more positive) with increasing cloze probability. The posterior probability that the coefficient for cloze is greater than 0 is 94%. From this, one cannot really say that the Nieuwland et al studies are a failure to replicate the DeLong study.

The next step would be to compare the original DeLong et al data using the same analysis done above. But I was unable to get the original data up until now, so I’m posting my comment on my blog.

Prestimulus interval

Mante Nieuwland pointed out to me that the effect can be seen already in the prestimulus interval. Here is the evidence for that.

# models for article fit for each lab separately:
res<-matrix(rep(NA,9*2),ncol=2)
for(i in 1:9){
model <- lmer(prestim ~ zcloze + ( 1| subject) +
                (zcloze|| item), 
              data = subset(articles,lab==labnames[i]),
                control=lmerControl(
                  optCtrl=list(maxfun=1e5)))
res[i,]<-summary(model)$coefficients[2,1:2]
}

dat<-list(y=res[,1],n=length(res[,1]),s=res[,2])

## is this the DeLong et al mean and se? Could add it to meta-analysis if original data ever becomes available.
#dat$y<-c(dat$y,0.4507929)
#dat$s<-c(dat$s,.5)
#dat$n<-dat$n+1

fit <- stan(file='rema2.stan', data=dat,
            iter=2000, chains=4, seed=987654321,
            control = list(adapt_delta = 0.99))

paramnames<-c("mu","tau","theta")
print(fit,pars=paramnames,prob=c(0.025,0.975))
## Inference for Stan model: rema2.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##           mean se_mean   sd  2.5% 97.5% n_eff Rhat
## mu        0.07       0 0.07 -0.06  0.21  1497    1
## tau       0.12       0 0.08  0.01  0.30   897    1
## theta[1]  0.18       0 0.11  0.00  0.41  2632    1
## theta[2] -0.01       0 0.11 -0.26  0.17  2042    1
## theta[3]  0.11       0 0.11 -0.10  0.36  4000    1
## theta[4]  0.13       0 0.10 -0.04  0.34  4000    1
## theta[5]  0.03       0 0.09 -0.16  0.20  4000    1
## theta[6]  0.01       0 0.11 -0.23  0.20  4000    1
## theta[7]  0.03       0 0.10 -0.19  0.22  4000    1
## theta[8]  0.09       0 0.09 -0.08  0.28  4000    1
## theta[9]  0.11       0 0.09 -0.06  0.29  4000    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Apr  8 19:51:33 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
params<-extract(fit,pars=paramnames)

## posterior probability that mu is > 0:
mean(params$mu>0)
## [1] 0.88175
mean(params$mu)
## [1] 0.07359659
## lower and upper bounds of credible intervals:
lower<-quantile(params$mu,prob=0.025)
upper<-quantile(params$mu,prob=0.975)
SE<-(upper-lower)/4 ## approximate, assuming symmetry which seems reasonable here
hist(params$mu,main="Posterior distribution of effect \n Prestimulus interval",xlab="mu",freq=F)
abline(v=0)
arrows(x0=lower,y0=1,x1=upper,y1=1,angle=90,code=3)

Retrospective power calculation

We now compute retrospective power and Type S/M errors:

## Gelman and Carlin function:
retrodesign <- function(A, s, alpha=.05, df=Inf, n.sims=10000){
  z <- qt(1-alpha/2, df)
  p.hi <- 1 - pt(z-A/s, df)
  p.lo <- pt(-z-A/s, df)
  power <- p.hi + p.lo
  typeS <- p.lo/power
  estimate <- A + s*rt(n.sims,df)
  significant <- abs(estimate) > s*z
  exaggeration <- mean(abs(estimate)[significant])/A
  return(list(power=power, typeS=typeS, exaggeration=exaggeration))
}

retrodesign(.11, SE)
## $power
##     97.5% 
## 0.3761173 
## 
## $typeS
##        97.5% 
## 0.0004168559 
## 
## $exaggeration
## [1] 1.616742

The bad news here is that if the meta-analysis posterior mean for the article is the true effect, then power is about 34%.

If we want about 80% power, we need to halve that SE:

retrodesign(.11, 0.072/2)
## $power
## [1] 0.8633715
## 
## $typeS
## [1] 3.063012e-07
## 
## $exaggeration
## [1] 1.079705

I think that implies that you need four times as many subjects than you had.

Acknowledgements

Thanks to Mante Nieuwland and Stephen Politzer-Ahles for patiently explaining a lot of details to me, and for sharing their data and code.

Appendix: Stan code

Here is the meta-analysis Stan code with the Gelman-style reparameterization. y_tilde is not used above, but yields posterior predictive values, and could be used profitably in future replication attempts.

data {
    int<lower=0> n; //number of studies
    real y[n]; // estimated effect
    real<lower=0> s[n]; // SEs of effect
}
parameters{
    real mu; //population mean
    real<lower=0> tau;  // between study variability    
    vector[n] eta; //study level errors
} 
transformed parameters {
    vector[n] theta;   // study effects
    theta = mu + tau*eta;
}
model {
eta ~ normal(0,1);
y ~ normal(theta,s);
}
generated quantities{
vector[n] y_tilde;
for(i in 1:n){
  y_tilde[i] = normal_rng(theta[i],s[i]); 
  }
}

Monday, March 27, 2017

Fitting Bayesian Linear Mixed Models for continuous and binary data using Stan: A quick tutorial

I want to give a quick tutorial on fitting Linear Mixed Models (hierarchical models) with a full variance-covariance matrix for random effects (what Barr et al 2013 call a maximal model) using Stan.

For a longer version of this tutorial, see: Sorensen, Hohenstein, Vasishth, 2016.

Prerequisites: You need to have R and preferably RStudio installed; RStudio is optional. You need to have rstan installed. See here. I am also assuming you have fit lmer models like these before:
lmer(log(rt) ~ 1+RCType+dist+int+(1+RCType+dist+int|subj) + (1+RCType+dist+int|item), dat)
If you don't know what the above code means, first read chapter 4 of my lecture notes.

The code and data format needed to fit LMMs in Stan

The data

I assume you have a 2x2 repeated measures design with some continuous measure like reading time (rt) data and want to do a main effects and interaction contrast coding. Let's say your main effects are RCType and dist, and the interaction is coded as int. All these contrast codings are $\pm 1$. If you don't know what contrast coding is, see these notes and read section 4.3 (although it's best to read the whole chapter). I am using an excerpt of an example data-set from Husain et al. 2014.
"subj" "item" "rt""RCType" "dist" "int"
1       14    438  -1        -1      1
1       16    531   1        -1     -1
1       15    422   1         1      1
1       18   1000  -1        -1      1 
...
Assume that these data are stored in R as a data-frame with name rDat.

The Stan code

Copy the following Stan code into a text file and save it as the file matrixModel.stan. For continuous data like reading times or EEG, you never need to touch this file again. You will only ever specify the design matrix X and the structure of the data. The rest is all taken care of.
data {
  int N;               //no trials
  int P;               //no fixefs
  int J;               //no subjects
  int n_u;             //no subj ranefs
  int K;               //no items
  int n_w;             //no item ranefs
  int subj[N]; //subject indicator
  int item[N]; //item indicator
  row_vector[P] X[N];           //fixef design matrix
  row_vector[n_u] Z_u[N];       //subj ranef design matrix
  row_vector[n_w] Z_w[N];       //item ranef design matrix
  vector[N] rt;                 //reading time
}

parameters {
  vector[P] beta;               //fixef coefs
  cholesky_factor_corr[n_u] L_u;  //cholesky factor of subj ranef corr matrix
  cholesky_factor_corr[n_w] L_w;  //cholesky factor of item ranef corr matrix
  vector[n_u] sigma_u; //subj ranef std
  vector[n_w] sigma_w; //item ranef std
  real sigma_e;        //residual std
  vector[n_u] z_u[J];           //spherical subj ranef
  vector[n_w] z_w[K];           //spherical item ranef
}

transformed parameters {
  vector[n_u] u[J];             //subj ranefs
  vector[n_w] w[K];             //item ranefs
  {
    matrix[n_u,n_u] Sigma_u;    //subj ranef cov matrix
    matrix[n_w,n_w] Sigma_w;    //item ranef cov matrix
    Sigma_u = diag_pre_multiply(sigma_u,L_u);
    Sigma_w = diag_pre_multiply(sigma_w,L_w);
    for(j in 1:J)
      u[j] = Sigma_u * z_u[j];
    for(k in 1:K)
      w[k] = Sigma_w * z_w[k];
  }
}

model {
  //priors
  beta ~ cauchy(0,2.5);
  sigma_e ~ cauchy(0,2.5);
  sigma_u ~ cauchy(0,2.5);
  sigma_w ~ cauchy(0,2.5);
  L_u ~ lkj_corr_cholesky(2.0);
  L_w ~ lkj_corr_cholesky(2.0);
  for (j in 1:J)
    z_u[j] ~ normal(0,1);
  for (k in 1:K)
    z_w[k] ~ normal(0,1);
  //likelihood
  for (i in 1:N)
    rt[i] ~ lognormal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);
}

Define the design matrix

Since we want to test the main effects coded as the columns RCType, dist, and int, our design matrix will look like this:
# Make design matrix
X <- unname(model.matrix(~ 1 + RCType + dist + int, rDat))
attr(X, "assign") <- NULL

Prepare data for Stan

Stan expects the data in a list form, not as a data frame (unlike lmer). So we set it up as follows:
# Make Stan data
stanDat <- list(N = nrow(X),
P = ncol(X),
n_u = ncol(X),
n_w = ncol(X),
X = X,
Z_u = X,
Z_w = X,
J = nlevels(rDat$subj),
K = nlevels(rDat$item),
rt = rDat$rt,
subj = as.integer(rDat$subj),
item = as.integer(rDat$item))

Load library rstan and fit Stan model

library(rstan) 
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# Fit the model
matrixFit <- stan(file = "matrixModel.stan", data = stanDat,
iter = 2000, chains = 4)

Examine posteriors

print(matrixFit)
This print output is overly verbose. I wrote a simple function to get the essential information quickly.
stan_results<-function(m,params=paramnames){
  m_extr<-extract(m,pars=paramnames)
  par_names<-names(m_extr)
  means<-lapply(m_extr,mean)
  quantiles<-lapply(m_extr,
                    function(x)quantile(x,probs=c(0.025,0.975)))
  means<-data.frame(means)
  quants<-data.frame(quantiles)
  summry<-t(rbind(means,quants))
  colnames(summry)<-c("mean","lower","upper")
  summry
}
For example, if I want to see only the posteriors of the four beta parameters, I can write:
stan_results(matrixFit, params=c("beta[1]","beta[2]","beta[3]","beta[4]"))
For more details, such as interpreting the results and computing things like Bayes Factors, see Nicenboim and Vasishth 2016.

FAQ: What if I don't want to fit a lognormal?

In the Stan code above, I assume a lognormal function for the reading times:
 rt[i] ~ lognormal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);
If this upsets you deeply and you want to use a normal distribution (and in fact, for EEG data this makes sense), go right ahead and change the lognormal to normal:
 rt[i] ~ normal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);

FAQ: What if I my dependent measure is binary (0,1) responses?

Use this Stan code instead of the one shown above. Here, I assume that you have a column called response in the data, which has 0,1 values. These are the trial level binary responses.
data {
  int N;               //no trials
  int P;               //no fixefs
  int J;               //no subjects
  int n_u;             //no subj ranefs
  int K;               //no items
  int n_w;             //no item ranefs
  int subj[N]; //subject indicator
  int item[N]; //item indicator
  row_vector[P] X[N];           //fixef design matrix
  row_vector[n_u] Z_u[N];       //subj ranef design matrix
  row_vector[n_w] Z_w[N];       //item ranef design matrix
  int response[N];                 //response
}

parameters {
  vector[P] beta;               //fixef coefs
  cholesky_factor_corr[n_u] L_u;  //cholesky factor of subj ranef corr matrix
  cholesky_factor_corr[n_w] L_w;  //cholesky factor of item ranef corr matrix
  vector[n_u] sigma_u; //subj ranef std
  vector[n_w] sigma_w; //item ranef std
  vector[n_u] z_u[J];           //spherical subj ranef
  vector[n_w] z_w[K];           //spherical item ranef
}

transformed parameters {
  vector[n_u] u[J];             //subj ranefs
  vector[n_w] w[K];             //item ranefs
  {
    matrix[n_u,n_u] Sigma_u;    //subj ranef cov matrix
    matrix[n_w,n_w] Sigma_w;    //item ranef cov matrix
    Sigma_u = diag_pre_multiply(sigma_u,L_u);
    Sigma_w = diag_pre_multiply(sigma_w,L_w);
    for(j in 1:J)
      u[j] = Sigma_u * z_u[j];
    for(k in 1:K)
      w[k] = Sigma_w * z_w[k];
  }
}

model {
  //priors
  beta ~ cauchy(0,2.5);
  sigma_u ~ cauchy(0,2.5);
  sigma_w ~ cauchy(0,2.5);
  L_u ~ lkj_corr_cholesky(2.0);
  L_w ~ lkj_corr_cholesky(2.0);
  for (j in 1:J)
    z_u[j] ~ normal(0,1);
  for (k in 1:K)
    z_w[k] ~ normal(0,1);
  //likelihood
  for (i in 1:N)
    response[i] ~ bernoulli_logit(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]]);
}

For reproducible example code

See here.