Search

Monday, March 27, 2017

Fitting Bayesian Linear Mixed Models for continuous and binary data using Stan: A quick tutorial

I want to give a quick tutorial on fitting Linear Mixed Models (hierarchical models) with a full variance-covariance matrix for random effects (what Barr et al 2013 call a maximal model) using Stan.

For a longer version of this tutorial, see: Sorensen, Hohenstein, Vasishth, 2016.

Prerequisites: You need to have R and preferably RStudio installed; RStudio is optional. You need to have rstan installed. See here. I am also assuming you have fit lmer models like these before:
lmer(log(rt) ~ 1+RCType+dist+int+(1+RCType+dist+int|subj) + (1+RCType+dist+int|item), dat)
If you don't know what the above code means, first read chapter 4 of my lecture notes.

The code and data format needed to fit LMMs in Stan

The data

I assume you have a 2x2 repeated measures design with some continuous measure like reading time (rt) data and want to do a main effects and interaction contrast coding. Let's say your main effects are RCType and dist, and the interaction is coded as int. All these contrast codings are $\pm 1$. If you don't know what contrast coding is, see these notes and read section 4.3 (although it's best to read the whole chapter). I am using an excerpt of an example data-set from Husain et al. 2014.
"subj" "item" "rt""RCType" "dist" "int"
1       14    438  -1        -1      1
1       16    531   1        -1     -1
1       15    422   1         1      1
1       18   1000  -1        -1      1 
...
Assume that these data are stored in R as a data-frame with name rDat.

The Stan code

Copy the following Stan code into a text file and save it as the file matrixModel.stan. For continuous data like reading times or EEG, you never need to touch this file again. You will only ever specify the design matrix X and the structure of the data. The rest is all taken care of.
data {
  int N;               //no trials
  int P;               //no fixefs
  int J;               //no subjects
  int n_u;             //no subj ranefs
  int K;               //no items
  int n_w;             //no item ranefs
  int subj[N]; //subject indicator
  int item[N]; //item indicator
  row_vector[P] X[N];           //fixef design matrix
  row_vector[n_u] Z_u[N];       //subj ranef design matrix
  row_vector[n_w] Z_w[N];       //item ranef design matrix
  vector[N] rt;                 //reading time
}

parameters {
  vector[P] beta;               //fixef coefs
  cholesky_factor_corr[n_u] L_u;  //cholesky factor of subj ranef corr matrix
  cholesky_factor_corr[n_w] L_w;  //cholesky factor of item ranef corr matrix
  vector[n_u] sigma_u; //subj ranef std
  vector[n_w] sigma_w; //item ranef std
  real sigma_e;        //residual std
  vector[n_u] z_u[J];           //spherical subj ranef
  vector[n_w] z_w[K];           //spherical item ranef
}

transformed parameters {
  vector[n_u] u[J];             //subj ranefs
  vector[n_w] w[K];             //item ranefs
  {
    matrix[n_u,n_u] Sigma_u;    //subj ranef cov matrix
    matrix[n_w,n_w] Sigma_w;    //item ranef cov matrix
    Sigma_u = diag_pre_multiply(sigma_u,L_u);
    Sigma_w = diag_pre_multiply(sigma_w,L_w);
    for(j in 1:J)
      u[j] = Sigma_u * z_u[j];
    for(k in 1:K)
      w[k] = Sigma_w * z_w[k];
  }
}

model {
  //priors
  beta ~ cauchy(0,2.5);
  sigma_e ~ cauchy(0,2.5);
  sigma_u ~ cauchy(0,2.5);
  sigma_w ~ cauchy(0,2.5);
  L_u ~ lkj_corr_cholesky(2.0);
  L_w ~ lkj_corr_cholesky(2.0);
  for (j in 1:J)
    z_u[j] ~ normal(0,1);
  for (k in 1:K)
    z_w[k] ~ normal(0,1);
  //likelihood
  for (i in 1:N)
    rt[i] ~ lognormal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);
}

Define the design matrix

Since we want to test the main effects coded as the columns RCType, dist, and int, our design matrix will look like this:
# Make design matrix
X <- unname(model.matrix(~ 1 + RCType + dist + int, rDat))
attr(X, "assign") <- NULL

Prepare data for Stan

Stan expects the data in a list form, not as a data frame (unlike lmer). So we set it up as follows:
# Make Stan data
stanDat <- list(N = nrow(X),
P = ncol(X),
n_u = ncol(X),
n_w = ncol(X),
X = X,
Z_u = X,
Z_w = X,
J = nlevels(rDat$subj),
K = nlevels(rDat$item),
rt = rDat$rt,
subj = as.integer(rDat$subj),
item = as.integer(rDat$item))

Load library rstan and fit Stan model

library(rstan) 
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# Fit the model
matrixFit <- stan(file = "matrixModel.stan", data = stanDat,
iter = 2000, chains = 4)

Examine posteriors

print(matrixFit)
This print output is overly verbose. I wrote a simple function to get the essential information quickly.
stan_results<-function(m,params=paramnames){
  m_extr<-extract(m,pars=paramnames)
  par_names<-names(m_extr)
  means<-lapply(m_extr,mean)
  quantiles<-lapply(m_extr,
                    function(x)quantile(x,probs=c(0.025,0.975)))
  means<-data.frame(means)
  quants<-data.frame(quantiles)
  summry<-t(rbind(means,quants))
  colnames(summry)<-c("mean","lower","upper")
  summry
}
For example, if I want to see only the posteriors of the four beta parameters, I can write:
stan_results(matrixFit, params=c("beta[1]","beta[2]","beta[3]","beta[4]"))
For more details, such as interpreting the results and computing things like Bayes Factors, see Nicenboim and Vasishth 2016.

FAQ: What if I don't want to fit a lognormal?

In the Stan code above, I assume a lognormal function for the reading times:
 rt[i] ~ lognormal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);
If this upsets you deeply and you want to use a normal distribution (and in fact, for EEG data this makes sense), go right ahead and change the lognormal to normal:
 rt[i] ~ normal(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]], sigma_e);

FAQ: What if I my dependent measure is binary (0,1) responses?

Use this Stan code instead of the one shown above. Here, I assume that you have a column called response in the data, which has 0,1 values. These are the trial level binary responses.
data {
  int N;               //no trials
  int P;               //no fixefs
  int J;               //no subjects
  int n_u;             //no subj ranefs
  int K;               //no items
  int n_w;             //no item ranefs
  int subj[N]; //subject indicator
  int item[N]; //item indicator
  row_vector[P] X[N];           //fixef design matrix
  row_vector[n_u] Z_u[N];       //subj ranef design matrix
  row_vector[n_w] Z_w[N];       //item ranef design matrix
  int response[N];                 //response
}

parameters {
  vector[P] beta;               //fixef coefs
  cholesky_factor_corr[n_u] L_u;  //cholesky factor of subj ranef corr matrix
  cholesky_factor_corr[n_w] L_w;  //cholesky factor of item ranef corr matrix
  vector[n_u] sigma_u; //subj ranef std
  vector[n_w] sigma_w; //item ranef std
  vector[n_u] z_u[J];           //spherical subj ranef
  vector[n_w] z_w[K];           //spherical item ranef
}

transformed parameters {
  vector[n_u] u[J];             //subj ranefs
  vector[n_w] w[K];             //item ranefs
  {
    matrix[n_u,n_u] Sigma_u;    //subj ranef cov matrix
    matrix[n_w,n_w] Sigma_w;    //item ranef cov matrix
    Sigma_u = diag_pre_multiply(sigma_u,L_u);
    Sigma_w = diag_pre_multiply(sigma_w,L_w);
    for(j in 1:J)
      u[j] = Sigma_u * z_u[j];
    for(k in 1:K)
      w[k] = Sigma_w * z_w[k];
  }
}

model {
  //priors
  beta ~ cauchy(0,2.5);
  sigma_u ~ cauchy(0,2.5);
  sigma_w ~ cauchy(0,2.5);
  L_u ~ lkj_corr_cholesky(2.0);
  L_w ~ lkj_corr_cholesky(2.0);
  for (j in 1:J)
    z_u[j] ~ normal(0,1);
  for (k in 1:K)
    z_w[k] ~ normal(0,1);
  //likelihood
  for (i in 1:N)
    response[i] ~ bernoulli_logit(X[i] * beta + Z_u[i] * u[subj[i]] + Z_w[i] * w[item[i]]);
}

For reproducible example code

See here.

9 comments:

Unknown said...

I don't understand why you use the same variable simultaneously in both the fixed and the random effects.

Shravan Vasishth said...

Do you mean these lines when specifying the data as a list?

...
X = X,
Z_u = X,
Z_w = X,
...


Here, I am assuming that we want to fit a full variance covariance matrix for the "random effects", i.e., all factors that appear in the fixed effects specification in the design matrix X appear in the random effects design matrices. This may not necessarily be what one needs in specific situations. In those cases you should define the random effects design matrices differently, the way you need them to be.

skan said...

Hello.

I mean
lmer(log(rt) ~ 1+RCType+dist+int+(1+RCType+dist+int|subj) + (1+RCType+dist+int|item), dat)

As you can see "1+RCType+dist+int" appear as fixed factors and also as a random factor depending on subject and also as a random factor depending on the item.

Maybe it's right but I have never seen it before.
Usually you see some variables as fixed and some as random but not both altogether.

Shravan Vasishth said...

Hi Skan, you should perhaps read Barr et al 2013: http://www.sciencedirect.com/science/article/pii/S0749596X12001180


I'm assuming the reader of this blog knows that paper.

Unknown said...

Dear Dr. Shravan,

Thank you for sharing such a useful guide. I have a few questions about your stan/R code (I am still new to Bayesian modelling so please ignore them if you think they do not make sense.

1. This is not related to bayesian modelling, but I always see you use sum coding rather than deviation coding while other psycholinguists often use deviation coding. As long as I know, they do not differ that much in terms of the produced result, but is there any reason why you persist with sum coding?

2. Does your stan code include interactions between each random effect and fixed effect (e.g., lmer(log(rt) ~ 1+RCType*dist*int+(1+RCType*dist*int|subj) + (1+RCType*dist*int|item), dat))?

3. I have read some of your papers about how to fit bayesian liner mixed effect models but could not find what to do with 2-way or 3 way interactions. As your paper suggested, I plan to buy a textbook, Kruschke's book, to learn more about theoretical backgrounds behind bayesian statistics, but would also like to know which book/article I should read to learn practical uses of bayesian modelling (or maybe his/her book includes this issue?).

Thank you.

Sincerely yours

Unknown said...

this is just a report, I tried your code in this blog with your data in your github repository (copied and pasted) but I just received an error message and stan did not start sampling...

Shravan Vasishth said...

藤田宏樹 , I will post on contrast coding at some point. Regarding your error message, you'd have to give more information.

Anonymous said...

Hi Dr. Shravan,

I have been following your posts and twitter feed about low powered studies and I can't seem to come to a conclusion. I work with people with aphasia and recruiting is extremely difficult. I will be lucky to get 20 over two years so how does one deal with that issue. I haven't found any extensive treatment of dealing with difficult to recruit subjects and statisticians just give different answers. I have been learning Bayesian methods but I think the problem persists. What sort of guidance would you offer that our results are replicable and how can one justify low sample size to both grant committees and journals.

Thank you.

Shravan Vasishth said...

Anonymous, with human subjects' behavioral data, replicability is a pipe dream, except in some special cases where the effects are strong (e.g., Stroop effects). The best you can do is quantify your uncertainty about the parameter estimates, and eventually, once enough studies are out there, do some evidence synthesis (meta-analysis). See my recent paper, where I discuss this point: https://psyarxiv.com/zcf8s/