Loading web-font TeX/Math/Italic

Search

Tuesday, December 17, 2013

lmer vs Stan for a somewhat involved dataset.

Here is a comparison of lmer vs Stan output on a mildly complicated dataset from a psychology expt. (Kliegl et al 2011). The data are here: https://www.dropbox.com/s/pwuz1g7rtwy17p1/KWDYZ_test.rda.

The data and paper available from: http://openscience.uni-leipzig.de/index.php/mr2

I should say that datasets from psychology and psycholinguistic can be much more complicated than this. So this was only a modest test of Stan.

The basic result is that I was able to recover in Stan the parameter estimates (fixed effects) that were primarily of interest, compared to the lmer output. The sds of the variance components all come out pretty much the same in Stan vs lmer. The correlations estimated in Stan are much smaller than lmer, but this is normal: the bayesian models seem to be more conservative when it comes to estimating correlations between random effects.

Traceplots are here: https://www.dropbox.com/s/91xhk7ywpvh9q24/traceplotkliegl2011.pdf

They look generally fine to me.

One very important fact about lmer vs Stan is that lmer took 23 seconds to return an answer, but Stan took 18,814 seconds (about 5 hours), running 500 iterations and 2 chains.

One caveat is that I do have to try to figure out how to speed up Stan so that we get the best performance out of it that is possible.

Details:
key:
beta[1] = Intercept
beta[2] = c1
beta[3] = c2
beta[4] = c3
1. Fixed effects, Stan vs lmer:
Stan:
Predictors:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 381.1 3.5 7.0 367.5 376.2 381.1 385.9 394.1 4 1.4
beta[2] 31.6 0.8 3.5 23.9 29.7 32.0 34.1 37.9 17 1.2
beta[3] 14.0 0.1 2.2 9.9 12.5 13.8 15.4 18.3 292 1.0
beta[4] 2.9 0.1 2.0 -1.3 1.5 3.0 4.2 7.0 251 1.0
lmer:
Estimate Std. Error t value
beta[1] 389.73 7.15 54.5
beta[2] 33.78 3.31 10.2
beta[3] 13.98 2.32 6.0
beta[4] 2.75 2.22 1.2
2. Random effects, Stan vs lmer:
Stan: Var sd
# id (Intercept) 3152.9 56.151
# c1 531.6 23.056 0.5054
# c2 89.3 9.4499 -0.01583 0.11337
# c3 77.5 8.8034 -0.094069 -0.4715 0.028849
# Residual 4886 69.9
lmer: Var sd
# id (Intercept) 3098.3 55.66
# c1 550.8 23.47 0.603
# c2 121.0 11.00 -0.129 -0.014
# c3 93.2 9.65 -0.247 -0.846 0.376
# Residual 4877.1 69.84
## calculations of correlations from Stan output:
r12: 654.3 / (56.151 * 23.056) = 0.5054 = r12
r13: -8.4 / (56.151 * 9.4499) = -0.01583 = r13
r14: -46.5 / (56.151 * 8.8034) = -0.094069 = r14
r23: 24.7 / (23.056 * 9.4499) = 0.11337 = r23
r24: -95.7 / (23.056 * 8.8034) = -0.4715 = r24
r34: 2.4 / (9.4499 * 8.8034) = 0.028849 = r34
## original stan output:
Sigma[1,1] 3152.9 32.4 589.5 2197.9 2729.1 3090.3 3522.1 4432.8 331 1.0
Sigma[1,2] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0
Sigma[1,3] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0
Sigma[1,4] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0
Sigma[2,1] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0
Sigma[2,2] 531.6 6.4 109.0 357.4 455.4 517.5 599.2 780.8 289 1.0
Sigma[2,3] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0
Sigma[2,4] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0
Sigma[3,1] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0
Sigma[3,2] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0
Sigma[3,3] 89.3 11.6 53.1 11.5 50.0 85.7 120.6 207.4 21 1.1
Sigma[3,4] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0
Sigma[4,1] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0
Sigma[4,2] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0
Sigma[4,3] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0
Sigma[4,4] 77.5 9.7 44.5 11.2 45.5 68.6 102.4 175.3 21 1.1
#LMER fit: time taken: 23 seconds
# Groups Name Variance Std.Dev. Corr
# id (Intercept) 3098.3 55.66
# c1 550.8 23.47 0.603
# c2 121.0 11.00 -0.129 -0.014
# c3 93.2 9.65 -0.247 -0.846 0.376
# Residual 4877.1 69.84
#Number of obs: 28710, groups: id, 61
#
#Fixed effects:
# Estimate Std. Error t value
#(Intercept) 389.73 7.15 54.5
#c1 33.78 3.31 10.2
#c2 13.98 2.32 6.0
#c3 2.75 2.22 1.2
##Stan: time taken: 18,814 seconds (5.2261 hours)
> fit2 <- stan(model_code = klieglvaryingintslopes_codefast,
+ data = dat2,
+ iter = 500, chains = 2)
TRANSLATING MODEL 'klieglvaryingintslopes_codefast' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'klieglvaryingintslopes_codefast' NOW.
SAMPLING FOR MODEL 'klieglvaryingintslopes_codefast' NOW (CHAIN 1).
Iteration: 500 / 500 [100%] (Sampling)
Elapsed Time: 8255.26 seconds (Warm-up)
1650.07 seconds (Sampling)
9905.33 seconds (Total)
SAMPLING FOR MODEL 'klieglvaryingintslopes_codefast' NOW (CHAIN 2).
Iteration: 500 / 500 [100%] (Sampling)
Elapsed Time: 8062.87 seconds (Warm-up)
845.716 seconds (Sampling)
8908.59 seconds (Total)
> print(fit2)
Inference for Stan model: klieglvaryingintslopes_codefast.
2 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=500.
The model code:
klieglvaryingintslopes_codefast <- 'data {
int<lower=1> N;
real rt[N]; //outcome
real c1[N]; //predictor
real c2[N]; //predictor
real c3[N]; //predictor
int<lower=1> I; //number of subjects
int<lower=1, upper=I> id[N]; //subject id
vector[4] mu_prior; //vector of zeros passed in from R
}
parameters {
vector[4] beta; // intercept and slope
vector[4] u[I]; // random intercept and slopes
real<lower=0> sigma_e; // residual sd
vector<lower=0>[4] sigma_u; // subj sd
corr_matrix[4] Omega; // correlation matrix for random intercepts and slopes
}
transformed parameters {
matrix[4,4] D;
D <- diag_matrix(sigma_u);
}
model {
matrix[4,4] L;
matrix[4,4] DL;
real mu[N]; // mu for likelihood
//priors:
beta ~ normal(0,50);
sigma_e ~ cauchy(0,2);
sigma_u ~ cauchy(0,2);
Omega ~ lkj_corr(4.0);
L <- cholesky_decompose(Omega);
DL <- D * L;
for (i in 1:I) // loop for subj random effects
u[i] ~ multi_normal_cholesky(mu_prior, DL);
for (n in 1:N) {
mu[n] <- beta[1] + beta[2]*c1[n] + beta[3]*c2[n] + beta[4]*c3[n] + u[id[n], 1] + u[id[n], 2]*c1[n] + u[id[n], 3]*c2[n] + u[id[n], 4]*c3[n];
}
rt ~ normal(mu,sigma_e); // likelihood
}
generated quantities {
cov_matrix[4] Sigma;
Sigma <- D * Omega * D;
}
'
view raw gistfile1.txt hosted with ❤ by GitHub


Monday, December 16, 2013

The most common linear mixed models in psycholinguistics, using JAGS and Stan

As part of my course in bayesian data analysis, I have put up some common linear mixed models that we fit in psycholinguistics. These are written in JAGS and Stan. Comments and suggestions for improvement are most welcome.

Code: http://www.ling.uni-potsdam.de/~vasishth/lmmexamplecode.txt
Data: http://www.ling.uni-potsdam.de/~vasishth/data/gibsonwu2012data.txt

Tuesday, October 08, 2013

New course on bayesian data analysis for psycholinguistics

I decided to teach a basic course on bayesian data analysis with a focus on psycholinguistics. Here is the course website (below). How could this possibly be a bad idea!

http://www.ling.uni-potsdam.de/~vasishth/advanceddataanalysis.html

Friday, March 15, 2013

How are the random effects (BLUPs) `predicted' in linear mixed models?




In linear mixed models, we fit models like these (the Ware-Laird formulation--see Pinheiro and Bates 2000, for example):

\begin{equation} Y = X\beta + Zu + \epsilon \end{equation}


Let u\sim N(0,\sigma_u^2), and this is independent from \epsilon\sim N(0,\sigma^2).

Given Y, the ``minimum mean square error predictor'' of u is the conditional expectation:

\begin{equation} \hat{u} = E(u\mid Y) \end{equation}


We can find E(u\mid Y) as follows. We write the joint distribution of Y and u as:

\begin{equation} \begin{pmatrix} Y \\ u \end{pmatrix} = N\left( \begin{pmatrix} X\beta\\ 0 \end{pmatrix}, \begin{pmatrix} V_Y & C_{Y,u}\\ C_{u,Y} & V_u \\ \end{pmatrix} \right) \end{equation}


V_Y, C_{Y,u}, C_{u,Y}, V_u are the various variance-covariance matrices.
It is a fact (need to track this down) that

\begin{equation} u\mid Y \sim N(C_{u,Y}V_Y^{-1}(Y-X\beta)), Y_u - C_{u,Y} V_Y^{-1} C_{Y,u}) \end{equation}


This apparently allows you to derive the BLUPs:

\begin{equation} \hat{u}= C_{u,Y}V_Y^{-1}(Y-X\beta)) \end{equation}


Substituting \hat{\beta} for \beta, we get:

\begin{equation} BLUP(u)= \hat{u}(\hat{\beta})C_{u,Y}V_Y^{-1}(Y-X\hat{\beta})) \end{equation}


Here is a working example:




> # Calculate the predicted random effects by hand for the ergoStool data
> (fm1<-lmer(effort~Type-1 + (1|Subject),ergoStool))
Linear mixed model fit by REML
Formula: effort ~ Type - 1 + (1 | Subject)
Data: ergoStool
AIC BIC logLik deviance REMLdev
133 143 -60.6 122 121
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1.78 1.33
Residual 1.21 1.10
Number of obs: 36, groups: Subject, 9
Fixed effects:
Estimate Std. Error t value
TypeT1 8.556 0.576 14.8
TypeT2 12.444 0.576 21.6
TypeT3 10.778 0.576 18.7
TypeT4 9.222 0.576 16.0
Correlation of Fixed Effects:
TypeT1 TypeT2 TypeT3
TypeT2 0.595
TypeT3 0.595 0.595
TypeT4 0.595 0.595 0.595
>
> ## Here are the BLUPs we will estimate by hand:
> ranef(fm1)
$Subject
(Intercept)
1 1.7088e+00
2 1.7088e+00
3 4.2720e-01
4 -8.5439e-01
5 -1.4952e+00
6 -1.3546e-14
7 4.2720e-01
8 -1.7088e+00
9 -2.1360e-01
>
> ## this gives us all the variance components:
> VarCorr(fm1)
$Subject
(Intercept)
(Intercept) 1.7755
attr(,"stddev")
(Intercept)
1.3325
attr(,"correlation")
(Intercept)
(Intercept) 1
attr(,"sc")
[1] 1.1003
>
> # First, calculate the predicted random effect for subject 1:
>
> ## The variance for the random effect subject is the term C_{u,Y}:
> covar.u.y<-VarCorr(fm1)$Subject[1]
>
>
> # Estimated covariance between u_1 and Y_1
> ## make up a var-covar matrix from this:
> (cov.u.Y<-matrix(covar.u.y,1,4))
[,1] [,2] [,3] [,4]
[1,] 1.7755 1.7755 1.7755 1.7755
>
>
> # Estimated variance matrix for Y_1
> (V.Y<-matrix(1.7755,4,4)+diag(1.2106,4,4))
[,1] [,2] [,3] [,4]
[1,] 2.9861 1.7755 1.7755 1.7755
[2,] 1.7755 2.9861 1.7755 1.7755
[3,] 1.7755 1.7755 2.9861 1.7755
[4,] 1.7755 1.7755 1.7755 2.9861
>
> # Extract observations for subject 1
> (Y<-matrix(ergoStool$effort[1:4],4,1))
[,1]
[1,] 12
[2,] 15
[3,] 12
[4,] 10
>
> # Estimated fixed effects
> (beta.hat<-matrix(fixef(fm1),4,1))
[,1]
[1,] 8.5556
[2,] 12.4444
[3,] 10.7778
[4,] 9.2222
>
> # Predicted random effect
> cov.u.Y %*% solve(V.Y)%*%(Y-beta.hat)
[,1]
[1,] 1.7087
>
> # Compare with ranef command
> ranef(fm1)$Subject[1,1]
[1] 1.7088
>
> # Calculate predicted random effects for all subjects
> t(cov.u.Y %*% solve(V.Y)%*%(matrix(ergoStool$effort,4,9)-matrix(fixef(fm1),4,9)))
[,1]
[1,] 1.7087e+00
[2,] 1.7087e+00
[3,] 4.2717e-01
[4,] -8.5435e-01
[5,] -1.4951e+00
[6,] -1.3906e-14
[7,] 4.2717e-01
[8,] -1.7087e+00
[9,] -2.1359e-01
> ranef(fm1)
$Subject
(Intercept)
1 1.7088e+00
2 1.7088e+00
3 4.2720e-01
4 -8.5439e-01
5 -1.4952e+00
6 -1.3546e-14
7 4.2720e-01
8 -1.7088e+00
9 -2.1360e-01
view raw blups.R hosted with ❤ by GitHub



Correlations of fixed effects in linear mixed models

Ever wondered what those correlations are in a linear mixed model? For example:

> (lm.full<-lmer(wear~material-1+(1|Subject), data = BHHshoes))
Linear mixed model fit by REML
Formula: wear ~ material - 1 + (1 | Subject)
Data: BHHshoes
AIC BIC logLik deviance REMLdev
62.9 66.9 -27.5 53.8 54.9
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 6.1009 2.470
Residual 0.0749 0.274
Number of obs: 20, groups: Subject, 10
Fixed effects:
Estimate Std. Error t value
materialA 10.630 0.786 13.5
materialB 11.040 0.786 14.1
Correlation of Fixed Effects:
matrlA
materialB 0.988

The estimated correlation between \hat{\beta}_1 and \hat{\beta}_2 is 0.988.  Note that

\hat{\beta}_1 = (Y_{1,1} + Y_{2,1} + \dots + Y_{10,1})/10=10.360

and 

\hat{\beta}_2 = (Y_{1,2} + Y_{2,2} + \dots + Y_{10,2})/10 = 11.040

From this we can recover the correlation 0.988 as follows:

> b1.vals<-subset(BHHshoes,material=="A")$wear
> b2.vals<-subset(BHHshoes,material=="B")$wear
>
> vcovmatrix<-var(cbind(b1.vals,b2.vals))
>
> covar<-vcovmatrix[1,2]
> sds<-sqrt(diag(vcovmatrix))
> covar/(sds[1]*sds[2])
b1.vals
0.98823

By comparison, in the linear model version of the above:

> summary(lm<-lm(wear~material-1,BHHshoes))
> X<-model.matrix(lm)
> 2.49^2*solve(t(X)%*%X)
materialA materialB
materialA 0.62001 0.00000
materialB 0.00000 0.62001

because Var(\hat{\beta}) = \hat{\sigma}^2 (X^T X)^{-1}.

Wednesday, January 23, 2013

Linear models summary sheet

As part of my long slog towards statistical understanding, I started making notes on the very specific topic of linear models. The details are tricky and hard to keep in mind, and it is difficult to go back and forth between books and notes to try to review them. So I tried to summarize the basic ideas into a few pages (the summary sheet is not yet complete).

It's not quite a cheat sheet, so I call it a summary sheet.

Here is the current version:

https://github.com/vasishth/StatisticsNotes

Needless to say (although I feel compelled to so it), the document is highly derivative of lecture notes I've been reading. Corrections and comments and/or suggestions for improvement are most welcome.