Here is a comparison of lmer vs Stan output on a mildly complicated dataset from a psychology expt. (Kliegl et al 2011). The data are here: https://www.dropbox.com/s/pwuz1g7rtwy17p1/KWDYZ_test.rda.
The data and paper available from: http://openscience.uni-leipzig.de/index.php/mr2
I should say that datasets from psychology and psycholinguistic can be much more complicated than this. So this was only a modest test of Stan.
The basic result is that I was able to recover in Stan the parameter estimates (fixed effects) that were primarily of interest, compared to the lmer output. The sds of the variance components all come out pretty much the same in Stan vs lmer. The correlations estimated in Stan are much smaller than lmer, but this is normal: the bayesian models seem to be more conservative when it comes to estimating correlations between random effects.
Traceplots are here: https://www.dropbox.com/s/91xhk7ywpvh9q24/traceplotkliegl2011.pdf
They look generally fine to me.
One very important fact about lmer vs Stan is that lmer took 23 seconds to return an answer, but Stan took 18,814 seconds (about 5 hours), running 500 iterations and 2 chains.
One caveat is that I do have to try to figure out how to speed up Stan so that we get the best performance out of it that is possible.
This blog is a repository of cool things relating to statistical computing, simulation and stochastic modeling.
Search
Tuesday, December 17, 2013
Monday, December 16, 2013
The most common linear mixed models in psycholinguistics, using JAGS and Stan
As part of my course in bayesian data analysis, I have put up some common linear mixed models that we fit in psycholinguistics. These are written in JAGS and Stan. Comments and suggestions for improvement are most welcome.
Code: http://www.ling.uni-potsdam.de/~vasishth/lmmexamplecode.txt
Data: http://www.ling.uni-potsdam.de/~vasishth/data/gibsonwu2012data.txt
Code: http://www.ling.uni-potsdam.de/~vasishth/lmmexamplecode.txt
Data: http://www.ling.uni-potsdam.de/~vasishth/data/gibsonwu2012data.txt
Tuesday, October 08, 2013
New course on bayesian data analysis for psycholinguistics
I decided to teach a basic course on bayesian data analysis with a focus on psycholinguistics. Here is the course website (below). How could this possibly be a bad idea!
http://www.ling.uni-potsdam.de/~vasishth/advanceddataanalysis.html
http://www.ling.uni-potsdam.de/~vasishth/advanceddataanalysis.html
Friday, March 15, 2013
How are the random effects (BLUPs) `predicted' in linear mixed models?
In linear mixed models, we fit models like these (the Ware-Laird formulation--see Pinheiro and Bates 2000, for example):
\begin{equation}
Y = X\beta + Zu + \epsilon
\end{equation}
Let $u\sim N(0,\sigma_u^2)$, and this is independent from $\epsilon\sim N(0,\sigma^2)$.
Given $Y$, the ``minimum mean square error predictor'' of $u$ is the conditional expectation:
\begin{equation}
\hat{u} = E(u\mid Y)
\end{equation}
We can find $E(u\mid Y)$ as follows. We write the joint distribution of $Y$ and $u$ as:
\begin{equation}
\begin{pmatrix}
Y \\
u
\end{pmatrix}
=
N\left(
\begin{pmatrix}
X\beta\\
0
\end{pmatrix},
\begin{pmatrix}
V_Y & C_{Y,u}\\
C_{u,Y} & V_u \\
\end{pmatrix}
\right)
\end{equation}
$V_Y, C_{Y,u}, C_{u,Y}, V_u$ are the various variance-covariance matrices.
It is a fact (need to track this down) that
\begin{equation}
u\mid Y \sim N(C_{u,Y}V_Y^{-1}(Y-X\beta)),
Y_u - C_{u,Y} V_Y^{-1} C_{Y,u})
\end{equation}
This apparently allows you to derive the BLUPs:
\begin{equation}
\hat{u}= C_{u,Y}V_Y^{-1}(Y-X\beta))
\end{equation}
Substituting $\hat{\beta}$ for $\beta$, we get:
\begin{equation}
BLUP(u)= \hat{u}(\hat{\beta})C_{u,Y}V_Y^{-1}(Y-X\hat{\beta}))
\end{equation}
Here is a working example:
Correlations of fixed effects in linear mixed models
Ever wondered what those correlations are in a linear mixed model? For example:
The estimated correlation between $\hat{\beta}_1$ and $\hat{\beta}_2$ is $0.988$. Note that
$\hat{\beta}_1 = (Y_{1,1} + Y_{2,1} + \dots + Y_{10,1})/10=10.360$
and
$\hat{\beta}_2 = (Y_{1,2} + Y_{2,2} + \dots + Y_{10,2})/10 = 11.040$
From this we can recover the correlation $0.988$ as follows:
By comparison, in the linear model version of the above:
because $Var(\hat{\beta}) = \hat{\sigma}^2 (X^T X)^{-1}$.
Wednesday, January 23, 2013
Linear models summary sheet
As part of my long slog towards statistical understanding, I started making notes on the very specific topic of linear models. The details are tricky and hard to keep in mind, and it is difficult to go back and forth between books and notes to try to review them. So I tried to summarize the basic ideas into a few pages (the summary sheet is not yet complete).
It's not quite a cheat sheet, so I call it a summary sheet.
Here is the current version:
https://github.com/vasishth/StatisticsNotes
Needless to say (although I feel compelled to so it), the document is highly derivative of lecture notes I've been reading. Corrections and comments and/or suggestions for improvement are most welcome.
It's not quite a cheat sheet, so I call it a summary sheet.
Here is the current version:
https://github.com/vasishth/StatisticsNotes
Needless to say (although I feel compelled to so it), the document is highly derivative of lecture notes I've been reading. Corrections and comments and/or suggestions for improvement are most welcome.
Subscribe to:
Posts (Atom)