Tuesday, December 17, 2013

lmer vs Stan for a somewhat involved dataset.

Here is a comparison of lmer vs Stan output on a mildly complicated dataset from a psychology expt. (Kliegl et al 2011). The data are here: https://www.dropbox.com/s/pwuz1g7rtwy17p1/KWDYZ_test.rda.

The data and paper available from: http://openscience.uni-leipzig.de/index.php/mr2

I should say that datasets from psychology and psycholinguistic can be much more complicated than this. So this was only a modest test of Stan.

The basic result is that I was able to recover in Stan the parameter estimates (fixed effects) that were primarily of interest, compared to the lmer output. The sds of the variance components all come out pretty much the same in Stan vs lmer. The correlations estimated in Stan are much smaller than lmer, but this is normal: the bayesian models seem to be more conservative when it comes to estimating correlations between random effects.

Traceplots are here: https://www.dropbox.com/s/91xhk7ywpvh9q24/traceplotkliegl2011.pdf

They look generally fine to me.

One very important fact about lmer vs Stan is that lmer took 23 seconds to return an answer, but Stan took 18,814 seconds (about 5 hours), running 500 iterations and 2 chains.

One caveat is that I do have to try to figure out how to speed up Stan so that we get the best performance out of it that is possible.


Bradley Spahn said...

Hi Shravan,

The efficiency of your STAN code aside, I think we shouldn't be surprised that a STAN model took longer than lmer. Any of the generic bayes samplers (BUGS, JAGS, STAN, etc.) will get trounced by well-written model-specific code. While the generic packages try to optimize as best they can, they can't compare to well-written code. I think you'd find similar results if you compared any of the mcmcpack results to a stan model as well.

Titus said...

My limited experience is consistent with your numbers, Shravan: the time it took STAN to converge was about 1000 times longer than what lmer needed for the analogous model. If this can't be improved substantially, STAN might be pretty much useless for many practical applications in our research field. Most of my colleagues are running lmer models that take hours our days to converge. Even a factor 10 slowdown is very difficult to swallow in these situations.

BTW, when I tried to access the traceplot I got a 404.

Shravan Vasishth said...

With Stan 2.5, things have become much, much faster. For psycholinguistics at least, speed is no longer a reason to not use it.