Shravan Vasishth's Slog (Statistics blog)

Thursday, August 16, 2018

How to capitalize on a priori contrasts in linear (mixed) models: A tutorial

We wrote a short tutorial on contast coding, covering the common contrast coding scenarios, among them: treatment, helmert, anova, sum, and sliding (successive differences) contrasts. The target audience is psychologists and linguists, but really it is for anyone doing planned experiments.

The paper has not been submitted anywhere yet. We are keen to get user feedback before we do that. Comments and criticism very welcome. Please post comments on this blog, or email me.

Abstract:

Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. This is unfortunate as researchers typically have specific hypotheses about which condition means differ from each other. A priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be "tested instead of, rather than as a supplement to, the ordinary `omnibus' F test" (Hayes, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, Helmert, and polynomial contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the weight coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more precise confirmatory tests of experimental hypotheses than standard omnibus F-test.

Full paper: https://arxiv.org/abs/1807.10451

Thursday, July 26, 2018

Stan Pharmacometrics conference in Paris July 24 2018

I just got back from attending this amazing conference in Paris:

http://www.go-isop.org/stan-for-pharmacometrics---paris-france

A few people were disturbed/surprised by the fact that I am linguist ("what are you doing at an pharmacometrics conference?"). I hasten to point out that two of the core developers of Stan are linguists too (Bob Carpenter and Mitzi Morris). People seem to think that all linguists do is correct other people's comma placements. However, despite my being a total outsider to the conference, the organizers were amazingly welcoming, and even allowed me to join in the speaker's dinner, and treated me like a regular guest.

Here is a quick summary of what I learnt:

1. Gelman's talk: The only thing I remember from his talk was the statement that when economists fit multiple regression models and find that one predictor's formerly significant effect was wiped out by adding another predictor, they think that the new predictor explains the old predictor. Which is pretty funny. Another funny thing was that he had absolutely no slides, and was drawing figures in the air, and apologizing for the low resolution of the figures.

2. Bob Carpenter gave an inspiring talk on the exciting stuff that's coming in Stan:

- Higher Speeds (Stan 2.10 will be 80 times faster with a 100 cores)

- Stan book

- New functionality (e.g., tuples, multivariate normal RNG)

- Gaussian process models will soon become tractable

- Blockless Stan is coming! This will make Stan code look more like JAGS (which is great). Stan will forever remain backward compatible so old code will not break.

My conclusion was that in the next few years, things will improve a lot in terms of speed and in terms of what one can do.

3. Torsten and Stan:

- Torsten seems to be a bunch of functions to do PK/PD modeling with Stan.

- Bill Gillespie on Torsten and Stan: https://www.metrumrg.com/wp-content/uploads/2018/05/BayesianPmetricsMBSW2018.pdf

- Free courses on Stan and PK/PK modeling: https://www.metrumrg.com/courses/

4. Mitzi Morris gave a great talk on disease mapping (accident mapping in NYC) using conditional autoregressive models (CAR). The idea is simple but great: one can model the correlations between neighboring boroughs. A straightforward application is in EEG, modeling data from all electrodes simultaneously, and modeling the decreasing correlation between neighbors. This is low-hanging fruit, esp. with Stan 2.18 coming.

5. From Bob I learnt that one should never provide free consultation (I am doing that these days), because people don't value your time then. If you charge them by the hour, this sharpens their focus. But I feel guilty charging people for my time, especially in medicine, where I provide free consulting: I'm a civil servant and already get paid by the state, and I get total freedom to do whatever I like. So it seems only fair that I serve the state in some useful way (other than studying processing differences in subject vs object relative clauses, that is).

For psycholinguists, there is a lot of stuff in pharmacometrics that will be important for EEG and visual world data: Gaussian process models, PK/PD modeling, spatial+temporal modeling of a signal like EEG. These tools exist today but we are not using them. And Stan makes a lot of this possible now or very soon now.

Summary: I'm impressed.

Friday, June 01, 2018

Soliciting comments on paper

I welcome comments and criticism on the following paper:

Title: The statistical significance filter leads to overoptimistic expectations of replicability
Authors: Vasishth, Mertzen, Jäger, Gelman

Abstract: It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overop- timistic expectations of replicability. These overoptimistic expectations arise due to Type M(agnitude) error: when underpowered studies yield significant results, effect size estimates are guaranteed to be exaggerated and noisy. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving best practices.

You can download the pdf from here: https://osf.io/eyphj/