Thursday, February 11, 2021

Talk in Tuebingen: Individual differences in cue-weighting in sentence comprehension: An evaluation using Approximate Bayesian Computation

When: Feb 22 2021
Where: Universität Tübingen, Seminar für Sprachwissenschaft
How: Zoom

[This is part of the PhD work of Himanshu Yadav, and the project is led by him. Co-authors: Dario Paape, Garrett Smith, and Brian Dillon.]

Cue-based retrieval theories of sentence processing assume that syntactic dependencies are resolved through a content-addressable search process. An important recent claim is that in certain dependency types, the retrieval cues are weighted such that one cue dominates. This cue-weighting proposal aims to explain the observed average behavior. We show that there is systematic individual-level variation in cue weighting. Using the Lewis and Vasishth cue-based retrieval model, we estimated individual-level parameters for processing speed and cue weighting using data from 13 published reading studies; hierarchical Approximate Bayesian Computation (ABC) with Gibbs sampling was used to estimate the parameters. The modeling reveals a nuanced picture about cue-weighting: we find support for the idea that some participants weight cues, but not all do; and only fast readers tend to have the predicted cue weighting, suggesting that reading proficiency might be associated with cue weighting. A broader achievement of the work is to demonstrate how individual differences can be investigated in computational models of sentence processing using hierarchical ABC.

Tuesday, February 02, 2021

Bayesian statistics: A tutorial taught at Experimental Methods for Language Acquisition research (EMLAR XVII 2021)

Bayesian statistics Taught by Shravan Vasishth ( When: Sometime between 13 and 15 April 2021 Where: Bayesian methods are increasingly becoming mainstream in psychology and psycholinguistics. However, finding an entry point into using these methods is often difficult for researchers. In this tutorial, I will provide an informal introduction to the fundamental ideas behind Bayesian statistics, using examples illustrating applications to psycholinguistics. I will also illustrate some of the advantages of the Bayesian approach over the standardly used frequentist paradigms: uncertainty quantification, robust estimates, the ability to incorporate expert and/or prior knowledge into the data analysis, and the ability to flexibly define the generative process and thereby to directly address the actual research question (as opposed to a straw-man null hypothesis). Suggestions for further readings will be provided. References Bruno Nicenboim, Daniel Schad, and Shravan Vasishth. Introduction to Bayesian Data Analysis for Cognitive Science. 2021. Under contract with Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Series. Daniel J. Schad, Michael Betancourt, and Shravan Vasishth. Towards a principled Bayesian workflow: A tutorial for cognitive science. Psychological Methods, 2020. In Press. Shravan Vasishth, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103:151-175, 2018. Shravan Vasishth, Bruno Nicenboim, Mary E. Beckman, Fangfang Li, and Eun Jong Kong. Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71:141-161, 2018. Bruno Nicenboim and Shravan Vasishth. Statistical methods for linguistic research: Foundational Ideas - Part II. Language and Linguistics Compass, 10:591-613, 2016.

Saturday, January 16, 2021

Applications are open for the fifth summer school in statistical methods for linguistics and psychology (SMLP)

The annual summer school, now in its fifth edition, will happen 6-10 Sept 2021, and will be conducted virtually over zoom. The summer school is free and is funded by the DFG through SFB 1287.
Instructors: Doug Bates, Reinhold Kliegl, Phillip Alday, Bruno Nicenboim, Daniel Schad, Anna Laurinavichyute, Paula Lisson, Audrey Buerki, Shravan Vasishth.
There will be four streams running in parallel: introductory and advances courses on frequentist and Bayesian statistics. Details, including how to apply, are here.

Saturday, January 02, 2021

Should statistical data analysis in psychology be like defecating?

 There was an interesting thread on twitter about linear mixed models (LMMs) that someone made me aware of recently. (I stopped following twitter because of its general inanity, but this thread is worth commenting on.) The gist of the complaints (trying to recreate this list from memory) were. My list is an amalgamation of comments from different people; I think that the thread started here:

To summarize the complaints:

-  LMMs take too long to fit (cf. repeated measures ANOVA). This slows down student output.

- Too much time is spent on thinking about what the right analysis is.

- The interpretation of LMMs can change dramatically depending on which model you fit.

- Reviewers will always object to whatever analysis one does and demand a different one. Often which  analysis one does doesn't matter as regards interpretation.

- The lme4 package exhibits all kinds of weird and unstable behavior. Should we trust its output?

- The focus has shifted away from substantive theoretical issues within psych* to statistical methods, but psych* people cannot be statisticians and can never know enough. This led to the colorful comment that doing statistics should be like taking a crap---it shouldn't become the center of your entire existence.

Indeed, a mathematical psychologist I know, someone who knows what they're doing, once told me that if  you cannot answer your question with a paired t-test, you are asking the wrong question. In fact, if I go back to my existing data-sets that I have published between 2002 and 2020, almost all of them can be reasonably analyzed using a series of paired t-tests. 

There is a presupposition that lies behind the above complaints: the purpose of data analysis is to find out whether an effect is significant or not. Once one understands that that's not the primary purpose of a statistical analysis, things start to make more sense. The problem is that it's just very hard to comprehend this point; this is because the idea of null hypothesis significance testing is very deeply entrenched in our minds. Walking away from it feels impossible. 

Here are some thoughts about the above objections. 

1.  If you want the simplicity of paired t-tests and repeated measures ANOVA, absolutely go for it. But release your data and code, and be open to others analyzing your data differently.  I think it's perfectly fine to spend your entire life doing just paired t-tests and publishing the resulting t and p-values.  Of course,  you are still fitting linear mixed models,  but heavily simplified ones. Sometimes it won't matter whether you fit a complicated model or a simple one, but sometimes it will. It has happened to me that a paired t-test was exactly the wrong thing to do, and I spent a lot of time trying to model the data differently. Should one care about these edge cases? I think this is a subjective decision that each one of us has to make individually. Here is another example of a simple two-condition study where a complicated model that took forever to fit gave new insight into the underlying process generating the data. The problem here comes down to the goal of a statistical analysis. If we accept the premise that statistical significance is the goal, then we should just go ahead and fit that paired t-test. If, instead, the goal is to model the generative process, then you will start losing time. What position you take really depends on what you want to achieve.

2. There is no one right analysis, and reviewers will always object to whatever analysis you present.  The reason that reviewers propose alternative analyses has nothing to do with the inherent flexibility of statistical methods. It has to do with academics being contrarians. I notice this in my own behavior: if my student does X, I want them to do Y!=X. If they do Y, I want them to do X!=Y. I suspect that academics are a self-selected lot, and one thing they are good at is objecting to whatever someone else says or does. So, the fact that reviewers keep asking for different analyses is just the price one has to pay for dealing with academics, it's not an inherent problem with  statistics per se. Notice that reviewers also object to the logic of a paper, and to the writing.  We are so used to dealing with  those things that we don't realize it's the same type of reaction we are seeing to the statistical analyses.

3.  If you want speed and still want to fit linear mixed models, use the right tools. There are plenty of ways to  fit linear mixed models fast. rstanarm, LMMs  in Julia, etc. E.g., Doug Bates, Phillip Alday, and Reinhold Kliegl taught a  one-week course on fitting LMMs super fast in Julia: see here.

4. The interpretation of linear mixed models depends on model specification.  This surprises many people, but the surprise is due to the fact that people have a very incomplete understanding of what they are doing. If you cannot be bothered to study linear mixed modeling theory (understandable, life is short), stick to paired t-tests.

5. lme4's unstable and weird behavior is problematic, but this is not enough reason to abandon linear mixed models.  The weirdness of messages, and the inconsistencies of lme4 are really frustrating, one has to admit that. Perhaps this is the price one has to pay for free software (although, having used non-free software like Word, SPSS, Excel, I'm not so sure there is any advantage). But the fact is that LMMs give you the power to incorporate variance components in a sensible way, and lme4 does the job, if you know what you are doing. Like any other instrument one thinks about using as a professional, if you  can't be bothered to learn to use  it, then just use some simpler method you do know how to use. E.g., I can't use fMRI; I don't have access to the equipment. I'm forced to work with simpler methods, and I have to live with that. If you want more control over your hierarchical models than lme4 provides, learn Stan. E.g., see our chapter on hierarchical models here.

Personally, I think that it is possible to learn enough statistics to be able to use linear mixed models competently; one doesn't need to become a statistician. The curriculum I think one needs in psych and related areas is encapsulated in our summer school on statistical methods, which we run annually at Potsdam. It's a time commitment, but it's worth  it.  I have seen many people go from zero knowledge to fitting sophisticated hierarchical models, so I know that people can learn all this without it taking over their entire life. 

Probably the biggest problem behind all these complaints is the misunderstanding surrounding null hypothesis significance testing. Unfortunately,p-values will rarely tell you anything useful, significant or not, unless you are willing to put in serious time and effort (the very thing people want to avoid doing). So it really not going to matter much whether you compute them using paired t-tests or linear mixed models.

Thursday, December 17, 2020

New paper: The effect of decay and lexical uncertainty on processing long-distance dependencies in reading

The effect of decay and lexical uncertainty on processing long-distance dependencies in reading

Kate Stone, Titus von der Malsburg, Shravan Vasishth

Download here:


 To make sense of a sentence, a reader must keep track of dependent relationships between words, such as between a verb and its particle (e.g. turn the music down). In languages such as German, verb-particle dependencies often span long distances, with the particle only appearing at the end of the clause. This means that it may be necessary to process a large amount of intervening sentence material before the full verb of the sentence is known. To facilitate processing, previous studies have shown that readers can preactivate the lexical information of neighbouring upcoming words, but less is known about whether such preactivation can be sustained over longer distances. We asked the question, do readers preactivate lexical information about long-distance verb particles? In one self-paced reading and one eye tracking experiment, we delayed the appearance of an obligatory verb particle that varied only in the predictability of its lexical identity. We additionally manipulated the length of the delay in order to test two contrasting accounts of dependency processing: that increased distance between dependent elements may sharpen expectation of the distant word and facilitate its processing (an antilocality effect), or that it may slow processing via temporal activation decay (a locality effect). We isolated decay by delaying the particle with a neutral noun modifier containing no information about the identity of the upcoming particle, and no known sources of interference or working memory load. Under the assumption that readers would preactivate the lexical representations of plausible verb particles, we hypothesised that a smaller number of plausible particles would lead to stronger preactivation of each particle, and thus higher predictability of the target. This in turn should have made predictable target particles more resistant to the effects of decay than less predictable target particles. The eye tracking experiment provided evidence that higher predictability did facilitate reading times, but found evidence against any effect of decay or its interaction with predictability. The self-paced reading study provided evidence against any effect of predictability or temporal decay, or their interaction. In sum, we provide evidence from eye movements that readers preactivate long-distance lexical content and that adding neutral sentence information does not induce detectable decay of this activation. The findings are consistent with accounts suggesting that delaying dependency resolution may only affect processing if the intervening information either confirms expectations or adds to working memory load, and that temporal activation decay alone may not be a major predictor of processing time.

Saturday, December 12, 2020

New paper: A Principled Approach to Feature Selection in Models of Sentence Processing

 A Principled Approach to Feature Selection in Models of Sentence Processing

Garrett Smith and Shravan Vasishth

Paper downloadable from:


Among theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt’s eye-tracking data. The features can easily be plugged into existing parsing models (including cue-based retrieval and self-organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.