Search

Friday, March 26, 2021

Freshly minted professor from our lab: Prof. Dr. Titus von der Malsburg


 One of my first PhD students, Titus von der Malsburg, has just been sworn in as a Professor of Psycholinguistics and Cognitive Modeling (tenure track assistant professor) at the Institute of LinguisticsUniversity of Stuttgart in Germany. Stuttgart is one of the most exciting places to be in Germany for computationally oriented scientists.  

Titus is the eighth professor coming out of my lab.  He does very exciting work in psycholinguistics; check out his work here.

Wednesday, March 17, 2021

New paper: Workflow Techniques for the Robust Use of Bayes Factors

 

Workflow Techniques for the Robust Use of Bayes Factors

Download from: https://arxiv.org/abs/2103.08744

Inferences about hypotheses are ubiquitous in the cognitive sciences. Bayes factors provide one general way to compare different hypotheses by their compatibility with the observed data. Those quantifications can then also be used to choose between hypotheses. While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions. Moreover it's not clear how straightforwardly this approach can be implemented in practice, and in particular how sensitive it is to the details of the computational implementation. Here, we investigate these questions for Bayes factor analyses in the cognitive sciences. We explain the statistics underlying Bayes factors as a tool for Bayesian inferences and discuss that utility functions are needed for principled decisions on hypotheses. Next, we study how Bayes factors misbehave under different conditions. This includes a study of errors in the estimation of Bayes factors. Importantly, it is unknown whether Bayes factor estimates based on bridge sampling are unbiased for complex analyses. We are the first to use simulation-based calibration as a tool to test the accuracy of Bayes factor estimates. Moreover, we study how stable Bayes factors are against different MCMC draws. We moreover study how Bayes factors depend on variation in the data. We also look at variability of decisions based on Bayes factors and how to optimize decisions using a utility function. We outline a Bayes factor workflow that researchers can use to study whether Bayes factors are robust for their individual analysis, and we illustrate this workflow using an example from the cognitive sciences. We hope that this study will provide a workflow to test the strengths and limitations of Bayes factors as a way to quantify evidence in support of scientific hypotheses. Reproducible code is available from this https URL.   


Also see this interesting  twitter thread on this paper by Michael Betancourt:


  

Monday, March 15, 2021

New paper: Is reanalysis selective when regressions are consciously controlled?

A new paper by Dr. Dario Paape; download from herehttps://psyarxiv.com/gnehs 

Abstract

The selective reanalysis hypothesis of Frazier and Rayner (1982) states that readers direct their eyes towards critical words in the sentence when faced with garden-path structures (e.g., Since Jay always jogs a mile seems like a short distance to him). Given the mixed evidence for this proposal in the literature, we investigated the possibility that selective reanalysis is tied to conscious awareness of the garden-path effect. To this end, we adapted the well-known self-paced reading paradigm to allow for regressive as well as progressive key presses. Assuming that regressions in such a paradigm are consciously controlled, we found no evidence for selective reanalysis, but rather for occasional extensive, heterogeneous rereading of garden-path sentences. We discuss the implications of our findings for the selective reanalysis hypothesis, the role of awareness in sentence processing, as well as the usefulness of the bidirectional self-paced reading method for sentence processing research.

Tuesday, March 09, 2021

Talk at Stanford (April 20 2021) Dependency completion in sentence processing: Some recent computational and empirical investigations

Title: Dependency completion in sentence processing: Some recent computational and empirical investigations 
When: April 20, 2021, 9PM German time
Where: zoom.

 Shravan Vasishth (vasishth.github.io) 

Abstract:
 Dependency completion processes in sentence processing have been intensively studied in psycholinguistics (e.g., Gibson 2000). I will discuss some recent work (e.g., Yadav et al. 2021) on computational models of dependency completion as they relate to a class of effects, so-called interference effects (Jäger et al., 2017). Using antecedent-reflexive and subject-verb number dependencies as a case study (Jäger et al., 2020), I will discuss the evidence base for some of the competing theoretical claims relating to these phenomena.  A common thread running through the talk will be that the well-known replication and statistical crisis in psychology and other areas (Nosek et al., 2015, Gelman and Carlin, 2014) is also unfolding in psycholinguistics and needs to be taken seriously (e.g., Vasishth, et al., 2018).

References 

Andrew Gelman and John Carlin (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641-651.

Edward Gibson, (2000). The dependency locality theory: A distance-based theory of linguistic complexity. Image, Language, Brain, 2000, 95-126. 

Lena A. Jäger, Felix Engelmann, and Shravan Vasishth, (2017). Similarity-based interference in sentence comprehension: Literature review and Bayesian meta-analysis. Journal of Memory and Language, 94:316-339. 

Lena A. Jäger, Daniela Mertzen, Julie A. Van Dyke, and Shravan Vasishth, (2020). Interference patterns in subject-verb agreement and reflexives revisited: A large-sample study. Journal of Memory and Language, 111. 

Brian A. Nosek, & Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science349(6251), aac4716-aac4716.

Shravan Vasishth, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman, (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103:151-175. 

Shravan Vasishth and Felix Engelmann, (2021). Sentence comprehension as a cognitive process: A computational approach. Cambridge University Press. In Press.

Himanshu Yadav, Garrett Smith, and Shravan Vasishth, (2021). Feature encoding modulates cue-based retrieval: Modeling interference effects in both grammatical and ungrammatical sentences. Submitted.

Wednesday, March 03, 2021

Talk at Hong Kong Virtual Psycholinguistics Forum (VPF, 心理语言学线上论坛)

I'll be giving at talk at the Chinese University of Hong Kong.
When: 10 March 2021
When: 10AM Berlin time
Where: Zoom:
https://cuhk.zoom.us/j/779556638
https://cuhk.zoom.cn/j/779556638 (mainland China)
Title: Case and Agreement Attraction in Armenian: Experimental and Computational Investigations
Abstract: https://osf.io/3wn79/

Thursday, February 11, 2021

Talk in Tuebingen: Individual differences in cue-weighting in sentence comprehension: An evaluation using Approximate Bayesian Computation

When: Feb 22 2021
Where: Universität Tübingen, Seminar für Sprachwissenschaft
How: Zoom

[This is part of the PhD work of Himanshu Yadav, and the project is led by him. Co-authors: Dario Paape, Garrett Smith, and Brian Dillon.]

Abstract
Cue-based retrieval theories of sentence processing assume that syntactic dependencies are resolved through a content-addressable search process. An important recent claim is that in certain dependency types, the retrieval cues are weighted such that one cue dominates. This cue-weighting proposal aims to explain the observed average behavior. We show that there is systematic individual-level variation in cue weighting. Using the Lewis and Vasishth cue-based retrieval model, we estimated individual-level parameters for processing speed and cue weighting using data from 13 published reading studies; hierarchical Approximate Bayesian Computation (ABC) with Gibbs sampling was used to estimate the parameters. The modeling reveals a nuanced picture about cue-weighting: we find support for the idea that some participants weight cues, but not all do; and only fast readers tend to have the predicted cue weighting, suggesting that reading proficiency might be associated with cue weighting. A broader achievement of the work is to demonstrate how individual differences can be investigated in computational models of sentence processing using hierarchical ABC.

Tuesday, February 02, 2021

Bayesian statistics: A tutorial taught at Experimental Methods for Language Acquisition research (EMLAR XVII 2021)

Bayesian statistics Taught by Shravan Vasishth (vasishth.github.io) When: Sometime between 13 and 15 April 2021 Where: https://emlar.wp.hum.uu.nl/tutorial/bayesian-statistics/ Bayesian methods are increasingly becoming mainstream in psychology and psycholinguistics. However, finding an entry point into using these methods is often difficult for researchers. In this tutorial, I will provide an informal introduction to the fundamental ideas behind Bayesian statistics, using examples illustrating applications to psycholinguistics. I will also illustrate some of the advantages of the Bayesian approach over the standardly used frequentist paradigms: uncertainty quantification, robust estimates, the ability to incorporate expert and/or prior knowledge into the data analysis, and the ability to flexibly define the generative process and thereby to directly address the actual research question (as opposed to a straw-man null hypothesis). Suggestions for further readings will be provided. References Bruno Nicenboim, Daniel Schad, and Shravan Vasishth. Introduction to Bayesian Data Analysis for Cognitive Science. 2021. Under contract with Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Series. https://vasishth.github.io/bayescogsci/ Daniel J. Schad, Michael Betancourt, and Shravan Vasishth. Towards a principled Bayesian workflow: A tutorial for cognitive science. Psychological Methods, 2020. In Press. https://osf.io/b2vx9/ Shravan Vasishth, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103:151-175, 2018. https://www.sciencedirect.com/science/article/pii/S0749596X18300640?via%3Dihub Shravan Vasishth, Bruno Nicenboim, Mary E. Beckman, Fangfang Li, and Eun Jong Kong. Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71:141-161, 2018. https://osf.io/g4zpv/ Bruno Nicenboim and Shravan Vasishth. Statistical methods for linguistic research: Foundational Ideas - Part II. Language and Linguistics Compass, 10:591-613, 2016. https://onlinelibrary.wiley.com/doi/abs/10.1111/lnc3.12207

Saturday, January 16, 2021

Applications are open for the fifth summer school in statistical methods for linguistics and psychology (SMLP)

The annual summer school, now in its fifth edition, will happen 6-10 Sept 2021, and will be conducted virtually over zoom. The summer school is free and is funded by the DFG through SFB 1287.
Instructors: Doug Bates, Reinhold Kliegl, Phillip Alday, Bruno Nicenboim, Daniel Schad, Anna Laurinavichyute, Paula Lisson, Audrey Buerki, Shravan Vasishth.
There will be four streams running in parallel: introductory and advances courses on frequentist and Bayesian statistics. Details, including how to apply, are here.

Saturday, January 02, 2021

Should statistical data analysis in psychology be like defecating?

 There was an interesting thread on twitter about linear mixed models (LMMs) that someone made me aware of recently. (I stopped following twitter because of its general inanity, but this thread is worth commenting on.) The gist of the complaints (trying to recreate this list from memory) were. My list is an amalgamation of comments from different people; I think that the thread started here:


To summarize the complaints:

-  LMMs take too long to fit (cf. repeated measures ANOVA). This slows down student output.

- Too much time is spent on thinking about what the right analysis is.

- The interpretation of LMMs can change dramatically depending on which model you fit.

- Reviewers will always object to whatever analysis one does and demand a different one. Often which  analysis one does doesn't matter as regards interpretation.

- The lme4 package exhibits all kinds of weird and unstable behavior. Should we trust its output?

- The focus has shifted away from substantive theoretical issues within psych* to statistical methods, but psych* people cannot be statisticians and can never know enough. This led to the colorful comment that doing statistics should be like taking a crap---it shouldn't become the center of your entire existence.

Indeed, a mathematical psychologist I know, someone who knows what they're doing, once told me that if  you cannot answer your question with a paired t-test, you are asking the wrong question. In fact, if I go back to my existing data-sets that I have published between 2002 and 2020, almost all of them can be reasonably analyzed using a series of paired t-tests. 

There is a presupposition that lies behind the above complaints: the purpose of data analysis is to find out whether an effect is significant or not. Once one understands that that's not the primary purpose of a statistical analysis, things start to make more sense. The problem is that it's just very hard to comprehend this point; this is because the idea of null hypothesis significance testing is very deeply entrenched in our minds. Walking away from it feels impossible. 

Here are some thoughts about the above objections. 

1.  If you want the simplicity of paired t-tests and repeated measures ANOVA, absolutely go for it. But release your data and code, and be open to others analyzing your data differently.  I think it's perfectly fine to spend your entire life doing just paired t-tests and publishing the resulting t and p-values.  Of course,  you are still fitting linear mixed models,  but heavily simplified ones. Sometimes it won't matter whether you fit a complicated model or a simple one, but sometimes it will. It has happened to me that a paired t-test was exactly the wrong thing to do, and I spent a lot of time trying to model the data differently. Should one care about these edge cases? I think this is a subjective decision that each one of us has to make individually. Here is another example of a simple two-condition study where a complicated model that took forever to fit gave new insight into the underlying process generating the data. The problem here comes down to the goal of a statistical analysis. If we accept the premise that statistical significance is the goal, then we should just go ahead and fit that paired t-test. If, instead, the goal is to model the generative process, then you will start losing time. What position you take really depends on what you want to achieve.

2. There is no one right analysis, and reviewers will always object to whatever analysis you present.  The reason that reviewers propose alternative analyses has nothing to do with the inherent flexibility of statistical methods. It has to do with academics being contrarians. I notice this in my own behavior: if my student does X, I want them to do Y!=X. If they do Y, I want them to do X!=Y. I suspect that academics are a self-selected lot, and one thing they are good at is objecting to whatever someone else says or does. So, the fact that reviewers keep asking for different analyses is just the price one has to pay for dealing with academics, it's not an inherent problem with  statistics per se. Notice that reviewers also object to the logic of a paper, and to the writing.  We are so used to dealing with  those things that we don't realize it's the same type of reaction we are seeing to the statistical analyses.

3.  If you want speed and still want to fit linear mixed models, use the right tools. There are plenty of ways to  fit linear mixed models fast. rstanarm, LMMs  in Julia, etc. E.g., Doug Bates, Phillip Alday, and Reinhold Kliegl taught a  one-week course on fitting LMMs super fast in Julia: see here.

4. The interpretation of linear mixed models depends on model specification.  This surprises many people, but the surprise is due to the fact that people have a very incomplete understanding of what they are doing. If you cannot be bothered to study linear mixed modeling theory (understandable, life is short), stick to paired t-tests.

5. lme4's unstable and weird behavior is problematic, but this is not enough reason to abandon linear mixed models.  The weirdness of messages, and the inconsistencies of lme4 are really frustrating, one has to admit that. Perhaps this is the price one has to pay for free software (although, having used non-free software like Word, SPSS, Excel, I'm not so sure there is any advantage). But the fact is that LMMs give you the power to incorporate variance components in a sensible way, and lme4 does the job, if you know what you are doing. Like any other instrument one thinks about using as a professional, if you  can't be bothered to learn to use  it, then just use some simpler method you do know how to use. E.g., I can't use fMRI; I don't have access to the equipment. I'm forced to work with simpler methods, and I have to live with that. If you want more control over your hierarchical models than lme4 provides, learn Stan. E.g., see our chapter on hierarchical models here.

Personally, I think that it is possible to learn enough statistics to be able to use linear mixed models competently; one doesn't need to become a statistician. The curriculum I think one needs in psych and related areas is encapsulated in our summer school on statistical methods, which we run annually at Potsdam. It's a time commitment, but it's worth  it.  I have seen many people go from zero knowledge to fitting sophisticated hierarchical models, so I know that people can learn all this without it taking over their entire life. 

Probably the biggest problem behind all these complaints is the misunderstanding surrounding null hypothesis significance testing. Unfortunately,p-values will rarely tell you anything useful, significant or not, unless you are willing to put in serious time and effort (the very thing people want to avoid doing). So it really not going to matter much whether you compute them using paired t-tests or linear mixed models.




Thursday, December 17, 2020

New paper: The effect of decay and lexical uncertainty on processing long-distance dependencies in reading

The effect of decay and lexical uncertainty on processing long-distance dependencies in reading

Kate Stone, Titus von der Malsburg, Shravan Vasishth

Download here: https://peerj.com/articles/10438/

Abstract:

 To make sense of a sentence, a reader must keep track of dependent relationships between words, such as between a verb and its particle (e.g. turn the music down). In languages such as German, verb-particle dependencies often span long distances, with the particle only appearing at the end of the clause. This means that it may be necessary to process a large amount of intervening sentence material before the full verb of the sentence is known. To facilitate processing, previous studies have shown that readers can preactivate the lexical information of neighbouring upcoming words, but less is known about whether such preactivation can be sustained over longer distances. We asked the question, do readers preactivate lexical information about long-distance verb particles? In one self-paced reading and one eye tracking experiment, we delayed the appearance of an obligatory verb particle that varied only in the predictability of its lexical identity. We additionally manipulated the length of the delay in order to test two contrasting accounts of dependency processing: that increased distance between dependent elements may sharpen expectation of the distant word and facilitate its processing (an antilocality effect), or that it may slow processing via temporal activation decay (a locality effect). We isolated decay by delaying the particle with a neutral noun modifier containing no information about the identity of the upcoming particle, and no known sources of interference or working memory load. Under the assumption that readers would preactivate the lexical representations of plausible verb particles, we hypothesised that a smaller number of plausible particles would lead to stronger preactivation of each particle, and thus higher predictability of the target. This in turn should have made predictable target particles more resistant to the effects of decay than less predictable target particles. The eye tracking experiment provided evidence that higher predictability did facilitate reading times, but found evidence against any effect of decay or its interaction with predictability. The self-paced reading study provided evidence against any effect of predictability or temporal decay, or their interaction. In sum, we provide evidence from eye movements that readers preactivate long-distance lexical content and that adding neutral sentence information does not induce detectable decay of this activation. The findings are consistent with accounts suggesting that delaying dependency resolution may only affect processing if the intervening information either confirms expectations or adds to working memory load, and that temporal activation decay alone may not be a major predictor of processing time.

Saturday, December 12, 2020

New paper: A Principled Approach to Feature Selection in Models of Sentence Processing

 A Principled Approach to Feature Selection in Models of Sentence Processing

Garrett Smith and Shravan Vasishth

Paper downloadable from: https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.12918

Abstract

Among theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt’s eye-tracking data. The features can easily be plugged into existing parsing models (including cue-based retrieval and self-organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.

Tuesday, November 24, 2020

How to become a professor in Germany---a unique tutorial

How to become a Professor in Germany (Online-Seminar)

For sign-up details, see here: https://www.dhvseminare.de/index.php?module=010700&event=187&catalog_id=3&category_id=15&language_id=

Live-Online-Seminar

This seminar addresses young scientists who are considering a career as a professor at a German university or are already in the middle of an application process for a professorship in Germany. It will give the participants an overview of the career paths to a professorship, covering the legal requirements, the appointment procedure and the legal status of a professor. 

The seminar also addresses how to approach the search for relevant job advertisements, how to prepare the written application documents and how to make a good impression during the further steps in the selection process.
In the second part of the seminar, the participants will receive an overview of the next steps for the successful candidates. This includes the appointment negotiations with German universities, the legal framework and the strategic preparation for those negotiations.

 
Speaker:

RA Dr. Vanessa Adam, Justitiarin für Hochschul- und Arbeitsrecht im Deutschen Hochschulverband

RA (syn.) Katharina Lemke, Justitiarin für Hochschul- und Arbeitsrecht im Deutschen Hochschulverband


Schedule:

09:00-10:00 Career Paths to a Professorship (Fr. Lemke)

   10:00-10:15 Break

10:15-11:45 Application for a Professorship (Fr. Dr. Adam)

   11:45-12:15 Break

12:15-13:15 Negotiations with the University (Legal Framework) (Fr. Lemke)

  13:15-13:30 Break

13:30-14:30 Negotiations with the University (Strategy) (Fr. Dr. Adam)


Included in the price:

Seminar documents in electronic form (via download).


Thursday, November 12, 2020

New paper: A computational evaluation of two models of retrieval processes in sentence processing in aphasia

Here is another important paper from my lab, led by my PhD student Paula Lissón, with a long list of co-authors.

This paper, which also depends heavily on the amazing capabilities of Stan, investigates the quantitative predictions of two competing models of retrieval processes, the cue-based retrieval model of Lewis and Vasishth, and the direct-access model of McElree.  We have done such an investigation before, in a very exciting paper by Bruno Nicenboim, using self-paced reading data from a German number interference experiment

What is interesting about this new paper by Paula is that the data come from individuals with aphasia and control participants. Such data is extremely difficult to collect, and as a result many papers report experimental results from a handful of people with aphasia, sometimes as few as 7 people. This paper has much more data, thanks to the hard work of David Caplan

The big achievements of this paper are that it provides a principled approach  to comparing the two competing models' predictions, and it derives testable predictions (which we are about to evaluate with new data from German individuals with aphasia---watch this space). As is always the case in psycholinguistics, even with this relatively large data-set, there just isn't enough data to draw unequivocal inferences. Our policy in my lab is to be upfront about the ambiguities inherent in the inferences. This kind of ambiguous conclusion tends to upset reviewers, because they expect (rather, demand) big-news results. But big news is, more often than not, just illusions of certainty, noise that looks like a signal (see some of my recent papers in the Journal of Memory and Language). We could easily have over-dramatized the paper and dressed it up to say way more than is warranted by the analyses.  Our goal here was to tell the story with all its uncertainties laid bare. The more papers one can put out there that make more measured claims, with all the limitations laid out openly, the easier it will be for reviewers (and editors!) to learn to accept that one can learn something important from a modeling exercise without necessarily obtaining a decisive result.

Download the paper from here: https://psyarxiv.com/r7dn5

A computational evaluation of two models of retrieval processes in sentence processing in aphasia

Abstract:

Can sentence comprehension impairments in aphasia be explained by difficulties arising from dependency completion processes in parsing? Two distinct models of dependency completion difficulty are investigated, the Lewis and Vasishth (2005) activation-based model, and the direct-access model (McElree, 2000). These models’ predictive performance is compared using data from individuals with aphasia (IWAs) and control participants. The data are from a self-paced listening task involving subject and object relative clauses. The relative predictive performance of the models is evaluated using k-fold cross validation. For both IWAs and controls, the activation model furnishes a somewhat better quantitative fit to the data than the direct-access model. Model comparison using Bayes factors shows that, assuming an activation-based model, intermittent deficiencies may be the best explanation for the cause of impairments in IWAs. This is the first computational evaluation of different models of dependency completion using data from impaired and unimpaired individuals. This evaluation develops a systematic approach that can be used to quantitatively compare the predictions of competing models of language processing.

Wednesday, November 11, 2020

New paper: Modeling misretrieval and feature substitution in agreement attraction: A computational evaluation

This is an important new paper from our lab, led by Dario Paape, and with Serine Avetisyan, Sol Lago, and myself as co-authors. 

One thing that this paper accomplishes is that it showcases the incredible expressive power of Stan, a probabilistic programming language developed by Andrew Gelman and colleagues at Columbia for Bayesian modeling. Stan allows us to implement relatively complex process models of sentence processing and test their performance against data. Paape et al show how we can quantitatively evaluate the predictions of different competing models.  There are plenty of papers out there that test different theories of encoding interference. What's revolutionary about this approach is that one is forced to make a commitment about one's theories; no more vague hand gestures. The limitations of what one can learn from data and from the models is always going to be an issue---one never has enough data, even when people think they do.  But in our paper we are completely upfront about the limitations; and all code and data are available at https://osf.io/ykjg7/ for the reader to look at, investigate, and build upon on their own.

Download the paper from here: https://psyarxiv.com/957e3/

Modeling misretrieval and feature substitution in agreement attraction: A computational evaluation

Abstract

 We present a self-paced reading study investigating attraction effects on number agreement in Eastern Armenian. Both word-by-word reading times and open-ended responses to sentence-final comprehension questions were collected, allowing us to relate reading times and sentence interpretations on a trial-by-trial basis. Results indicate that readers sometimes misinterpret the number feature of the subject in agreement attraction configurations, which is in line with agreement attraction being due to memory encoding errors. Our data also show that readers sometimes misassign the thematic roles of the critical verb. While such a tendency is principally in line with agreement attraction being due to incorrect memory retrievals, the specific pattern observed in our data is not predicted by existing models. We implement four computational models of agreement attraction in a Bayesian framework, finding that our data are better accounted for by an encoding-based model of agreement attraction, rather than a retrieval-based model. A novel contribution of our computational modeling is the finding that the best predictive fit to our data comes from a model that allows number features from the verb to overwrite number features on noun phrases during encoding.

Tuesday, November 10, 2020

Is it possible to write an honest psycholinguistics paper?

I'm teaching a new course this semester: Case Studies in Statistical and Computational Modeling. The idea is to revisit published papers and the associated data and code from the paper, and p-hack the paper creatively to get whatever result you like. Yesterday  I demonstrated that we could conclude whatever we liked from a recent paper that we had published; all conclusions (effect present, effect absent) were valid under different assumptions! The broader goal is to demonstrate how researcher degrees of freedom play out in real life.

Then someone asked me this question in the class:

Is it possible to write an honest psycholinguistics paper? 

The short answer is: yes, but you have to accept that some editors will reject your paper. If you can live with that, it's possible to be completely honest. 

Usually, the  only way to get a paper into a major journal is to make totally overblown claims that are completely unsupported or only very weakly supported by the data. If your p-value is 0.06 but  you want to claim it is significant, you have several options: mess around with the data till you push it below 0.05. Or claim "marginal significance". Or you can bury that result and keep redoing the experiment till it works. Or run the experiment till you get significance. There are plenty of tricks out there.

 If you got super-duper low p-values, you are on a good path to a top publication; in fact, if you have any  significant p-values (relevant to the question or not) you are on a good path to publication, because reviewers are impressed with p<0.05 somewhere, anywhere, in a table. That's why you will see huge tables in psychology articles, with tons and tons of p-values; the sheer force of low p-values spread out   over a gigantic table can convince the  reviewer to accept the paper, even though  only a single cell among dozens or hundreds in that table is actually testing the hypothesis. You can rely on the fact that nobody will think to ask whether power was low (the answer is usually yes), and how many comparisons were done.

Here are some examples of successes and failures, i.e., situations where we honestly reported what we found and were either summarily rejected or (perhaps surprisingly) accepted.

For example, in the following paper, 

Shravan Vasishth, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. The statistical significance filter leads to overoptimistic expectations of replicabilityJournal of Memory and Language, 103:151-175, 2018.

I wrote the following conclusion:

"In conclusion, in this 100-participant study we dont see any grounds for claiming an interaction between Load and Distance. The most that we can conclude is that the data are consistent with memory-based accounts such as the Dependency Locality Theory (Gibson, 2000), which predict increased processing difficulty when subject-verb distance is increased. However, this Distance effect yields estimates that are also consistent with our posited null region; so the evidence for the Distance effect cannot be considered convincing." 

Normally, such a tentative statement would lead to a rejection. E.g., here  is a statement  in another paper that led to a desk rejection (same editor) in the same journal where the above paper was published:

"In sum, taken together, Experiment 1 and 2 furnish some weak evidence for an interference effect, and only at the embedded auxiliary verb."

We published the above (rejected) paper in Cognitive Science instead.

In another example, both the key effects discussed in this paper would   have technically been  non-significant had we done a frequentist analysis.  The fact that we interpreted the Bayesian credible intervals with reference to a model's quantitative predictions doesn't change that detail. However, the paper was accepted:

Lena A. Jäger, Daniela Mertzen, Julie A. Van Dyke, and Shravan Vasishth. Interference patterns in subject-verb agreement and reflexives revisited: A large-sample studyJournal of Memory and Language, 111, 2020.

In the above paper, we were pretty clear about the fact that we didn't manage to achieve high enough power even in our large-sample study: Table A1 shows that for the critical effect we were studying, we probably had power between 25 and 69 percent, which is not dramatically high.

There are many other such examples from my lab, of papers accepted despite tentative claims, and papers rejected because of tentative claims. In spite of the  rejections, my plan is to continue telling the story like it is, with a limitations section. My hope is that editors will eventually understand the following point:

Almost no paper in psycholinguistics is going to give you a decisive result (it doesn't matter what the p-values are). So, rejecting a paper on the grounds that it isn't reporting a conclusive result is based on a misunderstanding about what we learnt from that paper. We almost never have conclusive results, even when  we claim we do. Once people realize that, they will become more comfortable accepting more realistic conclusions from data. 

Wednesday, September 16, 2020

Zoom link for my talk: Twenty years of retrieval models

Here is the zoom registration link to my talk at UMass on Sept 25, 21:30 CEST (15:30 UMass time).
Title: Twenty years of retrieval models
Abstract:
After Newell wrote his 1973 article, "You can't play twenty questions with nature and win", several important cognitive architectures emerged for modeling human cognitive processes across a wide range of phenomena. One of these, ACT-R, has played an important role in the study of memory processes in sentence processing. In this talk, I will talk about some important lessons I have learnt over the last 20 years while trying to evaluate ACT-R based computational models of sentence comprehension. In this connection, I will present some new results from a recent set of sentence processing studies on Eastern Armenian.
Reference: Shravan Vasishth and Felix Engelmann. Sentence comprehension as a cognitive process: A computational approach. 2021. Cambridge University Press. https://vasishth.github.io/RetrievalModels/ Zoom registration link:
You are invited to a Zoom webinar. When: Sep 25, 2020 09:30 PM Amsterdam, Berlin, Rome, Stockholm, Vienna Topic: UMass talk Vasishth
Register in advance for this webinar: https://zoom.us/webinar/register/WN_89F7BObjSwmxnK6DRC9fuQ
After registering, you will receive a confirmation email containing information about joining the webinar.

Tuesday, September 15, 2020

Twenty years of retrieval models: A talk at UMass Linguistics (25 Sept 2020)

I'll be giving a talk at UMass' Linguistics department on 25 September, 2020, over zoom naturally. Talk title and abstract below:
Twenty years of retrieval models
Shravan Vasishth (vasishth.github.io)
After Newell wrote his 1973 article, "You can't play twenty questions with nature and win", several important cognitive architectures emerged for modeling human cognitive processes across a wide range of phenomena. One of these, ACT-R, has played an important role in the study of memory processes in sentence processing. In this talk, I will talk about some important lessons I have learnt over the last 20 years while trying to evaluate ACT-R based computational models of sentence comprehension. In this connection, I will present some new results from a recent set of sentence processing studies on Eastern Armenian.
Reference: Shravan Vasishth and Felix Engelmann. Sentence comprehension as a cognitive process: A computational approach. 2021. Cambridge University Press. https://vasishth.github.io/RetrievalModels/

Monday, September 07, 2020

Registration open for two statistics-related webinars: SMLP Wed 9 Sept, and Fri 11 Sept 2020

As part of the summer school in Statistical Methods for Linguistics and Psychology, we have organized two webinars that anyone can attend. However, registration is required. Details below

Keynote speakers

  • Wed 9 Sept, 5-6PM:Christina Bergmann (Title: The "new" science: transparent, cumulative, and collaborative)
    Register for webinar: here
    Abstract: Transparency, cumulative thinking, and a collaborative mindset are key ingredients for a more robust foundation for experimental studies and theorizing. Empirical sciences have long faced criticism for some of the statistical tools they use and the overall approach to experimentation; a debate that has in the last decade gained momentum in the context of the "replicability crisis." Culprits were quickly identified: False incentives led to "questionable research practices" such as HARKing and p-hacking and single, "exciting" results are over-emphasized. Many solutions are gaining importance, from open data, code, and materials - rewarded with badges - over preregistration to a shift away from focusing on p values. There are a host of options to choose from; but how can we pick the right existing and emerging tools and techniques to improve transparency, aggregate evidence, and work together? I will discuss answers fitting my own work spanning empirical (including large-scale), computational, and meta-scientific studies, with a focus on strategies to see each study for what it is: A single brushstroke of a larger picture.
  • Fri 11 Sept, 5-6PM: Jeff Rouder Title: Robust cognitive modeling
    Register for webinar: here
    Abstract: In the past decade, there has been increased emphasis on the replicability and robustness of effects in psychological science. And more recently, the emphasis has been extended to cognitive process modeling of behavioral data under the rubric of “robust models." Making analyses open and replicable is fairly straightforward; more difficult is understanding what robust models are and how to specify and analyze them. Of particular concern is whether subjectivity is part of robust modeling, and if so, what can be done to guard against undue influence of subjective elements. Indeed, it seems the concept of "researchers' degrees of freedom" plays writ large in modeling. I take the challenge of subjectivity in robust modeling head on. I discuss what modeling does in science, how to specify models that capture theoretical positions, how to add value in analysis, and how to understand the role of subjective specification in drawing substantive inferences. I will extend the notion of robustness to mixed designs and hierarchical models as these are common in real-world experimental settings.

Jeff Rouder's keynote address at AMLaP 2020: Qualitative vs. Quantitative Individual Differences: Implications for Cognitive Control

 For various reasons, Jeff Rouder could not present his keynote address live. 

Here it is as a recording

Qualitative vs. Quantitative Individual Differences: Implications for Cognitive Control

Jeff Rouder (University of Missouri) rouderj@missouri.edu

Consider a task with a well-established effect such as the Stroop effect. In such tasks, there is often a canonical direction of the effect—responses to congruent items are faster than incongruent ones. And with this direction, there are three qualitatively different regions of performance: (a) a canonical effect, (b) no effect, or (c) an opposite or negative effect (for Stroop, responses to incongruent stimuli are faster than responses to congruent ones). Individual differences can be qualitative in that different people may truly occupy different regions; that is, some may have canonical effects while others may have the opposite effect. Or, alternatively, it may only be quantitative in that all people are truly in one region (all people have a true canonical effect). Which of these descriptions holds has two critical implications. The first is theoretical: Those tasks that admit qualitative differences may be more complex and subject to multiple processing pathways or strategies. Those tasks that do not admit qualitative differences may be explained more universally. The second is practical: it may be very difficult to document individual differences in a task or correlate individual differences across task if these tasks do not admit qualitative individual differences. In this talk, I develop trial-level hierarchical models of quantitative and qualitative individual differences and apply these models to cognitive control tasks. Not only is there no evidence for qualitative individual differences, the quantitative individual differences are so small that there is little hope of localizing correlations in true performance among these tasks.