Shravan Vasishth's Slog (Statistics blog): reproducibility

Showing posts with label reproducibility. Show all posts

Thursday, June 02, 2022

New paper in Journal of Memory and Language: Share the code, not just the data

Here is an important paper for the field of psycholionguistics that just came out in JML. It is led by Dr. Anna Laurinavichyute and was commissioned by the editor of JML (Prof. Kathy Rastle).

Download here: https://doi.org/10.1016/j.jml.2022.104332

Friday, May 27, 2022

Summer School “Methods in Language Sciences” (16-20 August 2022, Ghent, Belgium): Registrations open

I was asked to advertise this summer school (I will be teaching a 2.5 day course on linear mixed modeling, and will give a keynote lecture on the use of Bayesian methods in linguistics/psychology). The text below is from the organizers.

Summer School “Methods in Language Sciences” 2022:

Registrations are open

Top quality research requires outstanding methodological skills. That is why the Department

of Linguistics and the Department of Translation, Interpreting and Communication of Ghent

University will jointly organize the (second edition of the) Summer School “Methods in

Language Sciences” on 16-20 August 2022.

This Summer School is targeted at both junior and senior researchers and offers nine multi-

day modules on various topics, ranging from quantitative to qualitative methods and

covering introductory and advanced statistical analysis, Natural Language Processing

(NLP), eye-tracking, survey design, ethnographic methods, as well as specific tools such

as PRAAT and ELAN. In 2022 we have a new module on Linear Mixed Models. All lecturers

are internationally recognized experts with a strong research and teaching background.

Because the modules will partly be held in parallel sessions, participants have to choose one

or two modules to follow (see the Programme for details). No prerequisite knowledge or

experience is required, except for Modules 2 and 9, which deal with advanced statistical data

analysis.

We are proud to welcome two keynote speakers at this year’s summer school: Shravan

Vasishth and Crispin Thurlow, who both also act as lecturers.

This is your opportunity to take your methodological skills for research in (applied)

linguistics, translation or interpreting studies to the next level. We are looking forward to

meeting you in Ghent!

Saturday, April 16, 2022

Ever wondered how the probability of the null hypothesis being true changes given a significant result?

TRIGGER WARNING: These simulations might fundamentally shake your belief system. USE WITH CARE.

In a recently accepted paper in the open access journal Quantitative Methods for Psychology that Daniel Schad led, we discuss how, using Bayes' rule, one can explore the change in the probability of a null hypothesis being true (call it theta) when you get a significant effect. The paper, which was inspired by a short comment in McElreath's book (first edition), shows that theta does not necessarily change much even if you get a significant result. The probability theta can change dramatically under certain conditions, but those conditions are either so stringent or so trivial that it renders many of the significance-based conclusions in psychology and psycholinguistics questionable at the very least.

You can do your own simulations, under assumptions that you consider more appropriate for your own research problem, using this shiny app (below), or play with the source code: here.

Thursday, March 31, 2022

New(ish) paper: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Here's an important new paper led by Dr. Anna Laurinavichyute on the reproducibility of published analyses. This paper by commissioned by the editor in chief of the Journal of Memory and Language, Kathy Rastle.

Title: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Abstract:

In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. How effective is this policy? We compared 59 papers published before, and 59 papers published after, the policy took effect. After the policy was in place, the rate of data sharing increased by more than 50%. We further looked at whether papers published under the open data policy were reproducible, in the sense that the published results should be possible to regenerate given the data, and given the code, when code was provided. For 8 out of the 59 papers, data sets were inaccessible. The reproducibility rate ranged from 34% to 56%, depending on the reproducibility criteria. The strongest predictor of whether an attempt to reproduce would be successful is the presence of the analysis code: it increases the probability of reproducing reported results by almost 40%. We propose two simple steps that can increase the reproducibility of published papers: share the analysis code, and attempt to reproduce one’s own analysis using only the shared materials.

PDF: here.

Wednesday, March 23, 2022

New paper in Computational Brain and Behavior: Sample size determination in Bayesian Linear Mixed Models

We've just had a paper accepted in Computational Brain and Behavior, an open access journal of the Society for Mathematical Psychology.

Even though I am not a psychologist, I feel an increasing affinity to this field compared to psycholinguistics proper. I will be submitting more of my papers to this journal and other open access journals (Glossa Psycholx, Open Mind in particular) in the future.

Some things I liked about this journal:

- A fast and well-informed, intelligent, useful set of reviews. The reviewers actually understand what they are talking about! It's refreshing to find people out there who speak my language (and I don't mean English or Hindi). Also, the reviewers signed their reviews. This doesn't usually happen.

- Free availability of the paper after publication; I didn't have to do anything to make this happen. By contrast, I don't even have copies of my own articles published in APA journals. The same goes for Elsevier journals like the Journal of Memory and Language. Either I shell out $$$ to make the paper open access, or I learn to live with the arXiv version of my paper.

- The proofing was *excellent*. By contrast, the Journal of Memory and Language adds approximately 500 mistakes into my papers every time they publish it (then we have to correct them, if we catch them at all). E.g., in this paper we had to issue a correction about a German example; this error was added by the proofer! Another surprising example of JML actually destroying our paper's formatting is this one; here, the arXiv version has better formatting than the published paper, which cost several thousand Euros!

- LaTeX is encouraged. By contrast, APA journals demand that papers be submitted in W**d.

Here is the paper itself: here, we present an approach, adapted from the work of two statisticians (Wang and Gelfand), for determining approximate sample size needed for drawing meaningful inferences using Bayes factors in hierarchical models (aka linear mixed models). The example comes from a psycholinguistic study but the method is general. Code and data are of course available online.

The pdf: https://link.springer.com/article/10.1007/s42113-021-00125-y

Tuesday, December 14, 2021

New paper in Computational Brain and Behavior: Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics

We have just had a paper accepted in the journal Computational Brain and Behavior. This is part of a special issue that responds to the following paper on linear mixed models:
van Doorn, J., Aust, F., Haaf, J.M. et al. Bayes Factors for Mixed Models. Computational Brain and Behavior (2021). https://doi.org/10.1007/s42113-021-00113-2
There are quite a few papers in that special issue, all worth reading, but I especially liked the contribution by Singmann et al: Statistics in the Service of Science: Don't let the Tail Wag the Dog (https://psyarxiv.com/kxhfu/) They make some very good points in reaction to van Doorn et al's paper.

Our paper: Shravan Vasishth, Himanshu Yadav, Daniel J. Schad, and Bruno Nicenboim. Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics. Computational Brain and Behavior, 2021.
Abstract: We discuss an important issue that is not directly related to the main theses of the van Doorn et al. (2021) paper, but which frequently comes up when using Bayesian linear mixed models: how to determine sample size in advance of running a study when planning a Bayes factor analysis. We adapt a simulation-based method proposed by Wang and Gelfand (2002) for a Bayes-factor based design analysis, and demonstrate how relatively complex hierarchical models can be used to determine approximate sample sizes for planning experiments.
Code and data: https://osf.io/hjgrm/
pdf: here

Friday, November 12, 2021

Book: Sentence comprehension as a cognitive process: A computational approach (Vasishth and Engelmann)

My book with Felix Engelmann has just been published. It puts together in one place 20 years of research on retrieval models, carried out by my students, colleagues, and myself.

Thursday, September 30, 2021

New paper on the reproducibility of JML articles (2019-21) after the open data policy was introduced

New paper by Anna Laurinavichyute and me:

The (ir)reproducibility of published analyses: A case study of 57 JML articles published between 2019 and 2021

Download from: https://psyarxiv.com/hf297/

Shravan Vasishth's Slog (Statistics blog)

Search

Thursday, June 02, 2022

New paper in Journal of Memory and Language: Share the code, not just the data

Friday, May 27, 2022

Summer School “Methods in Language Sciences” (16-20 August 2022, Ghent, Belgium): Registrations open

Saturday, April 16, 2022

Ever wondered how the probability of the null hypothesis being true changes given a significant result?

Thursday, March 31, 2022

New(ish) paper: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Wednesday, March 23, 2022

New paper in Computational Brain and Behavior: Sample size determination in Bayesian Linear Mixed Models

Tuesday, December 14, 2021

New paper in Computational Brain and Behavior: Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics

Friday, November 12, 2021

Book: Sentence comprehension as a cognitive process: A computational approach (Vasishth and Engelmann)

Thursday, September 30, 2021

New paper on the reproducibility of JML articles (2019-21) after the open data policy was introduced

Blog Archive

Search

Thursday, June 02, 2022

Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy

Friday, May 27, 2022

Saturday, April 16, 2022

Thursday, March 31, 2022

Wednesday, March 23, 2022

Tuesday, December 14, 2021

Friday, November 12, 2021

Thursday, September 30, 2021

Blog Archive