Shravan Vasishth's Slog (Statistics blog)

Friday, May 27, 2022

Summer School “Methods in Language Sciences” (16-20 August 2022, Ghent, Belgium): Registrations open

I was asked to advertise this summer school (I will be teaching a 2.5 day course on linear mixed modeling, and will give a keynote lecture on the use of Bayesian methods in linguistics/psychology). The text below is from the organizers.

Summer School “Methods in Language Sciences” 2022:

Registrations are open

Top quality research requires outstanding methodological skills. That is why the Department

of Linguistics and the Department of Translation, Interpreting and Communication of Ghent

University will jointly organize the (second edition of the) Summer School “Methods in

Language Sciences” on 16-20 August 2022.

This Summer School is targeted at both junior and senior researchers and offers nine multi-

day modules on various topics, ranging from quantitative to qualitative methods and

covering introductory and advanced statistical analysis, Natural Language Processing

(NLP), eye-tracking, survey design, ethnographic methods, as well as specific tools such

as PRAAT and ELAN. In 2022 we have a new module on Linear Mixed Models. All lecturers

are internationally recognized experts with a strong research and teaching background.

Because the modules will partly be held in parallel sessions, participants have to choose one

or two modules to follow (see the Programme for details). No prerequisite knowledge or

experience is required, except for Modules 2 and 9, which deal with advanced statistical data

analysis.

We are proud to welcome two keynote speakers at this year’s summer school: Shravan

Vasishth and Crispin Thurlow, who both also act as lecturers.

This is your opportunity to take your methodological skills for research in (applied)

linguistics, translation or interpreting studies to the next level. We are looking forward to

meeting you in Ghent!

Saturday, April 16, 2022

Ever wondered how the probability of the null hypothesis being true changes given a significant result?

TRIGGER WARNING: These simulations might fundamentally shake your belief system. USE WITH CARE.

In a recently accepted paper in the open access journal Quantitative Methods for Psychology that Daniel Schad led, we discuss how, using Bayes' rule, one can explore the change in the probability of a null hypothesis being true (call it theta) when you get a significant effect. The paper, which was inspired by a short comment in McElreath's book (first edition), shows that theta does not necessarily change much even if you get a significant result. The probability theta can change dramatically under certain conditions, but those conditions are either so stringent or so trivial that it renders many of the significance-based conclusions in psychology and psycholinguistics questionable at the very least.

You can do your own simulations, under assumptions that you consider more appropriate for your own research problem, using this shiny app (below), or play with the source code: here.

Thursday, March 31, 2022

New(ish) paper: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Here's an important new paper led by Dr. Anna Laurinavichyute on the reproducibility of published analyses. This paper by commissioned by the editor in chief of the Journal of Memory and Language, Kathy Rastle.

Title: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Abstract:

In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. How effective is this policy? We compared 59 papers published before, and 59 papers published after, the policy took effect. After the policy was in place, the rate of data sharing increased by more than 50%. We further looked at whether papers published under the open data policy were reproducible, in the sense that the published results should be possible to regenerate given the data, and given the code, when code was provided. For 8 out of the 59 papers, data sets were inaccessible. The reproducibility rate ranged from 34% to 56%, depending on the reproducibility criteria. The strongest predictor of whether an attempt to reproduce would be successful is the presence of the analysis code: it increases the probability of reproducing reported results by almost 40%. We propose two simple steps that can increase the reproducibility of published papers: share the analysis code, and attempt to reproduce one’s own analysis using only the shared materials.

PDF: here.

Wednesday, March 23, 2022

Short course and keynote on statistical methods at Ghent Summer School on Methods in Language Sciences

I will be teaching an in-person course on linear mixed modeling at the summer school at Ghent (below) August 2022.

The summer school home page: https://www.mils.ugent.be/

1. 2.5 day course: Introduction to linear mixed modelling for linguists

When and where: August 18, 19, 20, 2022 in Ghent.

Prerequisites and target audience

The target audience is graduate students in linguistics.

I assume familiarity with graphical descriptive summaries of data of the type

encountered in linguistics; the most important theoretical distributions

(normal, t, binomial, chi-squared); description of univariate and bivariate data

(mean, variance, standard deviation, correlation, cross-tabulations);

graphical presentation of univariate and bivariate/multivariate data

(bar chart, histogram, boxplot, qq-plot, etc.);

point estimators and confidence intervals for population averages

with normal data or large samples;

null hypothesis significance testing;

t-test, Chi-square test, simple linear regression.

A basic knowledge of R is assumed.

Curriculum:

I will cover some important ideas relating to linear mixed models

and how they can be used in linguistics research. I will loosely follow

my textbook draft: https://vasishth.github.io/Freq_CogSci/

Topics to be covered:

- Linear mixed models: basic theory and applications

- Contrast coding

- Generalized Linear Mixed Models (binomial link)

- Using simulation for power analysis and for understanding one’s model

2. Keynote lecture

Using Bayesian Data Analysis in Language Research

Shravan Vasishth

Bayesian methods are becoming a standard part of the toolkit for
psycholinguists, linguists, and psychologists. This transition has
been sped up by the arrival of easy-to-use software like brms, a
front-end for the probabilistic programming language Stan. In this
talk, I will show how Bayesian analyses differ from frequentist
analogues, focusing on the linear mixed model. I will illustrate the
main advantages of Bayes: a direct, nuanced, and conservative answer
to the research question at hand, flexible model specification, the
ability to incorporate prior knowledge in the model, and a focus on
uncertainty quantification.

References
Daniel J. Schad, Bruno Nicenboim, Paul-Christian Bürkner, Michael
Betancourt, and Shravan Vasishth. Workflow Techniques for the Robust
Use of Bayes Factors. Psychological Methods, 2022.
https://doi.apa.org/doiLanding?doi=10.1037%2Fmet0000472

Shravan Vasishth and Andrew Gelman. How to embrace variation and
accept uncertainty in linguistic and psycholinguistic data analysis.
Linguistics, 59:1311--1342, 2021.
https://www.degruyter.com/document/doi/10.1515/ling-
2019-0051/html

Shravan Vasishth. Some right ways to analyze (psycho)linguistic data.
Submitted, 2022.
https://osf.io/5wzyg/

New paper: Some right ways to analyze (psycho)linguistic data

New paper (under review):

Title: Some right ways to analyze (psycho)linguistic data

Abstract:

Much has been written on the abuse and misuse of statistical methods, including p-values, statistical significance, etc. I present some of the best practices in statistics using a running example data analysis. Focusing primarily on frequentist and Bayesian linear mixed models, I illustrate some defensible ways in which statistical inference—specifically, hypothesis testing using Bayes factors vs. estimation or uncertainty quantification—can be carried out. The key is to not overstate the evidence and to not expect too much from statistics. Along the way, I demonstrate some powerful ideas, the most important ones being using simulation to understand the design properties of one’s experiment before running it, visualizing data before carrying out a formal analysis, and simulating data from the fitted model to understand the model’s behavior.

PDF: https://psyarxiv.com/y54va/

Shravan Vasishth's Slog (Statistics blog)

Search

Friday, May 27, 2022

Summer School “Methods in Language Sciences” (16-20 August 2022, Ghent, Belgium): Registrations open

Saturday, April 16, 2022

Ever wondered how the probability of the null hypothesis being true changes given a significant result?

Thursday, March 31, 2022

New(ish) paper: Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

Wednesday, March 23, 2022

Short course and keynote on statistical methods at Ghent Summer School on Methods in Language Sciences

New paper: Some right ways to analyze (psycho)linguistic data

Blog Archive