Search

Showing posts with label Bayesian data analysis. Show all posts
Showing posts with label Bayesian data analysis. Show all posts

Tuesday, March 21, 2023

Himanshu Yadav, PhD

Today Himanshu defended his dissertation. His dissertation consists of three published papers.

1. Himanshu Yadav, Garrett Smith, Sebastian Reich, and Shravan Vasishth. Number feature distortion modulates cue-based retrieval in reading. Journal of Memory and Language, 129, 2023.

2. Himanshu Yadav, Dario Paape, Garrett Smith, Brian W. Dillon, and Shravan Vasishth. Individual differences in cue weighting in sentence comprehension: An evaluation using Approximate Bayesian Computation. Open Mind, 2022

3. Himanshu Yadav, Garrett Smith, Daniela Mertzen, Ralf Engbert, and Shravan Vasishth Proceedings of the Annual Meeting of the Cognitive Science Society, 44, 2022.

Congratulations to Himanshu for his truly outstanding work!



Tuesday, March 07, 2023

Job opening: Postdoc position, starting 1 Oct 2023 (Vasishth lab, University of Potsdam, Germany)

I am looking for a postdoc working in sentence processing (psycholinguistics); the position is at the TV-L 13 salary level. This is a teaching + research position in my lab (vasishth.github.io) in the University of Potsdam, Germany. The planned start date is 1st October 2023, and the initial appointment (following a six-month probationary period) is three years; this can be extended following a positive evaluation.

Principal tasks:

- Teaching two 90 minute classes to undergraduates and graduate students every semester. We teach courses on frequentist and Bayesian statistics, the foundations of mathematics for non-STEM students entering an MSc program in the linguistics department, psycholinguistics (reviews of current research), introductions to psycholinguistics and to experimental methodology.

- Carrying out and publishing research on sentence processing (computational modeling and/or experimental work (e.g., eye-tracking, ERP, self-paced reading). For examples of our research, see: https://vasishth.github.io/publications.html.

- Participation in lab discussions and research collaborations.

Qualifications that you should have:

- A PhD in linguistics, psychology, or some related discipline. In exceptional circumstances, I will consider a prospective PhD student (with a full postdoc salary) who is willing to teach as well as do a PhD with me.

- Published scientific work.

- A background in sentence comprehension research (modeling or experimental or both).

- A solid quantitative background (basic fluency in mathematics and statistical computing at the level needed for statistical modeling and data analysis in psycholinguistics).

An ability to teach in German is desirable but not necessary. A high level of English fluency is expected, especially in writing.

The University and the linguistics department:

The University of Potsdam's Linguistics department is located in Golm, which is a suburb of the city Potsdam, and which can be reached within 40 minutes or so from Berlin through a direct train connection. The linguistics department has a broad focus on almost all areas relating to linguistics (syntax, morphology, semantics, phonetics/phonology, language acquisition, sentence comprehension and production, computational linguistics). The research is highly interdisciplinary, involving collaborations with psychology and mathematics, among other areas. We are a well-funded lab, with projects in a collaborative research grant (SFB 1287) on variability, as well as through individual grants.

The research focus of my lab:

Our lab currently consists of six postdocs (see here), and three guest professors who work closely with lab members.  We work mostly on models of sentence comprehension, developing both implemented computational models as well as doing experimental work to evaluate these models. 

Historically, our postdocs have been very successful in getting professorships: Lena Jäger, Sol Lago, Titus von der Malsburg, Daniel Schad, João Veríssimo, Samar Husain, Bruno Nicenboim. One of our graduates, Felix Engelmann, has his own start-up in Berlin.

For representative recent work from our lab, see:

Himanshu Yadav, Garrett Smith, Sebastian Reich, and Shravan Vasishth. Number feature distortion modulates cue-based retrieval in readingJournal of Memory and Language, 129, 2023.

Shravan Vasishth and Felix Engelmann. Sentence Comprehension as a Cognitive Process: A Computational Approach. Cambridge University Press, Cambridge, UK, 2022.

Dario Paape and Shravan Vasishth. Estimating the true cost of garden-pathing: A computational model of latent cognitive processesCognitive Science, 46:e13186, 2022.

Daniel J. Schad, Bruno Nicenboim, Paul-Christian Bürkner, Michael Betancourt, and Shravan Vasishth. Workflow Techniques for the Robust Use of Bayes FactorsPsychological Methods, 2022.

Bruno Nicenboim, Shravan Vasishth, and Frank Rösler. Are words pre-activated probabilistically during sentence comprehension? Evidence from new data and a Bayesian random-effects meta-analysis using publicly available dataNeuropsychologia, 142, 2020.

How to apply:

To apply, please send me an email (vasishth@uni-potsdam.de) with subject line "Postdoc position 2023", attaching a CV, a one-page statement of interest (research and teaching), copies of any publications (including the dissertation), and names of two or three referees that I can contact. The application period remains open until filled, but I hope to make a decision by end-July 2023 at the latest.


Monday, January 30, 2023

Introduction to Bayesian Data Analysis: Video lectures now available on youtube

These recordings are part of a set of videos that are available from the free four-week online course Introduction to Bayesian Data Analysis, taught over the openhpi.de portal.

Tuesday, October 04, 2022

Applications open: The Seventh Summer School on Statistical Methods for Linguistics and Psychology, 11-15 September 2023

Applications are open (till 1st April 2023( for the seventh summer school on statistical methods for linguistics and psychology, to be held in Potsdam, Germany.

Summer school website: https://vasishth.github.io/smlp2023/

Some of the highlights

1. Four parallel courses on frequentist and Bayesian methods (introductory/intermediate and advanced)

2. A special short course on Bayesian meta-analysis by Dr. Robert Grant of bayescamp.

3. You can also do this free, completely online four-week course on Introduction to Bayesian Data Analysis (starts Jan 2023): https://open.hpi.de/courses/bayesian-statistics2023 






 

Friday, May 27, 2022

Summer School “Methods in Language Sciences” (16-20 August 2022, Ghent, Belgium): Registrations open

I was asked to advertise this summer school (I will be teaching a 2.5 day course on linear mixed modeling, and will give a keynote lecture on the use of Bayesian methods in linguistics/psychology). The text below is from the organizers.

Summer School “Methods in Language Sciences” 2022:
Registrations are open

Top quality research requires outstanding methodological skills. That is why the Department
of Linguistics and the Department of Translation, Interpreting and Communication of Ghent
University will jointly organize the (second edition of the) Summer School “Methods in
Language Sciences” on 16-20 August 2022.

This Summer School is targeted at both junior and senior researchers and offers nine multi-
day modules on various topics, ranging from quantitative to qualitative methods and
covering introductory and advanced statistical analysis, Natural Language Processing
(NLP), eye-tracking, survey design, ethnographic methods, as well as specific tools such
as PRAAT and ELAN. In 2022 we have a new module on Linear Mixed Models. All lecturers
are internationally recognized experts with a strong research and teaching background.

Because the modules will partly be held in parallel sessions, participants have to choose one
or two modules to follow (see the  Programme  for details). No prerequisite knowledge or
experience is required, except for Modules 2 and 9, which deal with advanced statistical data
analysis.

We are proud to welcome two keynote speakers at this year’s summer school: Shravan
Vasishth and Crispin Thurlow, who both also act as lecturers.

This is your opportunity to take your methodological skills for research in (applied)
linguistics, translation or interpreting studies to the next level. We are looking forward to
meeting you in Ghent!

Saturday, April 16, 2022

Ever wondered how the probability of the null hypothesis being true changes given a significant result?

TRIGGER WARNING: These simulations might fundamentally shake your belief system. USE WITH CARE.

In a recently accepted paper in the open access journal Quantitative Methods for Psychology that Daniel Schad led, we discuss how, using Bayes' rule, one can explore the change in the probability of a null hypothesis being true (call it theta) when you get a significant effect. The paper, which was inspired by a short comment in McElreath's book (first edition), shows that theta does not necessarily change much even if you get a significant result. The probability theta can change dramatically under certain conditions, but those conditions are either so stringent or so trivial that it renders many of the significance-based conclusions in psychology and psycholinguistics questionable at the very least.

You can do your own simulations, under assumptions that you consider more appropriate for your own research problem, using this shiny app (below), or play with the source code: here.


  

Wednesday, March 23, 2022

Short course and keynote on statistical methods at Ghent Summer School on Methods in Language Sciences


I will be teaching an in-person course on linear mixed modeling at the summer school at Ghent (below) August 2022.

The summer school home page: https://www.mils.ugent.be/


1. 2.5 day course: Introduction to linear mixed modelling for linguists

When and where: August 18, 19, 20, 2022 in Ghent.

 Prerequisites and target audience

The target audience is graduate students in linguistics.

I assume familiarity with graphical descriptive summaries of data of the type

encountered in linguistics; the most important theoretical distributions 

(normal, t, binomial, chi-squared); description of univariate and bivariate data

(mean, variance, standard deviation, correlation, cross-tabulations);

graphical presentation of univariate and bivariate/multivariate data

(bar chart, histogram, boxplot, qq-plot, etc.);

point estimators and confidence intervals for population averages

with normal data or large samples;

null hypothesis significance testing;

t-test, Chi-square test, simple linear regression.

A basic knowledge of R is assumed.

Curriculum:

I will cover some important ideas relating to linear mixed models

and how they can be used in linguistics research. I will loosely follow

my textbook draft: https://vasishth.github.io/Freq_CogSci/

Topics to be covered: 

- Linear mixed models: basic theory and applications

- Contrast coding

- Generalized Linear Mixed Models (binomial link)

- Using simulation for power analysis and for understanding one’s model


2. Keynote lecture

 Using Bayesian Data Analysis in Language Research

Shravan Vasishth

Bayesian methods are becoming a standard part of the toolkit for
psycholinguists, linguists, and psychologists. This transition has
been sped up by the arrival of easy-to-use software like brms, a
front-end for the probabilistic programming language Stan. In this
talk, I will show how Bayesian analyses differ from frequentist
analogues, focusing on the linear mixed model. I will illustrate the
main advantages of Bayes: a direct,  nuanced, and conservative answer
to the research question at hand, flexible model specification, the
ability to incorporate prior knowledge in the model, and a focus on
uncertainty quantification.

References
Daniel J. Schad, Bruno Nicenboim, Paul-Christian Bürkner, Michael
Betancourt, and Shravan Vasishth. Workflow Techniques for the Robust
Use of Bayes Factors. Psychological Methods, 2022.
https://doi.apa.org/doiLanding?doi=10.1037%2Fmet0000472

Shravan Vasishth and Andrew Gelman. How to embrace variation and
accept uncertainty in linguistic and psycholinguistic data analysis.
Linguistics, 59:1311--1342, 2021.
https://www.degruyter.com/document/doi/10.1515/ling-
2019-0051/html

Shravan Vasishth. Some right ways to analyze (psycho)linguistic data.
Submitted, 2022.
https://osf.io/5wzyg/

New paper: Some right ways to analyze (psycho)linguistic data

 New paper (under review): 

Title: Some right ways to analyze (psycho)linguistic data

Abstract:

Much has been written on the abuse and misuse of statistical methods, including p-values, statistical significance, etc. I present some of the best practices in statistics using a running example data analysis. Focusing primarily on frequentist and Bayesian linear mixed models, I illustrate some defensible ways in which statistical inference—specifically, hypothesis testing using Bayes factors vs. estimation or uncertainty quantification—can be carried out. The key is to not overstate the evidence and to not expect too much from statistics. Along the way, I demonstrate some powerful ideas, the most important ones being using simulation to understand the design properties of one’s experiment before running it, visualizing data before carrying out a formal analysis, and simulating data from the fitted model to understand the model’s behavior.

PDFhttps://psyarxiv.com/y54va/


Summer School on Statistical Methods for Linguistics and Psychology, Sept. 12-16, 2022 (applications close April 1)

The Sixth Summer School on Statistical Methods for Linguistics and Psychology will be held in Potsdam, Germany, September 12-16, 2022.  Like the previous editions of the summer school, this edition will have two frequentist and two Bayesian streams. Currently, this summer school is being planned as an in-person event.

The application form closes April 1, 2022. We will announce the decisions on or around April 15, 2022.

Course fee: There is no fee because the summer school is funded by the Collaborative Research Center (Sonderforschungsbereich 1287). However, we will charge 40 Euros to cover costs for coffee and snacks during the breaks and social hours. And participants will have to pay for their own accommodation. 

For details, see: https://vasishth.github.io/smlp2022/

 Curriculum:

1. Introduction to Bayesian data analysis (maximum 30 participants). Taught by Shravan Vasishth, assisted by Anna Laurinavichyute, and Paula Lissón

This course is an introduction to Bayesian modeling, oriented towards linguists and psychologists. Topics to be covered: Introduction to Bayesian data analysis, Linear Modeling, Hierarchical Models. We will cover these topics within the context of an applied Bayesian workflow that includes exploratory data analysis, model fitting, and model checking using simulation. Participants are expected to be familiar with R, and must have some experience in data analysis, particularly with the R library lme4.
Course Materials Previous year's course web page: all materials (videos etc.) from the previous year are available here.
Textbook: here. We will work through the first six chapters.

2. Advanced Bayesian data analysis (maximum 30 participants). Taught by Bruno Nicenboim, assisted by Himanshu Yadav

This course assumes that participants have some experience in Bayesian modeling already using brms and want to transition to Stan to learn more advanced methods and start building simple computational cognitive models. Participants should have worked through or be familiar with the material in the first five chapters of our book draft: Introduction to Bayesian Data Analysis for Cognitive Science. In this course, we will cover Parts III to V of our book draft: model comparison using Bayes factors and k-fold cross validation, introduction and relatively advanced models with Stan, and simple computational cognitive models.
 Course Materials Textbook here. We will start from Part III of the book (Advanced models with Stan). Participants are expected to be familiar with the first five chapters.

3. Foundational methods in frequentist statistics (maximum 30 participants). Taught by Audrey Buerki, Daniel Schad, and João Veríssimo.

Participants will be expected to have used linear mixed models before, to the level of the textbook by Winter (2019, Statistics for Linguists), and want to acquire a deeper knowledge of frequentist foundations, and understand the linear mixed modeling framework more deeply. Participants are also expected to have fit multiple regressions. We will cover model selection, contrast coding, with a heavy emphasis on simulations to compute power and to understand what the model implies. We will work on (at least some of) the participants' own datasets. This course is not appropriate for researchers new to R or to frequentist statistics.
Course Materials Textbook draft here

4.  Advanced methods in frequentist statistics with Julia (maximum 30 participants). Taught by Reinhold Kliegl, Phillip Alday, Julius Krumbiegel, and Doug Bates.
Applicants must have experience with linear mixed models and be interested in learning how to carry out such analyses with the Julia-based MixedModels.jl package) (i.e., the analogue of the R-based lme4 package). MixedModels.jl has some significant advantages. Some of them are: (a) new and more efficient computational implementation, (b) speed — needed for, e.g., complex designs and power simulations, (c) more flexibility for selection of parsimonious mixed models, and (d) more flexibility in taking into account autocorrelations or other dependencies — typical EEG-, fMRI-based time series (under development). We do not expect profound knowledge of Julia from participants; the necessary subset of knowledge will be taught on the first day of the course. We do expect a readiness to install Julia and the confidence that with some basic instruction participants will be able to adapt prepared Julia scripts for their own data or to adapt some of their own lme4-commands to the equivalent MixedModels.jl-commands. The course will be taught in a hybrid IDE. There is already the option to execute R chunks from within Julia, meaning one needs Julia primarily for execution of MixedModels.jl commands as replacement of lme4. There is also an option to call MixedModels.jl from within R and process the resulting object like an lme4-object. Thus, much of pre- and postprocessing (e.g., data simulation for complex experimental designs; visualization of partial-effect interactions or shrinkage effects) can be carried out in R.
Course Materials Github repo: here.




New paper in Computational Brain and Behavior: Sample size determination in Bayesian Linear Mixed Models

We've just had a paper accepted in Computational Brain and Behavior, an open access journal of the Society for Mathematical Psychology

Even though I am not a psychologist, I feel an increasing affinity to this field compared to psycholinguistics proper. I will be submitting more of my papers to this journal and other open access journals (Glossa Psycholx, Open Mind in particular) in the future. 

Some things I liked about this journal:

- A fast and well-informed, intelligent, useful set of reviews. The reviewers actually understand what they are talking about! It's refreshing to find people out there who speak my language (and I don't mean English or Hindi). Also, the reviewers signed their reviews. This doesn't usually happen.

- Free availability of the paper after publication; I didn't have to do anything to make this happen. By contrast, I don't even have copies of my own articles published in APA journals. The same goes for Elsevier journals like the Journal of Memory and Language. Either I shell out $$$ to make the paper open access, or I learn to live with the arXiv version of my paper. 

The proofing was *excellent*. By contrast, the Journal of Memory and Language adds approximately 500 mistakes into my papers every time they publish it (then we have to correct them, if we catch them at all). E.g., in this paper we had to issue a correction about a German example; this error was added by the proofer! Another surprising example of JML actually destroying our paper's formatting is this one; here, the arXiv version has better formatting than the published paper, which cost several thousand Euros! 

- LaTeX is encouraged. By contrast, APA journals demand that papers be submitted in W**d. 

Here is the paper itself: here, we present an approach, adapted from the work of two statisticians (Wang and Gelfand), for determining approximate sample size needed for drawing meaningful inferences using Bayes factors in hierarchical models (aka linear mixed models). The example comes from a psycholinguistic study but the method is general. Code and data are of course available online.

The pdf: https://link.springer.com/article/10.1007/s42113-021-00125-y


 


Thursday, February 03, 2022

EMLAR 2022 tutorial on Bayesian methods

 At EMLAR 2022 I will teach two sessions that will introduce Bayesian methods. Here is the abstract for the two sessions:

EMLAR 2022: An introduction to Bayesian data analysis

Taught by Shravan Vasishth (vasishth.github.io)


Session 1. Tuesday 19 April 2022, 1-3PM (Zoom link will be provided)

Modern probabilistic programming languages like Stan (mc-stan.org)

have made Bayesian methods increasingly accessible to researchers

in linguistics and psychology. However, finding an entry point

into these methods is often difficult for researchers. In this

tutorial, I will provide an informal introduction to the

fundamental ideas behind Bayesian statistics, using examples

that illustrate applications to psycholinguistics.

I will also discuss some of the advantages of the Bayesian

approach over the standardly used frequentist paradigms:

uncertainty quantification, robust estimates through regularization,

the ability to incorporate expert and/or prior knowledge into

the data analysis, and the ability to flexibly define the

generative process and thereby to directly address the actual research

question (as opposed to a straw-man null hypothesis).

Suggestions for further reading will be provided. In this tutorial,

I presuppose that the audience is familiar with linear mixed models

(as used in R with the package lme4).


Session 2. Thursday 21 April 2022, 9:30-11:30 (Zoom link will be provided)

This session presupposed that the participant has attended

Session 1. I will show some case studies using brms and Stan

code that will demonstrate the major applications of

Bayesian methods in psycholinguistics. I will reference/use some of

the material described in this online textbook (in progress):

https://vasishth.github.io/bayescogsci/book/

Sunday, December 19, 2021

Generating data from a uniform distribution using R, without using R's runif function

Generating data from a uniform distribution using R, without using the runif function

One can easily generate data from a uniform(0,1) using the runif function in R:

runif(10)
##  [1] 0.25873184 0.06723362 0.07725857 0.65281945 0.43817895 0.35372059
##  [7] 0.14399150 0.16840633 0.24538047 0.95230596

But what if one doesn’t have this function and one needs to generate samples from a uniform(0,1)? In rejection sampling, one needs access to uniform(0,1) .

Here is one way to generate uniform data.

Generating samples from a uniform(0,1)

Samples from a uniform can be generated using the linear congruent generator algorithm (https://en.wikipedia.org/wiki/Linear_congruential_generator).

Here is the code in R.

pseudo_unif<-function(mult=16807,
                      mod=(2^31)-1,
                      seed=123456789,
                      size=100000){
  U<-rep(NA,size)
  x<-(seed*mult+1)%%mod
  U[1]<-x/mod
  for(i in 2:size){
    x<-(x*mult+1)%%mod
    U[i]<-x/mod
  }
  return(U)
}

u<-pseudo_unif()
hist(u,freq=FALSE)

For generating data from any range going from min to max:

gen_unif<-function(low=0,high=100,seed=987654321,
                   size=10000){
  low + (high-low)*pseudo_unif(seed=seed,size=size)
}

hist(gen_unif(),freq=FALSE)

The above code is based on: https://towardsdatascience.com/how-to-generate-random-variables-from-scratch-no-library-used-4b71eb3c8dc7