Sunday, December 19, 2021

Generating data from a uniform distribution using R, without using R's runif function

Generating data from a uniform distribution using R, without using the runif function

One can easily generate data from a uniform(0,1) using the runif function in R:

runif(10)

##  [1] 0.25873184 0.06723362 0.07725857 0.65281945 0.43817895 0.35372059
##  [7] 0.14399150 0.16840633 0.24538047 0.95230596

But what if one doesn’t have this function and one needs to generate samples from a uniform(0,1)? In rejection sampling, one needs access to uniform(0,1) .

Here is one way to generate uniform data.

Generating samples from a uniform(0,1)

Samples from a uniform can be generated using the linear congruent generator algorithm (https://en.wikipedia.org/wiki/Linear_congruential_generator).

Here is the code in R.

pseudo_unif<-function(mult=16807,
                      mod=(2^31)-1,
                      seed=123456789,
                      size=100000){
  U<-rep(NA,size)
  x<-(seed*mult+1)%%mod
  U[1]<-x/mod
  for(i in 2:size){
    x<-(x*mult+1)%%mod
    U[i]<-x/mod
  }
  return(U)
}

u<-pseudo_unif()
hist(u,freq=FALSE)

For generating data from any range going from min to max:

gen_unif<-function(low=0,high=100,seed=987654321,
                   size=10000){
  low + (high-low)*pseudo_unif(seed=seed,size=size)
}

hist(gen_unif(),freq=FALSE)

The above code is based on: https://towardsdatascience.com/how-to-generate-random-variables-from-scratch-no-library-used-4b71eb3c8dc7

Tuesday, December 14, 2021

New paper: Syntactic and semantic interference in sentence comprehension: Support from English and German eye-tracking data

This paper is part of a larger project that has been running for 4-5 years, on the predictions of cue-based retrieval theories. This paper revisits Van Dyke 2007's design, using eye-tracking (the data are from comparable designs in English and German). The reading time patterns are consistent with syntactic interference at the moment of retrieval in both English. Semantic interference shows interesting differences between English and German---in English, semantic interference seems to happen simultaneously with syntactic interference, but in German, semantic interference is delayed (it appears in the post-critical region). The morphosyntactic properties of German could be driving the lag in semantic interference. We also discuss the data in the context of the quantitative predictions from the Lewis & Vasishth cue-based retrieval model.

One striking fact about psycholinguistics in general and interference effects in particular is that most of the data tend to come from English. Very few people work on non-English languages. I bet there are a lot of surprises in store for us once we step out of the narrow confines of English. I bet that most theories of sentence processing are overfitted to English and will not scale. And if you submit a paper to a journal using data from a non-English language, there will always be a reviewer or editor who asks you to explain why you chose language X!=English, and not English. Nobody ever questions you if you study English. A bizarre world.

Title: Syntactic and semantic interference in sentence comprehension: Support from English and German eye-tracking data

Abstract:

A long-standing debate in the sentence processing literature concerns the time course of syntactic and semantic information in online sentence comprehension. The default assumption in cue-based models of parsing is that syntactic and semantic retrieval cues simultaneously guide dependency resolution. When retrieval cues match multiple items in memory, this leads to similarity-based interference. Both semantic and syntactic interference have been shown to occur in English. However, the relative timing of syntactic vs. semantic interference remains unclear. In this first-ever cross-linguistic investigation of the time course of syntactic vs. semantic interference, the data from two eye-tracking reading experiments (English and German) suggest that the two types of interference can in principle arise simultaneously during retrieval. However, the data also indicate that semantic cues may be evaluated with a small timing lag in German compared to English. This suggests that there may be cross-linguistic variation in how syntactic and semantic cues are used to resolve linguistic dependencies in real-time.

Download pdf from here: https://psyarxiv.com/ua9yv

New paper in Computational Brain and Behavior: Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics

We have just had a paper accepted in the journal Computational Brain and Behavior. This is part of a special issue that responds to the following paper on linear mixed models:
van Doorn, J., Aust, F., Haaf, J.M. et al. Bayes Factors for Mixed Models. Computational Brain and Behavior (2021). https://doi.org/10.1007/s42113-021-00113-2
There are quite a few papers in that special issue, all worth reading, but I especially liked the contribution by Singmann et al: Statistics in the Service of Science: Don't let the Tail Wag the Dog (https://psyarxiv.com/kxhfu/) They make some very good points in reaction to van Doorn et al's paper.

Our paper: Shravan Vasishth, Himanshu Yadav, Daniel J. Schad, and Bruno Nicenboim. Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics. Computational Brain and Behavior, 2021.
Abstract: We discuss an important issue that is not directly related to the main theses of the van Doorn et al. (2021) paper, but which frequently comes up when using Bayesian linear mixed models: how to determine sample size in advance of running a study when planning a Bayes factor analysis. We adapt a simulation-based method proposed by Wang and Gelfand (2002) for a Bayes-factor based design analysis, and demonstrate how relatively complex hierarchical models can be used to determine approximate sample sizes for planning experiments.
Code and data: https://osf.io/hjgrm/
pdf: here

Shravan Vasishth's Slog (Statistics blog)

Search