Shravan Vasishth's Slog (Statistics blog)

Monday, February 09, 2015

Another comment on Hornstein's comments on Hagoort

On his blog, Norbert Hornstein had the following exchange. The original Hagoort post is here.

##############
NH: " If Sprouse and Alemeida are right (which I assure you they are; read the papers) then there is nothing wrong with the data that GGers use."

SV: One should never be 100% sure of anything. There is always uncertainty and we should openly discuss the range of possibilities whenever we present a conclusion, not just argue for one position. That has been a problem in psychology, with overly strong conclusions, and that is a problem in linguistics, experimentally driven or not. But this is specially relevant for statistical inference. We can never be sure of anything.

NH: But I think that I disagree with your second point about being sure. One way of taking your point is that one should always be ready to admit that one is wrong. As a theoretical option, this is correct. BUT, I doubt very much anyone actually works in this way. Do you really leave open the option that, for example, thinking takes place in the kidneys and not the brain? Is it a live option for you that you see through the ears and see through the eyes? Is if a live option for you that gravitational attraction is stronger than electromagnetic forces over distances of 2 inches? We may be wrong about everything we have learned, but we this is a theoretical, not what in the 17th century was called a moral possibility. Moreover, there is a real down side to keeping too open a mind, which is what genuflecting to this theoretical option can engender. I find refuting flat earthers and climate science denialists a waste of intellectual time and effort. Is it logically possible that they are right? Sure. Is it morally possible? No. Need we open our minds to their possibilities? No. Should we? No. Same IMO with that GGers have found out about language. There are many details I am willing to discuss, but I believe that it is time to stop acting as if the last 60 years of results might one day go up in smoke. That's not being open minded, or it this is what being open minded requires, then so much the worse for being open minded.

Let me say this another way: there are lots of things I expect to change over the course of the next 25 years of work in linguistics. However, there are many findings that I believe are settled effects. We will not wake up tomorrow and discover that reflexives resist binding or that all unbounded dependencies are created equal. These are not established facts, though there may be some discussion of the limits of their relevance. But they won't all go away. But this is precisely what Hagoort thinks we should do, and on one reading you are suggesting as well. Maybe we are completely wrong! Nope, we aren't. Bding open minded to this kind of global skepticism about the state of play is both wrong and debilitating.

Last point: you are of course aware that your last sentence is a kind of paradox. Is the only thing we can be sure of is that we can never be sure of anything? Hmm. As you know better than I do, this is NOT what actually happens in statistical practice. There are all sorts of things that are held to be impossible. In any given model the hypothesis space defines the limits of the probable. What's outside has 0 probability. The real fight, always, is what is possible and what not. Only then does probably mean anything.
###############

Since Norbert's blog doesn't allow comments beyond a particular length, I post my response here:

Norbert, I agree that my statement, taken literally, if obviously absurd. When I said that we can't be sure of anything, I didn't mean that we can't be sure that we don't think with our kidneys etc. I fully agree (and I would have to be really, really stupid not to agree! ;) that there are many things we can easily rule out as impossible; no experiments needed there (also not in syntactic investigations). I was talking specifically about results using rating studies. Take Sprouse et al's work, which is excellent in my opinion. More work like that should be done, and I'm fully for it, whatever the outcome. My comment was directed at your statement that we can be sure of Sprouse et al's results. I agree that syntacticians have a finely honed ability to sift through data by just using intuition. So I find the Sprouse et al conclusions plausible. My skepticism is of the following nature: it's entirely possible that the things syntacticians have studied so far were, relatively speaking, low hanging fruit. The Sprouse et al results may be convincing for the items studied so far, but they may have limited validity for future work, where judgements could be a lot more variable and unstable. Or they may not replicate (replication is the acid test). Maybe we can take some of the work on negative polarity; we might find that the judgements diverge from expert NPI researchers (where judgements get pretty unstable---Van der Wouden once told me that we shouldn't even consult "ordinary" speakers of a language for NPI, since they won't even have reliable judgements, one has to consult a syntactician). Once we had an NPI specialist over at Ohio State when I was a grad students, and he presented his expert judgements as the basis for his theory; it was easy to find counterexamples in corpora. Or, if we move to a language like Hindi, which has inherently unstable and variable judgements, the judgements of linguists vs a sample from the population of native speakers may differ quite a bit. For example, I was really surprised by the key example in Mahajan's dissertation; it is very hard to "get" the judgement that Mahajan got. Initially I thought I just didn't get it because I wasn't a refined enough individual syntactically, but that was not the case. Simialrly, we have done several rating studies on word order variation in Hindi, with completely unclear and unstable results. But syntacticians working on Hindi are pretty sure about what's OK and what's not OK in these cases (just take monoclausal word order with and without negation: here's a syntactician holding forth on this topic: http://www.ling.uni-potsdam.de/~vasishth/pdfs/VasishthRLC04.pdf. The situation is much less clear than this guy suggests in the paper, if you do a rating study).

What I was commenting on was the certainty expressed in the statement "If Sprouse and Alemeida are right (which I assure you they are; read the papers)". Neither you nor I can know whether they are right. They have some evidence for their position, which may or may not replicate or generalize when we go beyond the language and phenomena covered there.

PS You said that "One way of taking your point is that one should always be ready to admit that one is wrong. As a theoretical option, this is correct. BUT, I doubt very much anyone actually works in this way." I know at least one person. Take a look at some of my papers:

http://www.ling.uni-potsdam.de/~vasishth/pdfs/FrankTrompenaarsVasishthCogSci.pdf

http://www.ling.uni-potsdam.de/~jaeger/publications/JaegerChenLiLinVasishth2015.pdf

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0100986

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0077006

We have more stuff in the works in which we try to break our own favorite story. Ted Gibson has also published against his favored positions. I think more people need to push against their own positions. People don't do that. I am highly suspicious of people who *only* find (or only publish) results favoring their own position.

Thursday, February 05, 2015

Quantitative methods in linguistics: The danger ahead

Peter Hagoort has written a nice piece on his take on the future of linguistics:

http://www.mpi.nl/departments/neurobiology-of-language/news/linguistics-quo-vadis-an-outsider-perspective

He's very gentle on linguists in this piece. One of his suggestions is to do proper experimental research instead of relying on intuition. Indeed, the field of linguistics is already moving in that direction. I want to point out a potentially dangerous consequence of the move towards quantitative methods in linguistics.

My expectation is that with the arrival of more and more quantitative work in linguistics, we are going to see (actually, we are already there) a different kind of degradation in the quality of work done. This degradation will be different from the kind linguistics has already experienced thanks to the tyranny of intuition in theory-building.

Here are some things that I have personally seen linguists do (and psycholinguists do this too, even though they should know better!):

1. Run an experiment until you hit significance. ("Is the result non-significant? Just run more subjects; it's going in the right direction.")
2. Alternatively, if you are looking to prove the null hypothesis, stop early or just run a low power study, where the probability of finding an effect is nice and low.
3. Run dozens (in ERP, even more than dozens) of tests and declare significance at 0.05.
4. Vary the region of interest post-hoc to get significance.
5. Never check model assumptions.
6. Never replicate results.
7. Don't release data and code with your publication.
8. Remove data as needed to get below the 0.05 threshold.
9. Only look for evidence in favor of your theory; never publish against your own theoretical position.
10. Argue from null results that you actually found that there is no effect.
11. Reverse-engineer your predictions post-hoc after the results show something unexpected.

I could go on. The central problem is that doing experiments requires a strong grounding in statistical theory. But linguists (and psycholinguists) are pretty cavalier about acquiring the relevant background: have button, will click. No linguist would think of running his sentences through some software to print out his formal analyses; you need to have expert knowledge to do linguistics. But the same linguist will happily gather rating data and run some scripts or press some buttons to get an illusion of quantitative rigor. I wonder why people think that statistical analysis is exempt from the deep background so necessary for doing linguistics. Many people tell me that they don't have the time to study statistics. But the statistics is the science. If you're not willing to put in the time, don't use statistics!

I suppose I should be giving specific examples here; but that would just insult a bunch of people and would distract us from the main point, which is that the move to doing quantitative work in linguistics has a good chance of backfiring and leading to a false sense of security that we've found something "real" about language.

I can offer one real example of a person I don't mind insulting: myself. I have made many, possibly all, of the mistakes I list above. I started out with formal syntax and semantics, and transitioned to doing experiments in 2000. Everything I knew about statistical analysis I learnt from a four-week course I did at Ohio State. I discovered R by googling for alternatives to SPSS and Excel, which had by then given me RSI. I had the opportunity to go over to the Statistics department to take courses there, but I missed that chance because I didn't understand how deep my ignorance was. The only reason I didn't make a complete fool of myself in my PhD was that I had the good sense to go to the Statistical Consulting section of OSU's Stats department, where they introduced me to linear mixed models ("why are you fitting repeated measures ANOVAs? Use nlme."). It was after I did a one-year course in Sheffield's Statistics department that I finally started to see what I had missed (I reviewed this course here).

For linguistics, becoming a quantitative discipline is not going to give us the payoff that people expect, unless we systematically work at making a formal statistical education a core part of the curriculum. Currently, what's happening is that we have advanced fast in using experimental methods, but have made little progress in developing a solid understanding of statistical inference.

Obviously, not everyone who uses experimental methods in linguistics falls into this category. But the problems are serious, both in linguistics (and psycholinguistics), and it's better to recognize this now rather than let thousands of badly done experiments and analyses lead us down some other garden-path.

Friday, January 02, 2015

A weird and unintended consequence of Barr et al's Keep It Maximal paper

Barr et al's well-intentioned paper is starting to lead to some seriously weird behavior in psycholinguistics! As a reviewer, I'm seeing submissions where people take the following approach:

1. Try to fit a "maximal" linear mixed model. If you get a convergence failure (this happens a lot since we routinely run low power studies!), move to step 2.

[Aside:
By the way, the word maximal is ambiguous here, because you can have a "maximal" model with no correlation parameters estimated, or have one with correlations estimated. For a 2x2 design, the difference would look like:

correlations estimated: (1+factor1+factor2+interaction|subject) etc.

no correlations estimated: (factor1+factor2+interaction || subject) etc.

Both options can be considered maximal.]

2. Fit a repeated measures ANOVA. This means that you average over items to get F1 scores in the by-subject ANOVA. But this is cheating and amounts to p-value hacking. This effectively changes the between items variance to 0 because we aggregated over items for each subject in each condition. That is the whole reason why linear mixed models are so important; we can take both between item and between subject variance into account simultaneously. People mistakenly think that the linear mixed model and rmANOVA are exactly identical. If your experiment design calls for crossed varying intercepts and varying slopes (and it always does in psycholinguistics), an rmANOVA is not identical to the LMM, for the reason I give above. In the old days we used to compute minF. In 2014, I mean, 2015, it makes no sense to do that if you have a tool like lmer.

As always, I'm happy to get comments on this.