## Saturday, April 04, 2020

### Developing the right mindset for learning statistics: Some suggestions

Developing the right mindset for learning statistics: Some suggestions

# Introduction

Over the last few decades, statistics has become a central part of the linguist’s toolkit. In psychology, there is a long tradition of using statistical methods for data analysis, but linguists and other cognitive scientists are relative newcomers to this area, and the formal statistics coursework provided in graduate programs is still quite sketchy. For example, as a grad student at Ohio State, in 1999 or 2000 or so, I did a four-week intensive course on statistics, after which I could do t-tests and ANOVAs on my data using JMP. Even in psychology departments, the amount of exposure students get to statistics varies a lot.

As part of Potsdam’s graduate linguistics/cognitive science/cognitive systems programs, we teach a sequence of five courses involving data analysis and statistics:

• (Winter) Statistical data analysis 1
• (Winter) Bayesian statistical inference 1
• (Winter) Case studies in psycholinguistics
• (Summer) Statistical data analysis 2
• (Winter) Bayesian statistical inference 2

In addition, we teach (in winter) a Foundations of Mathematics course that covers undergraduate calculus, probability theory, and linear algebra. This course is designed for people who plan to take the machine learning classes in computer science, as part of the MSc in Cognitive Systems.

Students sometimes have difficulties while doing these courses. This is because there is an art to taking these courses that is not obvious. This short note is aimed at spelling out some important aspects of this art.

In my experience, anyone can learn this way of approaching the study of statistics, which is inherently difficult. Keep in mind that when learning something new, one might not understand everything, but that’s OK. The whole world is built on partial understanding (I myself have only a very incomplete picture of statistics, and it’s likely to stay that way). Someone once told me that that the key difference between a mathematician and a “normal” preson is that the mathematician will keep reading or listening even if they are not following the details of the presentation. One can learn to become comfortable with partial understanding, safe in the knowledge that one can come back to the open questions later.

Below, I am shamelessly going to borrow from this (to my mind) classic book:

Burger, E. B., & Starbird, M. (2012). The 5 elements of effective thinking. Princeton University Press.

I strongly advise you to read the Burger and Starbird book; it’s short and very practically oriented. I re-read it once a year on average just to remind myself of the main ideas.

My comments below are specifically oriented towards the learning of statistics as my colleagues and I teach it at Potsdam, so my examples are very specifically about the material I teach. The examples are really the only thing I add beyond what’s in the Burger and Starbird book.

# Developing the right mindset: A checklist

## Understand the “easy” stuff deeply

Ask yourself: when starting the study of statistics, what is the basic knowledge I will need (I review all these topics in my introductory classes)? You will not be in a position to answer this question when you start your studies, but after completing one or two courses you should revisit this question.

• The basic elements of probability theory (sum rule, product rule, conditional probability, law of total probability)
• Basic high-school algebra (e.g., given $y = \frac{x}{1-x}$, solve for $x$)
• How to deal with exponents: $x^2 \times x^3 = ?$ Is it $x^5$ or $x^6$? We learnt this in school but we forgot it because we didn’t use it for many years. But now we need this knowledge!
• What is a log? What is log(1)? What is log(0)? How to find out if one has forgotten?
• What is a probability distribution? This requires some careful navigation. The key concepts here are the probability mass function (discrete case), probability density functions (continuous case), cumulative distribution functions. In bivariate/multivariate distributions, conditional, marginal, and joint distributions must be well-understood intuitively. The key here is to develop graphical intuition, using simulation. I teach this approach in my courses. Statisticians use calculus when discussing the properties of probability distributions. However, we can do all this graphically and lose no information. In practice, we rarely or never need to do any analytical work involving mathematical derivations; the software does all the work. However, it is important to understand the details intuitively, and here figures help a lot. A basic rule of thumb is: whenever trying to understand something, try to visualize it graphically. Even something mundane like repeated coin tosses can be graphically visualized, and then everything becomes clear.

Going back repeatedly to these foundational ideas as one advances through the courses is very important. The goal should be to internalize them deeply, through graphical intuition.

## Mistakes are your friend and teacher

Throughout our school years, we are encouraged to deliver the right answers, and penalized for delivering wrong answers. This style of schooling misses the point that mistakes can teach us more than our correct answers, if we compare the expectd answer with ours and try to work out what we got wrong and why. This is called “error learning” or something like that in machine learning, and it works with humans too. Don’t be afraid to make mistakes, but try to make only new mistakes, and keep learning from them.

Students generally assume that I will judge them if they get something wrong. This is a false impression. As I say above, you can learn more from a mistake than from a correct answer. In my own studies of statistics, you can see that my grades are not stellar, they are all online:

https://vasishth-statistics.blogspot.com/2015/02/getting-statistics-education-review-of.html

Despite my mediocre grades, I still learnt a lot. Similarly, in graduate school, at Ohio State, my grades were just OK to so-so, nothing to write home about. In computer science (Ohio State), my grades were usually in the range of B+. I rarely got an A-. I still learnt important and useful stuff.

## How to develop curiosity: Solve the same problem more than one way, and generate your own questions

The Burger and Starlight book encourages the reader to become curious about a problem. Here, I suggest a very concrete strategy, e.g., when doing homework assignments.

• First, create some mental space and time. Don’t try to squeeze the homework assignment into the last two hours before the submission deadline. Create a clear day ahead of you to explore a problem. I know that courses are designed these days to require at most 2-3 hours of work per week at home. This is an unfortunate productionalization of education that is now hurting the education system in Europe. If you need to stick to that tght schedule, do what you can in the limited time, but even there it is good to not leave the work to the last hours before submission. If you create more time, use it to explore in the following way.
• Second, assuming you have some extra time, try to solve the given problem using different approaches. E.g., if the assignment asks you to use a lognormal likelihood in a linear mixed model, ask yourself if there is some way to solve the problem with the standard normal likelihood. If the problem asks you to work with brms, try to also solve the problem using Stan or even rstanarm, even if the assignment doesn’t ask you to do this. You are doing this for yourself, not for submitting the assignment. Even if the assignment doesn’t ask you to change the priors in a model, fool around with them to see what happens to the posteriors. If there is an LKJ(2) prior on a correlation parameter in the linear mixed model, find out what happens if you use LKJ(0.5) or LKJ(10). Etc.
• Ask yourself what-if questions. Suppose you are learning about power analysis using simulation, a topic I cover in all my advanced classes, Bayesian or frequentist. This topic is ripe for exploration. Power depends essentially on three variables: effect size, sample size, and standard deviation. That is a fertile playground! I have spent so much time playing with power analyses that I can give ballpark estimates for my research problems quite accurately, without any simulation (of course, I always check my answers using simulation!). There are actually several different ways to compute power; you can use power.t.test, you can do it using simulation, etc. This topic is perfect for developing a sense of curiosity, but youc an do this for really any topic.

## Keep careful notes

Statistics is not to be trifled with. I don’t expect anyone to memorize any formulas, but the logic of the analytical steps can get confusing. Keep good records of your learning. As an example, here is my entire record of four years of formal statistics study at the University of Sheffield (I did an MSc online, part time). These are cheat sheets I prepared while studying:

https://github.com/vasishth/MScStatisticsNotes

These notes are way more mathematical than anything I will teach at Potsdam. However, the principle is: organize your understanding of the material yourself. Don’t just let the teacher organize it for you (the teacher does do that, through slides and lecture notes!). We only understand things if we can actively produce and reorganize them ourselves.

## Have a real problem you want to solve, and start simple

Usually, you will learn the most when you are desperate to get the answer to a data analysis problem. You will be working in a very small world of your own, and you know your problem, you are motivated to solving it. This is very different from homework assignments given out of the blue by the teacher. For this reason, especially in statistics courses, it is useful to come to the course with a specific problem you want to solve. As the course unfolds, apply the methods you learn to your problem. For example, suppose your supervisor has already told you that you need to fit a generalized linear mixed model with a logit link function to the data. Where to start?

Suppose you are taking a frequentist course and know that at the end of the course you need to be able to complete the data analysis your supervisor asked you to deal with. You can start by simplifying the problem radically and working with what you already know. Could you run a t-test instead? It doesn’t matter that someone told you that that’s the wrong test; we are playing here. Could you just fit a simple linear model (again wrong, but this is exploration). Just these two exercises will leave us with a lot of interesting insights to explore. Once you learn about linear mixed models, you can start exploring whether you can fit the model with the standard lmer function and what it would tell you. Once you reach that point, you are close to getting to the analysis you were told to do. Even if I don’t teach it in class, you can use the last trick to get there, which I discuss next.

## “Let me google that for you”: Learn to find information

Any time someone asks you a question you consider easily answered by googling, and you feel like being mean, you can use this website to deliver a sarcastic response: https://lmgtfy.com/. You simply type in the question, and then send the link to the person asking the question. When they click on it, the question is typed into the google search window, and you are invited to click on the search button. It’s a pretty passive aggressive thing to do, and I advise you to never use this approach. :)

But despite the nasty aspect of the LMFTY website, it does illustrate an important point: these days you can find a lot of information online. Here are some ways that I use the internet:

• When I get an error message in RStudio I don’t understand (this happens pretty much daily), I just copy it and paste it into google’s search engine. Almost always, someone has had that same problem before and posted a solution. You have to be patient sometimes and look at a lot of the search engine results; but eventually you will find the answer. One gets better at this with experience. Sometimes one can’t solve the problem (e.g., I have a minor ongoing problem with Cairo fonts); it’s OK to give up and move on when it isn’t critical to the work one is doing.
• For Bayesian data analysis, there are online forums one can ask questions at. E.g., discourse.mc-stan.org for Stan. For frequentist questions, there are R mailing lists (exercise: google them!).
• Stackexchange. I have gotten authoritative answers from distinguished scientists about math problems that I don’t have the technical knowledge to solve. Often, someone else has asked a similar question already, so it can happen that one doesn’t even need to ask.
• Blogs: I use Feedly to follow R-bloggers and other blogs like Andrew Gelman’s. Over time I have learnt a lot from reading blog posts.

Obviously, googling is not a fail-safe strategy. Sometimes you will get incorrect information. What I generally do is try to cross-check any technical claims from other sources like textbooks.

A common complaint in my statistics courses is that I don’t teach enough R. That’s because one can never teach enough R. One has to keep looking stuff up as needed; this is the skill that I am suggesting that you acquire.

## Look for connections between ideas

Often, statistics is taught like a random catalogue of tests: t-test, ANOVA, linear mixed model, Fisher exact test, etc., etc. Interestingly, however, many of these seemingly disparate ideas have deep connections. The t-value and the F-score are connected; the t-test and the linear mixed model are connected. Figuring out these relationships analytically is not difficult but one needs some background to work it out. For example, see

https://vasishth-statistics.blogspot.com/2018/04/a-little-known-fact-paired-t-test-is.html

Even if one doesn’t know enough to carry out this analytical derivation, one can play with data to get a feel for the connection. The way I first got a hint about the t-test and linear mixed model connection (discussed above analytically) was by simulating data and then analyzing it two different ways (t-test vs linear mixed model), and getting the exact same statistics. It was only much later that I saw how to work this out analytically. The point is that simulation will get you very far in such investigations. You may not be able to prove stuff mathematically (I usually can’t), but you can still gain insight.

## Getting further in your study of statistics

It is possible to take the Potsdam courses and do solid statistical analyses. However, if you get curious about the underlying mathematics, or want to read more advanced textbooks, or want to get into the machine learning field, we teach a Foundations of Mathematics course that graduate students can take. Historically, people have benefitted from taking this course even if they had no previous math exposure in university. So this course is definitely optional and most people can skip it; but it’s available for anyone interested in going deeper.