Interpreting regression interaction terms
Last post, after talking about the results from the "choking foul shooter" study, I mentioned that there was one additional assumption I had to make. That assumption was that, in the regression, the coefficients for "last 15 seconds" and "down 1-4 points" were close to zero.
The easiest way to explain that is to go through what an interaction term means in a regression. (Warning: This is boring statistics stuff, no sports content until the end.)
Suppose I want to figure out if stimulants help a student do better on an exam. So I run a regression to predict the exam score. I use a bunch of variables, like age, time studying, performance on other exams, grades on assignments, number of classes missed, and so on, but I also include a dummy variable for whether the student had (both) coffee and Red Bull before the exam.
After the exam, I run the regression, and I find the coefficient for "both coffee and Red Bull" is -3, and statistically significant. I conclude that if I were a student, I might consider not taking both coffee and Red Bull.
Fair enough, so far.
But, now, suppose I do the same experiment again, but, this time, I add a couple of new dummy variables -- whether or not the student had coffee (with or without Red Bull), and whether or not the student had Red Bull (with or without coffee). I don't remove the original "had both" variable -- that stays in.
I run the regression again, and, again, the coefficient for "both coffee and Red Bull" comes out to -3 -- exactly the same as last time. What am I able to conclude this time about the desirability of drinking both coffee and Red Bull?
The answer: almost nothing. That coefficient, *on its own*, does not give much useful information at all about how performance is affected by the coffee/Red Bull combination.
Let me explain why.
(First, a quick not on terminology. In a regression, the "both coffee and Red Bull" variable would be referred to as "the interaction of coffee and Red Bull". That would be written as "coffee x Red Bull," or a suitable abbreviation (In fact, I'm going to start referring to "coffee" as C, "Red Bull" as R, and "Coffee x Red Bull" as CxR). The "x" is a multiplication sign -- it's there because you can get the coefficient by multiplying together the dummy values for C and R. That is, if either C or R is zero, CxR equals zero; if both coffee and Red Bull are 1, then CxR equals one. That's exactly what we want.)
In a regression result, the simplest way to interpret the coefficient of a dummy variable is, "what happens when you change the value from 0 to 1 and leave all the other variables the same." In the first regression, that works fine. But in the second regression, it can't work. Because if you change CxR and leave everything else constant, your data and regression become inconsistent. You wind up with CxR being 1 (meaning both coffee and Red Bull), but you'll have either C=0 (no coffee) or R=0 (no Red Bull). Those three variables are tied together, so you can't just change CxR and leave the other two constant.
Put another way, there are four possible combinations for C, R, and CxR:
C = 0, R = 0, CxR = 0
C = 1, R = 0, CxR = 0
C = 0, R = 1, CxR = 0
C = 1, R = 1, CxR = 1
You can't change CxR from 0 to 1, and still have a combination that's on the list. So the "change CxR but leave all other variables the same" strategy no longer works. If you change CxR from 0 to 1, you'll have to change one of the other variables, too.
Which ones should you change? It depends what question you're trying to answer. For example, suppose you do the regression and you get these coefficients:
C = -5
R = -10
CxR = -3
If you're trying to ask, "what's the effect of taking coffee alone versus nothing at all," it's like asking, "what is the effect of changing (C=0, R=0, CxR=0) to (C=1, R=0, CxR = 0)?" The answer is -5.
If you're trying to ask, "what's the effect of taking both coffee and Red Bull versus nothing at all?", it's like asking, what's the effect of changing (C=0, R=0, CxR=0) to (C=1, R=1, CxR =1)?" The answer is -18.
And so on. But none of those kinds of questions lead to the answer of -3 points, because none of these questions can be answered by changing CxR alone.
So what does the -3 represent? The non-linearity of the coffee and Red Bull variables. Or, put another way, the "increasing or diminishing returns" to combining coffee and Red Bull. Or, put a third way, the effects of the *interaction* of coffee and Red Bull, independent of their individual effects. Or, put a fourth way, the amount of effects *duplicated* from both coffee and Red Bull, that you can't count twice even if you take both drinks.
The -3 is NOT any indication of whether it's a good thing to take coffee and Red Bull together. Even though the coefficient of the interaction is negative, coffee and Red Bull together might be a positive thing. Suppose the regression coefficients had looked like this:
C = +10
R = +20
CxR = -3
The CxR coefficient is still -3, but now look what happens:
Take coffee, score 10 points higher
Take Red Bull, score 20 points higher
Take both coffee and Red Bull, score 27 points higher!
In this case, you're still going to want to take both coffee and Red Bull. What the -3 is telling you is, there are diminishing returns to taking both. You might think that, since coffee improves you by 10, and Red Bull improves you by 20, that, if you take both, you'll improve by 30. That's not right. There are diminishing returns of -3, so, if you take both, you'll only improve by 27.
Of course, if the coefficients of C and R are both zero, then the CxR variable is indeed the entire effect. So if coffee does nothing, and Red Bull does nothing, but, when you take them together, you lose 3 points ... in that case, the CxR variable actually IS the effect of taking both C and R.
This is fairly standard stuff, I would think ... I looked for an explanation on the web, so I wouldn't have to type all this, but I couldn't find one.
Anyway, going back to the choking study ... there, I looked at a variable called "Last15 x Down1_4", which was the interaction of shots that happen in the last 15 seconds (a dummy variable called "Last15"), and with the shooting team up by 1 to 4 points (dummy variable "Down1_4").
It turned out that the coefficient for that was -0.058 (as compared to 11-point+ blowouts). I implied that meant that shooters were 5.8 percentage points worse in those clutch situations than in blowouts.
But that wasn't right, because "Down1_4" and "Last15" were also in the regression. It's like the "Coffee / Red Bull / Both" case. If I want to compare the effects of shooting in the last 15 seconds down by 1-4 points, against shooting where *neither* of those is true, I have to add up all three coefficients:
Down1_4 = A
Last15 = B
Down1_4 x Last 15 = -0.058
To get the true clutch effect, I have to compute A + B - 0.058. It could turn out that A and B are huge: maybe they're +7 points each! In that case, the effect would be 0.07 + 0.07 - .058, which would be +0.082 -- which would mean shooters were GREAT in the clutch.
The study doesn't give us A and B. However, the authors do tell us (and author Dan Stone reiterated in the comments to the previous post) that almost all the omitted coefficients are less than 0.01.
Still, suppose they are as high as exactly +0.01. That means that A + B - 0.058 would be -0.038, which would less significant a choke effect than I thought. Or, suppose they were as low as negative 0.01. In that case, players would be even chokier -- at -0.078.
That's why I added a note to the end of my post, saying I had to make one additional assumption. That assumption is that A and B were both close to zero. If they're exactly zero, the -0.058 stands.