Low statistical significance doesn't necessarily mean no effect
The "Wages of Wins" blog is written mostly by David Berri, but, as it turns out, co-author Stacey Brook also blogs. Recently, Brook had a post on the relationship between salary and wins.
He says there is none. Seriously. Not that the relationship is weak, not that money doesn't help much. Brook seems to honestly believe that salary doesn't buy wins at all. Read the full post to see if I'm interpreting him correctly, but here's a quote:
"So not only the proportion of variance that is common between the two tiny, but here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero."
I have several problems with this analysis. The first one is not unique to Brook, and it drives me nuts. It's the idea that if you do a regression, and the significance level is less than 95%, it's OK to claim that there is no relationship between the variables.
That's not always right. It's often right; I suppose you could even say it's *usually* right. But this is one of those exceptions where it's not right at all.
Let's suppose that somehow you get it into your head that rubbing chocolate on your legs can help cure cancer. So you set up a double-blind experiment, where one set of patients gets the chocolate rub, and the other set gets a rub with fake chocolate. It turns out that the first group actually improves more than the second group -- by a small amount, maybe 1%. But the result is not statistically significant. Maybe, instead of the 95% you were looking for, you only have 80% significance.
In this case, I agree with Brook -- it would be wrong to argue that the 1% improvement you saw was real. It's probably just random chance, and you'd be justified in saying that there's no reason to believe that a chocolate rub has any therapeutic value at all.
But, now, let's turn to salary and wins. Suppose you study actual NBA payrolls and records, and you find a similar small effect: every $1 million gives you 0.1 extra wins. Again, suppose that's significant at only the 80% level.
In this case, can you draw the same conclusion, that money has no effect on wins at all? No, you can't. In this case, it's likely that the effect is real, despite the low significance level.
Why the difference? Because in the first case, there was absolutely no reason to believe that chocolate can have any effect on cancer. There's no previous scientific evidence for it, and there isn't a plausible mechanism for how the effect might work.
Suppose that, going in to the study, you (generously) thought there couldn't be more than a one in a million chance that chocolate helps treat cancer. So imagine a million different universes where you run the experiment. One time, you'll get a real effect. 200,000 times, you'll get 80% significance just by chance. So the chance that the chocolate actually works in this universe is roughly 1 in 200,001. That's still no reason to believe.
But the salary case is very different. There's no basis to believe that chocolate can cure cancer, but there's very good reason to believe that spending money buys better players and leads to more wins. In fact, every serious basketball fan in the world (except maybe Stacey Brook) believes that you can buy wins. When the Celtics pay Kevin Garnett some $25 million, does anyone really believe that the signing won't help the team? That if the Celtics instead paid $500,000 for some mediocre guy, they'd be doing just as well?
In the salary case, when you run regressions and get only 80% significance, the calculation works out differently. Suppose that going into the study, you figured there was a 99% chance that money helped buy performance (which is again conservative). Then, in a million different universes, you'd get 2,000 where the 80% signfiicance came up just by chance; and you'd get 990,000 universes where the effect is real. The chance, then, that salary actually does buy wins in this particular universe is 99.8% (990,000 divided by 992,000). The effect that Brook found is probably a real one.
(The above argument can be put into more formal mathematics using Bayesian probability, but I won't bother -- first, because it makes more sense to explain it in plain English, and, second, because I don't remember all the terminology and notation from the one Bayesian course I took in 1996.)
Here's another way to look at it, if you don't like the "multiple universes" approach.
There are two possible reasons you might get a non-significant correlation between two variables:
1. There really is no relationship between the variables; or
2. There *is* a relationship, but you haven't looked at enough data to get a high enough significance level.
Almost any relationship, no matter how strong, will give you low significance if your sample size is too small. If you look at one random Ted Williams game, and one random Mario Mendoza game, what kind of significance level will you get? Pretty low. Even if Ted goes 2-for-5, and Mario goes 1-for-5 -- both of which are more extreme than their career averages -- you won't find the difference to be significant at the 95% level. One game is just not enough.
That doesn't mean this particular experiment is useless. You can still show the effect that you found, and invite further investigation. In this case, the difference between Williams and Mendoza is huge in the baseball sense -- .400 vs. .200. As a general rule, when you find an effect that's significant in the real-life sense, but not in the statistical sense, that's an indication that you might need more data. If the observed effect does have real-life importance, you are NOT entitled to conclude that there is no relationship between the variables. You are only entitled to conclude that you need more data.
And, in my opinion, you MUST show the size of the effect you found, not just the signficance level. Brook doesn't do that in his blog post. He gives us significance levels, and r, and r-squared, but the purpose of the study was to estimate the relationship between payroll and wins. Is it $5 million per win? $10 million per win? $15 million per win? Because, regardless of the significance level, the slope of the best-fit line is still the best estimate of that relationship. And I suspect that the results are reasonable, very close to what other analysts have estimated as the rate at which you can buy wins.
I suspect if we were able to look more closely at Brook's study, we'll find that:
-- he got an estimate of wins per dollar that's close to conventional wisdom;
-- but he didn't have enough data to get statistical significance;
-- so he claims that the proper estimate of wins per dollar is zero.
That ain't right.
P.S. Probably more on this topic in the next post -- for a preview, this is why I think Brook got such low correlation.
UPDATE: Actually, I think Brook got a low correlation because the data was flawed. Details in my next post here.