Wednesday, May 06, 2009

Low statistical significance doesn't necessarily mean no effect

The "Wages of Wins" blog is written mostly by David Berri, but, as it turns out, co-author Stacey Brook also blogs. Recently, Brook had a post on the relationship between salary and wins.

He says there is none. Seriously. Not that the relationship is weak, not that money doesn't help much. Brook seems to honestly believe that salary doesn't buy wins at all. Read the full post to see if I'm interpreting him correctly, but here's a quote:

"So not only the proportion of variance that is common between the two tiny, but here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero."

I have several problems with this analysis. The first one is not unique to Brook, and it drives me nuts. It's the idea that if you do a regression, and the significance level is less than 95%, it's OK to claim that there is no relationship between the variables.

That's not always right. It's often right; I suppose you could even say it's *usually* right. But this is one of those exceptions where it's not right at all.

Let's suppose that somehow you get it into your head that rubbing chocolate on your legs can help cure cancer. So you set up a double-blind experiment, where one set of patients gets the chocolate rub, and the other set gets a rub with fake chocolate. It turns out that the first group actually improves more than the second group -- by a small amount, maybe 1%. But the result is not statistically significant. Maybe, instead of the 95% you were looking for, you only have 80% significance.

In this case, I agree with Brook -- it would be wrong to argue that the 1% improvement you saw was real. It's probably just random chance, and you'd be justified in saying that there's no reason to believe that a chocolate rub has any therapeutic value at all.

But, now, let's turn to salary and wins. Suppose you study actual NBA payrolls and records, and you find a similar small effect: every $1 million gives you 0.1 extra wins. Again, suppose that's significant at only the 80% level.

In this case, can you draw the same conclusion, that money has no effect on wins at all? No, you can't. In this case, it's likely that the effect is real, despite the low significance level.

Why the difference? Because in the first case, there was absolutely no reason to believe that chocolate can have any effect on cancer. There's no previous scientific evidence for it, and there isn't a plausible mechanism for how the effect might work.

Suppose that, going in to the study, you (generously) thought there couldn't be more than a one in a million chance that chocolate helps treat cancer. So imagine a million different universes where you run the experiment. One time, you'll get a real effect. 200,000 times, you'll get 80% significance just by chance. So the chance that the chocolate actually works in this universe is roughly 1 in 200,001. That's still no reason to believe.

But the salary case is very different. There's no basis to believe that chocolate can cure cancer, but there's very good reason to believe that spending money buys better players and leads to more wins. In fact, every serious basketball fan in the world (except maybe Stacey Brook) believes that you can buy wins. When the Celtics pay Kevin Garnett some $25 million, does anyone really believe that the signing won't help the team? That if the Celtics instead paid $500,000 for some mediocre guy, they'd be doing just as well?

In the salary case, when you run regressions and get only 80% significance, the calculation works out differently. Suppose that going into the study, you figured there was a 99% chance that money helped buy performance (which is again conservative). Then, in a million different universes, you'd get 2,000 where the 80% signfiicance came up just by chance; and you'd get 990,000 universes where the effect is real. The chance, then, that salary actually does buy wins in this particular universe is 99.8% (990,000 divided by 992,000). The effect that Brook found is probably a real one.

(The above argument can be put into more formal mathematics using Bayesian probability, but I won't bother -- first, because it makes more sense to explain it in plain English, and, second, because I don't remember all the terminology and notation from the one Bayesian course I took in 1996.)


Here's another way to look at it, if you don't like the "multiple universes" approach.

There are two possible reasons you might get a non-significant correlation between two variables:

1. There really is no relationship between the variables; or

2. There *is* a relationship, but you haven't looked at enough data to get a high enough significance level.

Almost any relationship, no matter how strong, will give you low significance if your sample size is too small. If you look at one random Ted Williams game, and one random Mario Mendoza game, what kind of significance level will you get? Pretty low. Even if Ted goes 2-for-5, and Mario goes 1-for-5 -- both of which are more extreme than their career averages -- you won't find the difference to be significant at the 95% level. One game is just not enough.

That doesn't mean this particular experiment is useless. You can still show the effect that you found, and invite further investigation. In this case, the difference between Williams and Mendoza is huge in the baseball sense -- .400 vs. .200. As a general rule, when you find an effect that's significant in the real-life sense, but not in the statistical sense, that's an indication that you might need more data. If the observed effect does have real-life importance, you are NOT entitled to conclude that there is no relationship between the variables. You are only entitled to conclude that you need more data.

And, in my opinion, you MUST show the size of the effect you found, not just the signficance level. Brook doesn't do that in his blog post. He gives us significance levels, and r, and r-squared, but the purpose of the study was to estimate the relationship between payroll and wins. Is it $5 million per win? $10 million per win? $15 million per win? Because, regardless of the significance level, the slope of the best-fit line is still the best estimate of that relationship. And I suspect that the results are reasonable, very close to what other analysts have estimated as the rate at which you can buy wins.

I suspect if we were able to look more closely at Brook's study, we'll find that:

-- he got an estimate of wins per dollar that's close to conventional wisdom;
-- but he didn't have enough data to get statistical significance;
-- so he claims that the proper estimate of wins per dollar is zero.

That ain't right.


P.S. Probably more on this topic in the next post -- for a preview, this is why I think Brook got such low correlation.

UPDATE: Actually, I think Brook got a low correlation because the data was flawed. Details in my next post here.

Labels: , , , , ,


At Wednesday, May 06, 2009 11:06:00 PM, Anonymous Ryan J. Parker said...

Aside from the point you made in the older post you linked to, is it safe to say you're making a Bayesian argument here?

In other words: are you saying that we'd set higher prior probabilities to there being a relationship between payroll and performance compared to the leg chocolate and cancer?

It's no secret I struggle with statistical significance and the correct statements / usage of the relationships measured, so I want to make sure I understand your point correctly. I look forward to the next post on the subject.

At Wednesday, May 06, 2009 11:15:00 PM, Blogger Phil Birnbaum said...

I'm not an expert Bayesian, but from my understanding, yes, I'm making a Bayesian argument, and, yes, I'm saying you'd set a higher prior probability on the salary/wins than on the chocolate/cancer.

My second attempt at an explanation says that if your results are significant in the real-world sense, but not statistically, it means you need more data before declaring the effect to be zero. That one is not a Bayesian argument. Actually, now that I think about it, it may not be a statistical argument at all -- it may be a semantic argument that you can only say "no relationship" when it means "only a relationship so small that it doesn't mean anything in real life."

At Thursday, May 07, 2009 8:42:00 AM, Blogger Unknown said...

(Not the first Ryan who left a comment)

Your second argument is one about the "power" of a statistical test and the probability of committing a Type II Error (failing to reject the null hypothesis when it is actually false).

If there was more information available about the original study, you could, without too much difficulty, conduct a simulation analysis to determine the probability that Brook's test would be vulnerable to a Type II error.

As for the first point about accepting the null hypothesis, nearly no one will credibly say that you can do this. Everything (in frequentist/non-Bayesian statistics) needs to be phrased in the language of "reject or failure to reject" the null, where "failure to reject" does not equal accept the null hypothesis.

At Thursday, May 07, 2009 11:49:00 AM, Anonymous Anonymous said...

I'm not sure why Brook thinks that you tell anything about the co-relationship between just 30 datapoints. If he expanded his dataset to inlcude more than one season, he would get his statistical significance. (I'm looking at the 4 seasons between 2004-08 and getting a significant correlation.)

At Thursday, May 07, 2009 8:09:00 PM, Anonymous Tom G said...

Without looking into any numbers, I could accept the idea that NBA salary does not effect team performance

I am reminded of the idea that there is no relationship between bodyweight and performance among NFL lineman. That is a 280 pound NFL lineman is just as likely to make the Pro Bowl as a 330 pound player. It does not mean bodyweight is unimportant, simply that the people who reach the professional level are rarely limited by their size moreso than other factors

With a salary cap, vast majority of all teams are paying market value for five starting players

Not saying it is true or not, simply that I would be skeptical of anyone who comes to any conclusion without quality analysis

At Thursday, May 07, 2009 10:41:00 PM, Anonymous Guy said...

Good catch on the USA Today data. I would add that in looking at the R^2 for salary/wins, we also need to take account of random variation in team wins. Some of the variation in wins -- I'd guess about 20% in basketball -- can't be explained by anything. That is, even if the correlation between payroll and basketball talent were 1 -- meaning GMs were perfect appraisers of talent, and no one ever got injured -- you'd still only get an R^2 of maybe .8 in one year's data. So when you find an R^2 of .25, you've really explained about a third of what's explainable. Perhaps Brooks and Berri would still say that isn't a lot, but I can't imagine many fans would agree.

At Thursday, May 07, 2009 11:50:00 PM, Blogger Phil Birnbaum said...

Right. The total variance that you're trying to explain includes random chance. That's partly why r^2 doesn't tell you much ... unless you know how much random chance there is, how do you know if an r^2 of .2 is big or small? If you can predict a batter's one-day performance with an r^2 of 20%, you're a miracle worker, because there isn't 20% left over after you take out all the randomness you find in one game's totals.


Post a Comment

<< Home