Payroll and wins and correlation
Yesterday, Stacey Brook posted about payroll and wins. Brook ran a regression and found that, as of June 3, about a third of the way through the 2010 baseball season, the correlation between team salaries and team wins was .224. He writes,
" ...that the two variables move together just over 22%; or there is about 78% not moving together. That does not seem to me a great deal of support for the hypothesis that as team's spend more on payroll, it results in higher team winning percent (or better quality teams)."
What he's saying, paraphrased, is: ".22 is a low number. Therefore, the relationship between payroll and wins is low. QED."
But that's simplistic and wrong. You know what's an even lower number? .00142. Much lower, right? More than 100 times lower. Really small number. Turns out that's Yao Ming's height, in miles. Boy, Yao must be a pretty short little man!
On the other hand, here's a big number: 2,290,000,000. That's Yao's height in nanometers. Huge number! Yao must be really tall!
Well, which is it? Is Yao short or tall?
Obviously, the number alone isn't enough -- you need the units. Brook is simply saying "0.22 is low" without figuring the units, and that's where the problem is. (If anyone reading this thinks otherwise, I invite you to offer me 0.22 Megadollars for my car.)
What are the units of the correlation coefficient of .22? Well, Brook is right when he says it measures how the two variables move together. It means that for every 1 SD that salary moves, you'll see winning percentage moving by 0.22 SDs. Just like "one knot" is a rate of distance divided by time, the correlation coefficient 0.22 is a measure of the SD of winning percentage divided by the SD of money. So we should be able to convert that to wins per dollar. There are 5280 feet in a mile. How many wins per dollar are there in one "correlation coefficient"?
One SD of salary this year is about $38 million, or about $12 million in the 55 games so far. One SD of winning percentage so far this season is .095, or about 5 wins in the 55 games so far.
So one correlation coefficient = (5 wins /12 million dollars) = .42 wins / million dollars = 4.2 * 10^-7 win/dollar.
So 0.22 correlation coefficients equals 0.22 times that, which is 9.2 * 10^-8 win/dollar.
THAT is the number that Brook should be checking to see if it's big or small. Which is it? Well, if you take the inverse, it's about $11 million dollars per win. $11 million is the number Brook should be looking at, not .22.
At the margin, an extra $11 million in spending buys you an extra win. That's the number that the regression is telling us. That's exactly what the 0.22 means, when you figure out what units it's denominated in.
Over at "The Book" blog, Tom Tango criticizes Brook on the same grounds: that the correlation coefficient can be made as high or as low as you like just by using a larger or smaller sample. Again, it's a matter of units. If you use (say) only ten games, you get a very small correlation number, but a large unit of variance. If you use many seasons' worth of games, you get a higher correlation coefficient, but a small unit of variance. It's .001 kilometers vs. .999 meters. The numbers are extreme, but the units make up for it, and the end result is almost exactly the same.
It makes sense that the result should be the same -- after all, if one thing causes another thing at a certain rate, it should cause it at a certain rate no matter the size of the sample. If smoking causes cancer, smoking causes cancer. If payroll causes wins over 10 seasons, then payroll causes wins over 55 games. It doesn't matter that the correlation coefficient over 10 seasons is big, and the correlation coefficient over 55 games is small. They are not comparable without computing the units. Once you put in the units, they'll tell you exactly the same story, subject, of course, to the fact that the 55-game sample will have more random variation.
Another problem is that Brook dismisses the idea that money has been buying wins because the results of his regression are statistically insignificant:
"In other words in statistical terms payroll has zero effect on winning percentage at this point in the season."
That's just not right, for two reasons.
First: Suppose I claim that I can have an ability called "sensory perception," which other people call "eyesight". You toss a coin, and I will be able to tell you whether it landed heads or tails -- just by looking at it! You don't believe me. So you toss a coin. I look at it and tell you, "heads." You do it again, and I look and say "tails". Then you do it a third time, and after looking I say "tails" again.
I've called it correctly three times in a row. You run a statistical test on it, and find that the chance of me getting three in a row is 0.125 -- a lot higher than the threshold of 0.05 that you need for statistical significance.
And so you say, "in statistical terms you looking at the coin has no effect on whether you are able to guess it right."
Well, that's not a fair argument about me being wrong about having eyesight. Because, after all, I did exactly what I said I could do. If that's not enough evidence for you, that's a fault of your own experiment. You could have made me call ten tosses, or 20 tosses, or 100 tosses, and then you certainly would have had enough evidence! The fact that three tosses isn't enough to convince you is an issue with your experiment, not with real life.
It's true that your weak experiment doesn't show statistical significance for my ability to call coins. But it doesn't show statistical significance against my hypothesis of *always* being able to call coins. So the results are as consistent with my hypothesis as they are with your hypothesis -- even more so, in fact. So why are you rejecting my hypothesis but not yours?
If the results of your experiment are consistent with both your hypothesis (money doesn't buy wins) and your critics' hypothesis (money *does* by wins, at the rate of several million dollars each), you haven't proven anything.
Second, Brook contradicts himself. First, he claims that " in statistical terms payroll has zero effect on winning percentage." But then, he claims that's false:
"Over time (in other words adding more seasons) we do find a statistically significant relationship ... "
So what's the point of trumpeting the new experiment? It doesn't contradict what we already know -- it actually confirms it. If a big experiment finds a statistically significant relationship between salary and wins, and a small experiment finds approximately the same relationship, but without enough data to be statistically significantly different from zero ... then why would you argue that the small one contradicts the big one? It doesn't -- it's exactly what you would expect as confirmation!
In fairness, I think what Brook is doing is again just looking at a single number and giving a gut reaction. For this regression, he looks at the significance level, sees it's not significant, and realizes that, if not for the other study, he would be allowed to conclude that the relationship between salary and wins was zero. If he can "almost" conclude that the relationship is zero, then at the very least it must be small, right?
Well, no, not right. There's a big difference between the size of the effect, and the size of the evidence. Suppose I claim there's an elephant in the room, and show you a picture, but you choose to dismiss my claim on grounds of insufficient evidence. That doesn't give you the right to conclude, "therefore, if there IS an elephant in the room, he's probably a very small one."