Friday, June 04, 2010

Payroll and wins and correlation

Yesterday, Stacey Brook posted about payroll and wins. Brook ran a regression and found that, as of June 3, about a third of the way through the 2010 baseball season, the correlation between team salaries and team wins was .224. He writes,

" ...that the two variables move together just over 22%; or there is about 78% not moving together. That does not seem to me a great deal of support for the hypothesis that as team's spend more on payroll, it results in higher team winning percent (or better quality teams)."

What he's saying, paraphrased, is: ".22 is a low number. Therefore, the relationship between payroll and wins is low. QED."

But that's simplistic and wrong. You know what's an even lower number? .00142. Much lower, right? More than 100 times lower. Really small number. Turns out that's Yao Ming's height, in miles. Boy, Yao must be a pretty short little man!

On the other hand, here's a big number: 2,290,000,000. That's Yao's height in nanometers. Huge number! Yao must be really tall!

Well, which is it? Is Yao short or tall?

Obviously, the number alone isn't enough -- you need the units. Brook is simply saying "0.22 is low" without figuring the units, and that's where the problem is. (If anyone reading this thinks otherwise, I invite you to offer me 0.22 Megadollars for my car.)

What are the units of the correlation coefficient of .22? Well, Brook is right when he says it measures how the two variables move together. It means that for every 1 SD that salary moves, you'll see winning percentage moving by 0.22 SDs. Just like "one knot" is a rate of distance divided by time, the correlation coefficient 0.22 is a measure of the SD of winning percentage divided by the SD of money. So we should be able to convert that to wins per dollar. There are 5280 feet in a mile. How many wins per dollar are there in one "correlation coefficient"?

One SD of salary this year is about $38 million, or about $12 million in the 55 games so far. One SD of winning percentage so far this season is .095, or about 5 wins in the 55 games so far.

So one correlation coefficient = (5 wins /12 million dollars) = .42 wins / million dollars = 4.2 * 10^-7 win/dollar.

So 0.22 correlation coefficients equals 0.22 times that, which is 9.2 * 10^-8 win/dollar.

THAT is the number that Brook should be checking to see if it's big or small. Which is it? Well, if you take the inverse, it's about $11 million dollars per win. $11 million is the number Brook should be looking at, not .22.

At the margin, an extra $11 million in spending buys you an extra win. That's the number that the regression is telling us. That's exactly what the 0.22 means, when you figure out what units it's denominated in.


Over at "The Book" blog, Tom Tango criticizes Brook on the same grounds: that the correlation coefficient can be made as high or as low as you like just by using a larger or smaller sample. Again, it's a matter of units. If you use (say) only ten games, you get a very small correlation number, but a large unit of variance. If you use many seasons' worth of games, you get a higher correlation coefficient, but a small unit of variance. It's .001 kilometers vs. .999 meters. The numbers are extreme, but the units make up for it, and the end result is almost exactly the same.

It makes sense that the result should be the same -- after all, if one thing causes another thing at a certain rate, it should cause it at a certain rate no matter the size of the sample. If smoking causes cancer, smoking causes cancer. If payroll causes wins over 10 seasons, then payroll causes wins over 55 games. It doesn't matter that the correlation coefficient over 10 seasons is big, and the correlation coefficient over 55 games is small. They are not comparable without computing the units. Once you put in the units, they'll tell you exactly the same story, subject, of course, to the fact that the 55-game sample will have more random variation.


Another problem is that Brook dismisses the idea that money has been buying wins because the results of his regression are statistically insignificant:

"In other words in statistical terms payroll has zero effect on winning percentage at this point in the season."

That's just not right, for two reasons.

First: Suppose I claim that I can have an ability called "sensory perception," which other people call "eyesight". You toss a coin, and I will be able to tell you whether it landed heads or tails -- just by looking at it! You don't believe me. So you toss a coin. I look at it and tell you, "heads." You do it again, and I look and say "tails". Then you do it a third time, and after looking I say "tails" again.

I've called it correctly three times in a row. You run a statistical test on it, and find that the chance of me getting three in a row is 0.125 -- a lot higher than the threshold of 0.05 that you need for statistical significance.

And so you say, "in statistical terms you looking at the coin has no effect on whether you are able to guess it right."

Well, that's not a fair argument about me being wrong about having eyesight. Because, after all, I did exactly what I said I could do. If that's not enough evidence for you, that's a fault of your own experiment. You could have made me call ten tosses, or 20 tosses, or 100 tosses, and then you certainly would have had enough evidence! The fact that three tosses isn't enough to convince you is an issue with your experiment, not with real life.

It's true that your weak experiment doesn't show statistical significance for my ability to call coins. But it doesn't show statistical significance against my hypothesis of *always* being able to call coins. So the results are as consistent with my hypothesis as they are with your hypothesis -- even more so, in fact. So why are you rejecting my hypothesis but not yours?

If the results of your experiment are consistent with both your hypothesis (money doesn't buy wins) and your critics' hypothesis (money *does* by wins, at the rate of several million dollars each), you haven't proven anything.

Second, Brook contradicts himself. First, he claims that " in statistical terms payroll has zero effect on winning percentage." But then, he claims that's false:

"Over time (in other words adding more seasons) we do find a statistically significant relationship ... "

So what's the point of trumpeting the new experiment? It doesn't contradict what we already know -- it actually confirms it. If a big experiment finds a statistically significant relationship between salary and wins, and a small experiment finds approximately the same relationship, but without enough data to be statistically significantly different from zero ... then why would you argue that the small one contradicts the big one? It doesn't -- it's exactly what you would expect as confirmation!

In fairness, I think what Brook is doing is again just looking at a single number and giving a gut reaction. For this regression, he looks at the significance level, sees it's not significant, and realizes that, if not for the other study, he would be allowed to conclude that the relationship between salary and wins was zero. If he can "almost" conclude that the relationship is zero, then at the very least it must be small, right?

Well, no, not right. There's a big difference between the size of the effect, and the size of the evidence. Suppose I claim there's an elephant in the room, and show you a picture, but you choose to dismiss my claim on grounds of insufficient evidence. That doesn't give you the right to conclude, "therefore, if there IS an elephant in the room, he's probably a very small one."

Labels: , , , ,


At Friday, June 04, 2010 11:24:00 AM, Blogger Phil Birnbaum said...

I should also mention that you don't even need to do that units/SD thing if you actually have the regression equation.

For Brook's data, the regression equation explicitly tells you a dollar buys 1/11.5 millionth of a win.

At Friday, June 04, 2010 11:55:00 AM, Anonymous Anonymous said...

clearly stated, Phil; nice use of analogies.
Tom H

At Friday, June 04, 2010 8:35:00 PM, Blogger Vic Ferrari said...

Thanks Phil, I've learned a tonne about frequentist statistics and regression from reading this blog.

And correct me if I'm wrong, but if we were to guesstimate the distribution of non-payroll factors on results:

The non-payroll variance, in terms of a percentage of the wins distribution, is 1-.22² = .95

The Wins so far are distributed with a SD of about 5, you say, so a variance of about 25. .95 x 25 = .2375 ... that's the non-payroll wins variance at 55 games, no?

And the binomial chance variation of the wins distribution, at 55 games, is roughly .55 x .5 x .5 = .1375.

So the non-chance/non-payroll variation of wins is 0.10 so far.

Thus .10/.25 = 40% of the results. And 60% is attributable to a team having more money to spend. Is that right?

If I haven't made a mistake with the arithmetic, then even if the number of games played (I'd call that frame size, perhaps incorrectly) were augmented by a million with the influx of data from parallel universes ... Pearson's r² would be expected to settle in at around .6? i.e. the r would be about .77?

I know I've made a raft of assumptions ... payroll very obviously isn't distributed normally, and wins is too small a sample to call ... looks too flat to be normal though. Still, ballparkish, it's probably in that general range.

That's a hell of a difference in management between MLB teams, at least on the surface. Granted sked difficulty, injuries and other things I'm ignorant of are surely factors in that 40% as well. Still, it's a whack.

Do I have that right, Phil?

At Friday, June 04, 2010 11:25:00 PM, Blogger Phil Birnbaum said...

Hmmm ... the way I have it is same as you:

Total variance = 25

Payroll variance = .05 * 25 = 1.25
Non-payroll variance = 23.75

Non-Payroll variance = Binomial Variance + Other variance

Binomial variance = 13.75
Other variance = 10

But now, I get, eliminating binomial variance, payroll variance is 11.1% of the total (1.25 / 11.25).

Put another way, in terms of percentage of variance explained:

Payroll 1.25/25 = .05
Luck (in theory, binomial) 13.75/25 = .55
Other 10/25 = .40

Over 162 games, you get roughly 4 parts luck variance to 9 parts other variance. So over a third of that, you get 12 parts binomial luck variance to 9 parts binomial non-luck variance. We're seeing 11 to 9 instead of 12 to 9. Hey, not bad.

At Saturday, June 05, 2010 1:42:00 PM, Blogger Vic Ferrari said...

Phil said:

Put another way, in terms of percentage of variance explained:

Payroll 1.25/25 = .05
Luck (in theory, binomial) 13.75/25 = .55
Other 10/25 = .40

That's the clearest way of expressing it, I think.

I was making the bold assumption that the Payroll element would consume the chance variation element, given enough time.

I think we can agree that the game outcome binomial chance variation is reasonable. As well the separation of this variance by subtraction is wholly reasonable, and independent of the non-luck (for lack of a better term).

So we're agreeing on a rough guess of the non-luck variance, after 55 games, being:
11.1% payroll
88.9% non-payroll.

From here it's a bear. The former is clearly right skewed and the latter left skewed.

And the latter is composed of a whack of things, some of which are surely correlated with the former.

I'm sure everyone would agree that difficulty of sked is a factor in non-payroll variance so far. We'll call that element I. And teams that have had rougher schedules so far, say more road games and better opponents, most of them should have easier schedules going forward. NL teams will probably see a bit of a bump on the whole, and AL teams the opposite, this as more interleague play is completed, but on the whole this should shrink. Agree?

And I don't know if is related to team salary or not. Do teams with higher payrolls tend to play in stronger divisions? The opposite?

If element II of the non-payroll distribution is injuries to good players ... that effect on wins should shrink with time as well. At least I would think so. I mean teams who have had perfect health so far have nowhere to go but down, and teams that have lost man games to their best players are more likely to improve. That's a guess, I'll happily be proven wrong.

And if Element III is 'good young players on cheap contracts' ... that will persist, and will surely take a bite out of the luck distribution as it shrinks. Even moreso if a guy assumes that cheap teams now were likely cheap teams in recent years, so they probably drafted high and have better young guys, on the whole. I presume there is a negative relationship between payroll and element III.

Then again, maybe that's offset by the shrinkag expected from I and II. I don't know.

And on and on, I suppose.

At Saturday, June 05, 2010 5:25:00 PM, Blogger Vic Ferrari said...


In an effort to reduce the impact of variation due to chance in the games, as well as difficult stretches of schedule, I wrote a brutishly simple sim using 2009 results data from and salary data from

It's here:

what it does is take 5 random home games from the season for each team, and five random road games. Then it sums up the total wins.

Then it checks the correlation with salary. Pearson's r.

repeat that 10,000 times and take the average r, as well as the 500th lowest r and 500th highest r and you get something like:

10, 0.20, -0.09, 0.45

Meaning that choosing 5 road and 5 home games from the seaosn for each team at random ... r=.20 would be the best guess at the correlation of wins to salary. And that we would expect it to be -.09 or lower about 5% of the time, and .45 or higher about 5% of the time.

The same is repeated for 20 games, 30 games, etc. It may take a while to run, btw.

At Saturday, June 05, 2010 6:10:00 PM, Blogger Phil Birnbaum said...

Hi, Vic,

Interesting. It means the number Brook found, 0.224, is significant at around the 5% level (using your 50-game line as the basis, and raising the r's a bit to account for the fact that we've had 55 games, not 50). So, this year, wins have been more expensive than usual.

That confirms the 2010 regression result ... $11.5 million per win is a lot higher than normal.

We'll see if it evens out over the rest of the season.

At Saturday, June 05, 2010 6:11:00 PM, Blogger Phil Birnbaum said...

Actually, a little less significant than 5%. It would be 5% if there had been 60 games instead of 55.

At Sunday, June 06, 2010 5:52:00 PM, Blogger Martin Monkman said...

I've re-run and confirmed the regression, and written a longer analysis that I hope shines some light on the results.

You can find it at my blog:

At Tuesday, June 08, 2010 12:28:00 AM, Blogger Vic Ferrari said...


I googled A.C. Thomas to see if he had written anything about hockey lately. I'm a fan of the Edmonton Oilers NHL team, we have a top prospect named Riley Nash who plays for Cornell, and recently Thomas had written an article based on real time data from Harvard games (same league). Beyond the math, it was highly suggestive of a well coached league. As opposed to Canadian Major Junior hockey, which is largely a gong show in this regard.

Anyhow, he'd written nothing more on hockey, but I noticed this blog entry which gives props to your inspired post on regressions and linear weights from last fall. That's high praise from a clever dude.

Perhaps you were already aware. I thought I'd mention it anyways.

At Wednesday, July 21, 2010 8:05:00 PM, Anonymous Alex said...

Sorry for the late comment; just came across the post. I have two questions: First, if salary is related to wins, then my favorite team should be able to give everyone a raise and watch the wins come rolling in, right? Second, since this is obviously false, can't we agree that salary is only related to wins to the extent that GMs/owners can identify better players and pay them more, and there is no relationship between salary and wins once you account for player quality?

At Wednesday, July 21, 2010 9:17:00 PM, Blogger Phil Birnbaum said...

Hi, Alex,

Yes, agreed. Money buys you wins only insofar as you use it to buy better players.


Post a Comment

<< Home