## Sunday, September 19, 2010

### Does MLB payroll matter less than it used to?

In MLB this year, team payroll barely matters. Money is less important in 2010 than in any season since 1994.

That's according to an article a few days ago in the Wall Street Journal. Unfortunately, I don't think that's right ... or at least, I can't reproduce the result.

The article says,

"According to estimated payroll figures released throughout the season, the correlation beteween a team's player payroll and its winning percentage is 0.14, a number that makes the relationship almost statistically irrelevant. That figure is 67 percent below last year's mark and is easily the lowest since the strike."

However, when I run the same correlation, I get a correlation of .36. Where does the .14 come from? My best guess is that it's the r-squared, since .36 squared equals about .13.

It's also possible that the author of the article used different salary data than what I used -- mine is USA Today data from the beginning of the season. But could that data be so wrong as to turn a .14 into a .36? I doubt it, especially considering USA Today has numbers very similar to Baseball Reference.

The 2010 numbers are roughly in line with what I get for 2008 and 2009:

2010: correlation of .36
2009: correlation of .48
2008: correlation of .29

It seems to me that 2010 is fairly normal. BTW, because the 2010 season isn't over yet, it's actually a bit lower now than it will likely wind up -- but only by a point or two.

The actual measures of payroll vs. wins, obtained from the regressions, are also similar:

2010: \$ 8.9 MM in payroll associated with one win
2009: \$ 6.2 MM in payroll associated with one win
2008: \$12.6 MM in payroll associated with one win

These differences might look large, but they're really not, because of the wide confidence intervals associated with the estimates. For instance, the 2009 estimate has a 95% confidence interval of anywhere between \$3.6 MM and \$20.4 MM per win. The 2008 estimate is not even that different from zero wins per dollar spent, with a p value of .085.

---

Also, these results don't mean that a free-agent win actually costs this much ... other studies have shown the correct number is about \$4.5 million per win. These numbers are higher because they look at team payrolls overall, and there are other ways to get wins other than free agents. Therefore, the connection between salary and wins is looser overall than if everyone were a free agent.

For instance, suppose team A has \$50 MM worth of arbs and slaves good enough for 80 wins, while team B has \$45 MM worth of arbs and slaves only good for 75 wins.

Team A buys 5 free-agent wins for \$25 MM, bringing it to 85 wins. Team B buys 15 free-agent wins for \$75 MM, bringing it to 90 wins. Overall, team B spent \$45 MM more than team A, but has only 5 more wins to show for it. The regression shows that wins are associated with \$9 MM in spending, when in reality free-agent wins cost only \$5 MM each.

There are probably other scenarios that would give you a similar result.

---

The article says that back in 1998, the correlation between payroll and wins was a huge .71. Yup ... I ran a regression, and got .76 (the difference is probably because I used a different data source than the WSJ). The 1998 list is scary. Of the top 15 teams by payroll, only two finished below .500. And of the bottom 15 teams, only one finished above .500. In 1998, payroll was pretty close to destiny.

But that season may have been an anomaly. The WSJ article has a little graph of the trend, and 1998 was chosen for a mention because it's the highest point on the curve. Still, there's an obvious decline in correlation that takes place around 1999-2000. What might have caused that?

As I've mentioned before, a change in correlation doesn't necessarily mean that there's a real change in the relationship between the variables. If teams suddenly decide to all spend similar amounts, that will cause an apparent drop in correlation even if money is just as important as ever. (For instance, if you drop the seven highest-spending teams from the 2010 regression, as well as the seven lowest-spending teams, the correlation drops from .36 all the way down to .10, but the "dollars per win" value stays roughly the same, moving only from \$9MM to \$12MM.)

But there hasn't really been any payroll compression. In 1998, the SD of payroll was 43 percent of the mean. In 2008, 2009, and 2010, it was 44 percent, 38 percent, and 42 percent, respectively.

So what else could it be?

Was there a change in the labor agreement around then that somehow created more slaves and arbs? That would do it, because, the easier it is for poorer teams to keep cheap players, the easier it is for them to compete with a low payroll.

Or, are slaves and arbs better players now than they were then? Joey Votto will earn only \$500,000 this year ... if there are more Vottos than there used to be, scattered around the league, that would weaken the link between payroll and success.

Or, maybe with the crackdown on PEDs, older players are retiring earlier, and so slaves and arbs are getting more playing time? I like this theory, but it doesn't really explain the low correlations during the steroid years of 2000-2003.

Any other ideas?