## Friday, November 03, 2006

In “The Wages of Wins,” the authors determined that the correlation between payroll and wins was about 0.4. I interpreted that to mean that for every dollar a team spends on free agent signings, 40% of that money can be expected to translate into wins. But I was wrong – 40% is way too low.

Suppose 30 boys each have some marbles. They all have different numbers of marbles, collected over a childhood – some they got for their birthday, some were hand-me-downs from their siblings, some they found in the schoolyard. Some of the kids have only 10 marbles, but some have as many as 40.

Now, each boy is given \$2, and let loose in the toy store, where he can buy toy store marbles at 10 cents each. Some won’t buy any toy store marbles, while the ones who really like marbles might blow the entire \$2 and buy twenty of them.

Suppose that now, economists run a regression, to try to predict the number of marbles the child has based on how much he spent at the toy store. There is some correlation, because the kids who bought marbles will, obviously, tend to have more of them after the purchase. But the relationship isn’t perfect, because the original distribution of marbles was pretty much random. So they might get a correlation coefficient of, say, 0.4.

They conclude that such a small correlation means that “money can’t buy marbles.”

But that’s wrong, or course. Money can, with certainty, buy marbles, at the rate of 10 cents each. While the correlation between the money you spend and the marbles you *have* is 0.4, the correlation between the money you spend and the marbles you *buy* is 1.0.

Because some of the marbles arrived from sources other than money, they act as random noise in the regression. They make it appear, at first, that money only buys marbles at a 40% rate, when, really, the rate is 100%.

You probably see where this is going.

The boys are baseball teams. The marbles are wins. The toy store marbles are wins from signing free agents. And the legacy marbles the boys had are players not yet eligible for free agency.

The new argument goes something like this:

Team payroll is highly correlated with player ability only in the case of free agents. For young players, and non-free-agent players, salary is based more on years of experience than on performance – and besides, those salaries are very small compared to the amount of money spent on free agents. For instance, Albert Pujols made only \$700,000 in 2003 not because that’s all he was worth, but because he was only in his second major-league season.

It turns out that the correlation between team payroll and wins is 0.4. But the correlation between non-free-agent payroll and wins is probably close to zero. Therefore, to make the overall correlation between payroll and wins rise to 0.4, the correlation between free-agent payroll and wins must be significantly higher than 0.4. That is:

Non-free-agent correlation .................. 0
Free-agent correlation ...................... x

------------------------------------------------
Overall “average” correlation .............. 0.4

X must be way higher than 0.4 for this to work. (A naive estimate might be 0.8, which is probably too high. But it’s got to be higher than 0.4.)

So, of course money can buy wins. Not with a correlation of 1.0 like for marbles – marbles, of course, don’t get injured or have off years. But, yes, if you spend a bunch of money on free agents, you’re going to improve your team substantially, more substantially than the 0.4 of the simple regression suggests.

Labels:

At Friday, November 03, 2006 10:30:00 PM,  Phil Birnbaum said...

Thanks to Taylor Brinkman, who gave me the idea for this in his hockey article when he wrote,

"Given the degree of physicality permitted in hockey ... players have reached or even passed their prime performance years once they reach [free agent] age. As a result, building a team through high-priced free agent acquisitions is a less viable strategy than ... in ... baseball"

It occured to me that if free agents are less important because of quality, they're also less important because of numbers.

Of note, though, is that the 0.4 correlation is about the same in baseball as it is in hockey.

At Saturday, November 04, 2006 4:56:00 PM,  Anonymous said...

I like this idea. I found that sometimes there is a high correlation between where a team ranks in salaries and where they rank in winning percentage. It is at Salaries and Wins.

At Monday, November 06, 2006 8:02:00 PM,  Ted said...

My reaction to this is that I've always considered, and been trained to consider, 0.4 as being a pretty significant (in the economic sense, not the statistical sense) correlation. Phil's exposition does add to just why one should look at a 0.4 correlation as being strong evidence, though.

At Friday, January 26, 2007 5:10:00 PM,  Anonymous said...

Actually, it's possible that both the non-free agent(NFA) correlation and the free-agent(FA) correlation are 0, but the overall "average"(OA) correlation is 0.4 (or 0.99 or ...). As an oversimplified example, suppose there's no correlation between non-free agent payroll and wins-- everybody is at \$500,000 per.
The teams that don't get free agents are all at .400. the teams that do get free agents (at about \$5,000,000 per) all end up at .600. So there's no correlation between NFA and wins, there's no correlation between FA and wins, but there is a near 1.0 correlation between OA and wins (make the st.dev. in FA small and the OA vs. W goes closer to 1). Linear correlations are non-linear.

At Friday, January 26, 2007 5:55:00 PM,  Phil Birnbaum said...

John, in your example, there would be a high correlation between FA and wins: the teams with 0 FA are at .400, and the teams with high FA are at .600.

Agreed that there's no correlation between FA and wins *among only those teams who sign FA*, but that's not what's being measured.

At Saturday, January 27, 2007 12:11:00 AM,  Anonymous said...

My fault, I should have known better than to over-simplify. Consider the folowing three teams:
.400 team \$10M on non-free-agents(NFA), \$45M on free agents (FA)
.500 team \$3M on NFA, \$52.05M on FA
.600 team \$10M on NFA, \$45.1M on FA
The non-free agent correlation is 0.00
The free agent correlation is +0.01
The overall payroll correlation is +1.00
(\$55M,\$55.05M,\$55.1M)
If I add more teams, I can make them combine even more weirdly.

At Saturday, January 27, 2007 12:36:00 AM,  Phil Birnbaum said...

Yup, good example, thanks. Could we argue, though, that this state of affairs is unlikely to arise?

I mean "unlikely" in an informal sense, but you could formalize it, I guess: suppose you took the universe of distributions with reasonable constraints on salary amounts that (a) led to a zero correlation among NFAs, and (b) led to an 0.4 correlation overall. Then, find the probability that the correlation among FAs is greater than 0.4. I bet that probability would be large.

At Saturday, January 27, 2007 12:24:00 PM,  Anonymous said...

As an first approximation, I looked at salary for players with less than 7 years (or did not play) as the NFA group, more than 6 years as the FA group.
Using the 22 years on the Lahman database,
the average correlation for NFA was +0.14(st.dev. .19), for FA was +0.28(st.dev .19), and for total payroll +0.35(st.dev .19).
In 22 years the highest correlation came from
NFA 6 times, FA 5 times, Total 11 times.
There were 6 seasons when the NFA/WL correlation was negative, FA was higher than total 5 times, the six triples were
(-.05,.26,.22) (-.14,.00,-.06) (-.08,.26,.21)
(-.10,.36,.33) (-.09,.34,.33) (-.04,.41,.44)
This isn't exactly a free agent/non-free agent split, but it is a reasonable set of constraints, and backs up your assertion that FA will likely be larger than total correlation, but is against your assertion that the difference should be large. Also, there seemed to be slightly more than nonzero correlation between wins and young players. Using retrosheet transactions database, would probably allow a better study to be done. In summary, statistics can be non-intutive, and it's not unusual for the correlation betweeen wins and the sum to be larger than the individual correlations.

At Saturday, January 27, 2007 1:36:00 PM,  Phil Birnbaum said...

Hi, John,

Excellent, thanks for all the effort.

It seems like the overall correlation is (very, very roughly) the sum of the FA and NFA correlations, rather than the average (which I assumed).

Which puts a whole new slant on my assertion that money really does buy wins ... your analysis suggests that money buys wins LESS than you'd assume from the raw numbers, where I assumed more. That's assuming "money buying wins" refers to free agents only.

However, it's not that much less -- overall, .28 for the NFL instead of .35. So, in the baseball case, where "The Wages of Wins" found a correlation of .4, we might guess (roughly) that the free-agent correlation is maybe .35.

Does that sound right to you?

At Saturday, January 27, 2007 2:20:00 PM,  Anonymous said...

Roughly. We really ought to come up with a better way to define the FA/NFA split and run the numbers again. Then we'd have a bit more than our educated guess.

At Saturday, January 27, 2007 2:29:00 PM,  Phil Birnbaum said...

Another thing too is that the correlation between salary and wins is not necessarily the best way to answer the question of whether "money can buy wins".

What most people mean by money buying wins is spending on free agents at the margin. Suppose every team spends roughly the same amount on free agents of roughly the same quality, but, just by luck, some players outperform others randomly. The correlation between payroll and wins will be close to zero -- but any team that *doesn't* spend the money free agents will lose a lot of games.

In that case, (a) marginal money definitely buys marginal wins, but (b) total payroll is not strongly correlated to wins.

I much prefer old-fashioned methods. If we observe that player A costs \$10 million, and creates two wins above replacement, we can conclude that wins cost about \$5MM each.

To me, the approach "The Wages of Wins" took is far less reliable than using sabermetric measures. But the line of investigation here is arguing about how to interpret the results of the TWOW study, not about whether how appropriate or meaningful the study is.