Sabermetric Research: r-squared abuse

Tuesday, October 30, 2007

r-squared abuse

I bet you think a Ferrari is expensive. You're wrong. I did a statistical study, and I found out that 99.999% of Bill Gates' wealth is explained by factors other than the cost of a Ferrari.

That leaves only 0.001%, not even a thousandth of one percent. That's a very small number. Clearly, buying a Ferrari doesn't affect your wealth!

Not falling for it? You shouldn’t. And neither should you fall for this Andrew Zimbalist quote (as discussed in a recent Tangotiger blog post):

"If you do a statistical analysis [of] the relationship between team payroll and team win percentage, you find that 75 to 80 percent of win percentage is determined by things other than payroll," says Andrew Zimbalist, a noted sports economist ...

This figure seems to be quoted by economists everywhere, in the service of somehow proving that "you can't buy wins" by spending money on free agents. But there are a few problems. First, it's misinterpreted. Second, it doesn't measure anything real. And, third, as Tango points out, you can make the number come out to be wildly different, depending on your sample size.

------

(For purposes of this discussion, I'm going to turn Zimbalist's statement around. If 75% is *not* determined by payroll, it implies that 25% *is* determined by payroll. I'm going to use that 25% here, since that's how the result is usually presented.)

First, and least important, Zimbalist misstates what the number means. It's actually the r-squared of a regression between season wins and season payroll (for the record, using 2007 data for all 30 teams, I got .2456 – almost exactly 25%). That r-squared , technically speaking, is the percentage of *the total variance* that can be explained by taking payroll into account. Zimbalist simply says it's "win percentage," instead of *the variance* of win percentage. But the way he phrases it, readers would infer that, if you look at the Red Sox at 96-66, payroll somehow accounts for 25% of that.

What could that mean? Does it mean that payroll was responsible for exactly 24 wins and 16.5 losses? Does it mean that without paying their players, the Red Sox would have been .444 instead of .593? Of course not. Phrased the way Zimbalist did, the statement simply makes no sense.

But that's nitpicking. Zimbalist really didn't mean it that way.

What he *did* mean is that payroll accounts for 25% of the total variance of wins among the 30 MLB teams. But there's no plain English interpretation of what that number means. The only way to explain it is mathematically. Here's the explanation:

----

Look at all 30 teams. Guess at how each one was supposed to do in 2007. With no additional information, you have to project them all at 81-81.

Now, take the difference between the actual number of wins and 81. Square that number. (Why square it? Because that's just the way variance works). So, for instance, for the Red Sox, you figure that 96 minus 81 is 15. Fifteen squared is ~~256~~ 225.

Repeat this for the other 29 teams – Arizona is 81, Milwaukee is 4, and so on. Add all those numbers up. I did, and I got 2,488. That's the total variance for the league.

Now, get a sheet of graph paper. Put payroll on the X axis, and wins on the Y axis. Place 30 points on the graph, one for each team. Now, figure out how to draw a straight line that comes as close as possible to the 30 points. By "as close as possible," we mean the line – there's only one – that minimizes the sum of the squares of each of the 30 vertical distances from the line to each point. (Fortunately, there's an algorithm to figure this out for you automatically, so you don’t have to test an infinity of possible lines and square an infinity of vertical distances.)

That line is now your new guess at how each team would do, adjusted for payroll. For instance, the Red Sox, with a payroll of $143 million, would come out to 89-73. Arizona, with a payroll of $52 million, comes out at 77-85.

Now, repeat all the squaring, this time using the projections. The Red Sox won 96 games, not 89, so the difference is 7. Square that to get 49. The Diamondbacks won 90, not 77, so the difference is 13. Square that to get 169. Repeat for the other 28 teams. I think you should get something around 1,877.

Originally, we had 2,488. After the payroll line, we have 1,877. The reduction is around 25%.

Therefore, payroll explains 25% of the variance in winning percentage.

----

This entire process, taken together, is linear regression. And it's all correct – mathematically, at least. But the statement, "payroll explains 25% of the variance in winning percentage," has meaning only within the definitions of linear regression. It means very little in terms of baseball. Indeed, the 25% is not a statement about baseball at all, or even a statement about payroll, or about wins. It is a statement about what happens when you start accounting for *squares of differences in wins*. I think of it, not completely correctly, as a statement about "square wins." And who cares about square wins?

Most statements that involve a number like 25% have a coherent meaning – when you hear those statements, you can use them to draw conclusions. For instance, suppose I tell you "toilet paper is 25% off today." From that, you can calculate:

-- the same amount of money that bought 3 rolls yesterday will buy 4 rolls today
-- if it cost $10.00 per jumbo pack yesterday, it's $7.50 today
-- when the sale is over, the price will increase by 33%
-- if I use eight rolls a week, which normally costs $4.00, and I buy an 8-week supply today, I will save $8.00, which works out to 12.5 cents per roll.

That's how you know you got useful information – when you can make predictions and calculations based on the figure.

But suppose I now tell you, "payroll explains 25% of the variance of winning percentage." What can you tell me? Nothing! Even if you're very familiar with regression analysis, I challenge you to write an English sentence about wins and payroll – one that doesn't include any stats words like "variance" – that uses the number 25%, but refers to wins and payroll. I think it can't be done, at least not without taking the square root of 25%.

You can't even tell me, from this analysis, if payroll is an important factor in wins or not. If we can't make any statement about what the 25% means, how do we know whether it's important or not? Our intuition suggests that payroll must be not very important, because the percentage is "only" 25. But that's not true, as Tango said. I'll get back to that in a bit.

-----

The thing is that the regression analysis DID tell us lots of useful information about payroll and wins. The 25% figure is actually one of the least important things we get out of it, and it's strange that economists would emphasize it so much.

The most important thing we get is the regression equation itself. That actually answers our most urgent question about payroll – how many extra wins do high-spending teams get? The answer is in this equation (which I've rounded off to keep things simple):

Wins = 70 + (payroll / $7.4 million)

This gives us an exact answer:

-- in 2007, on average, every $7.4 million teams spent on salaries added an extra win above 70-92.

Did payroll buy wins? Yes, it did – at $7.4 million per win. Can't get much more straightforward than that. If you take that number, then figure out how many wins the free-spending teams bought ... well, work it out. The Yankees spent $190 million, which should have made them 96-66. The Red Sox, as we said, should have been 89-73. The Devil Rays, at only $24 million, should have been 73-89.

And that's the answer, based on the regression, on how payroll bought wins in 2007. Based on that, you may think that means payroll is important. Or you may think payroll is unimportant. But your answer should be based on $7.4 million, not "25%" of some mathematical construction.

(By the way, I think $7.4 million is unusually high – there were few extreme records in 2007, which made wins look more expensive last year than in 2006. But that's not important right now.)

------

Which brings us back to Tango's assertion, that the 25% can be almost anything depending on sample size.

Why is that true? Because the 25% means 25% of *total variance*. And total variance depends on the sample size.

Suppose there are only two things that influence the number of wins – payroll, and luck. Since payroll is 25% of the total, luck must be the other 75%. (This is one of the reasons statisticians like to use r-squared even though it's not that valuable – it's additive, and all the explanations will add up to 100%. That's very convenient.)

Let's suppose that over a single season, payroll accounts for 100 "units" of variance, and luck accounts for 300 "units":

100 units – payroll
300 units – luck

Payroll is 25% of the total.

But now, instead of basing this on 30 teams over one season, what if we based it on the same 30 teams, but on their average payroll and wins over two consecutive seasons?

Over two seasons, payroll should cause variance in winning percentage exactly the same way as over one season. But luck will have *less* variance. The more games you have, the more chance luck will even out. Mathematically, twice the sample size means half the variance of the mean. And so, if you take two seasons, you get

100 units – payroll
150 units – luck

And now payroll is 40% of the variance, not 25%.

If you go three seasons, you get 100 units payroll, 100 units luck. And now payroll is 50%. Go ten seasons, and payroll is 77% of the variance. The more seasons, the higher the r and the r-squared. That's why Tango says

"If [games played] approach infinity, r approaches 1. If GP approached 0, r approaches 0. You see, the correlation tells you NOTHING, absolutely NOTHING, unless you know the sample size."

(Update clarification: this happens ONLY if payroll and luck are the only factors, and payroll reflects actual talent. In real life, there are many other factors, and it's impossible to assess talent perfectly -- so while the r-squared between payroll and talent will increase with sample size, it won't come anywhere near 1.)

And it goes the other way, too – the higher the variance due to luck, the lower the percentage that's payroll. Suppose instead of winning percentage over a *season*, you use winning percentage over a *game*. I did that – I ran a regression using the 4,860 games of the 2007 season, where each winning percentage was (obviously) zero or 1.000.

Now the r-squared was .003. Instead of payroll explaining 25% of total variance, it explained only 0.3%!

Payroll explains 25% of winning percentage variance over a season
Payroll explains 0.3% of winning percentage variance over a game.

But the importance of payroll to winning has not changed. Payroll has the same effect on winning a single game as it does on a series of 162 games. But, on a single-game basis, the variance due to luck increases by a huge factor, and that dwarfs the variance due to payroll.

It's like the Ferrari. It explains only 0.001 percent of Bill Gates' net worth. But it explains 1% of a more typical CEO's net worth, 100% of a middle class family's net worth, and 1,000,000% of a beggar's net worth. The important thing is not what percentage of anything the Ferrari explains, but how much the damn thing costs in the first place!

And for payroll, what's important is not what percentage of wins are explained, but how much the win costs.

As Tango notes, the r-squared figure all depends on the denominator. Just like the Ferrari, you can wind up with a big number, or a small number. Both are correct, and neither is correct. Just seizing on the 25% figure, and noting that it's a small number ... that's not helpful.

A pinch of salt has more than 1,000,000,000,000,000,000,000,000 molecules, but weighs less than 0.000001 tons. And payroll explains 25% of season wins, and 0.3% of single game wins. As Tango notes, it's all in the unit of measure. The r-squared, without its dimension, is just a number. And it's almost meaningless.

The r-squared is meaningless, but the regression equation is not. Indeed, the equation does NOT depend on the sample size at all! When I ran the single game regression, with the 4,860 datapoints, I got an r-squared of 0.03, but this equation:

Wins per game = 0.43 + (payroll / $1.2 billion)

Multiplied by 162, that gives

Wins per season = 70 + (payroll / $7.4 million)

Which is EXACTLY THE SAME regression equation as when I did the season.

------

And so:

-- the r-squared changes drastically depending on the total variance, which depends on the sample size, but the regression equation does not;

-- the r-squared doesn't answer any real baseball question, but the regression equation gives you an answer to the exact question you're asking.

That is: if you do it right, and use the regression equation, you get the same answer regardless of what sample you use. If you blindly quote the r-squared, you get wildly different answers depending on what sample you use.

So why are all these economists just spouting r-squared? Do they really not get it? Or is it me?

Labels: baseball, payroll, statistics

23 Comments:

At Tuesday, October 30, 2007 8:27:00 PM, Brian Burke said...: Here is how I try to get the same point across:

If r-squared = 0.25, then r = 0.50. This means that for every standard deviation above the mean in relative payroll, a team can expect to have 0.5 standard deviations more wins than average.

This is, for lack of a better word, HUGE.

In effect, payroll accounts for half the difference in winning percentage among teams.

If you follow through on the regression and calculate how many wins a team should expect to see based on its payroll alone, you'll see that the poorer teams are at a very severe disadvantage.

For example, the Yankees' payroll is 3.5 SD above average this year. The SD of wins in MLB is about 9.3 wins. So the Yankees should expect to be 3.5 * 0.5 * 9.3 = 16.2 games above .500. That's 98 wins!
At Tuesday, October 30, 2007 8:36:00 PM, Phil Birnbaum said...: Agreed. I said something similar here and here, I think.

I agree that the r is more informative than the r-squared. But if there's a specific question you're trying to answer, why not just use the regression equation? At $X million a win, or whatever it comes out to, you can get the same 98 wins without having to explain the function of r.
At Tuesday, October 30, 2007 9:52:00 PM, Anonymous said...: Phil: I wonder if there's a way to use the regression coefficient to quantify the impact of payroll, rather than just treating it as a subjective judgement. You find that teams spent $7.4M per win this year. We could also estimate the average cost of wins, in terms of total wins above replacement divided by total MLB payroll. Let's say that teams spend $3.7M per win on average. In that scenario, teams are spending at 50% efficiency -- buying half as many wins with their dollars as they would with perfect knowledge of player value.

It's hard to know if 50% is high or low, without comparing this over time, and to other sports. But it might be a useful metric to develop.
At Tuesday, October 30, 2007 9:59:00 PM, Brian Burke said...: But this all assumes the relationship between payroll and wins is linear. Going back to my plebe year Econ 101 I vaguely recall something called diminishing returns. I imagine an extra $30 million spent by the Royals would have a bigger impact on wins than an extra $30 mil spent by the Red Sox.
At Tuesday, October 30, 2007 10:14:00 PM, Phil Birnbaum said...: Guy: Hmmm, let me think about that. It seems to me that if wins cost, on average, $7.4 MM, then sometimes they cost more, and sometimes they cost less, but, on average, they have to cost $7.4, and not $3.7.

And arbs and slaves complicate things. So I'd have to think about that.

Besides, I've always thought the best way to figure out how much wins cost is just to look at free agent players, figure out their WAR, and see how much they got paid. Much more precise than doing regressions and trying to deal with non-free-agents and such.

Economists won't do that. Why not? Perhaps they can't publish articles using sabermetric measures of value? Perhaps the math is too simple? I don't know the answer to that.
At Tuesday, October 30, 2007 10:15:00 PM, Phil Birnbaum said...: Brian: yes, it assumes the relationship is linear. It obviously isn't, but my point was about interpreting the results more than it was about actually figuring out the cost of a win.

My preference is still to just figure out value above replacement and divide by salary.
At Tuesday, October 30, 2007 10:57:00 PM, Pizza Cutter said...: When I explain r-squared, I usually talk about a "recipe." 25% of the recipe for winning percentage is payroll.

What's been missing from this whole discussion is the idea of measurement reliability. Is 162 games a reliable sample with which to assess a team's true talent level? Probably not. If seasons were (Avagadro's number) games long, then we would have a really good idea of how good a team is. It's key because I think people are misunderstanding two different concepts.

The correlation between a variable and _itself_ measured twice will approach 1.0. The reason is that if you let a team play a billion games, you'd know exactly how good they are. If you let them play a billion more, you'd know exactly how good they are and it would be the same number. Everything correlates with itself at 1.0 It's just a matter of how many observations you need to get a reliable measurement of whatever you're interested in.

However, if you are correlating two _different_ variables, such as payroll and wins, there is some "true" value of that correlation. For example, if you let a million (rally) monkeys run a million baseball teams, we'd eventually know exactly how well payroll and winning % correlate. The drive toward the actual value over millions of observations is called the Central Limit Theorem.
At Wednesday, October 31, 2007 12:00:00 AM, Anonymous said...: Phil: I don't think it's inevitable that the regression will yield the same $/win relationship as you'll get by dividing total payroll by wins above replacement. You could have gotten a result like W = 50 + Payroll/3, which would indicate very efficient spending. Or, suppose the top 15 payroll teams won 82 games on average this year, and bottom 15 won 80. Then your regression might show the cost of a win as $12M. In that scenario, teams would be spending their money very inefficiently.

Now, such an extreme scenario presumably would not be sustainable: if teams found they essentially couldn't predict player performance, they'd just take the cheapest players and salaries would plunge. But surely there is a range of plausible efficiencies -- corresponding to varying win-payroll correlation levels -- short of perfect talent evaluation and performance projection.
At Wednesday, October 31, 2007 12:37:00 AM, Phil Birnbaum said...: Guy: I don't think you'd get the same result as the regression simply because the regression includes arbs and slaves, while the VORP is free agents only. That's why, in my opinion, the VORP method is better -- the question, "how much does a win cost," makes sense only from the standpoint of signing free agents.

Suppose you look at all the full-time free agents signed last year. You tried to figure out what their expected VORP would be (by using Marcel, or PECOTA, or whatever), and figured out $ per win. In general, you might find it worked out to $4 million to $6 million, or something. Maybe some players were $3MM, and some were $5MM, and some were $7MM, but a typical figure was around $5MM.

Wouldn't that be a reliable way to conclude that wins cost $5 million each? A lot better than running a regression and dealing with arbs and slaves and binomial deviations between expected and actual wins.
At Wednesday, October 31, 2007 6:37:00 AM, Anonymous said...: Phil: I wasn't excluding non-FAs in calculating total wins over replacement. But let's say we did and found that teams had an average of 10 FA wins at a cost of $7M per win (making those #s up). That still leaves the question of how well teams are spending their FA budgets. If r for FA$ and FA-VORP was .01, then your regression will show that each FA-win costs something like $50M. If the r is .9, then the regression coefficient will be much closer to $7M per win (the highest possible efficiency). What I'm suggesting is that the ratio of the actual cost and theoretical cost of buying wins may be a way to measure how well teams are spending their money.

Think of it this way: suppose that A-Rod signs for $1.2M next year while David Eckstein signs a $30M contract. Wouldn't that change the coefficient, making wins more expensive? (Or if not, then of what use is the regression?)
At Wednesday, October 31, 2007 9:30:00 AM, Phil Birnbaum said...: Guy: OK, I see what you're saying. If you plot FA$ vs. FA VORP and do a regression, the r will tell you how accurately teams are evaluating players when spending their money.

For instance, if Eckstein makes more than A-Rod, the r will be negative, showing that teams are dumb.

That makes sense. Is that what you meant?
At Wednesday, October 31, 2007 9:51:00 AM, Anonymous said...: Yes. I supposed it's really the same thing as calculating r for FA-$ and FA-VORP. But I think it's a bit more intuitive for most people to say that teams are spending $X to buy a win when they could be spending $X/2 (or, "teams are buying half as many wins as they would with perfect knowledge").

I suppose you could also do an evaluation that extracts random variation, such as regressing FA's Marcel projection on their salary. That would probably be a still better tool for evaluating teams' decision-making.
At Wednesday, October 31, 2007 9:58:00 AM, Tangotiger said...: Pizza, even that's not enough. You still haven't explained how much of the recipe is due entirely to the fact that MLB has only 162 games (i.e., luck).

Stating that thigns other than payroll accounts for 75% doesn't tell us anything, since all of that 75% could have been luck. You must explain how much of the variance is due solely to luck.

I'd rather you tell me what percentage of the recipe payroll is, relative to the ingredients that are available.

Randonly setting your oven at 250F or 600F will severely affect the relationship between how good the food is and the ingredients you put in!
At Wednesday, October 31, 2007 10:00:00 AM, Phil Birnbaum said...: The thing is, because of binomial variation, you can NEVER have perfect knowledge. If Eckstein is supposed to hit .260, and you sign him and just by bad luck he hits .240 (which is only 10 hits off), it would be wrong to say you bought fewer wins with your $ than you could have.

It's like saying that with perfect knowledge, you could win every sports bet and bankrupt the casino.

If wins are worth $5MM each, then some Ecksteins will (in retrospect) cost $15MM, and some will have cost $1MM, just by binomial luck. When you sign a player, you're not buying wins, you're buying a *distribution* of wins.

Over many players, the luck should even out, and the regression equation will give you the correct value. But the r (or r-squared) will represent several things, of which two are: team judgement and luck.

You could probably do a simulation. Take every full-time player signed as a free-agent last year. Make a guess as to their real talent level (it doesn't matter if you're off a bit). Assume that teams paid EXACTLY what that talent level was worth, say $5MM per win. If you regress pay on talent, you'll get an r of 1.0 and an equation that says $5 million.

Now, simulate their seasons. Rerun the test, but regress their salary on their simulated season instead of their talent. Now, r will no longer be 1.0. The difference is caused by luck, a factor that is completely unpredictable.
At Wednesday, October 31, 2007 10:32:00 AM, Anonymous said...: Phil: Agreed, you can never get an r of 1, even if teams are pricing players perfectly. But it is also true that the actual r can be (and certainly is) lower than the highest-possible r that binomial variation permits. If we took Marcels for FAs, adjusted them for position, and added MGL's true talent UZR estimate, I'm sure we would still find that the correlation with salary is not 1. My guess is it's probably pretty high (.7 or .8?), but surely not perfect.
At Wednesday, October 31, 2007 10:38:00 AM, Phil Birnbaum said...: Guy: agreed. The difference would be partially caused by luck in ability, relative to aging. Guys who don't improve between 24-5 and 26, or players who collapse between 33 and 34 instead of getting slightly worse. That kind of thing.

It's like looking at the 10-year returns on stocks. Some stocks do really well, some don't. But the average should be roughly what stock buyers are expecting, and can be used as a proxy for the rate of return investors are willing to accept.

I guess what I'm saying is: what would the value of r actually tell you about how well teams evaluate talent? Given a certain figure -- say, r=0.65 -- how would you be able to reach any conclusion about teams' ability to evaluate talent?

To me, you'd need a few anchor numbers: what's r if you include only binomial luck? What's r if you include binomial luck *and* aging luck? What's r if teams go by what the player did last season? And so on.
At Wednesday, October 31, 2007 11:26:00 AM, Don Coffin said...: Here's another, and maybe useful, way to look at it. Suppose that, in a simple regression of winning percentage on payroll, the R2 is 0.25. But surely that regression is underspecified. So we add in other variables, that need to be essentially uncorrelated with payroll (why? because if they are correlated with payroll, we get multicolinearity--a confounding of the effects of the individual variables). And let's say that the "best" regression--theoretically well-grounded, and so on--has an R2 of 0.50. Then what we can say is that differents in payroll account for half of the differences in winning percentage that we can account for. Or suppose the best we can do is a regression that explains 25% of the variance inw inning percentage--and that all comes from variation in payroll. Then, variation in payroll accounts for 100% of what we can explain. The rest may be random, or "luck," or something we can't observe or can't measure.
At Wednesday, October 31, 2007 11:29:00 AM, Phil Birnbaum said...: Doc: agreed! But someone might argue that even if 50% goes unexplained, that doesn't mean it's not there -- we just haven't figured it out yet.

But you're right, it is indeed useful to estimate that payroll is (say) 50% of the factors *that a team can control*. That's something that Zimbalist and Berri and all the rest don't consider. But it is indeed important to the practical question of how to build a winning team.
At Wednesday, October 31, 2007 11:37:00 AM, Anonymous said...: "But someone might argue that even if 50% goes unexplained, that doesn't mean it's not there -- we just haven't figured it out yet."

But we know how much can't be explained. In the case of team win% over one year, it's .039^2/SD-win%^2, or about 30% these days. So if payroll explains 25%, then you've explained 36% of what's explainable.
At Wednesday, October 31, 2007 11:43:00 AM, Phil Birnbaum said...: Guy, absolutely! Teams will be interested in what they can control, when they're looking for a set of options.

BTW, I did some math on that question here.

One nitpick ... I would argue that you mean "can't be controlled," not "can't be explained." Luck does count as an explanation, and a strong one. It's just not controllable.
At Wednesday, October 31, 2007 11:57:00 AM, Tangotiger said...: Right, once you account for the luck due to the number of trials, that will explain 30% of the observed variance.

The remaining 70% of the variance is entirely and completely due to the variance of the talent. (i.e., wins = talent + luck)

The question is how efficient is payroll tied to that 70%. The wider the disparity in how teams dabble in free agency, the less payroll will explain talent. And the better the drafting (slave wages), the less payroll will explain talent.

The correlation is not payroll to wins, but payroll to talent.
At Wednesday, October 31, 2007 12:31:00 PM, Phil Birnbaum said...: >The correlation is not payroll to wins, but payroll to talent.

Well, they're separate questions, and both important. The first asks how well payroll predicts final standings. The second asks how well teams do with their spending.

All depends what you want to know.
At Wednesday, October 31, 2007 2:47:00 PM, Tangotiger said...: Well, it seems that the largest single determinent to the standings is luck, as that explains 30% of the variance (when you play 162 games).

But, I would think that that's not the question people are really asking.

Sabermetric Research

Tuesday, October 30, 2007

r-squared abuse

23 Comments:

About Me

Previous Posts