"The Wages of Wins" on r and r-squared
In "The Wages of Wins," the authors regressed wins against salary, found an r-squared of .18, and concluded that, because .18 is low, there is a very weak relationship between payroll and performance, and therefore "teams can’t buy the fan’s love."
I have posted a few times (like here) disagreeing, and arguing that the “r” gives you more useful information than the r-squared. In that regression, the r is about .42, which is high enough to be significant in a baseball sense.
Author Stacey L. Brook disagrees. In a post on the authors’ blog today, he writes,
"Recently, some individuals who claim to have knowledge about statistics have questioned [our] conclusion. Specifically … these individuals have suggested that using the correlation coefficient – otherwise known as r – is a more “real-life” statistic to use in looking at how payroll and wins are related in Major League Baseball. As you can guess, we disagree."
(By the way, I'm not sure Dr. Brook is not necessarily addressing his post to my argument specifically. For all I know, it's someone else entirely. There’s no link in his post and his use of the plural – "some individuals" – suggests it's more than just one person.)
Why do Dr. Brook and his colleagues disagree? They say that using r "exaggerates the relationship" between the two variables. They quote a professor who agrees. They say that r-squared says that "18% of the variance" in performance is explained by salary, and the percentage of variance is the appropriate measurement to consider.
The last claim sounds reasonable, until you realize that "variance" is not being used in the normal English sense of the word. It's a technical, statistical term meaning the square of the standard deviation.
Variances are unintuitive. If the standard deviation of weight is 30 pounds, the variance of weight is 900 square pounds. If the standard deviation of professors' salaries is $10,000, the variance of professors' salaries is 100 million square dollars. And if the standard deviation of team wins is 11.6 wins, the variance of team wins is 136 square wins.
The 18% makes sense only in context of the squares of what you're actually trying to measure. If salary explains 42% of performance, then salary explains 18% of performance squared. But we sabermetricians don't care about what performance squared; we care about performance. And that's why the .42 is more meaningful than the .18.
In the early 20th century, economist Alfred Marshall famously explained how to study economics:
"(1) Use mathematics as shorthand language, rather than as an engine of inquiry. (2) Keep to them till you have done. (3) Translate into English. (4) Then illustrate by examples that are important in real life. (5) Burn the mathematics."
If we concentrate on the r = 0.42, we can follow Marshall's advice. Translating into English, and using an example that's important in real life, we can say,
"If a team spends one extra standard deviation (in 2006, about $25 million) in salaries, it should have expect to improve its performance by 0.42 standard deviation of wins (about 4.5 wins)."
If you want to burn even more of the mathematics, and you make a few additional assumptions (for instance, that wins and salary are both normally distributed), then I think you can even say,
"If a team becomes the Nth highest-spending team in the league, it will, on average, be 42% as many wins above or below .500 as the Nth winningest team in the league."
That last sentence follows Marshall's prescription; it has no math, it's significant in real life, it's in English, and it's understandable to any GM, whether he knows statistics or not.
And it's based on the r.
Labels: statistics
13 Comments:
Give her a break shes an economist and they know nothing about math. [I've taken enough economic courses by economic PhD's to know this]
I remember a prof. I had a few semesters back, who made it so clear the r-squared (and likewise r) are problem dependant: in social science research an r-square of 10% might be amazing, but totally unacceptable in biological science project or a precision instrument testing.
Another thing you need to look at is what % of "real variability" it explains. For example I estimate 162 games to have a std-dev of 6.3 games, suggests actual non-random variability of 9.7 (of course a regression can "accedently" correlate to randomness, so it's not perfect), but all of the sudden the 6.3 games/9.7 looks pretty darn good!
Out of interest I figured I could do the same with NFL (r2 = 5% => r = 0.22) if Actual = 3.34, Random = 2 Then 2.67 needs explanation, by r = 0.22 we've explained have 0.7 standard deviations, we have 28% of the unexpected standard deviation explained (much less than baseballs 65%).
A lot of people forgot sports are random events, all things being equal there will be natural variability. If this natural variability is large then any r-square will be small (no matter how good it is)... And binomeal events have a lot of error. For example if you want to be 95% confident a teams score wont differ by more that 5% of their true skill you would need to play 400 games per season. (Baseball is 8%, Football is 25%)
It would be interesting to look furthur at how well the above calculation explains how efficient the market is at determining talent (Baseball = Good, Football = worse) so long as draft picking is relatively similar.
In a regression the most important thing is the coeffients not the r-squared, whether or not a hypothesis tests passes or fails should play a larger role than the r-squared.
I am Dan Rosenbaum - an economics professor who has taught econometrics at the undergraduate, MA, and Ph.D. level. I also am an NBA statistical consultant for the Cleveland Cavaliers.
For the most part, I agree with Phil's point here. R-squared measures the fraction of the sum of squared deviations of the dependent variable minus its mean that is explained by the explanatory variable(s).
If we want to know how much "variation" is explained by a given variable (or variables), then R-squared is the appropriate measure. But in this circumstance it is not clear that that is the appropriate question.
We just want to know whether the effect of team payroll on wins is big or small. And for that question it is a lot more intuitive to discuss whether spending an extra $25 million (one standard deviation in team salary) and producing 5 extra wins (0.43 standard deviations in wins) is a big or small effect.
I agree that this is a more appropriate question to ask than how much variation that team payroll explains. This is an appropriate way to ask whether this effect is economically significant. And we can use the regression results to examine whether it is statistically significant.
(I tried to post this at the Wages of Wins site, but they are selective on what posts they allow to be on their site.)
Oh, and related to an earlier post in this series, it is very questionable whether logging salary is appropriate in regressions in sports economics.
Logging salary comes from thousands of labor economics studies that rightfully have recognized that wages (and the underlying productivity that produces the wages) is distributed log normally. The right tail of the distribution is much thicker than the left tail. And it is reasonable to assume that increasing, for example, education by one year is likely to increase wages (and productivity) by a certain percentage rather than a particular dollars per hour amount.
But logging salary is much more questionable in professional sports salary equations. It basically assumes, for example, that a given increase in OPS of 0.100 should increase salary by ten times as much for a $10 million player as it does for a $1 million player.
Given that the 0.100 increase in OPS likely increases team wins the same for both players (assuming that their plate appearances are about equal), then it would be irrational for owners to increase the salary of the $10 million player 10 times as much as the $1 million player for the same increase in OPS.
Logging salary is appropriate in the labor economics field where it originated, but in sports economics it is very unclear that it is appropriate.
Dan,
What you say makes sense, that wages in general should be log-normal. Good to know that. Thanks!
Aside from the merits of r vs r2, I think the Wages of Wins authors miss two important elements of baseball "success." One is the aspect of succeeding over a period of several years -- the relationship btwn payroll and wins becomes stronger if you look at several years of data. Second is that all wins are not equally important: the real issue is reaching the post-season, and there again the link to payroll is very strong.
The authors specifically say they are responding to this assertion by Bob Costas: “The fact is, the singled biggest indicator of a team’s opportunity for success from one year to the next is whether the team has a payroll among the top few teams in the league. Period.” Yet they seem to miss his point, since r2 of winning percentage tells us little about a team's ability to have sustained success. I made this post over at the Wages of Wins site:
So let’s look at the Costas statement. The top 6 teams in payroll account for 11 postseason appearances over the past 3 seasons, getting into the playoffs about 2 years out of three on average. The bottom 6 payroll teams account for zero playoff appearances in that period (and the bottom 11 payroll teams have just one playoff spot over 3 seasons). If we define “success from one year to the next” as making the postseason with some frequency, I’d say Costas’ statement stands up extremely well.
All wins are not created equal. We don’t really care if our team wins 69 games or 79 games. What fans care about is whether their team can make a run at the playoffs on a somewhat regular basis, and get there a non-trivial portion of the time. Clearly, a high payroll greatly increases a team’s chances of doing that. Neither r nor r2 really get at the issue very well (although payroll and playoff appearances have a not too shabby r of about .64).
I suppose you could argue that avoiding a humiliating sub-70 win season is also a kind of success. But in fact there is also a very strong correlation between having an extremely low payroll and that kind of sustained futility.
Can mid-range payroll teams succeed? Sure. But there appears to be athreshhold of around $60M below which success is nearly impossible. And a payroll of about $90M or more gives a team a far-above average chance of success. If you define “success” correctly, then the notion that payroll is not an important determinant of success in baseball is simply not plausible.
Guy,
Agreed, not plausible at all. The claim that signing an talented but expensive free agent is irrelevant to winning is so extraordinary, and so contrary to common sense, that you'd swear it couldn't be so.
And when you look at the evidence, it turns out, that, indeed, it isn't so. And it's not like the evidence is subtle, that the link between payroll and wins is hiding in the shadows somewhere. Almost any way you look at the question, any piece of evidence, will show it's true, including the "r-squared = .18" finding.
To me, it seems like a no-brainer, and I really don't understand where the TWOW guys are coming from on this.
Hi, Beamer,
Do you mean "at the *very least* it isn't unimportant?" Because I would argue that a correlation of 0.43 is quite important, especially in light of the noise caused by non-free-agents (as I argued here).
And as far as the argument about whether or not salary is the "most important" factor:
Suppose you're consulted for advice from a GM who wants to make his team a few wins better. Can you think of any more important strategy than "buy some good free agents?"
That is, if money isn't the most important factor, what is? And what are the other candidates?
Drafting skill? Player evaluation? Trading for young players? Concentrating on OPS?
To be honest, as much as I like to think sabermetrics is important, there's no way it can it can find a way to improve your team as easily as just signing a couple of superstars.
Can anyone make an argument, even a half-assed argument, that what Bob Costas says is wrong, that some specific other thing than payroll is the most important factor?
To answer my own last question, one suggestion might be: luck. It could be that luck is more important than payroll when it comes to winning pennants.
The only way payroll could not be one of the most important factors, I think, is if the vast majority of players were non-arb eligible (like, if it took 6 yrs for arb, and 10 for FA). Then, good drafting, farm system, etc. would matter more. But that isn't the case.
Otherwise, what the authors have to be arguing is either that 1) teams have almost no ability to identify talent at all, or 2) teams -- at least some teams -- are better at evaluating the potential of HS and college players than they are at evaluating the talent of veteran FAs. Both are absurd claims.
"To me, it seems like a no-brainer, and I really don't understand where the TWOW guys are coming from on this."
This is a good example of a conclusion so stupid that only smart people could arrive at it. By that I mean that simple observation of team success, together with rudimentary knowledge of payroll size, would lead any fan to understand that $$ = wins. Only by getting deep inside a complex statistical analysis -- and having excessive faith in its ability to answer the question -- could anyone accept this kind of plainly wrong conclusion.
With the caveat being that I have not yet read the book nor any of the reviews, since a team's wp in any given year or years can be described as a binomial with an N of 162 (for one year), there is obviously a large random variance. THAT is one reason why they got an r^2 of .18. That in and of itself does not tell us how "important" payroll is in terms of team talent, which is the interesting question, not how well payroll explains team wp.
What if payroll described team talent perfectly (r=1) but we only looked at 5 game increments? The r and r^2 would be close to zero I assume.
What is interesting and important is how payroll describes team talent, NOT team wp. The authors should have explained that. Did they?
Why did they even choose one year wp to be the dependent variable? Why not 2 years or 5 years? If they did (choose more years) the number (r or ^2) would have been higher. So that tells us that the correlation is meaningless unless we qualify it by giving the random variance associated with the sample size chosen for the dependent variable OR there is some magic significance to one year team records.
I guess there is some significance to those one year records in that from a league standpoint it tells us how much "false" parity to expect. IOW, regardless of how much payroll correlates with team talent, it gives us an idea as to how much "extra" variance to expect in one season.
I also think that the debate over r or r^2 is silly. The debate should be what does the .18 or .42 mean in terms of significance and magnitude and at the very least the authors should have explained (and maybe they did - I don' know) this:
This is the variance expected by chance given a sample of 162 games for one of the variables. Given that, this is what we expect of the "r" (or r^2) if team payroll correlated perfectly with team TALENT. Since we came up with an r or r^2 of this, we conclude that team payroll explains this percent of team talent.
IOW, the interesting thing is to correlate team payroll with estimated team talent and NOT team wp, since there is already lots of random variance (lowering the r) in team wp, everyone knows that, and using one year is somewhat of an arbitrary number for the sample size of the dependent variable.
As Tango likes to do when you have these "funny" correlations (regressions) where one of the variables is a binomial (or close to it) of sample size N, where the resultant "r" is critically a functio of N, is to first give the maximum possible "r" if the correlation between one of the variables and the individual population means of the other variable (not the sample means based on sample size N) were perfect. If you do that, it puts the sample "r" (in this case, .42) in perspective.
For example, someone above suggested that the maximum possible r^2 is around .36, if payroll were perfectly correlated to team talent. I'll take their word for it. That means that an r^2 of .18 is pretty significant IMO even though by itself it is quite low, since it "explains" half of everything that's left over after you remove the variance by chance.
If I am running a team, I certainly want to know what correlates best (and by how much) with my team's talent and NOT their record in one year, as that is going to be misleading as I have explained above.
Hi, MGL,
I got .69 as the maximum r-squared, instead of .36, here ...
Post a Comment
<< Home