Disputing Hakes/Sauer, part I
The renowned 2003 book, "Moneyball," famously suggested that walks were undervalued by baseball executives. Jahn K. Hakes and Raymond D. Sauer wrote a paper studying the issue, in which they concluded that teams immediately adapted to the new information. They claim that, as early as the very next season, teams had adjusted their salary decisions to eliminate the market inefficiency and pay the right price for walks.
Here's the paper (.pdf): "The Moneyball Anomaly and Payroll Efficiency: A Further Investigation."
Hakes and Sauer’s claim seems to have been widely accepted as conventional wisdom, as far as I can tell. A quick Google search shows many uncritical references.
Here's Business Week from 2011. Here's Tyler Cowen and Kevin Grier from the same year. This is J.C. Bradbury from one of his books, and later on the Freakonomics blog. Here's David Berri from 2006 and Berri and Schmidt in their second book (on the authors' earlier, similar paper). Here’s Berri talking about the new paper, just a couple of months ago. Here's more and more and more and more.
I reviewed the study back in 2007, but, on re-reading, I think my criticisms were somewhat vague. So, I thought I’d revisit the subject, and do a bit more work. What I think I found is strong evidence that what the authors found *has nothing to do with Moneyball or salary.* There is no evidence there was an inefficiency, and no evidence that teams changed their behavior.
Read on and see if you agree with me. I’ll start with the intuitive arguments and work up to the hard numbers.
First, the results of the study. Hakes and Sauer ran a regression to predict (the logarithm of) a player’s salary, based on three statistics they call "eye", "bat," and "power". "Eye" is walks per PA. "Bat" is batting average. "Power" is bases per hit.
They predict this year’s salary based on last year’s eye/bat/power, on the reasonable expectation that a player’s pay is largely determined by his recent performance. They included a variable for plate appearances, and dummy variables for year, position, and contracting status (free agent/arbitration/neither).
Here are the coefficients the authors found:
eye bat power
1986 0.69 2.26 0.22
1987 1.27 3.87 0.46
1988 0.20 2.76 0.37
1989 1.15 4.04 0.50
1990 1.48 1.75 0.63
1991 1.13 1.20 0.52
1992 0.40 2.76 0.57
1993 0.71 4.42 0.65
1994 0.36 4.78 0.86
1995 2.86 5.33 0.76
1996 0.78 1.85 0.73
1997 1.84 5.80 0.52
1998 2.21 4.23 0.74
1999 2.77 3.81 0.77
2000 2.72 5.30 0.73
2001 0.53 5.28 0.84
2002 1.52 3.64 0.68
2003 2.12 3.07 0.57
2004 5.26 4.14 0.78
2005 4.19 5.38 0.86
2006 2.14 4.66 0.58
Moneyball was published in 2003 … the very next season, the coefficient of "eye" -- walks -- jumped by a very large amount! Hakes and Sauer claim this shows how teams quickly competed away the inefficiency by which players were undercompensated for their walks.
Those 2004/2005 numbers are indeed very high, compared to the other seasons. The next highest "eye" from 1986 on was only 2.86. It does seem, intuitively, that 2004 and 2005 could be teams adjusting their payroll evaluations.
But it’s not, I will argue.
First: it’s too high a jump to happen over one season. At the beginning of the 2004 season, most players will have already been signed to multi-year contracts, with their salaries already determined. You’d think any change in the market would have to show itself more gradually, as contracts expire over the following years and players renegotiate in the newer circumstances.
Using Retrosheet transactions data, I found all players who were signed as free agents from October 1, 2003 to April 1, 2004. Those players wound up accumulating 40,840 plate appearances in the 2004 season. There were 188,539 PA overall, so those new signings represented around 22 percent.
The Retrosheet data doesn’t include players who re-signed with their old team. It also doesn’t include players who signed non-free-agent contracts (arbs and slaves). Also, what’s important for the regression isn’t necessarily plate appearances, but player count, since Hakes and Sauer weighted every player equally (as long as they had at least 130 PA in 2003).
So, from 22 percent, let’s raise that to, say, 50 percent of eligible players whose salary was determined after Moneyball.
That means the jump in coefficient, from 2.12 to 5.26, was caused by only half the players. Those players, then, must have been evaluated at well over 5.26. If the overall coefficient jumped around 3 points, it must have been that, for those players affected, the real jump was actually six points.
Basically, Hakes and Sauer are claiming that teams recalibrated their assessment of walks from 2 points to 8 points. That is -- the salary value of walks *quadrupled* because of Moneyball.
That doesn’t make sense, does it? Nobody ever suggested that teams were undervaluing walks by a factor of four. I don’t know if Hakes and Sauer would even suggest that. That’s way too big. It suggests an undervaluing of a free-agent walk by more than $100,000 (in today’s dollars).
For full-time players, the SD of walks is around 18 per 500 AB. That means your typical player would have had to have been misallocated -- too high or too low -- by $1.8 million. That seems way too high, doesn’t it? Can you really go back to 2003, adjust each free agent by $1.8 million per 18 walks above or below average, and think you have something more efficient than before?
Also: even if a factor of four happened to be reasonable, you’d expect the observed coefficient to keep rising, as more contracts came up for renewal. Instead, though, we see a drop from 2004 to 2005, and, in 2006, it drops all the way back to the previous value! Even if you think the effect is real, that doesn’t suggest a market inefficiency -- it suggests, maybe, a fad, or a bubble. (Which doesn't make sense either, that "Moneyball" was capable of causing a bubble that inflated the value of a walk by 300 percent.)
In my opinion, the magnitude, timing, and pattern of the difference should be enough to make anyone skeptical. You can’t say, "well, yeah, the difference is too big, but at least that shows that teams *did* pay more, at least for one year." Well, you can, but I don’t think that’s a good argument. When you have that implausible a result, it’s more likely something else is going on.
Suppose I ask a co-worker what kind of car he has, and he says, "well, I have three Bugattis, eight Ferraris, and a space shuttle." You don’t leave his office saying, "well, obviously his estimate is too high, but he must at least have a couple of BMWs!" (Even if it later turns out that he *does* have two BMWs.)
Second: the model is wrong.
We know, from existing research, that salary appears to be linear in terms of wins above replacement, which means it’s linear in terms of runs, which means it’s linear in terms of walks. That is: one extra walk is worth the same number of dollars to a free agent, regardless of whether he’s a superstar or just an average player.
The rule of thumb is somewhere around $5 million per win, or $500K per run. That means a walk, which is worth about a third of a run, should be worth maybe around $150,000. (Turning an out into a walk is more, maybe around $250,000.)
But the Hakes/Sauer study didn’t treat events as linear on salary. They treated them as linear on the *logarithm* of salary. In effect, instead of saying a walk is worth an additional $150K, they said a walk should be worth (say) an additional 0.5% of what the salary already is.
That won’t work. It will try to fit the data on the assumption that, at the margin, a $10 million player’s walk is *ten times as valuable* as a $1 million player’s walk.
The other coefficients in the regression will artificially adjust for that. For instance, maybe plate appearances takes over the slack … if double the plate appearances *should* mean 5x the salary, the regression can decide, maybe, to make it only 2x the salary. That way, the good player’s walk may be counted at 10 times as much as it should, but his plate appearances will be counted at only 40 percent as much as they should.
There are other factors that work in one direction or another. For instance, a utility player’s walks actually *should* be worth less, since, with fewer plate appearances, differences between players are more likely to be random luck. Also, the authors used walk *percentage*, and it takes fewer walks to increase walk percentage with fewer AB. So, that will also work to absorb some of the "10 times" difference.
But there’s not guarantee all that stuff evens out … in fact, it would be an incredible coincidence if it did.
So that means that the coefficient of walks now means something other than what you think it means. And, so, when you have the coefficient of a walk jumping between seasons … you can’t be sure it’s really measuring the actual salary assigned to the walk. It could be just a difference in the distribution of plate appearances, or one of a thousand other things.
Again, I would argue that this flaw -- on its own -- is enough to have us reject the conclusions of the study. When you try to fit a linear relationship to a non-linear regression -- or vice versa -- all bets are off. The results can be very unreliable. I bet I could create an artificial example where walks would appear be worth almost any reasonable-sounding value you could name.
These two objections are nice in theory, but I bet they won’t convince many people who already believe the study’s conclusions are correct. My arguments sound too conjectural, too nitpicky. There, you have a real study with hard numbers and confidence intervals, and, here, you just have a bunch of words about why it shouldn't work.
So, next post, I’ll get into the numbers. Instead of arguing about why my coworker's sarcasm shouldn't be used as evidence, I'll try to actually show you his driveway.
UPDATE: Here's Part II.