Sunday, October 29, 2006

How fast did the market learn from "Moneyball?"

In “Moneyball,” Michael Lewis told the story of how the Oakland A’s were able to succeed on a small budget by realizing that undervalued talent could be picked up cheap. Specifically, GM Billy Beane realized that the market was undervaluing on-base percentage, and acquired hitters who took lots of walks at salaries that undervalued their ability to contribute to victory.

But once Moneyball was released in 2003, every GM in baseball learned Beane’s "secret." Furthermore, a couple of his staff members left that year for GM jobs with other teams. In theory, this should have increased competition for high-OBP players, and raised their salaries to the point where the market inefficiency disappeared.

Did that really happen? In “
An Economic Evaluation of the Moneyball Hypothesis,” economists Jahn K. Hakes and Raymond D. Sauer say that yes, it did.

Hakes and Sauer ran regressions for each year from 2000 to 2004, trying to predict (the logarithm of) players’ salary from several other variables: on-base percentage, slugging percentage, whether they were free agents, and so on. They found that in each year from 2000 to 2003, slugging percentage contributed more to salary than on-base percentage. But, in 2004, the first year after Lewis’s book, the coefficients were reversed – on-base percentage was now more highly valued that slugging percentage.

"This diffusion of statistical knowledge," they write, " … was apparently sufficient to correct the mispricing of skill."

But I’m not really sure about that.

The main reason is that, taking the data at face value, the 2004 numbers show not the market correcting, but the market overcorrecting.

For 2004, the study shows 100 points of on-base percentage worth 44% more salary, and 100 points of slugging worth only 24% more salary.

At first, that looks like confirmation: a ratio of 44:24 (1.8) is almost exactly the “correct” ratio of the two values (for instance, see the last two studies
here). But the problem is the “inertia” effects that Hakes and Sauer mention: the market can’t react to a player until his long-term contract is up.

Suppose only half the players in 2004 were signed that year. The other half would have been signed at something around the old, 2002, ratio, which was a 14% increase for OBP, but a 23% increase for SLG.

So half the players are 14:23, and the average of both halves is 44:24. That means the other half must be about 74:25. That is, the players signed in 2004 would have had their on-base valued at three times their slugging, when the correct ratio is only around two. Not only did GMs learn from “Moneyball,” they overlearned. The market is just as inefficient, but in the other direction!

And only free-agent salaries are set in a competitive market. The rest are set arbitrarily by the teams, or by arbitrators. If those salaries didn’t adjust instantly, then the market ratio for 2004 is even higher than 3:1. In the extreme case, if you assume that only free-agent salaries were affected by the new knowledge, and only half of players were free agents, the ratio would be higher, perhaps something like 6:1.

Could this have happened, that teams suddenly way overcompensated for their past errors? I suppose so. But the confidence intervals for the estimates are so wide that the difference between the 14% OBP increase for 2003 and the 44% increase for 2004 isn’t even statistically significant – it’s only about one standard deviation. So we could just be looking at random chance.

Also, the regression included free-agents, arbitration-eligible players, and young players whose salaries are pretty much set by management. This forces the regression to make the implicit assumption that all three groups will be rewarded in the same ratios. For instance, if free agents who slug 100 points higher increase their salaries by 24%, this assumes that even rookies with no negotiating power will get that same 24%. As commenter Guy wrote
here, this isn’t necessarily true, and a case could be made that these younger players should have been left out entirely.

Suppose that teams knew the importance of OBP all along, but players didn’t. Then it would make sense that it would appear to be “undervalued” among players whose salaries aren’t set by the market. Teams are paying players the minimum they would accept, and if those high-OBP players don’t know to ask what they’re worth, they would appear to be underpaid. That could conceivably account for the study’s entire effect, since the study combines market-determined salaries equally with non-market salaries.

So the bottom line is this: if you consider the 2004 data at face value, the conclusion is that GMs overcompensated for the findings in “Moneyball” and paid far more than intrinsic value for OBP. But because of the way the study is structured, there is reason to be skeptical of this result.

In any case, the study does seem to suggest something about salaries in 2004 was very different from the four years preceding. What is that something? I don’t think there’s enough evidence there yet to embrace the authors’ conclusions … but if you restricted the study to newly-signed free agents, and added 2005 and 2006, I’d sure be interested in seeing what happens.


At Sunday, October 29, 2006 11:49:00 PM, Blogger Phil Birnbaum said...

And one other strange thing I noticed. If you look at each of the five years in the study, there is a direct relationship between the ratio of OBP to SLG and the r-squared of the regression for that year.

Specifically, the more the regression finds OBP to be important, the worse the model works overall.

In 2004, the only year in which OBP was found to be more important than SLG, the overall r-squared of the model was .635, the lowest of the five years. In 2001, the only year in which OBP had a negative relationship to salary, the r-squared was .728 – the highest of the five years.

If you rank the five years in order of r-squared from highest to lowest, the ranking is exactly the same as if you ranked them in order of the importance of SLG, from highest to lowest. The chance of this happening randomly is only 1 in 120. If you regress SLG/OBP on the observed r-squared, leaving out the negative outlier, the r-squared is 0.895. It's only four data points, but it's still almost significant at the 5% level.

I have no idea if that means anything, or what it could mean, but I thought I’d at least mention it.

At Monday, October 30, 2006 8:55:00 PM, Anonymous Anonymous said...

Aside from the issues you raise, there's the problem of relying on a single year of performance data. Because the variance in true talent for SLG is much greater than for OBP, a single year of SLG data tells us more about a player's true talent. It's not unusual for a true .270 hitter to hit .300 (with corresponding OBP), but a true .430 SLG hitter isn't going to just slug .550 one year (Brady Anderson excepted). So salary SHOULD correlate better with last-year's SLG, even if SLG and OBP were correctly valued.

To do this correctly, you really need to look a player's career pre-contract performance data, or at least the 3 years prior to signing current contract. That's what they're being paid for. Then maybe we'd learn something.

I also wonder about including PA as an independent variable. PA is directly a function of player performance -- it is in no way independent. Since high-OBP players will tend to have the highest number of PAs, this may be contributing to the model's understating the financial reward to OBP.

At Monday, October 30, 2006 10:42:00 PM, Blogger Phil Birnbaum said...

Hey, Guy,

I thought exactly the same thing about OBP and SLG ... but I ran a test, and the year-to-year correlation for OBP was almost exactly the same as for SLG. I was pretty surprised, actually.

Agreed that looking at more than one year would be closer to what management is actually looking at, and give more reliable results.

As for PA, it makes sense that players with higher PA should make more than those with lower PA -- because those are probably better players (despite last year's OBP and SLG). This might also have something to do with the three-years-vs-one-year issue -- anyone who OBPs .375 but doesn't get many AB is probably a one-year wonder.

Also, the model assumes that if hitting for an extra .050 is worth (say) a 25% raise if you're full time, it's also worth an extra 25% raise if you're part-time. That's probably not true.

At Monday, October 30, 2006 10:45:00 PM, Blogger Phil Birnbaum said...

Also, did you notice that the "infielder dummy" variable (SS, 2B, 3B) has an effect that mostly goes the "wrong" way? You'd expect SS and 2Bs to be paid *more* for equivalent offense, but, IIRC, four of the five years they earn *less*.

At Tuesday, October 31, 2006 2:36:00 AM, Anonymous Anonymous said...

The right way to control for playing time would be to regress each component to the mean, and then re-calculate OBP and SLG.

At Tuesday, October 31, 2006 7:03:00 AM, Anonymous Anonymous said...

David: could you explain that a little more?

I'm also surprised that y-t-y r for OBP and SLG is the same for hitters. Both BB and HR have high r's, but ISO makes up a larger proportion of SLG. So I guess that isn't a problem for the model.

At Tuesday, October 31, 2006 10:39:00 AM, Anonymous Anonymous said...

Thinking some more about SLG and OBP: The SD for SLG is larger than for OBP. So even if the y-t-y correlation for the 2 stats is similar, I think we should expect the salary coefficient for SLG to be higher. That is, a point of SLG (in a single year) means more as a measure of true talent than a point of OBP. Or am I thinking about that incorrectly?

At Tuesday, October 31, 2006 11:13:00 AM, Blogger Phil Birnbaum said...

I think the r was about .7 for each stat ... assuming equal SDs between the two years, each additional point last year is 0.7 points this year, whether it's a point of SLG or OBP.

Maybe the thing is that SLG still has a lot of OBP in it (via batting average)? That would explain why both r's are so similar. What's left after batting average? Power and walks. And intuitively, both are pretty consistent from year to year.

At Tuesday, October 31, 2006 11:24:00 AM, Anonymous Anonymous said...

David: could you explain that a little more?


Well, players with few PAs will all end up bunched around the mean because their numbers will be more heavily regressed. The more PAs a player has, the further away from the mean he will stay, and the more impact he will have on the regression coefficients.

Plus, regression to the mean gives us a better estimate of a player's true talent than just his numbers.

At Tuesday, October 31, 2006 11:41:00 AM, Blogger Phil Birnbaum said...


Yeah, that would work ... good idea.

You'd probably still want to include a variable for playing time, because a (after regressing) .300 hitter who plays part time must be somewhat less valuable than a (after regressing) .300 hitter who plays part time.

Some of the reasons for being part-time would be in the study (such as position), some we've already talked about (one-year wonders), and some we haven't considered yet (defensive ability).

The problem also is injuries ... how do you tell if the .300 hitter who had only 200 AB is a good player who was injured, or just a part-time platoon player?

I'm rambling ... I guess the bottom line is that I agree that regressing to the mean would be a good thing.

Or, hey, why not just use "Baseball Prospectus" predictions of how the player would do? That's probably most in line with what GMs are thinking.

At Tuesday, October 31, 2006 11:42:00 AM, Anonymous Anonymous said...

My point was that the SD isn't the same for the two stats. Just eyeballing the 2006 data, it looks like it's about 30 points for OBP and more than 60 points for SLG. Looking at players with 400+ PAs, 64% have an OBP within 30 points of the mean, while 60% have SLG within 60 points of mean. (The distribution isn't symmetrical around league average -- there are more players above the mean than below, because the players with <400 PA are worse).

* * *

David: I see where you're going, but isn't there a problem with regressing the low-PA to the mean, since we know that as a group they are in fact below average players (based on their prior MLB and minor-lg performance). So regressing them will signif overstate their true talent.

Adjusting for PA is definitely a tricky issue. But I'm not convinced that just including it as an indep variable in the regression is the right solution.

* *

I must be getting old: I can barely read these word verifications!

At Tuesday, October 31, 2006 7:56:00 PM, Anonymous Anonymous said...

I made two mistakes. One, Guy is correct, the players should be regressed to the mean based on the number of PAs they had. So guys with less PAs would be regressed to a lower mean.

The second is that there is a really simple way to take care of the plate appearances issue: replacement level. Let's say a replacement-level player will bat .300/.400 (OBP/SLG). Well then, we can measure each player's "bases above replacement" and "extra-bases above replacement."

BAR = (OBP - .300)*PA
XBAR = (SLG - .400)*AB

Players with more playing time will get a chance to accumulate higher numbers, effectively taking care of the PA issue. Again, you would want to regress each component that goes into OBP and SLG to the appropriate mean first.

At Tuesday, October 31, 2006 7:56:00 PM, Anonymous Anonymous said...

That was me, BTW.

At Tuesday, October 31, 2006 9:03:00 PM, Blogger Phil Birnbaum said...


Hey, that would work. Of course, the Moneyball hypothesis was based on rates, and this is based on gross production. Don't know if that would be an issue or not, but, then again, including PA as a separate variable is even more problematic than that, anyway.

At Wednesday, November 01, 2006 12:55:00 AM, Anonymous Anonymous said...

Perhaps it's not a proper distinction, and maybe I'm missing the point, but if the question is (for example) how the market values OBP, is trying to find "true talent OBP" by regressing to a mean a correct step? I suppose it would be proper if we had reason to think that a similar process was actually followed by most teams. But the theory of market inefficiency is suggesting that most teams are not so sophisticated in their evaluations.

At Wednesday, November 01, 2006 10:36:00 AM, Anonymous Anonymous said...

In some ways, the problem here is the choice of methodology. If we want to find out whether OBP is/was undervalued, there are simpler (and perhaps better) ways than regression to get the answer. For example, sort players by their OBP:SLG ratio, and look at the top and bottom quartiles. For each group, calculate the group's actual offensive value (using RC, BsR, whatever your preference), and compare to average salary. If the hi-OBP/lo-SLG group is underpaid relative to its offensive production, you've got something. If not, you don't. With a decent number of players in each cell, you'd have a good shot at finding statistically meaningful results if OBP were in fact undervalued. (And, as we've all said, using multi-year or career peformance data would also be better than relying on a single season.)

Perhaps this is an unfair generalization, but it does seem to me that when economists write about baseball, there is a tendency to employ multivariate regression, even in cases where it may not be needed, or may not even be the best tool for the job.

At Wednesday, November 01, 2006 10:48:00 AM, Blogger Phil Birnbaum said...

Beamer wrote here:

"proper understanding of analytical approaches such as Markov, BaseRuns, basic stats (std dev) are actually a lot more useful than regression."

I agree. And I wonder if the reliance on regression is to satisfy peer reviewers.

Or perhaps not. One poster at BTF said that the "Solving DIPS" paper, which is one of my favorites, was not very impressive because it used only elementary statistical methods.

Another theory is that actually coming up with your own method is "harder," in the sense that there's no way to figure out the advantages and disadvantages of the method. Plus, you have to come up with the method yourself, and, if you want significance testing, you've got to figure out how to do that.

So, four theories:

1. peer review requires regression
2. cultural bias against "easy" math methods
3. regression is formulaic and thus easier
4. regression allows formal significance testing, and thus "rigor".

In any case, Guy, I'd go with your method over regression too.

At Wednesday, November 01, 2006 10:52:00 AM, Blogger Phil Birnbaum said...

Another approach would be to come up with an excellent predictor of offensive performance (use Baseball Prospectus' PECOTA, for instance). Then make your best assumption about what salary should be (linear on VORP?) and, as Guy suggests, break players into categories and check how the actual salaries compare.

You run into a problem with multi-year contracts, though, since those are based on projections of more than one year.


Post a Comment

<< Home