Wednesday, October 10, 2007

An updated "Moneyball Effect" study

According to "Moneyball," walks were underrated by major league baseball teams. The Oakland A's recognized this, and were able to sign productive players cheaply by looking for undervalued hitters with high OBPs. This (among other strategies) allowed them to make the playoffs several years on a small-budget payroll.

If this is correct, then, once "Moneyball" was published and the A's thinking was made public, the OBP effect should have disappeared. Teams should have started fully valuing a player's walks, and the salaries of players excelling in that skill should have taken a jump.

About a year ago,
I reviewed a study that claimed to find such a sudden salary increase. I wasn't convinced. Now, the same authors have updated their study, with better stats and more seasons worth of data. Again, they claim to find a large effect. And, again, I am not convinced.

The authors, Jahn K. Hakes and Raymond D. Sauer, took the years 1986-2006 and divided them into four time periods. They regressed salary against OPB and SLG. They found that the value of a point of SLG didn't change much over that time period, but OBP did. It increased gradually across the first three periods, but starting in 2004, after the release of "Moneyball," it took a huge jump.

They repeated the study using a measure of bases on balls, instead of OBP (the latter includes hits, which might have confounded the results). Again, they found a huge jump in remuneration for walks starting in 2004.

The numbers are striking, but I'm not sure they mean what the authors think they do. There are several reasons for this. In comments to
a post at "The Sports Economist," Guy points out some of them. (The points are Guy's, but the commentary here is mine.)

First, the study grouped together all players, regardless of whether salary was determined by free agency, arbitration, or neither (players with little major league experience have their salaries set by the team; I'll call those "slaves"). In the regression the authors used, the hidden assumption is that for all three types, player salaries increase the same way. That is, if an additional 20 walks over 500 PA will increase a free agent's salary by 10%, it will also increase an arbitration award by 10%, and a team will even offer a slave 10% more.

That's not necessarily true. Suppose that free agent salaries rise because 20% of teams read "Moneyball." That's probably enough that almost 100% of high-OBP players have their salaries bid up. But if the same 20% of arbitrators read "Moneyball," what happens? Only 20% of salaries will increase. And, actually, it'll be less than that, because most of the teams won't be emphasizing walk rate at the hearing.

I'm sure you could come up with scenarios where the changes in compensation are due to changes in patterns between the three groups, rather than walks. For instance, suppose that slave salaries are increasing faster than free-agent salaries, and slave OPBs are increasing faster than free-agent OPBs. That could account for the observed effect. I'm not saying this is true, because I have no idea. But there are lots of hypothetical scenarios that could also account for what the study found.

Second, the authors used a very low cutoff for inclusion: only 130 plate appearances. They do include a multiplier for plate appearances, so that each PA is worth x% more dollars. However, as Guy points out, the study assumes that the performance of a part-time player is as accurate an indication of his talent as it would be for a full-time player. This can cause problems. Anything can happen in 150PA; someone who OBPs .400 in that stretch is probably a mediocre player having a lucky year, not an unheralded star.

Also, salaries probably don't correlate all that well to plate appearances. Someone with 200 PA might be a regular who got injured, or it could be a pinch-hitter to had to play regularly for a month because someone else got injured. So the assumption that salary is proportional to PA adds a lot of noise to the data.

In addition, suppose that salaries for star players are increasing faster than for part-time players. That would make sense; there is a much larger pool of mediocre players than regulars, and the competition among the ordinary players keeps their cost down. When the Rangers decided to spend $252 million, they used it to buy Alex Rodriguez, not to give ten bench players $25 million each.

If that's the case, the observed effect could be an increasing difference in walks between regulars and bench players, rather than an increase in the market value of walks.

Suppose in 2002, full-time players OBP .400 and part-time players are also at .400. In 2004, full-timers are still at .400, but part-timers drop to .300. If that happened, that would certainly account for the observed jump in OBP value. The regression would notice that suddenly the spread between the .400 guys and the .300 guys was on the rise, and would attribute that to their walks instead of their full-time status.

Did this actually happen? I don't have data for 2004 handy, but, in 2003, the full-time guys (500+ AB) outwalked the replacement guys (160-499 AB) by .103 to .093 (BB per AB). That's a difference of .010. I ran the same calculation for a few other years (this is a complete list of the years I checked. It averages all players equally, regardless of AB, and includes pitchers):

2003: .103 - .093 = .010
2002: .105 - .098 = .007
2001: .107 - .091 = .016
1997: .107 - .102 = .005
1992: .101 - .096 = .005
1987: .103 - .102 = .001
1982: .099 - .091 = .008

So there is some evidence of a recent increase in the amount by which regulars outwalk non-regulars, which corresponds to the gradual increase in OBP value the authors found. I don't have data for the jump years 2004-2006, but maybe I'll visit Sean Lahman's site and download some.

Thirdly – and this is now my point, not Guy's – there is a lag between a player's performance and his salary being set. Most free-agents would have negotiated their 2004 salary well before "Moneyball" was released. If you take that into account, the huge effect the authors found must be even huger in real life, having been created by only a fraction of the players!

This should actually make the authors' conclusions stronger, not weaker, except that if you find the size of the jump implausible, it's even more so when you take this effect into account.


Now, I'm not saying that there isn't a Moneyball Effect, just that this study doesn't measure it very well. How *can* you measure it? Here's a method. It's not perfect, but it'll probably give you a reasonably reliable answer.

First, find a suitable estimator of a player's expected performance in 2007. Bill James used to use a weighted (3-2-1) average of the player's last three years, which seems reasonable to me. You could regress that to the mean a bit, if you like. Or, you could use published predictions, like
PECOTA, or Marcel.

Now, take that estimate and figure the player's expected value to his team in 2007. Use any reasonable method: linear weights, VORP, extrapolated runs, whatever. Let's assume you use VORP.

Take all full-time players ("full-time" based on expectations for 2007, to control for selection bias) who signed a free-agent contract during the off-season. Run a regression to predict salary from expected VORP. Include variables to adjust for age, position, and so on, until you're happy.

Now, run the same regression, *but add a variable for BB rate*. That coefficient will give you the amount by which the market over- or undervalues walks. That is, suppose the regression says that salary is $2 million per win, less $10,000 per walk. That tells you that if you have two identical players, each of whom creates five wins above replacement, but where one walks 20 times more than the other, that one will earn $200,000 less. That would mean that walks are undervalued – you get less money if you have fewer hits and more walks, even if the walks exactly compensate for the hits.

Repeat this for all years from 1986 to 2006 (adjusting salaries for inflation). If the Moneyball Effect is real, you should find that the coefficient for walks is negative up to 2003, then rises to zero after 2004.

A fun side-effect is that you can include all kinds of variables, not just walks – batting average, home runs, and so forth – to see which skills are more or less valued through the years. For instance, you could check for a "Bill James" effect, to see if the perceived value of batting average drops through the 80s and 90s. You could include RBIs, to see if the market pays more for cleanup men than leadoff men. And so on.

Labels: , ,


At Wednesday, October 10, 2007 1:11:00 PM, Blogger Phil Birnbaum said...

I should say that there might not be enough free agents each year to give meaningful results ... you'd probably at least have to group the years into clusters.

But including part-time players is probably not a good idea. Including non-free-agents would ruin everything. And including free-agents who signed more than a year ago is (a) double-counting; (b) adding noise because the salary is not determined based on the right expectation; and (c) distorting the effect because multi-year contracts don't increase salaries by exactly the MLB-wide rate of inflation.


Post a Comment

Links to this post:

Create a Link

<< Home