Did the baseball salary market anticipate DIPS?
According to the famous Voros McCracken DIPS hypothesis, pitchers have little control over what happens once the ball is hit off them. As long as they stay in the park, balls hit off bad pitchers are no more likely to drop for hits as balls hit off good pitchers. No matter who the pitcher is, his batting average on balls in play (BABIP) should be roughly the same.
If that’s true, then teams shouldn’t be evaluating pitchers based on their BABIP, since it’s not evidence of skill. And, therefore, they shouldn’t be paying players based on BABIP.
J.C. Bradbury’s study, “Does the Baseball Market Properly Value Pitchers?” sets out to check whether that is actually the case.
The paper starts off by verifying whether the DIPS hypothesis holds. That part of the paper is technical and hard to summarize concisely, so I’ll skip the details and run it down in one paragraph. What Bradbury does is, first, he shows that if you choose the right combination of variables to predict this year’s ERA, adding last year’s ERA and BABIP doesn’t help with the prediction. Then, he runs a second regression to see what other statistics correlate with BABIP – and the answer, it turns out, is strikeouts and home runs.
One important point Bradbury makes is that even if that last year’s BABIP doesn’t help other stats to predict this year’s BABIP, that doesn’t by itself imply that pitchers have no control over BABIP. It could be that pitchers do have BABIP skill, but that skill correlates perfectly with the other stats he considered. He finds a high correlation with strikeouts, but not a perfect one.
In summary, Bradbury writes,
“… it appears that pitchers do have some minor control over hits on balls in play; but, this influence is small … this skill just happens to be captured in strikeouts.”
Having concluded that BABIP isn’t much of a skill, Bradbury now checks whether players are nonetheless being paid based on it. He regresses (the logarithm of) salary on various measures of the pitcher’s performance in his previous year – strikeouts, walks, home runs, hit batters, and BABIP. He takes every year separately from 1985-2004.
-- strikeouts are a significant predictor of salary 16 out of 20 seasons;
-- walks are significant 11 out of 20 seasons;
-- home runs allowed are signficant 8 out of 20 seasons;
-- HBP are significant 0 out of 20 seasons;
-- BABIP is significant 4 out of 20 seasons.
Bradbury concludes that because BABIP shows as so many fewer significant seasons than some of the other factors, this means that GMs were less likely to base decisions on it. In effect, they (or rather, the market) knew about DIPS before even Voros did:
“… the market seems to have solved the problem of estimating pitcher MRPs [marginal revenue products – the benefit of signing a pitcher versus not signing him] well before McCracken published his findings in 2001.”
This is not as farfetched as it sounds – economists are fond of finding ways in which markets are capable of appearing to “figure out” things that no individual knows. For instance, even though every individual has a different idea of what a stock may be worth, the market “figures it out” so well that it’s very, very difficult for any individual to outinvest any other. (And here is another possible example of the market “knowing” something about sports that most of its participants do not.)
So it’s certainly possible. But I think the evidence points the other way.
Bradbury’s conclusion, that GMs aren’t paying for BABIP, is based only on the number of years that came out statistically significant. Those other seasons, the ones that do not show significance, are treated as if they confirm no evidence of effect that year. But, taken all together, they show obvious significance.
A look at the study’s Table 6 shows that from 1985-1999, the direction of the relationship between BABIP and salary is almost exactly as you’d expect – all negative but one. (Negative, because higher BABIP means lower salary.) Plus, the positive one is only 0.02, and only two of the other t-statistics are lower than 0.5. Clearly, this is a very strong positive relationship between BABIP and salary.
Here, I’ll list those 15 scores so you can see for yourself. Remember, if there were no effect, they should be centered around zero:
Only the three highest of those are individually significant, but taken as a whole, there’s no doubt. The chance of getting fourteen or more negative t-statistics out of fifteen is 1 in 2048. (Of course, that’s not a proper test of significance because I chose the criterion after I saw the data. But still…)
If you combined all fifteen years into one regression (you’d have to adjust for salary inflation, of course), you’d wind up with a massive t-statistic. It’s the data being broken up into small samples that hides the substantial significance.
If you look at an entire season, Rod Carew would easily score a statistically significantly better hitter than Mario Mendoza. But if you carved their season into individual weeks, not that many of those weeks would show a statistically significant difference.
In terms of actual salary impact, the numbers show a great deal of baseball significance. Take 1990, which is pretty close to average (z=-1.04). If your BABIP in 1989 was .320, you would earn about 5.4% less money in 1990 than if your BABIP was only .300. (Assuming I’ve done the math right.) That’s a reasonable difference, 5% of salary for only a moderate increase in BABIP.
Furthermore, the real-life effect is almost certainly higher than that. Players don’t sign new contracts every year, so there’s a time lag between performance and salary. Suppose, on average, half of pitchers sign a new contract in any given year. The 5% difference overall must then be a 10% difference on the pitchers who actually sign (to counterbalance the 0% effect on pitchers whose salary didn’t change).
And still further, pitchers aren’t evaluated on just their most recent season. Suppose that GMs intuitively give the most recent season only 50% weight in their forecast of what the pitcher will do for them. Again, that makes the 5% difference really a 10% difference in the GMs evaluations.
Combine the two adjustments, and now you’re talking real money.
So it seems to me that in terms of both statistical significance and baseball significance, it seems pretty solid that the market for pitchers does consider BABIP to be significant in determining pitcher skill.
But there’s still something interesting in the data. From 2000 to 2004, the years McCracken’s original DIPS study was in the public domain, the numbers become less consistent:
Two of the five years have the correlation between BABIP and salary going the wrong way. One of the others is close to zero, and the numbers seem to be jumping up and down a bit more. All this might be because the number of pitchers in the sample is a bit lower – but it also might be that GMs are catching on.
It’s weak, but it’s something. This study provides the first hint I’ve seen that baseball’s labor market might actually have learned something about DIPS. It’ll be very interesting to extend the table ten years from now and see if that’s really true.