Are umpires biased in favor of star pitchers?
Are MLB umpires are biased in favor of All-Star pitchers? An academic study, released last spring, says they are. Authored by business professors Braden King and Jerry Kim, it's called "Seeing Stars: Matthew Effects and Status Bias in Major League Baseball Umpiring."
"What Umpires Get Wrong" is the title of an Op-Ed piece in the New York Times where the authors summarize their study. Umps, they write, favor "higher status" pitchers when making ball/strike calls:
"Umpires tend to make errors in ways that favor players who have established themselves at the top of the game's status hierarchy."
But there's nothing special about umpires, the authors say. In deferring to pitchers with high status, umps are just exhibiting an inherent unconscious bias that affects everyone:
" ... our findings are also suggestive of the way that people in any sort of evaluative role — not just umpires — are unconsciously biased by simple 'status characteristics.' Even constant monitoring and incentives can fail to train such biases out of us."
Well ... as sympathetic as I am to the authors' argument about status bias in regular life, I have to disagree that the study supports their conclusion in any meaningful way.
The authors looked at PITCHf/x data for the 2008 and 2009 seasons, and found all instances where the umpire miscalled a ball or strike, based on the true, measured x/y coordinates of the pitch. After a large multiple regression, they found that umpire errors tend to be more favorable for "high status" pitchers -- defined as those with more All-Star appearances, and those who give up fewer walks per game.
For instance, in one of their regressions, the odds of a favorable miscall -- the umpire calling a strike on a pitch that was actually out of the strike zone -- increased by 0.047 for every previous All-Star appearance by the pitcher. (It was a logit regression, but for low-probability events like these, the number itself is a close approximation of the geometric difference. So you can think of 0.047 as a 5 percent increase.)
The pitcher's odds also increased 1.4 percent for each year of service, and another 2.5 percent for each percentage point improvement in career BB/PA.
For unfavorable miscalls -- balls called on pitches that should have been strikes -- the effects were smaller, but still in favor of the better pitchers.
I have some issues with the regression, but will get to those in a future post. For now ... well, it seems to me that even if you accept that these results are correct, couldn't there be other, much more plausible explanations than status bias?
1. Maybe umpires significantly base their decisions on how well the pitcher hits the target the catcher sets up. Good pitchers come close to the target, and the umpire thinks, "good control" and calls it a strike. Bad pitchers vary, and the catcher moves the glove, and the umpire thinks, "not what was intended," and calls it a ball.
The authors talk about this, but they consider it an attribute of catcher skill, or "pitch framing," which they adjust for in their regression. I always thought of pitch framing as the catcher's ability to make it appear that he's not moving the glove as much as he actually is. That's separate from the pitcher's ability to hit the target.
2. Every umpire has a different strike zone. If a particular ump is calling a strike on a low pitch that day, a control pitcher is more able to exploit that opportunity by hitting the spot. That shows up as an umpire error in the control pitcher's favor, but it's actually just a change in the definition of the strike zone, applied equally to both pitchers.
3. The study controlled for the pitch's distance from the strike zone, but there's more to pitching than location. Better pitchers probably have better movement on their pitches, making them more deceptive. Those might deceive the umpire as well as the batter.
Perhaps umpires give deceptive pitches the benefit of the doubt -- when the pitch has unusual movement, and it's close, they tend to call it a strike, either way. That would explain why the good pitchers get favorable miscalls. It's not their status, or anything about their identity -- just the trajectory of the balls they throw.
4. And what I think is the most important possibility: the umpires are Bayesian, trying to maximize their accuracy.
Start with this. Suppose that umpires are completely unbiased based on status -- in fact, they don't even know who the pitcher is. In that case, would an All-Star have the same chance of a favorable or unfavorable call as a bad pitcher? Would the data show them as equal?
I don't think so.
There are times when an umpire isn't really sure about whether a pitch is a ball or a strike, but has to make a quick judgment anyway. It's a given that "high-status" control pitchers throw more strikes overall; that's probably also true in those "umpire not sure" situations.
Let's suppose a borderline pitch is a strike 60% of the time when it's from an All-Star, but only 40% of the time when it's from a mediocre pitcher.
If the umpire is completely unbiased, what should he do? Maybe call it a strike 50% of the time, since that's the overall rate.
But then: the good pitcher will get only five strike calls when he deserves six, and the bad pitcher will get five strike calls when he only deserves four. The good pitcher suffers, and the bad pitcher benefits.
So, unbiased umpires benefit mediocre pitchers. Even if umpires were completely free of bias, the authors' methodology would nonetheless conclude that umpires are unfairly favoring low-status pitchers!
Of course, that's not what's happening, since in real life, it's the better pitchers who seem to be benefiting. (But, actually, that does lead to a fifth (perhaps implausible) possibility for what the authors observed: umpires are unbiased, but the *worse* pitchers throw more deceptive pitches for strikes.)
So, there's something else happening. And, it might just be the umpires trying to improve their accuracy.
Our hypothetical unbiased umpire will have miscalled 5 out of 10 pitches for each player. To reduce his miscall rate, he might change his strategy to a Bayesian one.
Since he understands that the star pitcher has a 60% true strike rate in these difficult cases, he might call *all* strikes in those situations. And, since he knows the bad pitcher's strike rate is only 40%, he might call *all balls* on those pitches.
That is: the umpire chooses the call most likely to be correct. 60% beats 40%.
With that strategy, the umpire's overall accuracy rate improves to 60%. Even if he has no desire, conscious or unconscious, to favor the ace for the specific reason of "high status", it looks like he does -- but that's just a side-effect of a deliberate attempt to increase overall accuracy.
In other words: it could be that umpires *consciously* take the history of the pitcher into account, because they believe it's more important to minimize the number of wrong calls than to spread them evenly among different skills of pitcher.
That could just as plausibly be what the authors are observing.
How can the ump improve his accuracy without winding up advantaging or disadvantaging any particular "status" of pitcher? By calling strikes in exactly the proportion he expects from each. For the good pitcher, he calls strikes 60% of the time when he's in doubt. For the bad pitcher, he calls 40% strikes.
That strategy increases his accuracy rate only marginally -- from 50 percent to 52 percent (60% squared plus 40% squared). But, now, at least, neither pitcher can claim he's being hurt by umpire bias.
But, even though the result is equitable, it's only because the umpire DOES have a "status bias." He's treating the two pitchers differently, on the basis of their historical performance. But King and Kim's study won't be able to tell there's a bias, because neither pitcher is hurt. The bias is at exactly the right level.
Is that what we should want umpires to do, bias just enough to balance the advantage with the disadvantage? That's a moral question, rather than an empirical one.
Which are the most ethical instructions to give to the umpires?
Make what you think is the correct call, on a "more likely than not" basis, *without* taking the pitcher's identity into account.
Advantages: No "status bias." Every pitcher is treated the same.
Disadvantages: The good pitchers wind up being disadvantaged, and the bad pitchers advantaged. Also, overall accuracy suffers.
Make what you think is the correct call, on a "more likely than not" basis, but *do* take the pitcher's identity into account.
Advantages: Maximizes overall accuracy.
Disadvantages: The bad pitchers wind up being disadvantaged, and the good pitchers advantaged.
Make what you think is the most likely correct call, but adjust only slightly for the pitcher's identity, just enough that, overall, no type of pitcher is either advantaged or disadvantaged.
Advantages: No pitcher has an inherent advantage just because he's better or worse.
Disadvantages: Hard for an umpire to calibrate his brain to get it just right. Also, overall accuracy not as good as it could be. And, how do you explain this strategy to umpires and players and fans?
Which of the three is the right answer, morally? I don't know. Actually, I don't think there necessarily is one -- I think any of the three is fair, if understood by all parties, and applied consistently. Your opinion may vary, and I may be wrong. But, that's a side issue.
Getting back to the study: the fact that umpires make more favorable mistakes for good pitchers than bad pitchers is not, by any means, evidence that they are unconsciously biased against pitchers based on "status." It could just as easily be one of several other, more plausible reasons.
So that's why I don't accept the study's conclusions.
There's also another reason -- the regression itself. I'll talk about that next post.
(Hat tip: Charlie Pavitt)