Monday, August 30, 2021

Are umpires racially biased? A 2021 study (Part I)

Are MLB umpires racially biased? There's a recent new study that claims they are. The author, who wrote it as an undergrad thesis, mentioned it on Twitter, and when I checked a week or so later, there were lots of articles and links to it. (Here, for instance, is a Baseball Prospectus post reporting on it.  And here's a Yahoo! report.)

The study tried to figure whether umpires make more bad calls against batters* of a race other than theirs (where there is no "umpire-batter match," or "UBM," as the literature calls it). It ran regressions on called pitches from 2008 to 2020, to figure out how best to predict the probability of the home-plate umpire calling a pitch incorrectly (based on MLB "Gameday" pitch location). The author controlled for many different factors, and found a statistically significant coefficient for UBM, concluding that the pitcher gains an advantage when the umpire is of the same race. It also argues that white umpires in particular "could be the driving force behind discrimination in MLB."  

I don't think any of that is right. I think the results point to something different, and benign. 


Imagine a baseball league where some teams are comprised of dentists, while the others are jockeys. The league didn't hire any umpires, so the players take turns, and promise to call pitches fairly.

They play a bunch of games, and it turns out that the umpires call more strikes against the dentists than against the jockeys. Nobody is surprised -- jockeys are short, and thus have small strike zones.

It's true that the data shows that if you look at the Jockey umpires, you'll see that they call a lot fewer strikes against batters of their own group than against batters of the other group. Their "UBM" coefficient is high and statistically significant.

Does that mean the jockey umps are "racist" against dentists? No, of course not. It's just that the dentists have bigger strike zones. 

It's the same, but in reverse, for the dentist umpires. They call more strikes against their fellow dentists -- again, not because of pro-jockey "reverse racism," but because of the different strike zones.

Later, teams of NBA players enter the league. These guys are tall, with huge strike zones, so they get a lot of called strikes, even from their own umpires.

Let's put some numbers on this: we'll say there are 10 teams of dentists, 1 team of jockeys, and 2 teams of NBA players. The jockeys are -10 in called strikes compared to average, and the NBA players are +10. That leaves the dentists at -1 (in order for the average to be zero).

Here's a chart that shows every umpire is completely fair and unbiased. 

Umpire             Jockey    NBA    Dentist
Jockey batter:       -10     -10     -10
NBA batter           +10     +10     +10
Dentist batter        -1      -1      -1

I've highlighted the "UBM" cells where the umpire matches the batter. If you look only at those cells, and don't think too much about what's going on, you could think the umpires are horribly biased. The Jockey batters get 10 fewer strikes than average from Jockey umpires!  That's awful!

But then when you look closer, you see the horizontal row is *all* -10. That means all the umpires called the jockeys the same way (-10), so it's probably something about the jockey batters that made that happen. In this case, it's that they're short.

I think this is what's going on in the actual study. But it's harder to see, because the chart isn't set up with the raw numbers. The author ran different regressions for the three different umpire races, and set a different set of batters as the zero-level for each. Since they're calibrated to a different standard of player, the results make the umpires look very different.

If I had done here what the author did there, the chart above would have looked like this:

Umpire             Jockey    NBA   Dentist
Jockey batter:         0    -20      -9
NBA batter           +20      0     +11
Dentist batter        +9    -11       0

If you just look at this chart without knowing you can't compare the columns to each other (because they're based on a different zero baseline), it's easy to think there's evidence of bias. You'd look at the chart and say, "Hey, it looks like Jockey umpires are racist against NBA batters and dentists. Also, dentist umpires are racist against NBA players but favor Jockeys somewhat. But, look!  NBA umpires actually *favor* other races!  That's probably because NBA umpires are new to the tournament, and are going out of their way to appear unbiased."  

That's a near-perfect analogue to the actual study.  This is the top half of Table 8, which measures "over-recognition" of pitchers, meaning balls incorrectly called as strikes (hurting the batter). I've multiplied everything by 1000, so the numbers are "wrong strike calls per 1000 called pitches outside the zone".

Umpire             Black   Hispanic   White
Black batter:       ---      -5.3     -0.3
Hispanic batter    +7.8      ---      +5.9
White batter       +5.6      -4.4      ---

It's  very similar to my fake table above, where the dentists and Jockeys look biased, but the NBA players look "reverse biased". 

The study notes the chart and says,

"For White umpires, the results suggest that for pitches outside the zone, Hispanic batters ... face umpire discrimination. [But Hispanic umpires have a] "reverse-bias effect ... [which] holds for both Black and White batters... Lastly, the bias against non-Black batters by Black umpires is relatively consistent for both Hispanic and White batters."

And it rationalizes the apparent "reverse racism" from Hispanic umpires this way:

"This is perhaps attributable to the recent increase in MLB umpires from Hispanic countries, who could potentially fear the consequences of appearing biased towards Hispanic players."

But ... no. The apparent result is almost completely the result of setting a different zero level for each umpire/batter race -- in other words, by arbitrarily setting the diagonal to zero. That only works if the groups of batters are exactly the same. They're not. Just as Jockey batters have different characteristics than NBA player batters, it's likely that Hispanic batters don't have exactly the same characteristics as White and Black batters.

The author decided that White, Black, and Hispanic batters all should get exactly the same results from an unbiased umpire. If that assumption is false, the effect disappears. 

Instead, the study could have made a more conservative assumption: that unbiased umpires of any race should call *White* batters the same. (Or Black batters, or Hispanic batters. But White batters have the largest sample size, giving the best signal-to-noise ratio.)

That is, use a baseline where the bottom row is zero, rather than one where the diagonal is zero. To do that, take the original, set the bottom cells to zero, but keep the differences between any two rows in the same column:

Umpire             Black   Hispanic  White
Black batter:      -5.6     -0.9     -0.3
Hispanic batter    +2.2     +4.4     +5.9
White batter        ---      ---      ---

Does this look like evidence of umpire bias? I don't think so. For any given race of batter, all three groups of umpires call about the same amount of bad strikes. In fact, all three groups of umpires even have the same *order* among batter groups: Hispanic the most, White second, and Black third. (The raw odds of that happening are 1 in 36). 

The only anomaly is that maybe it looks like there's some evidence that Black umpires benefit Black batters by about 5 pitches per 1,000, but even that difference is not statistically significant. 

In other words: the entire effect in the study disappears when you remove the hidden assumption that Hispanic batters respond to pitches exactly the same way as White or Black batters. And the pattern of "discrimination" is *exactly* what you'd expect if the Hispanic batters respond to pitches in ways that result in more errors -- that is, it explains the anomaly that Hispanic umpires tend to look "reverse racist."

Also, I think the entire effect would disappear if the author had expanded his regression to include dummy variables for the race of the batter.  


If, like me, you find it perfectly plausible that Hispanic batters respond to pitches in ways that generate more umpire errors, you can skip this section. If not, I will try to convince you.

First, keep in mind that it's a very, very small difference we're talking about: maybe 4 pitches per 1,000, or 0.4 percent. Compare that to some of the other, much larger effects the study found:

 +8.9%   3-0 count on the batter
 -0.9%   two outs
 +2.8%   visiting team batting
 -3.3%   right-handed batter
 +0.5%   right-handed pitcher
+19.7%   bases loaded (!!!)
 +1.4%   pitcher 2 WAR vs. 0 WAR
 +0.9%   pitcher has two extra all-star appearances
 +4.0%   2019 vs. 2008
 +0.4%   batter is Hispanic

I wouldn't have expected most of those other effects to exist, but they do. And they're so large that they make this one, at only +0.4%, look unremarkable. 

Also: with so many large effects found in the study, there are probably other factors the author didn't consider that are just as large. Just to make something up ... since handedness of pitcher and batter are so important, suppose that platoon advantage (the interaction between pitcher and batter hand, which the study didn't include) is worth, say, 5%. And suppose Hispanic batters are likely to have the platoon advantage, say, 8% less than White batters. That would give you an 0.4% effect right there.

I don't have data specifically for Hispanic batters, but I do have data for country of birth. Not all non-USA players are Hispanic, but probably a large subset are, so I split them up that way. Here is batting-handedness stats for players from 1969 to 2016:

Born in USA:       61.7% RHB
Born outside USA:  67.1% RHB

That's a 10% difference in handedness. I don't know how that translates into platoon advantage, but it's got to be the same order of magnitude as what we'd need for 0.4%.

Here's another theory. They used to say, about prospects from the Dominican Republic, that they deliberately become free swingers because "you can't walk off the island."  

Suppose, that knowing a certain player is a free swinger, the pitcher aims a bit more outside the strike zone than usual, knowing the batter is likely to swing anyway. If the catcher sets a target outside, and the pitcher hits it perfectly, the umpire may be more likely to miscall it as a strike (at least according to many broadcasters I've heard).

Couldn't that explain why Hispanic players get very slightly more erroneous strike calls? 

In support of that hypothesis, here are K/W ratios for that same set of batters (total K divided by total BB):

Born in USA:       1.82 K per BB
Born outside USA:  2.05 K per BB 

Again, that seems around the correct order of magnitude.

I'm not saying these are the right explanations -- they might be right, or they might not. The "right answer" is probably several factors, perhaps going different directions, but adding up to 0.4%. 

But the point is: there do seem to be significant differences in hitting styles between Hispanic and non-Hispanic batters, certainly significant enough that an 0.4% difference in bad calls is quite plausible. Attributing the entire 0.4% to racist umpires (and assuming that all races of umpires would have to discriminate against Hispanics!) doesn't have any justification whatsoever -- at least not without additional evidence.


Here's a TLDR summary, with a completely different analogy this time:

Eddie Gaedel's father calls fewer strikes on Eddie Gaedel than Aaron Judge's father calls on Aaron Judge. So Gaedel Sr. must be biased! 


There's another part of the study -- actually, the main part -- that throws everything into one big regression and still comes out with a significant "UBM" effect, which again it believes is racial bias. I think that conclusion is also wrong, for reasons that aren't quite the same. 

That's Part II, which is now here.


(*The author found a similar result for pitchers, who gained an advantage in more called strikes when they were the same race as the umpire, and a similar result for called balls as well as called strikes. In this post, I'll just talk about the batting side and the called strikes, but the issues are the same for all four combinations of batter/pitcher ball/strike.)

Labels: , , , ,


At Saturday, September 04, 2021 10:37:00 AM, Blogger Guy said...

Another theory on the higher rate of wrong calls for Hispanic batters: The main systemic factor determining a high error rate is count. We know that umps call a far larger zone in hitters' counts than pitchers' counts. My guess is that Hispanic batters have a higher swing rate, i.e. they take fewer pitches, and that likely means the pitches they *do* take are disproportionately in hitters' counts (while swinging at the vast majority of 2-strike pitches). If so, that will naturally result in a higher rate of wrongly-called strikes as a proportion of called pitches (though perhaps a smaller absolute number).

At Saturday, September 04, 2021 12:18:00 PM, Blogger Phil Birnbaum said...

Hi, Guy,

The regression controlled for balls and strikes (separate dummy variables for balls and strikes), so if that's part of it, it could be from the interaction, where "3-0" is more than the sum of the 3 balls coefficient and the 0 strikes coefficient.


Post a Comment

<< Home