I'm scheduled to talk about umpires and racial bias in a couple of weeks at JSM in Miami. I was hoping not to have to repeat same old things I've been talking about for the last few years, so I decided to see if there's anything new I could find. And I think I've got something, maybe. Well, I thought I had something, and it's interesting, but I now think it might be a false alarm with respect to umpires and race.

First, a quick review (and I promise it'll be quick). The Hamermesh study of racial bias (.pdf) was based on a chart that looked like this:

White Umpire-- 31.88 31.27 31.27

Hspnc Umpire-- 31.41 32.47 28.29

Black Umpire-- 31.22 31.21 32.52

All Umpires –- 31.83 31.30 31.32

The numbers are the percentage of called pitches (not swung at) that the umpire called a strike. This chart is based only on low-attendance games (less than 30,000 fans), where the study's authors found the strongest effect. It's my attempt to reproduce their results from Retrosheet data (the authors didn't provide the equivalent chart to what I have here).

If you look at the chart, you will see that, for any race of pitcher or umpire, the largest percentage of pitches are strikes exactly when the race of the umpire matches the race of the pitcher. The original study did a big regression and found that this result is indeed statistically significant. They concluded that umpires are biased in favor of pitchers of their own race (or biased against pitchers of a different race).

That's all I'm going to say about that; if you want to see my arguments, you can go here. Now, I'm going to take a different route.

Let's start by ignoring umpires for now, and just looking at the pitchers. The bottom row of the chart shows the overall called strike percentage of the pitchers. Let me repeat them here for clarity:

31.83 percent strikes -- white pitchers

31.30 percent strikes -- hispanic pitchers

31.32 percent strikes -- black pitchers

It looks like there are real differences between the pitchers. Now, it's *possible* that the entire effect is actually caused by biased umpires, but nobody really believes that, including the authors of the original study. Different pitchers have different attributes, and it's probably just that the white pitchers are such that they happen to throw more called strikes than the minority pitchers.

Moreover, it would appear that the white pitchers happen to be *better* than the minority pitchers, since their strike percentage is higher. In fact, I think I may have said this a few times in the past, that the white pitchers were more successful.

I was wrong. Actually, it's the minority pitchers who performed better, *despite* the fact that their called pitches were less likely to be strikes.

Here are the opposition batting records for each of the three groups of pitchers, normalized to 600 PA:

White .... 543 147 30 3 17 51 099 0.271 5.02

Hispanic . 541 141 28 3 17 53 108 0.261 4.71

Black .... 546 145 28 3 14 48 106 0.266 4.57

The white pitchers performed the worst, striking out fewer batters and allowing more hits and runs. The last column of the batting record is "runs created per 27 outs."

What's going on? How is it that the minority pitchers did so much better despite having fewer called strikes? My first reaction was this: perhaps the relationship between called strikes and performance is *negative*. That is, maybe having lots of called strikes means you're throwing lots of pitches right down the middle of the plate, and you're getting hammered. Logically possible, right?

But it doesn't seem to be true. I ran a regression of Component ERA vs. Called Strike Percentage for starting pitchers with 100 IP or more, and the relationship goes the way you'd think: the higher the called strikes, the lower the ERA and the more successful the pitcher. In fact, it's a pretty strong relationship: every 0.1 percentage point in called strike percentage (example: from 31.83 percent to 31.93 percent) lowers ERA by 0.11. That's almost exactly what you'd expect knowing that the difference between a ball and a strike is approximately .14 runs.

So how is it that those pitchers bucked the relationship, and had a better performance despite fewer called strikes?

I think I was able to find the answer: they compensated by having more pitches swung at. As it turns out, the benefit of an extra percentage point in pitches swung at is also positive: an increase of 0.1 percent lowers ERA by 0.13 points.

Here are the numbers for pitches swung at:

44.99 percent pitches swung at -- White

45.52 percent pitches swung at -- Hispanic

46.84 percent pitches swung at -- Black

These are large differences, more than comparable to the differences in called strike percentage.

(By the way, keep in mind that the denominators of the two measures are different. Pitches swung at is (swung at and missed + foul balls + put in play) divided by total pitches. Called strike percentage is (called strikes) / (called strikes + balls).)

Here's the same 3x3 chart as earlier, but this time using swinging percentage:

White Umpire-- 45.01 45.51 46.92

Hspnc Umpire-- 44.63 45.61 43.59

Black Umpire-- 44.77 46.05 46.82

All Umpires –- 44.99 45.52 46.84

Just like in the original chart, the numbers are higher when the umpire's race matches the pitcher's race (with the exception of black pitchers facing white umpires).

Now, I suppose you could argue that these differences, also, could be attributed to umpire bias. It's possible that, knowing that more umpires are biased against them, minority pitchers have to throw down the middle to compensate. That results in batters swinging the bat more.

The problem with that theory is that the minority pitchers *improved* under this (alleged) injustice. If it's really racist bias, shouldn't they have gotten worse? Because, if the racism actually made them compensate in such a way that they got better, why wouldn't they compensate all the time, not just for umpires of the opposite race?

If you want to hold on to the hypothesis that it's umpire bias, you have to assume that the bias backfired, and that the pitchers, in their ignorance, didn't realize that there was a way to pitch better than they were already pitching. That seems farfetched.

So, the minority pitchers have a *lower* percentage of called strikes, but a *higher* percentage of swinging strikes. When I saw that, I thought it might be normal: the more batters swing, the fewer strikes remain to be called by the umpire. But, again, that turns out not to be the case. There's a strong positive relationship between called strike percentage and swinging strike percentage, with a correlation coefficient of .23 (this is for 1,350 starting pitcher seasons of 100+ IP, 2000-2009).

Why, then, are the black and hispanic pitchers bucking the trend? The only thing I can think of is that even though the correlation between called strikes and swinging strikes is positive, maybe there are certain types of pitchers who go the opposite way. For instance, maybe there are three types of pitchers:

1. Pitchers who throw right down the middle. They get a lot of swings, and, when the batter doesn't swing, it's very likely to be a strike.

2. Pitchers with poor control. They don't get a lot of swings, and, when the batter doesn't swing, it's likely to be a ball.

3. Pitchers who normally throw right down the middle, but like to waste pitches frequently (or throw a certain type of pitch that sometimes goes awry). They get a lot of swings, but, when the batter doesn't swing, it's one of those waste pitches and likely to be a ball.

Types 1 and 2 would show a positive correlation between swings and called strikes. Type 3 would show a negative correlation. If there are a lot more types 1 and 2 than type 3, the overall correlation would be positive.

So, maybe black and minority pitchers are more likely to be Type 3. Any other explanations?

---------

BTW, my first reaction was that this all had to do with count. In "Scorecasting," the authors found that umpires were reluctant to call a third strike or a fourth ball on a close pitch. That would explain the observations perfectly, like this: The minority pitchers get more strikeouts. So they get more two-strike pitches. Therefore, they get more batters swinging on those pitches, and also fewer called strikes on those pitches. That's enough to give us the results we saw.

Alas, the beautiful theory doesn't hold up. I reran the tables, but looking only at 0-0 pitches. Again, (a) the minority pitchers had more swings, and (b) on the remaining pitches, the minority pitchers got fewer called strikes. Numbers available on request.

So what is it that the minority pitchers have in common that gives them this unusual combination of low called strikes and high swinging strikes? I don't know, but I bet someone reading this can tell me.

For the ten black pitchers in the study, I looked at their tendencies from 2000 to 2009 (even though the study was only 2004 to 2006). The difference between their swinging strike percentage and their called strike percentage was 16.04, well above the average of 13.60. What is it about them, as a group, that would explain that?

Arthur Rhodes

CC Sabathia

Darren Oliver

Dontrelle Willis

Edwin Jackson

Ian Snell

Jerome Williams

LaTroy Hawkins

Ray King

Tom Gordon

I'd give you the hispanic pitchers -- I think there's about 30 of them -- but I don't have a list handy.

In any case, and getting back to the issue of umpire bias ...

This is where the false alarm comes in. When I saw that a higher called strike percentage means different things for different pitchers, I thought we might have an explanation: rather than the umpires calling more unmerited strikes, maybe it was just those pitchers pursuing a different strategy. Maybe they were occasionally deciding to pitch how the average white pitcher does -- whatever that is -- and getting more called strikes, but without a change in performance.

Alas, that's not true. *Between* races of pitchers, increased called strike percentage didn't mean better performance. But *within* races of pitchers, it did.

Here's the original 3x3 chart, but with RC27 instead of called strike percentage:

Pitcher ------ Whte Hspn Blac

------------------------------

White Umpire-- 4.97 4.77 4.49

Hspnc Umpire-- 5.15 4.59 5.88

Black Umpire-- 5.47 4.20 5.39

-----------------------------

All Umpires –- 5.02 4.71 4.57

With the exception of the bottom-right cell and the bottom-center cell, the RC27 figures match the order of the called strike figures (see the very first chart of this post). It does seem like, as a characteristic of their style, black and hispanic pitchers successfully sacrifice called strikes in exchange for swinging strikes ... but when they *do* get those called strikes from certain umpires, they do even better.

So, pitchers *do* seem to benefit from extra called strikes, once you control for who the pitcher is. So we still have the same problem we had at the beginning.

That problem, still, appears to be that when the pitcher was hispanic, hispanic umpires called around 40 too many strikes out of 2,864 called pitches.

40 pitches doesn't seem like a lot over three years ... but it's only over the equivalent of about 30 or 40 team-games (1,349 PA). I don't really see an argument for how those 40 pitches could have been miscalled. It can't be anything the original study controlled for ... like home/road, starter/reliever, score, identity of the pitcher, etc. It would have to be an interaction of some of those things. Like, for instance, pitcher A throws a lot of inside sliders, and umpire B likes to call those strikes, and B happened to randomly umpire a lot of A's games.

But I don't see how the numbers work out. It's still 40 pitches in 30 games. With three hispanic umpires and 30 hispanic pitchers, that's 90 possible combinations. Some are more likely than others -- we're only looking at pitchers in front of 30,000 or fewer fans, which concentrates them a bit among certain teams -- but still, 90 combinations over 30 games makes it unlikely that one or two pairs would dominate to the tune of 40 pitches.

So, I thought I had an explanation ... but, after all this, I don't think I do. I still suspect that the result is just random, and not racial bias or any other explanation, but ... that's just my opinion.

Still, I need to think some more. Now that we know that more called strikes does not *always* lead to improved performance, and that it depends on the pitcher ... can you see any arguments that I'm missing, for what else might be happening?

UPDATE: OK, one more theory I thought of. Suppose pitcher style varies from game to game. Take, for instance, a hispanic pitcher. Some games, he pitches one way, and gets few called strikes and lots of swinging strikes. Other games, and independently of the umpire, he consciously decides to pitch differently, and he gets more called strikes and fewer swinging strikes.

In that case, pitches are no longer independent -- it's *games* that are independent. That means that you have to use a different statistical technique, like cluster sampling. The bottom line, there, is that the SD goes way up. The results stay the same, but the confidence interval widens and the statistical significance disappears.

So, if there's evidence that pitchers' expected percentages change on a game-by-game basis (that is, the *expectations* have to change due to pitcher behavior, not just the outcome of the game fluctuating because of random variation), that probably negates the statistical significance, which is the only reason to suspect umpire bias.

Hey Phil,

I too will be at the meetings in Miami. Unfortunately, I'll be heading out on Tuesday (thanks to limited funds for a grad student that has already traveled to two other conferences this summer). Any place specific you'll be attending on that Monday or Tuesday?

Hey, Millsy,

I'll let you know closer to the date ... gotta check out the online program and figure out when to fit in some vacation time in between convention days. Keep in touch, we should definitely meet up!

I guess I am not clear on something. When you talk about the 2nd 3x3 chart, which shows that, for example, Hispanics get the highest % of swings when the umpire is Hispanic, that seems to indicate that alot of pitches are called strikes when the race of the umpire and pitcher match. So A batter might say to himself that he needs to swing more because the umpire might favor the pitcher

Cy,

Yes, that's right. Yes, when the pitcher was hispanic, batters swung the most when the umpire was also hispanic. And, yes, that might be because the batter "knows" the umpire might favor the pitcher.

The main part of my post, though, was about the fact that batters ALWAYS swing more against hispanic pitchers, regardless of the race of the umpire. That doesn't necessarily help us with the race bias issue, but it's interesting and tells us that swing percentage is just as important to success as called strike percentage.

Thanks

I don't know if there's any statistical truth to this, but I can't help but think when I read about Hispanic pitchers getting an advantage that two of the most extreme guys for pitching on the edges and getting the umpire to extend the size of the plate are Livan Hernandez and Mariano Rivera. Of course, Tom Glavine did that, too, and he's not Hispanic, so maybe it's not the Hispanic equivalent of "you can't walk your way off the island." I have not systematically looked at race with regard to that strategy, so I don't know. (I don't think I even have racial data unless it's in Retrosheet data and I don't know it.)

But in any case, I've wondered how many pitcher-umpire matchups there are for the minority umpires and whether the performance of a few specific pitchers such Hernandez and Rivera would be enough to bias the results the way we see.

Mike,

In my data, the minority same race match-ups for pitchers and umpires are few and far between. For example, from 2007 through 2010 I find 1,953 called pitches for Black-Black matches and 15,532 for Hispanic-Hispanic. Meanwhile, White-White matches are 848,472.

That's 3.5 years of data--and if we want to control for the batter being non-matched in these situations then the sample sizes shrink considerably as well (probably much more for those of Hispanic descent).

When modeling an umpire strike zone, the method I use (which I think is the best, of course ;-)) requires a good 3,000 pitches at the minimum to get anything worthwhile. Even that is pushing it.

When you're talking about 4 different umpires (and probably 2 or 3 pitchers making up nearly 50% of the 1,953, each with different pitches, handedness and approaches, think Edwin Jackson vs. CC Sabathia) any model is going to have issues.

Post a Comment

