Minority pitchers succeed with fewer called strikes
I'm scheduled to talk about umpires and racial bias in a couple of weeks at JSM in Miami. I was hoping not to have to repeat same old things I've been talking about for the last few years, so I decided to see if there's anything new I could find. And I think I've got something, maybe. Well, I thought I had something, and it's interesting, but I now think it might be a false alarm with respect to umpires and race.
First, a quick review (and I promise it'll be quick). The Hamermesh study of racial bias (.pdf) was based on a chart that looked like this:
Pitcher ------ White Hspnc Black
White Umpire-- 31.88 31.27 31.27
Hspnc Umpire-- 31.41 32.47 28.29
Black Umpire-- 31.22 31.21 32.52
All Umpires –- 31.83 31.30 31.32
The numbers are the percentage of called pitches (not swung at) that the umpire called a strike. This chart is based only on low-attendance games (less than 30,000 fans), where the study's authors found the strongest effect. It's my attempt to reproduce their results from Retrosheet data (the authors didn't provide the equivalent chart to what I have here).
If you look at the chart, you will see that, for any race of pitcher or umpire, the largest percentage of pitches are strikes exactly when the race of the umpire matches the race of the pitcher. The original study did a big regression and found that this result is indeed statistically significant. They concluded that umpires are biased in favor of pitchers of their own race (or biased against pitchers of a different race).
That's all I'm going to say about that; if you want to see my arguments, you can go here. Now, I'm going to take a different route.
Let's start by ignoring umpires for now, and just looking at the pitchers. The bottom row of the chart shows the overall called strike percentage of the pitchers. Let me repeat them here for clarity:
31.83 percent strikes -- white pitchers
31.30 percent strikes -- hispanic pitchers
31.32 percent strikes -- black pitchers
It looks like there are real differences between the pitchers. Now, it's *possible* that the entire effect is actually caused by biased umpires, but nobody really believes that, including the authors of the original study. Different pitchers have different attributes, and it's probably just that the white pitchers are such that they happen to throw more called strikes than the minority pitchers.
Moreover, it would appear that the white pitchers happen to be *better* than the minority pitchers, since their strike percentage is higher. In fact, I think I may have said this a few times in the past, that the white pitchers were more successful.
I was wrong. Actually, it's the minority pitchers who performed better, *despite* the fact that their called pitches were less likely to be strikes.
Here are the opposition batting records for each of the three groups of pitchers, normalized to 600 PA:
White .... 543 147 30 3 17 51 099 0.271 5.02
Hispanic . 541 141 28 3 17 53 108 0.261 4.71
Black .... 546 145 28 3 14 48 106 0.266 4.57
The white pitchers performed the worst, striking out fewer batters and allowing more hits and runs. The last column of the batting record is "runs created per 27 outs."
What's going on? How is it that the minority pitchers did so much better despite having fewer called strikes? My first reaction was this: perhaps the relationship between called strikes and performance is *negative*. That is, maybe having lots of called strikes means you're throwing lots of pitches right down the middle of the plate, and you're getting hammered. Logically possible, right?
But it doesn't seem to be true. I ran a regression of Component ERA vs. Called Strike Percentage for starting pitchers with 100 IP or more, and the relationship goes the way you'd think: the higher the called strikes, the lower the ERA and the more successful the pitcher. In fact, it's a pretty strong relationship: every 0.1 percentage point in called strike percentage (example: from 31.83 percent to 31.93 percent) lowers ERA by 0.11. That's almost exactly what you'd expect knowing that the difference between a ball and a strike is approximately .14 runs.
So how is it that those pitchers bucked the relationship, and had a better performance despite fewer called strikes?
I think I was able to find the answer: they compensated by having more pitches swung at. As it turns out, the benefit of an extra percentage point in pitches swung at is also positive: an increase of 0.1 percent lowers ERA by 0.13 points.
Here are the numbers for pitches swung at:
44.99 percent pitches swung at -- White
45.52 percent pitches swung at -- Hispanic
46.84 percent pitches swung at -- Black
These are large differences, more than comparable to the differences in called strike percentage.
(By the way, keep in mind that the denominators of the two measures are different. Pitches swung at is (swung at and missed + foul balls + put in play) divided by total pitches. Called strike percentage is (called strikes) / (called strikes + balls).)
Here's the same 3x3 chart as earlier, but this time using swinging percentage:
Pitcher ------ White Hspnc Black
White Umpire-- 45.01 45.51 46.92
Hspnc Umpire-- 44.63 45.61 43.59
Black Umpire-- 44.77 46.05 46.82
All Umpires –- 44.99 45.52 46.84
Just like in the original chart, the numbers are higher when the umpire's race matches the pitcher's race (with the exception of black pitchers facing white umpires).
Now, I suppose you could argue that these differences, also, could be attributed to umpire bias. It's possible that, knowing that more umpires are biased against them, minority pitchers have to throw down the middle to compensate. That results in batters swinging the bat more.
The problem with that theory is that the minority pitchers *improved* under this (alleged) injustice. If it's really racist bias, shouldn't they have gotten worse? Because, if the racism actually made them compensate in such a way that they got better, why wouldn't they compensate all the time, not just for umpires of the opposite race?
If you want to hold on to the hypothesis that it's umpire bias, you have to assume that the bias backfired, and that the pitchers, in their ignorance, didn't realize that there was a way to pitch better than they were already pitching. That seems farfetched.
So, the minority pitchers have a *lower* percentage of called strikes, but a *higher* percentage of swinging strikes. When I saw that, I thought it might be normal: the more batters swing, the fewer strikes remain to be called by the umpire. But, again, that turns out not to be the case. There's a strong positive relationship between called strike percentage and swinging strike percentage, with a correlation coefficient of .23 (this is for 1,350 starting pitcher seasons of 100+ IP, 2000-2009).
Why, then, are the black and hispanic pitchers bucking the trend? The only thing I can think of is that even though the correlation between called strikes and swinging strikes is positive, maybe there are certain types of pitchers who go the opposite way. For instance, maybe there are three types of pitchers:
1. Pitchers who throw right down the middle. They get a lot of swings, and, when the batter doesn't swing, it's very likely to be a strike.
2. Pitchers with poor control. They don't get a lot of swings, and, when the batter doesn't swing, it's likely to be a ball.
3. Pitchers who normally throw right down the middle, but like to waste pitches frequently (or throw a certain type of pitch that sometimes goes awry). They get a lot of swings, but, when the batter doesn't swing, it's one of those waste pitches and likely to be a ball.
Types 1 and 2 would show a positive correlation between swings and called strikes. Type 3 would show a negative correlation. If there are a lot more types 1 and 2 than type 3, the overall correlation would be positive.
So, maybe black and minority pitchers are more likely to be Type 3. Any other explanations?
BTW, my first reaction was that this all had to do with count. In "Scorecasting," the authors found that umpires were reluctant to call a third strike or a fourth ball on a close pitch. That would explain the observations perfectly, like this: The minority pitchers get more strikeouts. So they get more two-strike pitches. Therefore, they get more batters swinging on those pitches, and also fewer called strikes on those pitches. That's enough to give us the results we saw.
Alas, the beautiful theory doesn't hold up. I reran the tables, but looking only at 0-0 pitches. Again, (a) the minority pitchers had more swings, and (b) on the remaining pitches, the minority pitchers got fewer called strikes. Numbers available on request.
So what is it that the minority pitchers have in common that gives them this unusual combination of low called strikes and high swinging strikes? I don't know, but I bet someone reading this can tell me.
For the ten black pitchers in the study, I looked at their tendencies from 2000 to 2009 (even though the study was only 2004 to 2006). The difference between their swinging strike percentage and their called strike percentage was 16.04, well above the average of 13.60. What is it about them, as a group, that would explain that?
I'd give you the hispanic pitchers -- I think there's about 30 of them -- but I don't have a list handy.
In any case, and getting back to the issue of umpire bias ...
This is where the false alarm comes in. When I saw that a higher called strike percentage means different things for different pitchers, I thought we might have an explanation: rather than the umpires calling more unmerited strikes, maybe it was just those pitchers pursuing a different strategy. Maybe they were occasionally deciding to pitch how the average white pitcher does -- whatever that is -- and getting more called strikes, but without a change in performance.
Alas, that's not true. *Between* races of pitchers, increased called strike percentage didn't mean better performance. But *within* races of pitchers, it did.
Here's the original 3x3 chart, but with RC27 instead of called strike percentage:
Pitcher ------ Whte Hspn Blac
White Umpire-- 4.97 4.77 4.49
Hspnc Umpire-- 5.15 4.59 5.88
Black Umpire-- 5.47 4.20 5.39
All Umpires –- 5.02 4.71 4.57
With the exception of the bottom-right cell and the bottom-center cell, the RC27 figures match the order of the called strike figures (see the very first chart of this post). It does seem like, as a characteristic of their style, black and hispanic pitchers successfully sacrifice called strikes in exchange for swinging strikes ... but when they *do* get those called strikes from certain umpires, they do even better.
So, pitchers *do* seem to benefit from extra called strikes, once you control for who the pitcher is. So we still have the same problem we had at the beginning.
That problem, still, appears to be that when the pitcher was hispanic, hispanic umpires called around 40 too many strikes out of 2,864 called pitches.
40 pitches doesn't seem like a lot over three years ... but it's only over the equivalent of about 30 or 40 team-games (1,349 PA). I don't really see an argument for how those 40 pitches could have been miscalled. It can't be anything the original study controlled for ... like home/road, starter/reliever, score, identity of the pitcher, etc. It would have to be an interaction of some of those things. Like, for instance, pitcher A throws a lot of inside sliders, and umpire B likes to call those strikes, and B happened to randomly umpire a lot of A's games.
But I don't see how the numbers work out. It's still 40 pitches in 30 games. With three hispanic umpires and 30 hispanic pitchers, that's 90 possible combinations. Some are more likely than others -- we're only looking at pitchers in front of 30,000 or fewer fans, which concentrates them a bit among certain teams -- but still, 90 combinations over 30 games makes it unlikely that one or two pairs would dominate to the tune of 40 pitches.
So, I thought I had an explanation ... but, after all this, I don't think I do. I still suspect that the result is just random, and not racial bias or any other explanation, but ... that's just my opinion.
Still, I need to think some more. Now that we know that more called strikes does not *always* lead to improved performance, and that it depends on the pitcher ... can you see any arguments that I'm missing, for what else might be happening?
UPDATE: OK, one more theory I thought of. Suppose pitcher style varies from game to game. Take, for instance, a hispanic pitcher. Some games, he pitches one way, and gets few called strikes and lots of swinging strikes. Other games, and independently of the umpire, he consciously decides to pitch differently, and he gets more called strikes and fewer swinging strikes.
In that case, pitches are no longer independent -- it's *games* that are independent. That means that you have to use a different statistical technique, like cluster sampling. The bottom line, there, is that the SD goes way up. The results stay the same, but the confidence interval widens and the statistical significance disappears.
So, if there's evidence that pitchers' expected percentages change on a game-by-game basis (that is, the *expectations* have to change due to pitcher behavior, not just the outcome of the game fluctuating because of random variation), that probably negates the statistical significance, which is the only reason to suspect umpire bias.