Sabermetric Research: The Hamermesh umpire/race study revisited -- part IV

This discussion of the Hamermesh study is continued from the previous post. I'm going to go through this part pretty quick, since it's probably getting boring.

-----

The previous posts have brought us to the finding with the most statistical significance: the attendance study.

The authors divided their three-seasons' worth of games into high attendance vs. low attendance, with a cutoff point of 70% of capacity. Their idea is that the more people are attending the game, the worse the consequences for the umpire if he gets the call wrong: attendance "prox[ies] the scrutiny of umpires and thus the price of discrimination."

And the authors' expectations are confirmed. It turns out that when attendance is high, what discrimination there is is opposite to what you'd expect – the racial preference goes *against* the umpire's own race. But when attendance is low, there is a large, and statistically significant, same-race preference. Expressed in "UPM," the results are

-0.28 % fewer same-race strikes with high attendance (0.8 SD)
+0.84% more same-race strikes with high attendance (2.7 SD)

The same qualifications apply as in the previous cases ... plus a couple of additional things:

-- In the QuesTec case, all home games of the QuesTec teams were omitted from the regression. Here, some home games will be included while some won't. Suppose that (say) only one home game in Boston is included, but all their road games. Might that skew the estimate of the "home" parameter in the regression, if the one Boston game was somehow an outlier? That adds a little more of a clustering effect, would cause the standard error to be underestimated even a little bit more.

-- According to a recent study by David Gassko, there are park effects for strikeouts and walks – and those effects appear to be real (even after regressing to the mean to eliminate luck). If that's the case, that would cause even more clustering – it becomes more likely that the small number of pitches in the B/B cell, just by luck, featured games in high K/BB ratio parks. Again, that would cause the regression to overestimate significance levels.

But I think there's really something there. I think this is real evidence that differences in attendance lead to umpires calling pitches differently with respect to race. As I wrote before, I think the next step would be to look at the small number of hispanic/hispanic and black/black games, to see if there's anything unusual, and then to look at each of the individual umpires.

It would be premature to conclude, as the authors do, that they have found conclusive evidence of equal, subconcious bias among all umpires. The results are just as consistent with conscious bias, unequal bias, and bias on a small minority of umpires.

-----

There's a second way that the study's authors try to show that racial bias decreases when scrutiny increases. They classify pitches with two strikes or three balls as "terminal" – meaning they have the potential to cause a strikeout or walk, and the plate appearance to end. They point out that such terminal pitches are more highly scrutinized, and umpires would be motivated to put their biases on hold when the situation is so important that they might get in trouble.

And indeed, they find some differences:

+0.46% UPM when not in terminal count;
-0.28% UPM when in terminal count.

They also show that the effect is larger in the early innings (when, presumably, the pitch isn't important and the price of discrimination is low), and smaller in the late innings (when scrutiny is high).

In general, the individual UPMs are not statistically significant (the +0.46 above is 1.9 SD), but the difference between terminal and non-terminal count *is* significant.

However: I'm not sure if the authors controlled for the actual count; in Table 5, they said they did, but in Table 7, which gives the same result, they say they didn't. And if they didn't, the results aren't meaningful.

Why? Because white pitchers throw more strikes than minority pitchers. So, when it comes to a teriminal count, white pitchers will be more heavily weighted among the 2-strike counts, and minority pitchers will be more heavily weighted among the 3-ball counts.

That means that white pitchers will be in terminal-count situations where they're more likely to waste a pitch and throw a ball, and minority pitchers where they're more likely to assume the batter is taking and throw a called strike.

Also, since the data show that white umpires call more strikes than minority umpires, the white/white cell will have the largest decrease in strikes on terminal counts. And the minority/minority cells will have the largest increase in strikes on terminal counts. How will this affect the UPM coefficient? I'm not sure, but if you compare white umpires calling extra 0-2 pitches on white pitchers to minority umpires calling extra 3-0 pitches on minority pitchers, I don't think the regression will tell you anything of value.

If the authors *did* control for count, the results are reasonable.

-----

In Table 7, the authors do a master regression, including all three factors: QuesTec, attendance, and terminal count. When all three factors are low scrutiny, the sum of the UPM factors is 0.0107. When all three factors are high-scrutiny, the UPM total is *negative* 0.0120:

+1.07 percentage points: non-QuesTec, low attendance, non-terminal count
-1.20 percentage points: QuesTec, high attendance, terminal count

This seems to be pretty good evidence that there's a difference between the two cases. It does, however, bring up an obvious question: if umpires have a tendency to be biased against their own race in general, why are they suddenly biased *against* their own race in high-scrutiny situations? Shouldn't they, at best, become neutral? The authors argue,

"One might speculate that umpires feel that they are favoring matched pitchers [at other times] and that they sub-consciously overcompensate in instances when they know they are under scrutiny."

I don't know anything about the psychology of racism, so I'll defer to Hamermesh's expertise that this is a plausible explanation.

-----

In cases of more scrutiny, umpires have negative racial bias. In cases of less scrutiny, they have positive racial bias. Is it possible the two cancel out?

The bias does seem to be larger on the positive side, but the negative cases might have higher leverage in terms of winning the pennant. "Terminal counts" are the most important to winning games, and high attendance games are probably more important towards winning the pennant. Interpreting the results the way the study does, you might conclude that, yes, umpires have bias in low-scrutiny situations, but that in subconsciously undoing that bias in high-scrutiny situations, they even everything out.

Of course, they don't even everything out evenly: the Yankees and Red Sox will have negatively-biased umpires most of the time, and the Marlins will have positively-biased umpires most of the time. So the point is somewhat irrelevant.

Labels: baseball, Hamermesh update, race

Sabermetric Research

Tuesday, April 15, 2008

The Hamermesh umpire/race study revisited -- part IV

0 Comments:

About Me

Previous Posts