Friday, July 22, 2011

Umpires' racial bias disappears for other years of data

The Hamermesh (et al) study on umpire racial bias looked at data from 2004 to 2006. When I tried reproducing their numbers, I got this chart (repeated here for the nth time):

Pitcher ------ White Hspnc Black
White Umpire-- 31.88 31.27 31.27
Hspnc Umpire-- 31.41 32.47 28.29
Black Umpire-- 31.22 31.21 32.52
All Umpires –- 31.83 31.30 31.32

There's some evidence of bias there; specifically, the entries in bold, which represent hispanic umpires calling hispanic pitchers, and black umpires calling black pitchers, seem a lot higher than they "should" be compared to their row and column.

I decided to try looking at other years: specifically, 2002, 2003, 2007, and 2008 combined. The only problem with that is that I didn't have a list of minority umpires and black pitchers for those years, so I had to use the same list as in the 2004-06 sample. That means some minority pitchers and umpires may have been excluded from their proper group, and misclassified as "white". Still, there shouldn't be too many of those, and their numbers would be small.

(This problem doesn't exist for hispanic pitchers, because I used country of birth for those.)

So, here's the same chart as above for those other years:

Pitcher ------ White Hspnc Black
White Umpire-- 31.47 30.97 31.22
Hspnc Umpire-- 31.19 30.77 34.65
Black Umpire-- 30.90 30.07 32.55
All Umpires –- 31.83 31.30 31.32

The "umpires seem to favor pitchers of their own race" effect seems almost completely gone here. For instance, compare hispanic to white pitchers. Against white umpires, the hispanic pitchers got 0.50 percent fewer strikes. Against hispanic umpires, the hispanic pithcers got 0.42 percent fewer strikes. There's barely any difference.

Comparing umpires ... white pitchers called 0.28 percent more strikes for white pitchers. Hispanic umpires called 0.20 percent more strikes for white pitchers. Again, barely any difference.

The effect in the original sample was driven by the middle cell (hispanic/hispanic), which was more than a full percentage point higher than it was "supposed" to be. This doesn't happen in the new sample, where the middle cell seems to be within about 0.08 of where it should be.

The SD of that new middle cell (hispanic/hispanic) is 0.73 percent. The SD of the bottom middle cell (which appears to be very low) is 0.50 percent, so even that one isn't significant. And the two bottom cells in the right-hand column have very small sample sizes, so those can probably be ignored.

Verdict: although 2004-2006 does show some evidence of bias, there is no such effect for 2002-3-7-8.

Labels: , ,


Post a Comment

<< Home