Thursday, July 28, 2011

More fastballs = fewer called strikes

A couple of weeks ago, I noticed that, from 2004 to 2006, even though hispanic and black pitchers received a lower percentage of called strikes than white pitchers (called strikes as a percentage of called pitches), they were able to post above-average numbers.

The reason, it turned out, was that despite not getting as many called strikes, they got a lot more *swinging* strikes, and that more than compensated.

I wondered why that would happen, what was so special about those pitchers. Then, commenter GuyM e-mailed me a suggestion: it looked like the ten pitchers I highlighted were all fastball pitchers.

I went over to Fangraphs and looked them up ... and Guy was right. With the exception of Ray King, the other nine pitchers threw fastballs at or above the MLB-average rate.

So, I did a more formal test. For 2004, 2005, and 2006 (separately), I split the league into the usual nine pitcher/umpire combinations (white/hispanic/black), and figured out the average fastball percentage (FB%) for each group that year. (I didn't have breakdowns on a per-pitch basis, so I used the player's overall season rate for each cell.)

Here's 2005:

Pitcher ------ White Hspnc Black
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78

There's a big bump in the H/H row and column -- a lot more fastballs than you would expect. It would be hard to argue that that's racial bias, since the pitch chosen is a deliberate decision from the pitcher and catcher.

It just seems like, in 2005, the H/H pitchers happened to throw a lot of fastballs.

The situation was reversed in 2006:

Pitcher ------ White Hspnc Black
White Umpire-- 61.09 60.50 62.73
Hspnc Umpire-- 61.93 58.70 58.31
Black Umpire-- 60.80 61.72 61.53

Suddenly, the H/H group is throwing many FEWER fastballs. Actually, it looks like fastballs were down across the board in 2006 -- I bet that was a change in how the stringers recorded pitches, rather than an actual change in what pitchers threw. In any case, even after adjusting for that, the H/H group is low.

So what's going on? Well, it's probably just different pitchers who make up that cell. It's somewhere around 1,000 pitches each year, which means the equivalent of maybe 20 hispanic pitchers starting against hispanic umpires. Just by chance, the 20 pitchers in 2005 were fastball pitchers, and the 20 pitchers in 2006 weren't.

Finally, here's 2004, just for completeness. It doesn't really show anything interesting.

Pitcher ------ White Hspnc Black
White Umpire-- 61.81 61.86 66.52
Hspnc Umpire-- 61.66 61.88 64.75
Black Umpire-- 62.66 65.54 66.40

So, as I was saying ... we want to try to figure out if more fastballs lead to more called strikes. To figure that out, I ran a regression to predict fastball percentage based on strike percentage, using all 27 cases in the above three tables. Since the overall FB% seems to vary from year to year, I added two dummy variables for the individual seasons.

The result: an r-squared of 0.4, and statistical significance. More important, the results of the regression equation: a relationship where, for every 1 percentage point more called strikes you get, you're likely to have thrown 1.67 percentage points fewer fastballs.

When I took out the bottom two cells in each of the "Black" columns (in which the sample sizes are very small, around 100 and 300 pitches each respectively), the result was even more significant (r-squared 0.53), and the relationship changed from 1.67 to 1.1.

So, we have a pretty good indication that more fastballs cause fewer called strikes. Technically, I shouldn't assume causation -- the data leave open the possibility that fewer called strikes cause fastballs, or that some third variable causes both lots of fastballs and fewer called strikes. But neither of those seems very plausible.


Here's a more intuitive way to see the relationship. Here's 2005 again, for fastballs:

Pitcher ------ White Hspnc Black
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78

And here's 2005 for called strikes:

Pitcher ------ White Hspnc Black
White Umpire-- 32.15 31.20 31.74
Hspnc Umpire-- 31.55 31.04 24.19
Black Umpire-- 31.39 31.53 30.88

If you compare the charts, you can see for yourself that the high FB% cells generally seem to be paired with low CS%.


Another important thing is that, now, we can't assume that when a pitcher gets few called strikes, his performance suffers. In fact, if the reason for fewer called strikes is more fastballs, it could be the other way around.

For instance, in the center cell in 2005, where the hispanic pitchers got only 31.04 percent called strikes, they gave up a very good 3.76 RC27 (like a 3.50 ERA). But in 2006, when they got 34.16% called strikes (which is very high), the batters facing them had an RC27 of 5.52. The more called strikes, the worse the performance. Very much opposite to the way you'd think.

That's when we look mostly *between* pitchers -- pitcher A, with more called strikes, is likely to be worse than pitcher B, with fewer called strikes. We don't know the relationship within the *same* pitcher. If pitcher A gets more called strikes in one start than another, is he likely to be worse in that start? We don't know.

So, when the Hamermesh study asserts that the H/H group benefits from the umpires having called more strikes in their favor, that's not necessarily true. It might be, but it also might not be. It's certainly true if the cause IS umpire bias, because that just changes the identical pitch from a ball to a strike. But if the cause is pitch selection, the relationship could be the exact opposite.


Now, in my own little study, which was an attempt to reproduce the results of the original Hamermesh study, I did indeed find that the CS% in the "hispanic/hispanic" cell was very high. Now, we have an explanation other than umpire bias -- pitching style. It could just be that the overall H/H cell had fewer fastball pitchers than expected, and that caused the results.

But, while that would explain *my* results, it won't explain the original Hamermesh results. That's because the Hamermesh study controlled for the identity of the pitcher. So, if the center cell did indeed feature a lot of finesse pitchers, their study would have adjusted for that, even though mine didn't.

Still, we have a possible *weaker* explanation. Suppose that pitchers vary their fastball tendencies from year to year. One season, they might throw 65% fastballs, but, when they're a year or two older, their slider improves, and now they only throw 55% fastballs. The Hamermesh study adjusted for the identity of the player, but not for the individual player/season. So, if hispanic pitcher X threw 55% fastballs in the season where he faced the hispanic umpire, but 60% fastballs in the season where he faced the white umpire, that would bias the results and make it look like the umpire was biased.

Or, even more granular: if pitchers change their reperatoire *from game to game*, that would also do it. For instance, suppose hispanic pitcher Y finds out his curve ball isn't working well one game, and relies more on his fastball. If that happened more in games where the umpire was white, then, again, that would make the hispanic umpire look biased in his favor.

It's important to keep in mind that this is a valid criticism only if pitch selection differences are clustered over games or seasons. If a pitcher randomly decides to throw a fastball this pitch, but a breaking ball next pitch, that's included in the significance levels of the original study. It's only when the fastballs are *clustered* within umpires, rather than random over pitches, that that's something that affects the significance levels.


So where does this leave us? Well, we haven't really found any smoking gun evidence that explains what the Hamermesh study found, since that study did control for who the pitcher is (which means they effectively controlled for fastball percentage). However, we *do* have a potential explanation, which is non-random pitch selection.

Normally, I hate when a study is criticized on the grounds of "you didn't control for X". That's a lazy argument, and it's an argument that can be leveled at any study, because, no matter how thorough, there's always *something* that hasn't been controlled for. Also, there's often no reason to believe X is important to control for. And, even if it is, there's no reason to believe that it's non-randomly distributed among the other variables.

In order to be taken seriously when you say "you didn't control for X," you need to come up with (a) an argument that X is actually an important factor, important enough to change the results, and (b) that there is reason to believe X is distributed non-randomly.

That's what I'm trying to do here. First, (a) I think I have proven that pitch type does seriously and significantly affect called strike percentage. Second (b), it's plausible that pitch type may vary *by the conscious choice of the pitcher* over seasons, and perhaps even games.

If I knew for sure that (b) happened -- if we had data that showed that it was common that, for some games a pitcher chooses to throw 70% fastballs, and some games he chooses to throw only 50% fastballs -- that would be enough to prove that the Hamermesh study's confidence intervals were overstated. Since we don't, it's just a possibility.

We don't know *for sure* that pitch types tend to cluster together. But it's a reasonable thing to look at in a future study. Based on the little I've looked at it so far, I suspect that it's a small but important factor.


P.S. Thanks to GuyM for his e-mail discussion, and to Fangraphs' David Appelman for assistance in getting the FB% data I needed.

Labels: , , ,

Friday, July 22, 2011

Umpires' racial bias disappears for other years of data

The Hamermesh (et al) study on umpire racial bias looked at data from 2004 to 2006. When I tried reproducing their numbers, I got this chart (repeated here for the nth time):

Pitcher ------ White Hspnc Black
White Umpire-- 31.88 31.27 31.27
Hspnc Umpire-- 31.41 32.47 28.29
Black Umpire-- 31.22 31.21 32.52
All Umpires –- 31.83 31.30 31.32

There's some evidence of bias there; specifically, the entries in bold, which represent hispanic umpires calling hispanic pitchers, and black umpires calling black pitchers, seem a lot higher than they "should" be compared to their row and column.

I decided to try looking at other years: specifically, 2002, 2003, 2007, and 2008 combined. The only problem with that is that I didn't have a list of minority umpires and black pitchers for those years, so I had to use the same list as in the 2004-06 sample. That means some minority pitchers and umpires may have been excluded from their proper group, and misclassified as "white". Still, there shouldn't be too many of those, and their numbers would be small.

(This problem doesn't exist for hispanic pitchers, because I used country of birth for those.)

So, here's the same chart as above for those other years:

Pitcher ------ White Hspnc Black
White Umpire-- 31.47 30.97 31.22
Hspnc Umpire-- 31.19 30.77 34.65
Black Umpire-- 30.90 30.07 32.55
All Umpires –- 31.83 31.30 31.32

The "umpires seem to favor pitchers of their own race" effect seems almost completely gone here. For instance, compare hispanic to white pitchers. Against white umpires, the hispanic pitchers got 0.50 percent fewer strikes. Against hispanic umpires, the hispanic pithcers got 0.42 percent fewer strikes. There's barely any difference.

Comparing umpires ... white pitchers called 0.28 percent more strikes for white pitchers. Hispanic umpires called 0.20 percent more strikes for white pitchers. Again, barely any difference.

The effect in the original sample was driven by the middle cell (hispanic/hispanic), which was more than a full percentage point higher than it was "supposed" to be. This doesn't happen in the new sample, where the middle cell seems to be within about 0.08 of where it should be.

The SD of that new middle cell (hispanic/hispanic) is 0.73 percent. The SD of the bottom middle cell (which appears to be very low) is 0.50 percent, so even that one isn't significant. And the two bottom cells in the right-hand column have very small sample sizes, so those can probably be ignored.

Verdict: although 2004-2006 does show some evidence of bias, there is no such effect for 2002-3-7-8.

Labels: , ,

Saturday, July 16, 2011

Minority pitchers succeed with fewer called strikes

I'm scheduled to talk about umpires and racial bias in a couple of weeks at JSM in Miami. I was hoping not to have to repeat same old things I've been talking about for the last few years, so I decided to see if there's anything new I could find. And I think I've got something, maybe. Well, I thought I had something, and it's interesting, but I now think it might be a false alarm with respect to umpires and race.

First, a quick review (and I promise it'll be quick). The Hamermesh study of racial bias (.pdf) was based on a chart that looked like this:

Pitcher ------ White Hspnc Black
White Umpire-- 31.88 31.27 31.27
Hspnc Umpire-- 31.41 32.47 28.29
Black Umpire-- 31.22 31.21 32.52
All Umpires –- 31.83 31.30 31.32

The numbers are the percentage of called pitches (not swung at) that the umpire called a strike. This chart is based only on low-attendance games (less than 30,000 fans), where the study's authors found the strongest effect. It's my attempt to reproduce their results from Retrosheet data (the authors didn't provide the equivalent chart to what I have here).

If you look at the chart, you will see that, for any race of pitcher or umpire, the largest percentage of pitches are strikes exactly when the race of the umpire matches the race of the pitcher. The original study did a big regression and found that this result is indeed statistically significant. They concluded that umpires are biased in favor of pitchers of their own race (or biased against pitchers of a different race).

That's all I'm going to say about that; if you want to see my arguments, you can go here. Now, I'm going to take a different route.


Let's start by ignoring umpires for now, and just looking at the pitchers. The bottom row of the chart shows the overall called strike percentage of the pitchers. Let me repeat them here for clarity:

31.83 percent strikes -- white pitchers
31.30 percent strikes -- hispanic pitchers
31.32 percent strikes -- black pitchers

It looks like there are real differences between the pitchers. Now, it's *possible* that the entire effect is actually caused by biased umpires, but nobody really believes that, including the authors of the original study. Different pitchers have different attributes, and it's probably just that the white pitchers are such that they happen to throw more called strikes than the minority pitchers.

Moreover, it would appear that the white pitchers happen to be *better* than the minority pitchers, since their strike percentage is higher. In fact, I think I may have said this a few times in the past, that the white pitchers were more successful.

I was wrong. Actually, it's the minority pitchers who performed better, *despite* the fact that their called pitches were less likely to be strikes.

Here are the opposition batting records for each of the three groups of pitchers, normalized to 600 PA:

White .... 543 147 30 3 17 51 099 0.271 5.02
Hispanic . 541 141 28 3 17 53 108 0.261 4.71
Black .... 546 145 28 3 14 48 106 0.266 4.57

The white pitchers performed the worst, striking out fewer batters and allowing more hits and runs. The last column of the batting record is "runs created per 27 outs."

What's going on? How is it that the minority pitchers did so much better despite having fewer called strikes? My first reaction was this: perhaps the relationship between called strikes and performance is *negative*. That is, maybe having lots of called strikes means you're throwing lots of pitches right down the middle of the plate, and you're getting hammered. Logically possible, right?

But it doesn't seem to be true. I ran a regression of Component ERA vs. Called Strike Percentage for starting pitchers with 100 IP or more, and the relationship goes the way you'd think: the higher the called strikes, the lower the ERA and the more successful the pitcher. In fact, it's a pretty strong relationship: every 0.1 percentage point in called strike percentage (example: from 31.83 percent to 31.93 percent) lowers ERA by 0.11. That's almost exactly what you'd expect knowing that the difference between a ball and a strike is approximately .14 runs.

So how is it that those pitchers bucked the relationship, and had a better performance despite fewer called strikes?

I think I was able to find the answer: they compensated by having more pitches swung at. As it turns out, the benefit of an extra percentage point in pitches swung at is also positive: an increase of 0.1 percent lowers ERA by 0.13 points.

Here are the numbers for pitches swung at:

44.99 percent pitches swung at -- White
45.52 percent pitches swung at -- Hispanic
46.84 percent pitches swung at -- Black

These are large differences, more than comparable to the differences in called strike percentage.

(By the way, keep in mind that the denominators of the two measures are different. Pitches swung at is (swung at and missed + foul balls + put in play) divided by total pitches. Called strike percentage is (called strikes) / (called strikes + balls).)

Here's the same 3x3 chart as earlier, but this time using swinging percentage:

Pitcher ------ White Hspnc Black
White Umpire-- 45.01 45.51 46.92
Hspnc Umpire-- 44.63 45.61 43.59
Black Umpire-- 44.77 46.05 46.82
All Umpires –- 44.99 45.52 46.84

Just like in the original chart, the numbers are higher when the umpire's race matches the pitcher's race (with the exception of black pitchers facing white umpires).

Now, I suppose you could argue that these differences, also, could be attributed to umpire bias. It's possible that, knowing that more umpires are biased against them, minority pitchers have to throw down the middle to compensate. That results in batters swinging the bat more.

The problem with that theory is that the minority pitchers *improved* under this (alleged) injustice. If it's really racist bias, shouldn't they have gotten worse? Because, if the racism actually made them compensate in such a way that they got better, why wouldn't they compensate all the time, not just for umpires of the opposite race?

If you want to hold on to the hypothesis that it's umpire bias, you have to assume that the bias backfired, and that the pitchers, in their ignorance, didn't realize that there was a way to pitch better than they were already pitching. That seems farfetched.


So, the minority pitchers have a *lower* percentage of called strikes, but a *higher* percentage of swinging strikes. When I saw that, I thought it might be normal: the more batters swing, the fewer strikes remain to be called by the umpire. But, again, that turns out not to be the case. There's a strong positive relationship between called strike percentage and swinging strike percentage, with a correlation coefficient of .23 (this is for 1,350 starting pitcher seasons of 100+ IP, 2000-2009).

Why, then, are the black and hispanic pitchers bucking the trend? The only thing I can think of is that even though the correlation between called strikes and swinging strikes is positive, maybe there are certain types of pitchers who go the opposite way. For instance, maybe there are three types of pitchers:

1. Pitchers who throw right down the middle. They get a lot of swings, and, when the batter doesn't swing, it's very likely to be a strike.

2. Pitchers with poor control. They don't get a lot of swings, and, when the batter doesn't swing, it's likely to be a ball.

3. Pitchers who normally throw right down the middle, but like to waste pitches frequently (or throw a certain type of pitch that sometimes goes awry). They get a lot of swings, but, when the batter doesn't swing, it's one of those waste pitches and likely to be a ball.

Types 1 and 2 would show a positive correlation between swings and called strikes. Type 3 would show a negative correlation. If there are a lot more types 1 and 2 than type 3, the overall correlation would be positive.

So, maybe black and minority pitchers are more likely to be Type 3. Any other explanations?


BTW, my first reaction was that this all had to do with count. In "Scorecasting," the authors found that umpires were reluctant to call a third strike or a fourth ball on a close pitch. That would explain the observations perfectly, like this: The minority pitchers get more strikeouts. So they get more two-strike pitches. Therefore, they get more batters swinging on those pitches, and also fewer called strikes on those pitches. That's enough to give us the results we saw.

Alas, the beautiful theory doesn't hold up. I reran the tables, but looking only at 0-0 pitches. Again, (a) the minority pitchers had more swings, and (b) on the remaining pitches, the minority pitchers got fewer called strikes. Numbers available on request.


So what is it that the minority pitchers have in common that gives them this unusual combination of low called strikes and high swinging strikes? I don't know, but I bet someone reading this can tell me.

For the ten black pitchers in the study, I looked at their tendencies from 2000 to 2009 (even though the study was only 2004 to 2006). The difference between their swinging strike percentage and their called strike percentage was 16.04, well above the average of 13.60. What is it about them, as a group, that would explain that?

Arthur Rhodes
CC Sabathia
Darren Oliver
Dontrelle Willis
Edwin Jackson
Ian Snell
Jerome Williams
LaTroy Hawkins
Ray King
Tom Gordon

I'd give you the hispanic pitchers -- I think there's about 30 of them -- but I don't have a list handy.


In any case, and getting back to the issue of umpire bias ...

This is where the false alarm comes in. When I saw that a higher called strike percentage means different things for different pitchers, I thought we might have an explanation: rather than the umpires calling more unmerited strikes, maybe it was just those pitchers pursuing a different strategy. Maybe they were occasionally deciding to pitch how the average white pitcher does -- whatever that is -- and getting more called strikes, but without a change in performance.

Alas, that's not true. *Between* races of pitchers, increased called strike percentage didn't mean better performance. But *within* races of pitchers, it did.

Here's the original 3x3 chart, but with RC27 instead of called strike percentage:

Pitcher ------ Whte Hspn Blac
White Umpire-- 4.97 4.77 4.49
Hspnc Umpire-- 5.15 4.59 5.88
Black Umpire-- 5.47 4.20 5.39
All Umpires –- 5.02 4.71 4.57

With the exception of the bottom-right cell and the bottom-center cell, the RC27 figures match the order of the called strike figures (see the very first chart of this post). It does seem like, as a characteristic of their style, black and hispanic pitchers successfully sacrifice called strikes in exchange for swinging strikes ... but when they *do* get those called strikes from certain umpires, they do even better.

So, pitchers *do* seem to benefit from extra called strikes, once you control for who the pitcher is. So we still have the same problem we had at the beginning.


That problem, still, appears to be that when the pitcher was hispanic, hispanic umpires called around 40 too many strikes out of 2,864 called pitches.

40 pitches doesn't seem like a lot over three years ... but it's only over the equivalent of about 30 or 40 team-games (1,349 PA). I don't really see an argument for how those 40 pitches could have been miscalled. It can't be anything the original study controlled for ... like home/road, starter/reliever, score, identity of the pitcher, etc. It would have to be an interaction of some of those things. Like, for instance, pitcher A throws a lot of inside sliders, and umpire B likes to call those strikes, and B happened to randomly umpire a lot of A's games.

But I don't see how the numbers work out. It's still 40 pitches in 30 games. With three hispanic umpires and 30 hispanic pitchers, that's 90 possible combinations. Some are more likely than others -- we're only looking at pitchers in front of 30,000 or fewer fans, which concentrates them a bit among certain teams -- but still, 90 combinations over 30 games makes it unlikely that one or two pairs would dominate to the tune of 40 pitches.

So, I thought I had an explanation ... but, after all this, I don't think I do. I still suspect that the result is just random, and not racial bias or any other explanation, but ... that's just my opinion.

Still, I need to think some more. Now that we know that more called strikes does not *always* lead to improved performance, and that it depends on the pitcher ... can you see any arguments that I'm missing, for what else might be happening?


UPDATE: OK, one more theory I thought of. Suppose pitcher style varies from game to game. Take, for instance, a hispanic pitcher. Some games, he pitches one way, and gets few called strikes and lots of swinging strikes. Other games, and independently of the umpire, he consciously decides to pitch differently, and he gets more called strikes and fewer swinging strikes.

In that case, pitches are no longer independent -- it's *games* that are independent. That means that you have to use a different statistical technique, like cluster sampling. The bottom line, there, is that the SD goes way up. The results stay the same, but the confidence interval widens and the statistical significance disappears.

So, if there's evidence that pitchers' expected percentages change on a game-by-game basis (that is, the *expectations* have to change due to pitcher behavior, not just the outcome of the game fluctuating because of random variation), that probably negates the statistical significance, which is the only reason to suspect umpire bias.

Labels: , , ,

Friday, July 08, 2011

Presentation on home-field advantage

I've posted the slides for my SABR presentation on home-field advantage (.ppt).

Nothing new here ... everything in the slides I've posted about previously.

Labels: ,

Sunday, July 03, 2011

Home field advantage on pitch calls, by count

Did a bit of last minute research before putting together my presentation on home field advantage (HFA) for the SABR convention.

In "Scorecasting," Toby Moskowitz and Jon Wertheim wrote that HFA on ball-strike calls varies with the importance of the situation. They said that in clutch plate appearances, HFA is very high -- but, when it doesn't matter much, HFA actually goes the *other* way, and visiting pitchers actually get the benefit of more called strikes than home batters. They concluded that biased umpires are favoring the home team, but trying to compensate the visiting team by calling more strikes for them when it's not as important.

A few months ago, MGL did a study, and found some confirmation for Scorecasting's results. He did find that HFA went up with leverage. However, he didn't find any situations in which the home team actually had an advantage -- just situations in which they had less of an advantage than usual.

So I tried the same thing today (but not as rigorously). I used 2000-2009 Retrosheet data, and I got similar results.

Overall, not looking at leverage yet, the home pitchers got 0.6 percentage points more strikes than the visiting pitchers. (Specifically, the visiting team had 31.2% of their called pitches ruled strikes, but the home team had 31.8% of theirs ruled strikes.)

In certain higher-leverage situations (for which I used 8th inning or later, score tied), the difference was higher -- 1.2 percentage points. In another higher-leverage situation (9th inning or later, tying run at the plate), the difference was also higher -- 1.0 percentage points.

But in lower-leverage situations (one team leads by 5 runs or more), the difference was only 0.3 points.


0.3 -- one team leading by 5+ runs
0.6 -- all situations
1.0 -- ninth inning+, tying run at bat
1.2 -- eighth inning+, score tied

Another thing I did is, for all these situations, I computed the HFA in terms of the outcomes of the plate appearances. Here are the home team advantages by wOBA points:

.0010 -- one team leading by 5+ runs
.0013 -- all situations
.0018 -- ninth inning+, tying run at bat
.0025 -- eighth inning+, score tied

As expected, an excellent correlation between HFA on ball/strike, and HFA on eventual outcome.

But here's something interesting: the home/road difference on what percentage of pitches were swung at (including foul balls and balls in play):

0.48 -- one team leading by 5+ runs
0.54 -- all situations
0.88 -- ninth inning+, tying run at bat
0.59 -- eighth inning+, score tied

For instance, overall, home teams swung at 44.9 percent of pitches, but road teams swung at 45.4 percent of pitches.

So, not only did home teams have fewer strikes *called* against them (first table), but they also had fewer *swings* (third table). That suggests that visiting teams actually throw fewer strikes than home teams, since this results holds even on pitches where the umpire has no say.

But, you could argue otherwise. It's possible that the swing difference is because the home batters know they're going to get more marginal calls in their favor, so they don't swing on iffy pitches in order to work a walk. It's also possible that batters are worse on the road, and they can't tell a good pitch from a bad pitch quite as well as they can at home.

So I'm not sure if we can draw any conclusions from this, but I thought it was worth mentioning.


Anyway, that does confirm the "Scorecasting" basic findings. But it occurs to me that what might be causing this is just the different ball/strike counts.

As I said above, when an umpire called a home pitch, it was a strike 31.8 percent of the time. But, I checked, and when an umpire called a home pitch *on an 0-0 count*, it was a strike 43.0 percent of the time. That's a big difference. Maybe it extends to home/road differences too?

It does. The HFA was also much bigger on 0-0: instead of 0.6 percentage points, it was 0.9 percentage points.

So maybe HFA is lower in low-leverage situations just because when a game is a blowout, teams pitch differently and you get a different frequency of the different counts. So, what I did was break down all pitches by count, and by leverage group. The leverage groups were:

High ..... 8th inning or later, 0-1 run difference
Low ...... One team leading by 6+ runs
Average .. All other plate appearances.

Here are the results, in percentage points of HFA (called strikes as a percentage of all called pitches). Standard errors are in parentheses.

-------------------------------- Leverage -----------------
----------------- Average -------- High ----------- Low ---
0-0 count ..... 1.03 (0.09) ... 1.02 (0.20) ... 0.66 (0.28)
0-1 count ..... 0.63 (0.14) ... 0.77 (0.31) ... 0.79 (0.43)
0-2 count ..... 0.51 (0.20) ... 0.62 (0.45) .. -0.13 (0.64)
1-0 count ..... 0.92 (0.14) ... 1.54 (0.31) ... 0.38 (0.44)
1-1 count ..... 0.65 (0.15) ... 0.35 (0.34) .. -0.22 (0.48)
1-2 count ..... 0.83 (0.17) ... 0.72 (0.37) ... 0.56 (0.53)
2-0 count ..... 0.98 (0.23) ... 1.11 (0.52) ... 1.18 (0.75)
2-1 count ..... 0.80 (0.20) ... 0.75 (0.46) ... 0.22 (0.65)
2-2 count ..... 0.56 (0.18) ... 0.44 (0.40) ... 1.05 (0.58)
3-0 count ..... 1.07 (0.37) ... 2.35 (0.86) ... 1.15 (1.20)
3-1 count ..... 0.90 (0.30) ... 1.31 (0.69) ... 0.64 (0.97)
3-2 count ..... 0.56 (0.22) ... 1.25 (0.50) ... 1.77 (0.72)

Is there evidence here that HFA depends on leverage? If you compare the average leverage to the high leverage, you get that the high-leverage situations have a higher HFA in 7 out of the 12 cases -- not much more than average. Comparing average to low, you get more HFA for the average situations again 7 out of 12 times. And, comparing high to low, the "high" only win 8 out of 12.

Doesn't seem like much. But I think I've just diced up the data so finely that you can't see the real pattern any more. It looks like all three of the three lowest differences do appear in the low-leverage column (although that could be partly because the SDs are high there, so you expect extreme more values than in the other columns).

Here's the equally weighted average of all three columns, each column weighted by the smaller of the frequencies (home, road) in column 1:

0.82 (~ 0.04 SD) overall
0.92 (~ 0.10 SD) high leverage
0.58 (~ 0.14 SD) low leverage

So there is something there, although smaller than it looked before adjusting for count. But the differences are not statistically significant, although the low-leverage one is close.

Conclusion: from 2000 to 2009, home teams were somewhat more likely to get a strike call in higher-leverage situations than in lower-leverage situations. This is significant only at approximately p=0.1.

Labels: , ,

Friday, July 01, 2011

Umpires and racial bias (encore presentation)

A few people have written me about this Freakonomics post, from Daniel Hamermesh, the co-author of a just-published study that found racial bias among MLB umpires.

That study is an update of the paper that came out a few years back. The basic data is the same, but the authors have updated the discussion and added a section with PitchFX data.

I haven't gone through the new stuff in detail -- I'll have to do that after the SABR convention, when I have more time. However, the basic data appear to be the same as in the previous version. So the comments I made about the paper back then still stand:

1. The authors of the paper explicitly assume, in their model, that every umpire has the same racial bias in favor of his own race. I think that assumption is unwarranted. I think it's much more likely that umpires have individually different levels of bias, just like everyone in the real world.

2. If you relax the "every umpire is the same" restriction, it's possible that there is much less racial bias than the authors found. For example, if you remove one single umpire from the study, the one whose calls favored his own race the most, the results are no longer statistically significant.

3. Therefore, while I agree that the authors found the results to be statistically significantly different from "no bias at all," I believe the conclusion -- that racial bias is pervasive among umpires -- is not supported by the data.

If you want to see my reasoning in detail, go to my website and look for the first entry under "Research". There's a PowerPoint presentation, an article (.PDF), and a series of nine blog posts. I recommend the article first, and the ninth blog post.

Labels: , ,