Thursday, July 28, 2011

More fastballs = fewer called strikes

A couple of weeks ago, I noticed that, from 2004 to 2006, even though hispanic and black pitchers received a lower percentage of called strikes than white pitchers (called strikes as a percentage of called pitches), they were able to post above-average numbers.

The reason, it turned out, was that despite not getting as many called strikes, they got a lot more *swinging* strikes, and that more than compensated.

I wondered why that would happen, what was so special about those pitchers. Then, commenter GuyM e-mailed me a suggestion: it looked like the ten pitchers I highlighted were all fastball pitchers.

I went over to Fangraphs and looked them up ... and Guy was right. With the exception of Ray King, the other nine pitchers threw fastballs at or above the MLB-average rate.

So, I did a more formal test. For 2004, 2005, and 2006 (separately), I split the league into the usual nine pitcher/umpire combinations (white/hispanic/black), and figured out the average fastball percentage (FB%) for each group that year. (I didn't have breakdowns on a per-pitch basis, so I used the player's overall season rate for each cell.)

Here's 2005:

Pitcher ------ White Hspnc Black
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78

There's a big bump in the H/H row and column -- a lot more fastballs than you would expect. It would be hard to argue that that's racial bias, since the pitch chosen is a deliberate decision from the pitcher and catcher.

It just seems like, in 2005, the H/H pitchers happened to throw a lot of fastballs.

The situation was reversed in 2006:

Pitcher ------ White Hspnc Black
White Umpire-- 61.09 60.50 62.73
Hspnc Umpire-- 61.93 58.70 58.31
Black Umpire-- 60.80 61.72 61.53

Suddenly, the H/H group is throwing many FEWER fastballs. Actually, it looks like fastballs were down across the board in 2006 -- I bet that was a change in how the stringers recorded pitches, rather than an actual change in what pitchers threw. In any case, even after adjusting for that, the H/H group is low.

So what's going on? Well, it's probably just different pitchers who make up that cell. It's somewhere around 1,000 pitches each year, which means the equivalent of maybe 20 hispanic pitchers starting against hispanic umpires. Just by chance, the 20 pitchers in 2005 were fastball pitchers, and the 20 pitchers in 2006 weren't.

Finally, here's 2004, just for completeness. It doesn't really show anything interesting.

Pitcher ------ White Hspnc Black
White Umpire-- 61.81 61.86 66.52
Hspnc Umpire-- 61.66 61.88 64.75
Black Umpire-- 62.66 65.54 66.40

So, as I was saying ... we want to try to figure out if more fastballs lead to more called strikes. To figure that out, I ran a regression to predict fastball percentage based on strike percentage, using all 27 cases in the above three tables. Since the overall FB% seems to vary from year to year, I added two dummy variables for the individual seasons.

The result: an r-squared of 0.4, and statistical significance. More important, the results of the regression equation: a relationship where, for every 1 percentage point more called strikes you get, you're likely to have thrown 1.67 percentage points fewer fastballs.

When I took out the bottom two cells in each of the "Black" columns (in which the sample sizes are very small, around 100 and 300 pitches each respectively), the result was even more significant (r-squared 0.53), and the relationship changed from 1.67 to 1.1.

So, we have a pretty good indication that more fastballs cause fewer called strikes. Technically, I shouldn't assume causation -- the data leave open the possibility that fewer called strikes cause fastballs, or that some third variable causes both lots of fastballs and fewer called strikes. But neither of those seems very plausible.


Here's a more intuitive way to see the relationship. Here's 2005 again, for fastballs:

Pitcher ------ White Hspnc Black
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78

And here's 2005 for called strikes:

Pitcher ------ White Hspnc Black
White Umpire-- 32.15 31.20 31.74
Hspnc Umpire-- 31.55 31.04 24.19
Black Umpire-- 31.39 31.53 30.88

If you compare the charts, you can see for yourself that the high FB% cells generally seem to be paired with low CS%.


Another important thing is that, now, we can't assume that when a pitcher gets few called strikes, his performance suffers. In fact, if the reason for fewer called strikes is more fastballs, it could be the other way around.

For instance, in the center cell in 2005, where the hispanic pitchers got only 31.04 percent called strikes, they gave up a very good 3.76 RC27 (like a 3.50 ERA). But in 2006, when they got 34.16% called strikes (which is very high), the batters facing them had an RC27 of 5.52. The more called strikes, the worse the performance. Very much opposite to the way you'd think.

That's when we look mostly *between* pitchers -- pitcher A, with more called strikes, is likely to be worse than pitcher B, with fewer called strikes. We don't know the relationship within the *same* pitcher. If pitcher A gets more called strikes in one start than another, is he likely to be worse in that start? We don't know.

So, when the Hamermesh study asserts that the H/H group benefits from the umpires having called more strikes in their favor, that's not necessarily true. It might be, but it also might not be. It's certainly true if the cause IS umpire bias, because that just changes the identical pitch from a ball to a strike. But if the cause is pitch selection, the relationship could be the exact opposite.


Now, in my own little study, which was an attempt to reproduce the results of the original Hamermesh study, I did indeed find that the CS% in the "hispanic/hispanic" cell was very high. Now, we have an explanation other than umpire bias -- pitching style. It could just be that the overall H/H cell had fewer fastball pitchers than expected, and that caused the results.

But, while that would explain *my* results, it won't explain the original Hamermesh results. That's because the Hamermesh study controlled for the identity of the pitcher. So, if the center cell did indeed feature a lot of finesse pitchers, their study would have adjusted for that, even though mine didn't.

Still, we have a possible *weaker* explanation. Suppose that pitchers vary their fastball tendencies from year to year. One season, they might throw 65% fastballs, but, when they're a year or two older, their slider improves, and now they only throw 55% fastballs. The Hamermesh study adjusted for the identity of the player, but not for the individual player/season. So, if hispanic pitcher X threw 55% fastballs in the season where he faced the hispanic umpire, but 60% fastballs in the season where he faced the white umpire, that would bias the results and make it look like the umpire was biased.

Or, even more granular: if pitchers change their reperatoire *from game to game*, that would also do it. For instance, suppose hispanic pitcher Y finds out his curve ball isn't working well one game, and relies more on his fastball. If that happened more in games where the umpire was white, then, again, that would make the hispanic umpire look biased in his favor.

It's important to keep in mind that this is a valid criticism only if pitch selection differences are clustered over games or seasons. If a pitcher randomly decides to throw a fastball this pitch, but a breaking ball next pitch, that's included in the significance levels of the original study. It's only when the fastballs are *clustered* within umpires, rather than random over pitches, that that's something that affects the significance levels.


So where does this leave us? Well, we haven't really found any smoking gun evidence that explains what the Hamermesh study found, since that study did control for who the pitcher is (which means they effectively controlled for fastball percentage). However, we *do* have a potential explanation, which is non-random pitch selection.

Normally, I hate when a study is criticized on the grounds of "you didn't control for X". That's a lazy argument, and it's an argument that can be leveled at any study, because, no matter how thorough, there's always *something* that hasn't been controlled for. Also, there's often no reason to believe X is important to control for. And, even if it is, there's no reason to believe that it's non-randomly distributed among the other variables.

In order to be taken seriously when you say "you didn't control for X," you need to come up with (a) an argument that X is actually an important factor, important enough to change the results, and (b) that there is reason to believe X is distributed non-randomly.

That's what I'm trying to do here. First, (a) I think I have proven that pitch type does seriously and significantly affect called strike percentage. Second (b), it's plausible that pitch type may vary *by the conscious choice of the pitcher* over seasons, and perhaps even games.

If I knew for sure that (b) happened -- if we had data that showed that it was common that, for some games a pitcher chooses to throw 70% fastballs, and some games he chooses to throw only 50% fastballs -- that would be enough to prove that the Hamermesh study's confidence intervals were overstated. Since we don't, it's just a possibility.

We don't know *for sure* that pitch types tend to cluster together. But it's a reasonable thing to look at in a future study. Based on the little I've looked at it so far, I suspect that it's a small but important factor.


P.S. Thanks to GuyM for his e-mail discussion, and to Fangraphs' David Appelman for assistance in getting the FB% data I needed.

Labels: , , ,


At Friday, July 29, 2011 10:03:00 AM, Blogger BMMillsy said...

"So, we have a pretty good indication that more fastballs cause fewer called strikes."

I actually find that it's the velocity that drives this. While the coefficient on the fastball indicator in my models is very positive (when compared to off-speed), the velocity coefficient is negative.

These two even out when the fastball is about 10 mph harder than the other pitches thrown, and if the difference is larger, then the probability of a fastball strike is lower, given the location and count and other things.

In other words, all else equal, a 90 mph fastball will be called a strike at the same rate as an 80 mph change-up (or something around there). A faster FB would be less likely to be called a strike.

Cutters, however, tend to be called strikes at a lower rate than other fastballs (the coefficient is about half so that an 84 or 85 mph cutter would be called at the same rate as an 80 mph change-up).

I'm still not sure that this isn't just picking up some excluded variable, though.

At Friday, July 29, 2011 10:08:00 AM, Blogger Phil Birnbaum said...

Makes sense ... based on the rule of thumb the harder you throw, the less control you have.

Thanks! So now we have a second factor driving CS%: pitch speed, in addition to pitch type.

At Friday, July 29, 2011 10:23:00 AM, Blogger BMMillsy said...

"based on the rule of thumb the harder you throw, the less control you have."

Likely true; however, the effects I describe above are actually controlling for pitch location and a host of other factors.

So it seems that the velocity gets in the way of the umpire being able to actually judge where the pitch is. But so does pitch movement.

At Friday, July 29, 2011 10:34:00 AM, Blogger Phil Birnbaum said...

Ah, interesting! What percentage of (say) fastballs does the umpire get wrong, typically?

At Friday, July 29, 2011 10:45:00 AM, Blogger BMMillsy said...

Might take a while to the get "wrong" calls tabulated.

However, this post did lead me to revisit velocity by race. Here is what I have:

All Pitches:
Black: 88.73 mph
Hispanic: 87.59 mph
White: 86.60 mph

Fastballs (FA, FF, FT, FC, SI):
Black: 92.25
Hispanic: 91.51
White: 90.49

At Friday, July 29, 2011 10:54:00 AM, Blogger BMMillsy said...

Ah, nevermind about taking a while. I have the Sensitivity/Specificity by pitch for all umps. Forgot about that.

Keep in mind I define the zone as the rulebook zone (width of the plate) and Mike Fast's suggestions for top and bottom of the zone. "Correctness" would probably increase with a wider zone.

For All Pitches inside the zone, 86.68% are called Strikes correctly. For All Pitches outside the zone, 85.35% are correctly called Balls.

For FA classification:
85.86% vs. 82.73%

For FF classification:
87.45% vs. 82.86%

For FC classification:
87.51% vs. 85.05%

For FT classification:
87.53% vs. 84.27%

For SI classification:
87.70% vs. 83.87%

HOWEVER, it is important to keep in mind that these are not conditional on count. Things change fairly dramatically depending on who the count favors (as J-Doug and others have shown).

At Friday, July 29, 2011 11:24:00 AM, Blogger Mike Fast said...

I definitely agree with Millsy's last paragraph. I'm not sure how we can do any umpire studies until we agree on what the strike zone is or ought to be. And that is much harder than you might imagine!

Or at least studies should define very carefully what strike zone they are using and be very explicit about the fact that umpires are being judged relative to that artificially-defined zone rather than relative to any impact on the game in real life.


Post a Comment

<< Home