Sunday, September 27, 2009

A game theory study on pitch selection

Commenter "Eddy" was kind enough to send me this link to what looks like a press release on a new study by Kenneth Kovash and Steve Levitt (of Freakonomics fame). The link is to a summary only; to get the actual study, I had to pay $5.

The study is in two parts: one baseball, and one football. I'll talk about the baseball results here; I'll send the football portion of the study to Brian Burke (of "Advanced NFL Stats") in case he wants to review it himself. I hope that's allowed under fair use and I don't have to pay another $5.

In the baseball half, the authors claim that pitchers throw too many fastballs. They would do better -- much better, in fact -- if they threw other kinds of pitches more often.

How can you tell, using game theory, whether fastballs are being overused? Simple: you just check the outcomes. If opposition hitters bat for an OPS of .850 when you throw a first-pitch fastball, but they OPS (can you use OPS as a verb?) .800 when you don't, they obviously you should cut back on the fastballs. At first glance, it might look like you should get rid of them entirely, because, that way, you could shave .050 off the opposition's OPS. But it's not that simple: as soon as the opposition realizes that you're not throwing fastballs, they'll be able to predict your pitches more accurately, and they'll wind up OPSing higher than .800 -- probably even higher than the original .850. Game theory can't tell you the right proportion, at least not without having to make assumptions that would probably be wrong. But it *can* tell you that you should adjust your strategy until the OPS-after-fastball is exactly equal to the OPS-after-non-fastball.

If that's what the Kovash/Levitt study did, it would be great. But it didn't. Instead, it did something that doesn't make sense, and makes almost all its conclusions invalid.

What did it do? It considered outcomes only for pitches that ended the "at bat". (The authors say "at bat", but I think they mean "plate appearance". I'll also use "at bat" to mean "plate appearance" for consistency with the paper.)

That's a huge selective sampling issue. It means that when a pitch on a 3-0 count is a ball, you count it; when it's put in play, you count it; but when it's a strike, you don't include it. That doesn't work. I can make up some data to show you why. Suppose:

-- Fastballs are 50% put in play, for an OPS of 1.000
-- Fastballs are 50% strikes, for an OPS of .800 after the 3-1 count.

-- Non-fastballs are 25% put in play, for an OPS of .900
-- Non-fastballs are 25% strikes, for an OPS of .800 after the 3-1 count
-- Non-fastballs are 50% balls, for an OPS of 1.000.

That summarizes to:

0.900 OPS for fastball
0.925 OPS for non-fastball

Clearly, you should throw a fastball, right?

But if you consider only the last pitch of the at-bat, you have to ignore those 3-1 counts. Then you get:

1.000 OPS for fastball
0.933 OPS for non-fastball

And it looks like you should throw *fewer* fastballs, not more. And that's wrong.

This kind of thing is exactly what Kovash and Levitt have done. They think they've shown that the fastball is a worse pitch than the non-fastball. But what they've *really* shown is that the fastball is a worse pitch than the non-fastball only if you ignore the fact that if the pitch doesn't end the at-bat, the fastball is more likely to put the count more in the pitcher's favor.

So I don't think their main regression result, the one in Table 4, holds water, and I don't think there's a way for the reader to work around it. If the authors just reran that regression, but considered the outcome even if it wasn't the last pitch of the at-bat, that would fix the problem. I'm not sure why they chose not to do that.


Still, there are some other aspects of the study that are interesting.

In Table 2, the authors show results for every count separately. On 3-2, every pitch is the last pitch of the AB (except for foul balls, which the authors actually included in the study, but don't affect the results). Therefore, the change in count isn't a consideration, and we can take the results at close to face value.

So what happens? There is indeed a big difference between fastballs and non-fastballs:

.769 OPS after a 3-2 fastball
.651 OPS after a 3-2 non-fastball.

This would certainly lead to a conclusion that pitchers are throwing too many 3-2 fastballs, and the results stunned me: I didn't expect this big a difference. But then it occurred to me: most of the OPS on 3-2 is walks. And walks are undervalued in OPS. If a 3-2 fastball results in more balls in play, but the 3-2 curveball (or whatever) results in more walks, the actual run values might be more even. That is: pitchers know that walks are "worse" than OPS says they are, so they're willing to tolerate a higher OPS for fastballs if it's contains fewer walks. That seems quite reasonable.

Suppose walks form half of OBP for fastballs, but 60% of OBP from curveballs. That's a difference of .100 in OPS due to walks. If you assume that should "really" be .140, that closes the gap from 120 points down to 80.

That adjustment is still not enough to explain the entire gap between fastballs and non-fastballs, but it's certainly part of it. In studies like this, where you're looking for very small discrepancies, and you have non-traditional proportions of offensive events, you need to use something more accurate than OPS.


But here's something that makes me worry, and I wonder if there's a problem with the authors' database. Here are the overall OPS values for ABs ending on that pitch, from the authors' Table 1:

.753 fastball
.620 non-fastball

Do you see the problem? This data puts the average OPS at .709 (fastballs being twice as likely as non-fastballs). But the overall major-league OPS for the years of the study (2002-2006) was around .750. Why the discrepancy? The authors do say they left out about 6% of pitches, mostly "unknown", but with a few knuckleballs and screwballs. But there's no way 6% of the data could bring a .750 OPS down to .709. So I'm thinking something's wrong here.

There's no such problem with Table 2, which is broken down by count instead of pitch type. That table does average out to about .750.

UPDATE: in the comments, Guy reports that if you calculate SLG with a denominator of PA instead of AB, the numbers appear to work out OK. So the authors probably just miscalculated.


Finally, the authors argue that pitchers aren't randomizing enough. According to game theory, there should be no correlation between your choice of this pitch, and your choice of the next pitch. If you have a correlation, because you're choosing not to randomize properly, the opposition can pick up on that, guess pitches with more confidence, and take advantage.

Kovash and Levitt found that pitchers have negative correlation: after a fastball, they're more likely to throw a non-fastball, and vice-versa. They conclude that teams are not playing the optimal strategy, and it's costing them runs.

However: couldn't there be another factor making it beneficial to do that? It's conventional wisdom that, after seeing a fastball, it's harder to hit a breaking pitch, because your brain is still "tuned" to the trajectory of the fastball. If that's true -- and I think every pitcher and broadcaster would think it is, to some extent -- that would easily explain how the negative correlation observed in the study could actually be the optimal strategy. But the authors don't mention it at all.


So I don't think we learn much from this paper, but there's a tidbit I found interesting. Apparently Kovash and Levitt have access to MLB bigwigs, and did a little survey:

"Executives of Major League Baseball teams with whom we spoke estimated that there would be a .150 gap in OPS between a batter who knew for certain a fastball was coming versus the same batter who mistakenly thought that there was a 100 percent chance the next pitch would *not* be a fastball, but in fact was surprised and faced a fastball."

That's kind of interesting. I have no idea how accurate the estimate is ... anybody seen any other research on this topic?

Labels: , ,


At Sunday, September 27, 2009 11:33:00 AM, Anonymous Guy said...

A textbook example of why academics need to consult with subject matter experts. I'm sure these guys put a lot of work into this paper. But if they had shown it to ANYONE with a saber background, they would have immediately been told that you can't use OPS at the count, only through the count, for this kind of analysis. Not because we're any smarter, just because when you work with count data a bit you quickly discover the selective sampling problem.

To the extent economists aren't willing to consult with amateurs, I'd give a second piece of advice: if your research uncovers very large efficiency failures -- like, a pitcher can increase his income by about $6 million a year by throwing fewer fastballs -- then you should reexamine your assumptions and methods. Because there's virtually no chance you are right. Almost every study claiming to find such large scale inefficiencies has been debunked. The only exception I can think of is the Romer 4th-down football study (and it won't surprise me if that's rebutted some day).

I agree they seem to have a serious data problem as well. Their full count data suggest an overall OPS of about .730. But the OPS on a full count is actually around .850. So they are doing something wrong in calculating OPS.

At Sunday, September 27, 2009 11:42:00 AM, Blogger Phil Birnbaum said...

Good point about the 3-2 numbers ... their overall OPS is about right in the chart that number came from, which suggests that maybe some of their other numbers are too high to compensate.

Actually, now that I think about it, OPS isn't linear, so averaging the numbers doesn't really work. That makes it harder to tell what's really happening.

But, as you say, *something* must be wrong.

At Monday, September 28, 2009 12:20:00 PM, Anonymous Anonymous said...

I don't think you can simply correlate one pitch with the next and conclude that pitchers are incorrectly randomizing their pitches without considering the count.

Since a fastball is more likely to be thrown for a strike, the count after a fastball is more likely to be a pitcher's count which makes it more likely that they are going to throw an off-speed pitch and vice versa.

That being said, I would not at all be surprised if pitchers did in fact "change" their pitches too much. We had that discussion a few months ago on The Book blog about a commentator who said something like, after two inside fastballs in a row, "I'd be shocked if he thew that same pitch again," and I commented that he has to throw that same pitch again some significant percentage of the time in order to keep the batter from guessing the next pitch with a high degree of confidence. And what the commentator said tends to be how pitchers and pitching coaches think.

Another interesting thing is that you might think that there would be a positive correlation among consecutive pitches because of batter strengths and weaknesses in general. For example, if a batter was a good fastball hitter, you might think that he gets more off-speed pitches which would lead to a positive correlation overall among consecutive pitches.


At Monday, September 28, 2009 12:36:00 PM, Blogger Phil Birnbaum said...


For the correlation part of the study, the authors did correct for count, batter, and pitcher. They actually even considered all pitches thrown to that point, so, effectively, they noticed that after "FFN", the same pitcher was more likely to throw a "F" to the same batter than after "FNF".

Sorry, I should have mentioned that in the post.

At Monday, September 28, 2009 4:37:00 PM, Anonymous Rodney Fort said...

Hi Phil.

Thanks for the post. When my colleague bounced the ideas in this paper off a friend in the Econ Dept, the friend's response was: "How do we know some of the fastballs weren't just hanging curves?" I would amend this observation to include sliders that didn't.

In addition to a tricky selection issue, it seems measurement error can account for the difference?

But you're the expert on different measurement devices so I defer to you on this one (wouldn't want to irritate Guy).

At Monday, September 28, 2009 5:22:00 PM, Anonymous Guy said...

Hi Rodney:
A good classification system is based on pitch velocity, not just horizontal and vertical movement (break), so I would expect that breaking balls that failed to break as much as expected would still be classified as non-FBs (because of their low velocity). But a poor classification method could have this problem.

At Monday, September 28, 2009 7:16:00 PM, Blogger Cyril Morong said...

Charlie Brown had this figured out a long time ago. From the old Peanuts comic strip, he is standing on the mound trying to decide what pitch to throw

Frame 1: "This guy will never be expecting a fastball..."

Frame 2: "With the bases loaded he'll be expecting a curve. But he also knows I know what he's expecting..."

Frame 3: "So if he's expecting me to pitch what I know he knows I know he knows he's expecting..."

Frame 4: "Where was I?"

At Monday, September 28, 2009 9:04:00 PM, Anonymous Rodney Fort said...

Hi Guy.

I see the point. But could you share whether or not they used the good classification system you suggest?

How did they actually generate the data?

At Monday, September 28, 2009 9:47:00 PM, Anonymous Anonymous said...


For the correlation part of the study, the authors did correct for count, batter, and pitcher. They actually even considered all pitches thrown to that point, so, effectively, they noticed that after "FFN", the same pitcher was more likely to throw a "F" to the same batter than after "FNF".

Sorry, I should have mentioned that in the post."

O.K. Again, I am not surprised at what they found. It is exceedingly difficult for a human being to randomize their behavior. One of the mistakes you will always find is in fact a correlation between consecutive behaviors.

In attempting to be random, human beings want to "mix up" their behaviors and inadvertantly tend to follow an X with a Y and vice versa even if they are attempting to behave randomly and independently.

For example, if for whatever reason a pitcher throws 3 or 4 straight fastballs (assuming that he is not almost exclusively a fastball pitcher), he is going to think, "I can't throw another fastball."

In fact, I would be willing to bet that not only is there a negative correlation, as the authors found, but that the magnitude of that negative correlation increases with "runs" of fastballs or non-fastballs.

I would also be willing to bet that after a fastball or a run of fastballs that a pitcher is more likely to throw a non-fastball than he is to throw a fastball after a run of non-fastballs (after accounting for his overall percentage of fastballs and non-fastballs of course).

IOW, let's say that a pitcher throws 50% fastballs and non-fastballs. If a pitcher throws F/F, he is more likely to follow that with a NF, then he is to follow NF/NF with a F.

That is a guess on my part from watching and playing baseball for many years.


At Monday, September 28, 2009 9:48:00 PM, Anonymous Anonymous said...

This sentence:

"One of the mistakes you will always find is in fact a correlation between consecutive behaviors."

should be "negative correlation."


At Monday, September 28, 2009 11:38:00 PM, Blogger Pizza Cutter said...

First off, I haven't read the study, so I'm possibly about to say something really stupid.

If I'm reading what you're saying correctly, you're arguing for a Nash equilibrium strategy for solving the problem. Seeing that they used OPS as their outcome metric, there's all sorts of methodological issues that come from that. However, with a better variable (or suite of variables), that could work rather nicely.

Also, on the issue of pitch sequencing, wasn't there something done on the subject (I want to say by Josh Kalk, but I could be wrong) done about 6 months ago at THT?

At Monday, September 28, 2009 11:43:00 PM, Blogger Phil Birnbaum said...

Pizza: yes, I think they mean a Nash equilibrium. According to Wikipedia, Von Neumann gets credit for equilibrium theory for zero-sum games, and Nash for non-zero-sum games. Since this is a zero-sum game, the authors talk about Von Neumann's Minimax theory.

The problem isn't so much their use of OPS (although that's part of it); it's also their flawed methodology of considering only AB-ending pitches, and the fact that their data don't seem quite right.

At Tuesday, September 29, 2009 12:18:00 AM, Anonymous Anonymous said...

I haven't gotten around to reading it, but does this article suffer from the same issues as Kovash and Levitt?


At Tuesday, September 29, 2009 12:20:00 AM, Blogger Phil Birnbaum said...


I think that one's OK, but isn't all that powerful, which means there might be some pitchers strategizing suboptimally that the test used wouldn't find.

I read that a couple of months ago, haven't got around to posting about it yet.

At Tuesday, September 29, 2009 12:59:00 AM, Anonymous Guy said...

Phil: I think the authors may have calculated OPS incorrectly. When I use PA rather than AB as the denominator for SLG, and use the B-Ref league splits by count, I get an "OPS" value for each count that's pretty close to what the authors report. My guess is they mistakenly used PAs as the denominator for both OBP and SLG.

Rodney: I wasn't being coy, I hadn't read the paper. But now I've looked at it, and they rely on coding by BIS for pitch type. They say that the coding on FB vs. non-FB matches up extremely well (94%) with that done by a BIS competitor (STATS), which suggests different observers see the same pitch type in most cases. While that doesn't preclude a general bias, for example a tendency to label pitches that are put in play as fastballs, it does give me more confidence in the coding. Assuming the BIS scorers are relying in part on radar gun velocity readings, I'm inclined to think the data is pretty good in distinguishing FBs from other pitches. But others who post here probably know much more about the quality of the BIS data than I do.

That said, the newer pitch/fx data which allows objective classification based on pitch movement and speed would definitely be a superior data source.

At Tuesday, September 29, 2009 1:01:00 AM, Blogger Phil Birnbaum said...

Guy: good catch on the miscalculation! I'll add that to the post.

At Tuesday, September 29, 2009 1:07:00 PM, Anonymous Guy said...

It's worth noting that their miscalculation of OPS greatly undervalues walks (even more than regular OPS), which likely results in non-FBs looking more effective than they actually are. Presumably, non-FBs result in relatively more BBs than FBs, but better outcomes (for the pitcher) when a BB is not the outcome.


On the pitch randomization issue, there could be a good reason for negative correlation even after controlling for count. It may be that changing speeds on successive pitches gives an advantage to the pitcher, i.e. a changeup or curve may be more effective immediately after a fastball (and vice-versa). If that were true, then some negative correlation would be correct for the pitcher, even at the cost of allowing the hitter to make a better prediction of pitch type.

At Tuesday, September 29, 2009 3:58:00 PM, Anonymous Guy said...

Tom Tango, on his blog, supports the conclusion that bias in the pitch classifications is unlikely:

"I would say it would be improbable to mix a fastball with any other kind of pitch. Let’s take Felix Hernandez, he of the occasional 91mph changeup (!). In his career, his average fastball speed is 7 to 10 mph faster than his average slider and 5-10 mph faster than his average changeup.

That’s on average. Let’s say that he’s at 94mph for his fastball, 89 for his changeup and 87 for his slider. In a game, his fastball will range 91-97, and his changeup would range 87-91, and his slider would be 85-89. As you can see, we are really talking about the fringes here of a ball that one might call a fastball, that another would call a changeup. And that’s based strictly looking at the speed of the pitch. Once you add movement, it should be a lock.

Unless Felix starts throwing his fastball intentionally with less speed and more movement (and thereby creating even more overlap), I would suspect that you can get at least 95% of his fastballs labelled as fastball strictly by using pitch speed as the sole determinant. So, I agree, bias would not be found in the FB v non-FB."

At Tuesday, January 18, 2011 2:13:00 PM, Anonymous Jarrod said...

I am fairly certain this conversation is dead but I am currently doing related research and would like to see if anyone is still even thinking about this paper.

A lot of people have complained about the fact that the analysis is strictly limited to terminal pitches, or pitches that end the at bat. As both a baseball enthusiast and developing economist I agree that this is a major limiting factor in the validity of the results, but I have yet to come up with or hear anyone else come up with a better idea.

The problem here is that if you calculate the OPS for ALL pitches, then every single pitch in a given at bat will yield the same OPS.

For example, consider the following at bat: a first pitch fastball followed by a a second pitch slider that results in a ground out. In this case if we want to create an OPS variable like Levitt & Kovash did, we will have to attribute an OPS of .000 to BOTH the first pitch fastball and the second pitch slider. The problem here is that although the fastball was instrumental in achieving the groundout, it seems odd to tack an OPS of .000 it just as it seems odd to tack an OPS of .000 to the slider.

I can think of no way to get around only using the terminal pitches. The author of this article seemed to have an idea for a regression that would take into consideration all pitches, not just terminals. Seeing as I am in a position to run such a regression, I would love to hear any suggestions or ideas about how to get around this issue.


Post a Comment

<< Home