Do pitchers perform worse after a high-pitch start?
Last week, J.C. Bradbury and Sean Forman released a study to check whether throwing a lot of pitches affects a pitcher's next start. The paper, along with a series of PowerPoint slides, can be found at JC's blog, here.
There were several things that the study checked, but I'm going to concentrate on one part of it, which is fairly representative of the whole.
The authors tried to predict a starting pitcher's ERA in the following game, based on how many pitches he threw this game, and a bunch of other variables. Specifically:
-- number of pitches
-- number of days rest
-- the pitcher's ERA in this same season
-- the pitcher's age
It turned out that, controlling for the other three factors, every additional pitch thrown this game led to a .007 increase in ERA the next game.
Except that, I think there's a problem.
The authors included season ERA in their list of controls. That's because they needed a way to control for the quality of the pitcher. Otherwise, they'd probably find that throwing a lot of pitches today means you'll pitch well next time -- since the pitcher who throws 130 pitches today is more likely to be Nolan Ryan than Josh Towers.
So, effectively, they're comparing every pitcher to himself that season.
But if you compare a pitcher to himself that season, then it's guaranteed that an above-average game (for that pitcher) will be more likely to be followed by a below-average game (for that pitcher). After all, the entire set of games has to add up to "average" for that pitcher.
This is easiest to see if you consider the case where the pitcher only starts two games. If the first game is below his average, the second game absolutely must be above his average. And if the first game is above his average, the second game must be below.
The same thing holds for pitchers with more than two starts. Suppose a starter throws 150 innings, and gives up 75 runs, for an ERA of 4.50. And suppose that, today, he throws a 125-pitch complete game shutout.
For all games other than this one, his record will be 141 innings and 75 earned runs, for a 4.79 ERA. So, in his next start, you'd expect him, in retrospect, to be significantly worse than his season average of 4.50. That difference isn't caused by the 125 pitches. It's just the logical consequence that if this game was above the season average, the other games combined must be below the season average.
Now, high pitch counts are associated with above-average games, and low pitch counts are associated with bad starts. So, since a player should be below average after a good start, and a high pitch start was probably a very good start, then it follows that a player should be below his average after a high pitch start. Similarly, he should be above his average after a low-pitch start. That's just an artifact of the way the study was designed, and has nothing to do with the player's arm being tired or not.
How big is the effect over the entire study? I checked. For every starter from 2000 to 2009 starting on less than 15 days rest, I computed how much his ERA would have been higher or lower had that start been eliminated completely. Then I grouped the starts in groups, by number of pitches. The results:
(Note: even though I'm talking about ERA, I included unearned runs too. I really should say "RA", but I'll occasionally keep on saying "ERA" anyway just to keep the discussion easier to follow. Just remember: JC/Sean's data is really ERA, and mine is really RA.)
To read one line off the chart: if you randomly found a game in which a starter threw only 50 pitches, and eliminated that game from his record, his season ERA would drop by half a run, 0.50. That's because a 50-pitch start is probably a bad outing, so eliminating it is a big improvement.
That's pretty big. A pitcher with an ERA of 4.00 *including* that bad outing might be 3.50 in all other games. And so, if he actually pitches to an ERA of around 3.50 in his next start, that would be just as expected by the logic of the calculations.
It's also interesting to note that the effect is very steep up to about 90 pitches, and then it levels off. That's probably because, after 90, any subsequent pitches are more a consequence of the pitcher's perceived ability to handle the workload, and less the number of runs he's giving up on this particular day.
Finally, if you take the "if this game were omitted" ERA difference in every game, and regress it against the number of pitches, what do you get? You'll get that every extra pitch causes an .006 increase in ERA next game -- very close to the .007 that JC and Sean found in their study.
So, that's an argument that suggests the result might be just due to the methodology, and not to arm fatigue at all. To be more certain, I decided to try to reproduce the result. I ran a regression to predict next game's ERA from this game's pitches, and the pitcher's season ERA (the same variables JC and Sean used, but without age and year, which weren't found to be significant). I used roughly the same database they did -- 1988 to 2009.
My result: every extra pitch was worth .005 of ERA next game. That's a bit smaller than the .007 the authors found (more so when you consider that theirs really is ERA, and mine includes unearned runs), but still consistent. (I should mention that the original study didn't do a straight-line linear regression like I did -- the authors investigated transformations that might have wound up with a curved line as best fit. However, their graph shows a line that's almost straight -- I had to hold a ruler to it to notice a slight curve -- so it seems to me that the results are indeed similar.)
Then, I ran the same regression, but, this time, to remove the flaw, I used the pitcher's ERA for that season but adjusted *to not include that particular game*. So, for instance, in the 50-pitch example above, I used 3.50 instead of 4.00.
Now, the results went the other way! In this regression, every additional pitch this game led to a .003 *decrease* in runs allowed next game. Moreover, the result was only barely statistically significant (p=.07).
So, there appears to be a much weaker relationship between pitch count and future performance when you choose a better version of ERA, one that's independent of the other variables in the regression.
However, there's still some bias there, and there's one more correction we can make. Let me explain.
In 2002, Mike Mussina allowed 103 runs in 215.2 innings of work, for an RA of 4.30.
Suppose you took one of Mussina's 2005 starts, at random. On average, what should his RA that game be?
The answer is NOT 4.30. It's much higher. It's 4.89. That is, if you take Mussina's RA for every one of his 33 starts, and you average all those numbers out, you get 4.89.
Why? Because the ERA calculation, the 4.30, is when you weight all Mussina's innings equally. But, when we wonder about his average ERA in a game, we're wanting to treat all *games* equally, not innings. The July 31 game, where he pitched only 3 innings and had an RA of 21.00, gets the same weight in the per-game-average as his 9-inning shutout of August 28, with an RA of 0.00.
In ERA, the 0.00 gets three times the weight of the 9.00, because it covered three times as many innings. But when we ask about ERA in a given game, we're ignoring innings, and just looking at games. So the 0.00 gets only equal weight to the 9.00, not three times.
Since pitchers tend to pitch more innings in games where they pitch better, ERA gives a greater weight to those games. And that's why overall ERA is lower than averaging individual games' ERAs.
The point is: The study is trying to predict ERA for the next game. The best estimate for ERA next game is *not* the ERA for the season. That's because, as we just saw, the overall season ERA is too low to be a proper estimate of a single game's ERA. Rather, the best estimate of a game's ERA is the overall average of the individual game ERAs.
So, in the regression, instead of using plain ERA as one of the dependent variables, why not use the player's average game ERA that season? That would be more consistent with what we're trying to predict. In our Mussina example, instead of using 4.30, we'll use 4.89.
With the exception, of course, that we'll subtract out the current game from the average game ERA. So, if we're working on predicting the game after Mussina's shutout, we'll use the average game ERA from Mussina's other 32 starts, not including the shutout. Instead of 4.89, that works out to 5.04.
That is, I again ran a regression, trying to predict the next game's RA based on:
-- pitches thrown this game
-- pitcher's average game ERA this season for all games excluding this one.
When I did that, what happened?
The effect of pitches thrown disappeared, almost entirely. It went down to -.0004 in ERA, and wasn't even close to significant (p=.79). Basically, the number of pitches thrown had no effect at all on the next start.
So I think what JC and Sean found is not at all related to arm fatigue. It's just a consequence of the fact that their model retroactively required all the starts to add up to zero, relative to that pitcher's season average. And so, when one start is positive, the other starts simply have to work out to be negative, to cancel out. That makes it look like a good start causes a bad start, which makes it look like a high-pitch start causes a bad start.
But that's not true. And, as it turns out, when we correct for the zero-sum situation, the entire effect disappears. And so it doesn't look to me like pitches thrown has any connection to subsequent performance.
UPDATE: I took JC/Sean's regression and added one additional predictor variable -- ERA in the first game, the game corresponding to the number of pitches.
Once you control for ERA that game, the number of pitches became completely non-significant (p=.94), and its effect on ERA was pretty much zero (-0.00014).
That is: if you give up the same number of runs in two complete games, but one game takes you 90 pitches, and the other takes you 130 pitches ... well, there's effectively no difference in how well you'll pitch the following game.
That is strongly supportive of the theory that number of pitches is significant in the study's regression only because it acts as a proxy for runs allowed.