Do pitchers throw no worse in their bad starts than in their good starts?
Suppose your starting pitcher does great against the first nine batters he faces, getting them all out. You'd think he's got good stuff tonight, right? And that he'd do better than normal the rest of the game.
It turns out that's not the case. In a study I did nine years ago (.pdf, page 13), I found that when a pitcher had a no-hitter going through three, four, or five innings, his performance in the remainder of the game was almost exactly what you'd expect out of a pitcher of his general quality. The great early performance didn't signal a better performance later on.
The authors of "The Book" did a more comprehensive study (page 189), and came to the same conclusion. If a pitcher is excellent early, that doesn't give you any extra information on how he'll perform later.
What if the pitcher is really crappy his first couple of innings? My study showed that, again, there was no effect. After giving up three runs in the first inning, starters reverted to almost exactly their expected form in the remainder of the game. In this case, though, "The Book" found a different, and more intuitive result: pitchers were actually a little worse than usual after getting hammered: you would have expected them to give up a wOBA of .360, but they got hit for .378. That's statistically significant, but still fairly small.
Anyway, the point is: there seems to be a whole lot of luck involved in pitching. The pitchers who got hammered gave up a wOBA of .701 to their first nine batters, but only .378 afterwards. Now, I suppose you could argue that they were legitimately unskilled at first, so bad that the entire lineup hit better than Barry Bonds or Babe Ruth at their peak. But it's much more likely that they just had bad luck, that they were pretty much the same pitcher before and after, and just happened to throw pitches a bit worse than usual at first. Or, that the hitters were lucky enough to get good wood on the ball during that inning.
If you think about it, that's roughly how we think of hitters. When a career .250 hitter goes 3-for-4 one day, we praise him for what he did, and might even name him the MVP of the game. But, still, we don't assume that he somehow played better than usual. We don't say it in so many words, but we understand that every player has good games and bad games, lucky at-bats and unlucky at-bats, and 3-for-4 is not that unusually lucky. He was the same hitter as always, but happened to get better results that day.
For pitchers, on the other hand, we're a little bit less understanding that way. When a pitcher gets hammered, we usually talk about how he gave up bad pitches, or how he didn't have his control, or some such. It's rare that a pitcher will give up five runs in the first inning without everyone trying to figure out what's wrong.
But if you think about it for a bit, you'll understand there must be a lot of luck. If you've played simulation games, like APBA or Strat-O-Matic or Diamond Mind, you've experienced that the same pitcher can appear to pitch awesome, or get hammered, just by a few unlucky rolls of the dice. I think I found once that an average team has a standard deviation of 3 runs scored per game due to luck. If you assume that runs scored is normally distributed (which it's not, but doesn't affect the argument much), you'll see that a decent starter with an ERA of 4.00 would be expected to give up 10 runs in a complete game, one time a year, just by luck alone. (Of course, he'll likely be relieved long before he gets to 10 runs, but the point remains.)
Which brings me to the point of this post, which is an awesome pitching analysis by Nick Steiner, over at The Hardball Times. Steiner looked up PITCHf/x data for A.J. Burnett, during his good starts, and during his bad starts, and found ... pretty much *no difference*. If you haven't seen the article yet, you should go there now, scroll about halfway down, and look at the two scatterplots. They look almost the same to me. Also, the pitch selection bar graphs, at the bottom of the piece ... they look pretty much the same, too.
That's a bit different from what we were talking about before, where pitchers "got better" after a bad start. There, it was still possible that they threw worse pitches even though their "stuff" was just fine. But here, we're finding that not only was their skill level apparently not changed, but the *actual pitches* were the same.
That is, there's two different kinds of luck. First, there's the possibility that, even though your skill is fine, your pitches are a bit unlucky and don't quite work -- they don't hit the corners, or the specific curve ball hangs a little more than normal. Second, there's the possibility that, your pitches are just as good as any other day -- but you're unlucky enough that the batters just happened to hammer them.
It's the second kind of luck that we're talking about here. Of course, even with Steiner's data, we're not 100% sure it's luck. There are other things it could be, as some of the commenters have pointed out:
-- Mistake pitches. Maybe the difference between a good and bad outing is just the number of mistakes. From the charts, it would be easy to miss a few hanging curves, or fastballs down the middle.
-- Combinations. Maybe it's not just what kind of pitch, how fast it is, and how it breaks: maybe it's the timing of certain pitches relative to others. If it's hard to hit a curve ball after a fastball, and the pitcher doesn't choose that combination often enough, the batters will have better results.
-- Slight differences. Maybe a small difference in location makes a big difference in results. It could be that there *are* differences in those to scatterplots, but we can't pick them up with the naked eye.
-- Umpires. Part of the effect could be a different strike zone between the "good" days and the "bad" days -- the same pitch that was called a strike five days ago is called a ball today.
-- Differences that PITCHf/x doesn't tell you about. Maybe certain pitches are deceiving in ways that type, velocity, and spin don't capture: two pitches that look identical on paper might look very different to the batter.
All those things are possible, but they just don't seem very likely to me. The most plausible one, to me, is differences that aren't captured in the data. I'm not well-enough informed to know if that could be happening.
My feeling is that a lot of the luck comes from the batter side. The pitcher has time to plan and decide; the batter has very little time to react. If the batter has to "guess" what kind of pitch is coming, and roughly where it's going to cross the plate, that's inherently a random process. If the hitter is "waiting on a fastball," and he gets one, things are going to work out well for him. If it's a curve ball, not so much.
Doesn't it seem reasonable that a certain pitch might be a "good" pitch only in the sense of probability? Maybe a certain pitch is a strike 50% of the time, a ball 20% of the time, an out 20% of the time, and a hit 10% of the time. But, if instead of ten pitches going 5/2/2/1, one day they might go 5/2/1/2 -- only because the batter happened to guess right one extra time, and got good wood on the ball. That one extra hit is worth an average of more than three-quarters of a run. Depending on circumstances, it could be several runs. And not because anything the pitcher did differently, but just because the batter decided to wait on a curve ball instead of a slider.
Anyway, it shouldn't be too hard to check: find a bunch of pitchers of roughly the same ability. Figure out the variation of their results. Then, run a simulation, and check the variation of *those* results. If they're the same, you've just shown that pitchers perform like their own APBA cards, and that it's likely that almost all of what you see is randomness. If not, then the difference between the two variances (actually, the square root of the squares of the differences) is something other than luck.
What might that be? 20 years ago, we probably thought it was pitchers having better stuff some days, and not other days. But now, we know that's a fairly small effect, except (as "The Book" found) for inexperienced pitchers.
Up until a few days ago, we thought a lot of it might be pitchers having their stuff, but just getting unlucky and happening to throw bad pitches that day. But now, we seem to have evidence that that's not a big factor either. But even though it's not a *big* factor, it must have *some* effect. That's because we know that some pitches are easier to hit than others; there have been PITCHf/x studies that have shown, as expected, that pitches down the middle get hammered, and that certain levels of movement are easier or harder to hit than others. So it can't be that the pitches actually thrown make *no* difference.
But it does appear that the difference is small, at least compared to luck -- because, when we compare good games to bad games, the difference in the scatterplot of pitches is too small to notice!
What does that mean, in practical terms? It means that you shouldn't necessarily take out your starter just because he gives up a lot of runs, because he's likely just his usual self. That might be hard for some managers, when their ace gives up 7 runs in the first inning.
But managers already know that, perhaps. On "The Book" blog, Tangotiger found that A.J. Burnett threw almost as many pitches in his bad starts as in his good starts (101 vs. 105).
So what else do we learn? Well, from the experience of DIPS, it could turn out that Steiner has found a way to eliminate a lot of noise from a pitcher's record. If and when we can associate a firm run value to a specific pitch, based on type, speed, location, and spin, we might be able spot those pitchers who were unlucky: who threw good pitches, but were hit hard anyway.
That might be a ways off: it might be that there are things about the individual pitcher that go beyond those measurements that PITCHf/x makes, and it could be that the "mistake" pitches get lost in the scatterplot. But, as a starting point, I think teams would have at least a bit of an edge applying Steiner's conclusions anyway.