Friday, March 24, 2017

Career run support for starting pitchers

For the little study I did last post, I used Retrosheet data to compile run support stats for every starting pitcher in recent history (specifically, pitchers whose starts all came in 1950 or later).

Comparing every pitcher to his teammates, and totalling up everything for a career ...the biggest "hard luck" starter, in terms of total runs, is Greg Maddux. In Maddux's 740 starts, his offense scored 238 fewer runs than they did for his teammates those same seasons. That's a shortfall of 0.32 runs per game.

Here's the top six:

Runs   GS   R/GS  
--------------------------------
-238  740  -0.32  Greg Maddux
-199  773  -0.26  Nolan Ryan
-192  707  -0.27  Roger Clemens
-168  430  -0.39  A.J. Burnett
-167  690  -0.24  Gaylord Perry
-164  393  -0.42  Steve Rogers

Four of the top five are in the Hall of Fame. You might expect that to be the case, since, to accumulate a big deficiency in run support, you have to pitch a lot of games ... and guys who pitch a lot of games tend to be good. But, on the flip side, the "good luck" starters, whose teams scored more for them than for their teammates, aren't nearly as good:

Runs   GS   R/GS  
--------------------------------
+238  364  +0.65  Vern Law
+188  458  +0.41  Mike Torrez
+170  254  +0.67  Bryn Smith
+151  297  +0.51  Ramon Martinez
+147  355  +0.41  Mike Krukow
+143  682  +0.21  Tom Glavine

The only explanation for the difference, that I can think of, is that to have a long career despite bad run support, you have to be a REALLY good pitcher. To have the same length career, with good run support, you can just be PRETTY good.

But, that assumes that teams pay a lot of attention to W-L record, which would be the biggest statistical reflection of run support. And, we're only talking about a difference of around half a run per game. 

Another possibility: pitchers who are the ace of the staff usually start on opening day, where they face the other team's ace. So, that game, against a star pitcher, they get below-average support. Maybe, because of the way rotations work, they face better pitchers more often, and that's what accounts for the difference. Did Bill James study this once?

In any event, just taking the opening day game .. if those games are one run below average for the team, and Nolan Ryan got 20 of those starts, there's 20 of his 199 runs right there.

--------

UPDATE: see the comments for suggestions from Tango and GuyM.  The biggest one: GuyM points out that good pitchers lead to more leads, which means fewer bottom-of-the-ninth runs when they pitch at home.  Back of the envelope estimate: suppose a great pitcher means the team goes 24-8 in his starts, instead of 16-16.  That's 8 extra wins, which is 4 extra wins at home, which is 2 runs over a season, which is 30 runs over 15 good seasons like that.
--------

Here are the career highs and lows on a per-game basis, minimum 100 starts:

Runs   GS   R/GS  
--------------------------------
- 85  106  -0.80  Ryan Franklin
- 94  134  -0.70  Shawn Chacon
-135  203  -0.66  Ron Kline
- 72  116  -0.62  Shelby Miller
-154  249  -0.62  Denny Lemaster
- 68  115  -0.59  Trevor Wilson

Runs   GS   R/GS  
--------------------------------
+127  164  +0.77  Bill Krueger
+ 82  108  +0.76  Rob Bell
+ 89  118  +0.76  Jeff Ballard
+ 81  110  +0.73  Mike Minor
+170  254  +0.67  Bryn Smith
+106  161  +0.66  Jake Arrieta
+238  364  +0.65  Vern Law

These look fairly random to me.

-------

Here's what happens if we go down to a minimum of 10 starts:

Runs   GS   R/GS  
---------------------------------
- 29   12  -2.40  Angel Moreno
- 30   13  -2.29  Jim Converse
- 23   11  -2.25  Mike Walker
- 20   11  -1.86  Tony Mounce
- 25   14  -1.81  John Gabler

Runs   GS   R/GS  
---------------------------------
+ 32   11  +2.91  J.D. Durbin
+ 43   17  +2.56  John Strohmayer
+ 58   25  +2.30  Colin Rea
+ 61   28  +2.16  Bob Wickman
+ 23   11  +2.33  John Rauch

-------

It seems weird that, for instance, Bob Wickman would get such good run support in as many as 28 starts, his team scoring more than two extra runs a game for him. But, with 2,169 pitchers in the list, you're going to get these kinds of things happening just randomly.

The SD of team runs in a game is around 3. Over 36 starts, the SD of average support is 3 divided by the square root of 36, which works out to 0.5. Over Wickman's 28 starts, it's 0.57. So, Wickman was about 3.8 SDs from zero.

But that's not quite right ... the support his teammates got is a random variable, too. Accounting for that, I get that Wickman was 3.7 SDs from zero. Not that big a deal, but still worth correcting for.

I'll call that "3.7" figure the "Z-score."  Here are the top and bottom career Z-scores, minimum 72 starts:


    Z   GS   R/GS  
--------------------------------
-3.06   72  -1.16  Kevin Gausman
-2.94  203  -0.66  Ron Kline
-2.89  249  -0.62  Denny Lemaster
-2.57  134  -0.70  Shawn Chacon
-2.57  740  -0.32  Greg Maddux

    Z   GS   R/GS  
--------------------------------
+3.79  364  +0.65  Vern Law
+3.24  254  +0.67  Bryn Smith
+3.16  164  +0.77  Bill Krueger
+3.12   93  +1.02  Roy Smith
+2.73  247  +0.56  Tony Cloninger

The SD of the overall Z-scores is 1.045, pretty close to the 1.000 we'd expect if everything were just random. But, that still leaves enough room that something else could be going on.

-------

I chose a cutoff 72 starts to include Kevin Gausman, who is still active. Last year, the Orioles starter went 9-12 despite an ERA of only 3.61. 

Not only is Gausman the highest Z-score of pitchers with 72 starts, he's also the highest Z-score of pitchers with as few as 10 starts!

Of the forty-two starters more extreme than Gausman's support shortfall of 1.16 runs per game, none of them have more than 41 starts. 

Gausman is a historical outlier, in terms of poor run support -- the hardluckest starting pitcher ever.

------

I've posted the full spreadsheet at my website, here.


UPDATE, 3/31: New spreadsheet (Excel format), updated to account for innings of run support, to correct any the bottom-of-the-ninth issues (as per GuyM's suggestion).  Actually, both methods are in separate tabs.


Labels: ,

Thursday, March 02, 2017

How much to park-adjust a performance depends on the performance itself

In 2016, the Detroit Tigers' fielders were below average -- by about 50 runs, according to Baseball Reference. Still, Justin Verlander had an excellent season, going 16-9 with a 3.04 ERA. Should we rate Verlander's season even higher than his stat line, since he had to overcome his team's poor fielding behind him?

Not necessarily, I have argued. A team's defense is better some games than others (in results, not necessarily in talent). The fact that Verlander had a good season suggests that his starts probably got the benefit of the better games. 

I used this analogy:

In 2015, Mark Buehrle and R.A. Dickey had very similar seasons for the Blue Jays. They had comparable workloads and ERAs (3.91 for Dickey, 3.81 for Buehrle). 

But in terms of W-L records ... Buehrle was 15-8, while Dickey went 11-11.

How could Dickey win only 11 games with an ERA below four? One conclusion is that he must have pitched worse when it mattered most. Because, it would be hard to argue that it was run support. In 2015, the Blue Jays were by far the best-hitting team in baseball, scoring 5.5 runs per game. 

Except that ... it WAS run support. 

It turns out that Dickey got only 4.6 runs of support in his starts, almost a full run less than the Jays' 5.5-run average. Buehrle, on the other hand, got 6.9 runs from the offense, a benefit of a full 1.4 runs per game.

------

Just for fun, I decided to run a little study to see how big the effect actually is, for pitcher run support.

I found all starters from 1950 to 2015, who:

-- played for teams with below-league-average runs scored;

-- had at least 15 starts and 15 decisions, pitching no more than 5 games in relief; and

-- had a W-L record at least 10 games above .500 (e.g. 16-6).

There were 102 qualifying pitchers, mostly from the modern era. Their average record was 20-8 (19.8-7.7). 

They played in leagues where an average 4.41 runs were scored per game, but their below-average teams scored only 4.22. 

A first instinct might be to say, "hey, these pitchers should have had a W-L record even better than they did, because their teams gave them worse run support than the league average, by 0.19 runs per start!"

But, I'm arguing, you can't say that. Run support varies from game to game. Since we're doing selective sampling, concentrating on pitchers with excellent W-L records, we're more likely to have selected pitchers who got better run support than the rest of their team.

And the results show that. 

As mentioned, the pitchers' teams scored only 4.22 runs per game that season, compared with the league average 4.41. But, in the specific games those pitchers started, their teams gave them 4.54 runs of support. 

That's not just more than the team normally scored -- it's actually even more than the league average.

4.22 team
4.41 league
4.54 these pitchers

That's a pretty large effect. The size is due in part to the fact that we took pitchers with exceptionally good records.

Suppose a pitcher goes 22-8. Because run support varies, it could be that:

-- he pitched to (say) a 20-10 level, but got better run support;
-- he pitched to (say) a 24-6 level, but got worse run support.

But it's much less common to pitch at a 24-6 level than it is to pitch at a 20-10 level. So, the 22-8 guy was much more likely to be a 20-10 guy who got good run support than a 24-6 guy who got poor run support.

The same is true for lesser pitchers, to a lesser extent. It's not as much rarer to (say) pitch at a 14-10 level than at a 12-12 level. So, the effect should be there, for those pitchers, too, but it should be smaller.

I reran the study, but this time, pitchers were included if they were even one game over .500. That increased the sample size to 1024 team-seasons. The average pitcher in the sample was 14-10 (14.4 and 9.7).

Here are the run support numbers:

4.15 team
4.40 league
4.32 these pitchers

This time, the effect wasn't so big that the pitchers actually got more support than the league average. But it did move them two-thirds of the way there. 

And, of course, not *every* pitcher in the study got better run support than his teammates. That figure was only 62.1 percent. The point is, we should expect it to be more than half.

-------

Suppose a player has an exceptionally good result -- an extremely good W-L record, or a lot of home runs, or a high batting average, or whatever. 

Then, in any way that it's possible for him to have been lucky or unlucky -- that is, influenced by external factors that you might want to correct for -- he's more likely to have been lucky than unlucky.

If a player hits 40 home runs in an extreme pitcher's park, he probably wasn't hurt by the park as much as other players. If a player steals 80 bases and is only caught 6 times, he probably faced weaker-throwing catchers than the league average. If a shortstop rates very high for his fielding runs one year, he was probably lucky in that he got easier balls to field than normal (relative to the standards of the metric you're using).

"Probably" doesn't mean "always," of course. It just means more than a 50 percent chance. It could be anywhere from 50.0001 percent to 99.9999 percent. (As I mentioned, it was 62.1 percent for the run support study.)

The actual probability, and the size of the effect, depends on a lot of things. It depends on how you define "extreme" performances. It depends on the variances of the performances and the factor you're correcting for. It depends on how many external factors actually affect the extreme performance you're seeing.

So: for any given case, is the effect big, or is it small? You have to think about it and make an argument. Here's an argument you could make for run support, without actually having to do the study.

In most seasons, the SD of a single team's runs per game is about 3. That means that in a season of 36 starts, the SD of average run support is 0.5 runs (which is 3 divided by the square root of 36). 

In the 2015 AL, the SD of season runs scored between teams was only 0.4 runs per game.

0.5 runs of variation between pitchers on a team
0.4 runs of variation between teams

That means, that, for a given starting pitcher's W-L record, randomness in what games he starts matters *more* than his team's overall level of run support. 

That's why we should expect the effect to be large.

There are other sources of luck that might affect a pitcher's W-L record. Home/road starts, for instance. If you find a pitcher with a good record, there's better than a 50-50 shot that he started more games at home than on the road. But, the amount of overall randomness in that stat is so small -- especially since there's usually a regular rotation -- that the expectation is probably closer to, say, 50.1 percent, than to the 62.1 percent that we found for run support.

But, in theory, the effect must exist, at some magnitude. Whether it's big enough that you have to worry about, is something that you have to figure out.

I've always wanted to try to study this for park effects. I've always suspected that when a player hits 40 home runs in a pitcher's park, and he gets adjusted up to 47 or something ... that that's way too high. But I haven't figured out how to figure it out.







Labels: , , , ,