How much to park-adjust a performance depends on the performance itself
In 2016, the Detroit Tigers' fielders were below average -- by about 50 runs, according to Baseball Reference. Still, Justin Verlander had an excellent season, going 16-9 with a 3.04 ERA. Should we rate Verlander's season even higher than his stat line, since he had to overcome his team's poor fielding behind him?
Not necessarily, I have argued. A team's defense is better some games than others (in results, not necessarily in talent). The fact that Verlander had a good season suggests that his starts probably got the benefit of the better games.
I used this analogy:
In 2015, Mark Buehrle and R.A. Dickey had very similar seasons for the Blue Jays. They had comparable workloads and ERAs (3.91 for Dickey, 3.81 for Buehrle).
But in terms of W-L records ... Buehrle was 15-8, while Dickey went 11-11.
How could Dickey win only 11 games with an ERA below four? One conclusion is that he must have pitched worse when it mattered most. Because, it would be hard to argue that it was run support. In 2015, the Blue Jays were by far the best-hitting team in baseball, scoring 5.5 runs per game.
Except that ... it WAS run support.
It turns out that Dickey got only 4.6 runs of support in his starts, almost a full run less than the Jays' 5.5-run average. Buehrle, on the other hand, got 6.9 runs from the offense, a benefit of a full 1.4 runs per game.
Just for fun, I decided to run a little study to see how big the effect actually is, for pitcher run support.
I found all starters from 1950 to 2015, who:
-- played for teams with below-league-average runs scored;
-- had at least 15 starts and 15 decisions, pitching no more than 5 games in relief; and
-- had a W-L record at least 10 games above .500 (e.g. 16-6).
There were 102 qualifying pitchers, mostly from the modern era. Their average record was 20-8 (19.8-7.7).
They played in leagues where an average 4.41 runs were scored per game, but their below-average teams scored only 4.22.
A first instinct might be to say, "hey, these pitchers should have had a W-L record even better than they did, because their teams gave them worse run support than the league average, by 0.19 runs per start!"
But, I'm arguing, you can't say that. Run support varies from game to game. Since we're doing selective sampling, concentrating on pitchers with excellent W-L records, we're more likely to have selected pitchers who got better run support than the rest of their team.
And the results show that.
As mentioned, the pitchers' teams scored only 4.22 runs per game that season, compared with the league average 4.41. But, in the specific games those pitchers started, their teams gave them 4.54 runs of support.
That's not just more than the team normally scored -- it's actually even more than the league average.
4.54 these pitchers
That's a pretty large effect. The size is due in part to the fact that we took pitchers with exceptionally good records.
Suppose a pitcher goes 22-8. Because run support varies, it could be that:
-- he pitched to (say) a 20-10 level, but got better run support;
-- he pitched to (say) a 24-6 level, but got worse run support.
But it's much less common to pitch at a 24-6 level than it is to pitch at a 20-10 level. So, the 22-8 guy was much more likely to be a 20-10 guy who got good run support than a 24-6 guy who got poor run support.
The same is true for lesser pitchers, to a lesser extent. It's not as much rarer to (say) pitch at a 14-10 level than at a 12-12 level. So, the effect should be there, for those pitchers, too, but it should be smaller.
I reran the study, but this time, pitchers were included if they were even one game over .500. That increased the sample size to 1024 team-seasons. The average pitcher in the sample was 14-10 (14.4 and 9.7).
Here are the run support numbers:
4.32 these pitchers
This time, the effect wasn't so big that the pitchers actually got more support than the league average. But it did move them two-thirds of the way there.
And, of course, not *every* pitcher in the study got better run support than his teammates. That figure was only 62.1 percent. The point is, we should expect it to be more than half.
Suppose a player has an exceptionally good result -- an extremely good W-L record, or a lot of home runs, or a high batting average, or whatever.
Then, in any way that it's possible for him to have been lucky or unlucky -- that is, influenced by external factors that you might want to correct for -- he's more likely to have been lucky than unlucky.
If a player hits 40 home runs in an extreme pitcher's park, he probably wasn't hurt by the park as much as other players. If a player steals 80 bases and is only caught 6 times, he probably faced weaker-throwing catchers than the league average. If a shortstop rates very high for his fielding runs one year, he was probably lucky in that he got easier balls to field than normal (relative to the standards of the metric you're using).
"Probably" doesn't mean "always," of course. It just means more than a 50 percent chance. It could be anywhere from 50.0001 percent to 99.9999 percent. (As I mentioned, it was 62.1 percent for the run support study.)
The actual probability, and the size of the effect, depends on a lot of things. It depends on how you define "extreme" performances. It depends on the variances of the performances and the factor you're correcting for. It depends on how many external factors actually affect the extreme performance you're seeing.
So: for any given case, is the effect big, or is it small? You have to think about it and make an argument. Here's an argument you could make for run support, without actually having to do the study.
In most seasons, the SD of a single team's runs per game is about 3. That means that in a season of 36 starts, the SD of average run support is 0.5 runs (which is 3 divided by the square root of 36).
In the 2015 AL, the SD of season runs scored between teams was only 0.4 runs per game.
0.5 runs of variation between pitchers on a team
0.4 runs of variation between teams
That means, that, for a given starting pitcher's W-L record, randomness in what games he starts matters *more* than his team's overall level of run support.
That's why we should expect the effect to be large.
There are other sources of luck that might affect a pitcher's W-L record. Home/road starts, for instance. If you find a pitcher with a good record, there's better than a 50-50 shot that he started more games at home than on the road. But, the amount of overall randomness in that stat is so small -- especially since there's usually a regular rotation -- that the expectation is probably closer to, say, 50.1 percent, than to the 62.1 percent that we found for run support.
But, in theory, the effect must exist, at some magnitude. Whether it's big enough that you have to worry about, is something that you have to figure out.
I've always wanted to try to study this for park effects. I've always suspected that when a player hits 40 home runs in a pitcher's park, and he gets adjusted up to 47 or something ... that that's way too high. But I haven't figured out how to figure it out.