Monday, January 11, 2021

Splitting defensive credit between pitchers and fielders (Part II)

(Part 1 is here.  This is Part 2.  If you want to skip the math and just want the formula, it's at the bottom of this post.)


When evaluating a pitcher, you want to account for how good his fielders were. The "traditional" way of doing that is, you scale the team fielding to the pitcher. Suppose a pitcher was +20 plays better than normal, and his team fielding was -5 for the season. If the pitcher pitched 10 percent of the team innings, you might figure the fielding cost him 0.5 runs, and adjust him from +20 to +20.5.

I have argued that this isn't right. Fielding performance varies from game to game, just like run support does. Pitchers with better ball-in-play numbers probably got better fielding during their starts than pitchers with worse ball-in-play numbers.

By analogy to run support: in 1972, Steve Carlton famously went 27-10 on a Phillies team that was 32-87 without him. Imagine how good he must have been to go 27-10 for a team that scored only 3.22 runs per game!

Except ... in the games Carlton started, the Phillies actually scored 3.76 runs per game. In games he didn't start, the Phillies scored only 3.03 runs per game. 

The fielding version of Steve Carlton might be Aaron Nola in 2018. A couple of years ago, Tom Tango pointed out the problem using Nola as an example, so I'll follow his lead.

Nola went 17-6 for the Phillies with a 2.37 ERA, and gave up a batting average on balls in play (BAbip) of only .254, against a league average of .295 -- that, despite an estimate that his fielders were 0.60 runs per game worse than average. If you subtract 0.60 from Nola's stat line, you wind up with Nola's pitching equivalent to an ERA in the 1s. As a result, Baseball-Reference winds up assigning Nola a WAR of 10.2, tied with Mike Trout for best in MLB that year.

But ... could Nola really have been hurt that much by his fielders? A BAbip of .254 is already exceptionally low. An estimate of -0.60 runs per game implies his BAbip with average fielders would have been .220, which is almost unheard of.

(In fairness: the Phillies 0.60 DRS fielding estimate, which comes from Baseball Info Solutions, is much, much worse than estimates from other sources -- three times the UZR estimate, for instance. I suspect there's some kind of scaling bug in recent BIS ratings, because, roughly, if you divide DRS by 3, you get more realistic numbers, and standard deviations that now match the other measures. But I'll save that for a future post.)

So Nola was almost certainly hurt less by his fielders than his teammates were, the same way Steve Carlton was hurt less by his hitters than his teammates were. But, how much less? 

Phrasing the question another way: Nola's BAbip (I will leave out the word "against") was .254, on a team that was .306, in a league that was .295. What's the best estimate of how his fielders did?

I think we can figure that out, extending the results in my previous post.


First, let's adjust for park. In the five years prior to 2018, the Phillies BAbip for both teams combined was .0127 ("12.7 points") better at Citizens Bank Park than in Phillies road games. Since only half of Phillies games were at home, that's 6.3 points of park factor. Since there's a lot of luck involved, I regressed 60 percent to the mean of zero (with a limit of 5 points of regression, to avoid ruining outliers like Coors Field), leaving the Phillies with 2.5 points of park factor.

Now, look at how the Phillies did with all the other pitchers. For non-Nolas, the team BAbip was .3141, against a league average of .2954. Take the difference, subtract the park factor, and the Phillies were 21 points worse than average.

How much of those 21 points came from below-average fielding talent? To figure that out, here's the SD breakdown from the previous post, but adjusted. I've bumped luck upwards for the lower number of PA, dropped park down to 1.5 since we have an actual estimate, and increased the SD of pitching because the Phillies had more high-inning guys than average:

6.1 points fielding talent
3.9 points fielding luck
5.6 points pitching talent
6.8 points pitching luck
1.5 points park
11.5 points total

Of the Phillies' 21 points in BAbip, what percentage is fielding talent? The answer: (6.1/11.5)^2, or 28 percent. That's 5.9 points.

So, we assume that the Phillies' fielding talent was 5.9 points of BAbip worse than average. With that number in hand, we'll leave the Phillies without Nola and move on to Nola himself.


On the raw numbers, Nola was 41 points better than the league average. But, we estimated, his fielding was about 6 points worse, while his park helped him by 2.5 points, so he was really 44.5 points better.

For an individual pitcher with 700 BIP, here's the breakdown of SDs, again from the previous post:

 6.1  fielding talent
 7.6  fielding luck
 7.6  pitching talent
15.5  pitching luck
 3.5  park
20.2  total

We have to adjust all of these for Nola.

First, fielding talent goes down to 5.2. Why? Because we estimated it from other data, and so we have less variance than if we just took the all-time average. (A simulation suggests that we multiply the 6.1 by, from the "team without Nola" case, (SD without fielding talent)/(SD with fielding talent).)

Fielding luck and pitching luck increase because Nola had only 519 BIP, not 700.

Finally, park goes to 1.5 for the same reason as before. 

 5.2 fielding talent
10.0 fielding luck  
 7.6 pitching talent
17.3 pitching luck
 1.5 park
22.1 total

Convert to percentages:

 5.5% fielding talent
20.4% fielding luck
11.8% pitching talent
61.3% pitching luck
 0.5% park
100% total

Multiply by Nola's 44.5 points:

 2.5 fielding talent 
 9.1 fielding luck
 5.3 pitching talent
27.3 pitching luck
 0.2 park
44.5 total

Now we add in our previous estimates of fielding talent and park, to get back to Nola's raw total of 41 points:
-3.4 fielding talent [2.5-5.9]
 9.1 fielding luck
 5.3 pitching talent
27.3 pitching luck
 2.7 park            [0.2+2.5]
41 total

Consolidate fielding and pitching:

 5.6 fielding
32.6 pitching 
 2.7 park  
41   total

Conclusion: The best estimate is that Nola's fielders actually *helped him* by 5.6 points of BAbip. That's about 3 extra outs in his 519 BIP. At 0.8 runs per out, that's 2.4 runs, in 212.1 IP, for about 0.24 WAR or 10 points of ERA.

Baseball-reference had him at 60 points of ERA; we have him at 10. Our estimate brings his WAR down from 10.3 to 9.1, or something like that. (Again, in fairness, most of that difference is the weirdly-high DRS estimate of 0.60. If DRS had him at a more reasonable .20, we'd have adjusted him from 9.4 to 9.1, or something.)


Our estimate of +3 outs is ... just an estimate. It would be nice if we had real data instead. We wouldn't have to do all this fancy stuff if we had a reliable zone-based estimate specifically for Nola.

Actually, we do! Since 2017, Statcast has been analyzing batted balls and tabulating "outs above average" (OAA) for every pitcher. For Nola, in 2018, they have +2. Tom Tango told me Statcast doesn't have data for all games, so I should multiply the OAA estimate by 1.25. 

That brings Statcast to +2.5. We estimated +3. Not bad!

But Nola is just one case. And we might be biased in the case of Nola. This method is based on a pitcher of average talent. Nola is well above average, so it's likely some of the difference we attributed to fielding is really due to Nola's own BAbip pitching tendencies. Maybe instead of +3, his fielders were really +1 or something.

So I figured I'd better test other players too.

I found all pitchers from 2017 to 2019 that had Statcast estimates, with at least 300 BIP for a single team. There were a few players whose names didn't quite correlate with my Lahman database, so I just let those go instead of fixing them. That left 342 pitcher-seasons. I assume almost all of them were starters.

For each pitcher, I ran the same calculation as for Nola. For comparison, I also did the "traditional" estimate where I gave the pitcher the same fielding as the rest of the team. Here are the correlations to the "gold standard" OAA:

r=0.37 this method
r=0.23 traditional

Here are the approximate root-mean-square errors (lower is better):

11.3 points of BAbip this method
13.7 points of BAbip traditional

This method is meant to be especially relevant for a pitcher like Nola, whose own BAbip is very different from his team's. Here are the root-mean-squared errors for pitchers who, like Nola, had a BAbip at least 10 plays better than their team's:

 9.3 points this method
11.9 points traditional 

And for pitchers at least 10 plays worse:

 9.3 points this method
10.9 points traditional


Now, the best part: there's an easy formula to get our estimates, so we don't have to use the messy sums-of-squares stuff we've been doing so far. 

We found that the original estimate for team fielding talent was 28% of observed-BAbip-without-pitcher. And then, our estimate for additional fielding behind that pitcher was 26% of the difference between that pitcher and the team. In other words, if the team's non-Nola BAbip (relative to the league) is T, and Nola's is P,

Fielders = .28T + .26(P-.28T)

The coefficients vary by numbers of BIPs. But the .28 is pretty close for most teams. And, the .26 is pretty close for most single-season pitchers: luck is 25% fielding, and talent is about 30% fielding, so no matter your proportion of randomness-to-skill, you'll still wind up between 25% and 30%.

Expanding that out gives an easier version of the fielding adjustment, which I'll print bigger.


Suppose you have an average pitcher, and you want to know how much his fielders helped or hurt him in a given season. You can use this estimate:

F = .21T + .26P 


T is his team's BAbip relative to league for the other pitchers on the team, and

P is the pitcher's BAbip relative to league, and 

F is the estimated BAbip performance of the fielders, relative to league, when that pitcher was on the mound.


Next: Part III, splitting team OAA among pitchers.

Labels: , , ,


At Tuesday, January 12, 2021 10:03:00 AM, Anonymous Guy said...

Really nice work, Phil. One small quibble regarding Parks: BABIP is about .006 lower at home than on the road for all teams (a component of HFA). Shouldn't you compare a team's home/road split to that league average?

At Tuesday, January 12, 2021 2:13:00 PM, Blogger Phil Birnbaum said...

Oops! The home/road numbers were for both teams combined, the same way we do park factor for runs. Will update the post. Thanks for the catch, Guy!


Post a Comment

<< Home