Saturday, October 31, 2020

Calculating park factors from batting lines instead of runs

I missed a post Tango wrote back in 2019 about park factors. In the comments, he said,

"That’s one place where we failed with our park factors, using actual runs instead of "component" runs. They should be based on Linear Weights or RC or wOBA, something like that.

"Using actual runs means introducing unnecessary random variation in the mix."

Yup. One of those bits of brilliance that's obvious in retrospect.

The idea is, there's a certain amount of luck involved in turning batting events into runs, which depends on the sequence -- in other words, "clutch hitting," which is thought to be mostly random. If teams wind up scoring, say, 20 runs above average in a certain park, it could be that the park lends itself to higher offense. But, it could also be that the park is neutral, and those 20 runs just came from better clutch hitting.

So if we calculated park factors from raw batting lines, instead of actual runs, we eliminate that luck, and should get better estimates. We can still convert to expected runs afterwards.

Let's do it. I'll start with using runs as usual. Then, I'll do it for wOBA, and we'll compare.

-------

I used team-seasons from 2000-2019, except Coors Field (because it`s so extreme an outlier). I included only parks that were used at least 16 of the 20 seasons. 

To get the observed park effects, I just took home scoring (both teams combined) and subtracted road scoring (both teams combined). 

For those 444 datapoints, I got

SD(observed) = 81.6 runs

To estimate luck, I used the rule of thumb that SD(runs) for a single team's games is about 3. (Tango uses the square root of total runs for both teams, but I didn't bother.)  

If SD(1 game) = 3, then SD(81 games) = 27. But we want both teams combined, so multiply by root 2. Then, we want (home - road), so multiply by root 2 again. That gives us 54.

SD(luck) = 54 runs

Since var(observed) = var(luck) + var(non-luck), we get*

SD(non-luck) = 61.2 runs

*"var" is variance, the square of SD. I'm using it instead of "SD^2" because it makes it much easier to read.

Now, what's this thing I called "non-luck"? It's a combination of the differences between parks, and season-to season differences within the same park -- weather, how well the players are suited to the park, the parks used by other teams in the division (because of the unbalanced schedule), the parks used by interleague opponents, the somewhat-random distribution of opposing pitchers ... stuff like that.

var(non-luck) = var(between parks) + var(within park)

To estimate SD(within park), I just looked at the observed SDs of the same park across the 16-20 seasons in the dataset. There were 23 parks in the sample, and I took the root-mean-square of those 23 individual SDs. I got

SD(different seasons of park) = 64.1

But ... that 64.1 includes luck, and we want only the non-luck portion. So let's remove luck:

var(diff. seas. of park)= var(luck) + var(within park)
64.1 squared = 54 squared  + var(within park)
SD(within park) = 34.5 runs

And now we can estimate SD(between parks):

var(non-luck) = var(between parks) + var(within park)
61.2 squared = var(between parks) + 34.5 squared
SD(between parks) = 50.5 runs

Summarizing:

81.6  runs total
---------------------------------
54    luck
50.5  between parks
34.5  within park between seasons

Park squared is only 38 percent of the total squared. That means that only 38 percent of the observed park effect is real, and you have to regress to the mean by 62 percent to get an unbiased estimate.

That's a lot. And it's one reason that most sites publish park factors based on more than one season, to give luck a chance to even out.

-------

Now, let's try Tango's suggestion to use wOBA instead, and see how much luck that squeezes out.

For the same individual parks, I calculated every year's observed park difference the same way as for runs -- home wOBA minus road wOBA, both teams combined.

For the sample, SD(observed) was 0.01524, against an average wOBA of .3248. That's a ratio of 4.7%. I did a regression and found runs-per-PA increase 1.8x as fast as wOBA (probably proportional to the 1.77th power, or something), so 4.7% in wOBA is 8.45% in runs.

In the full sample, there were .118875 runs per PA, and an average 6207 PA for each home park-season. That's about 738 runs. Taking 8.45 percent of that works out to an SD of 67.3 runs.

SD(observed) = 67.3 runs

The luck SD for wOBA for a single PA is .532 (as calculated from an average batting line APBA card). I'll spare you repeating the percentage calculations, but for 6207 PA,

SD(luck) = 41.9 runs

As before, var(observed) = var(luck) + var(non-luck), so

SD(non-luck) = 52.7 runs

Looking at the RMS between-season SD of the 23 teams in the sample, 

SD(different seasons of park) = 51.2 runs

Eliminating luck to get true season-to-season differences:

var(diff. seas. of park)= var(luck) + var(within park)
51.2 squared = 41.9 squared  + var(within park)
SD(within park) = 29.4 runs

And, finally,

var(non-luck) = var(between parks) + var(within park)
52.7 squared  = var(between parks) + 29.4 squared
SD(between parks) = 43.7 runs

The summary:

67.3  runs total
---------------------------------
41.9  luck
43.7  between park
29.4  within park between seasons

Here the "between park" variance is 42 percent of the total, up from 38 percent when we used runs. So we have, in fact, gotten more accurate estimates.

------

But wait! The two methods really should give us the same estimate of the SD of the "between" and "within" park factors, since they're trying to measure the same thing. But they don't:

runs  wOBA
-----------------------------------------
81.6  67.3   runs total
-----------------------------------------
54    41.9   luck
50.5  43.7   between park
34.5  29.4   within park between seasons

(The "luck" SD is supposed to be different, since that was the whole purpose of using wOBA, to eliminate some of the random noise.)

I think the difference is due to the fact that the wOBA variances were all based on averages per PA, while the runs variances were based on averages per game (roughly, per 27 outs).

On average, the more runs you score, the more PA you'll have. So changing the denominator to PA reduces the high-scoring games relative to the low-scoring games, which compresses the differences, which reduces the SD. 

Although the differences in PA look small, they actually indicate large differences in scoring. Because, per season, every park gets roughly the same number of outs, which means roughly the same number of PA that are outs. So any "extra PA" are mostly baserunners, and very valuable in terms of runs.

If you switch from "observed runs per game" to "observed runs per 6207 PA," the observed SD drops from 81.6 to 72.7 runs.  That's an 11 percent drop. When I did the same for wOBA, the observed SD dropped by 13 percent. So, let's estimate that the difference between "per game" and "per PA" is 12 percent, and reduce everything in the runs column by 12 percent:

runs  wOBA
--------------------------------------------
71.8  67.3   runs total
--------------------------------------------
47.5  41.9   luck
44.4  43.7   between park
30.4  29.4   within park between seasons
--------------------------------------------
62%   58%    regression to long-term mean 

I'm not 100% sure this is legitimate, but it's probably pretty close. One thing I want to do to make the comparison better, is to use the same value for "between park" and "within park", since we expect the methods to produce the same estimate, and we expect that any difference is random (in things like wOBA to run conversion, or how PA vary between games, or the fact that the wOBA calculation omits factors like baserunning).

So after my manual adjustment, we have:

runs  wOBA
--------------------------------------------
71.4  67.8   runs total
--------------------------------------------
47.5  41.9   luck
44    44     between park
30    30     within park between seasons
--------------------------------------------
62%   58%    regression to long-term mean 

-------

That's still a fair bit you have to regress either way -- more than half -- but that would be reduced if you used more than one season in your sample. If we go to the average of four seasons, "luck" and "within park" both get cut in half (the square root of 1/4). 

I'll divide both of those by 2, and recalculate the top and bottom line:

runs  wOBA
--------------------------------------------
52.3  51.0   runs total
--------------------------------------------
24    21     luck
44    44     between park
15    15     within park between seasons
--------------------------------------------
29%   26%    regression to long-term mean 

So if we use a four-year park average, we should only have to regress 29 percent (for runs) or 26 percent (for wOBA). 

-------

Thanks to Tango for the wOBA data making this possible, and for other observations I'm saving for a future post.

My three previous posts on park factors are here:  one two three



Labels: , , ,