Sabermetric Research: Home field advantage is naturally higher in a hitter's park

The Rockies have always had a huge home-field advantage (HFA) at Coors. From 1993 to 2001, Colorado has played .545 at home, but only .395 on the road. That's the equivalent of the difference between going 89-73 and 64-98.

Why such a big difference? I have some ideas I'm working on, but the most obvious one -- although it's not that big, as we will see -- is that higher scoring naturally, mathematically, leads to a bigger HFA.

When teams play better at home than on the road -- for whatever reason --the manifestation of "better" is in physical performance, not winning percentage as such. The translation from performance to winning percentage depends on the characteristics of the game.

In MLB, historically, the home team plays around .540. But if the commissioner decreed that now games were going to be 36 innings long instead of 9, the home advantage would roughly double, with the home team now winning at a .580 pace.

(Why? With the game four times as long, the SD of the score difference by luck would double. But the home team's run advantage would quadruple. So the run differential by talent would double compared to luck. Since the normal distribution is almost linear at such small differences (roughly, from 0.1 SD to 0.2 SD), HFA would approximately double.)

But it's not *always* that a higher score number increases HFA. If it was decided that all runs now count as 2 points, like in basketball, scoring would double, but, obviously, HFA would stay the same.

Roughly speaking, increased scoring increases the home advantage only if it also increases the "signal to noise ratio" of performance to luck. Increasing the length of the game does that; doubling all the scores does not.

In 2000, Coors Field increased scoring by about 40%. If that forty percent was obtained by increasing games from 9 innings to 13 innings, HFA would be around 20% higher. If the forty percent was obtained by making every run count as 1.4 runs, HFA would be 0% higher. In reality, the increase could be anywhere between 0% and 20%, or beyond.

We probably have the tools available to get a pretty good estimate of the true increase.

------

Let's start with the overall average HFA. My subscription to Baseball Reference allowed me to obtain home and road batting records, all teams combined, for the 1980-2022 seasons:

AB H 2B 3B HR BB SO
------------------------------------------------------
home 3209469 846723 161290 19928 95790 321178 612545
road 3363640 859813 163954 17203 96043 308047 668363

What's the run differential between those two batting lines? We can look at actual runs, or even the difference in run statistics like Runs Created or Extrapolated Runs. But, for better accuracy, I used Tom Tango's on-line Markov Calculator (the version modified by Bill Skelton, found here). It turns out the home batting line leads to 4.79 runs per nine innings, and the road batting line works out to 4.36 R/9.

AB H 2B 3B HR BB SO R/9
-------------------------------------------------------------
home 3209469 846723 161290 19928 95790 321178 612545 4.79
road 3363640 859813 163954 17203 96043 308047 668363 4.36
-------------------------------------------------------------
difference 0.43

That's a difference of 0.43 runs per game. Using the rule of thumb that 10 runs equals one win, a rough estimate is that the home team should have a win advantage of 0.043 wins per game, for a winning percentage of .543.

That's a pretty good estimate -- home teams actually went .539 in that span (51832-44409). But, we'll actually need to be more accurate than that, because the "10 runs per win" figure will change significantly for higher-scoring environments such as Coors.

So let's calculate an estimate of the actual runs per win for this scoring environment.

The Tango/Skelton Markov calculator includes a feature where, given the batting line, it will show the probability of a team scoring any particular number of runs in a nine-inning game. Here's part of that output:

home road
----------------------
2 runs: .1201 .1342
3 runs: .1315 .1404
4 runs: .1282 .1309

From this table, which actually extends from 0 to 30+ runs, we can calculate how many runs it would take for the road team to turn a loss into a win.

Case 1: If the road team is tied after 9 innings, it has about a 50% chance of winning. With one additional run, it turns that into 100%. So an additional run in a tie game is worth half a win.

How often is the game tied? Well, the chance of a 2-2 tie is .1202*.1342, or about 1.6%. The chance of a 3-3 tie is .1315*.1404, or 1.8%. Adding up the 2-2 and the 3-3 and the 0-0 and the 1-1 and the 4-4 and the 5-5, and so on all the way down the line, the overall chance is 9.7%.

Case 2: If the road team is down a run after 9 innings, it loses, which is a 0% chance of winning. With one additional run, it's tied, and turns that into a 50% chance. So, an additional run there is also worth half a win.

How often is the road team down a run? Well, the chance of a 3-2 result is .1315*.1342, or about 1.8%. The chance of 4-3 is .1282*.1404, another 1.8%. And so on.

The total: a 9.54% chance the road team winds up losing by one run.

What's the chance that the additional run will give the *home* team the extra half win? We can repeat the calculation, but instead of 3-2, we'll calculate 2-3. Instead of 4-3, we'll calculate 3-4. And so on.

The total: only 8.54%. It makes sense that it's smaller, because the better team is less likely to be behind by a run than ahead by a run.

We'll average the home and road numbers to get 9.04%.

So, we have:

9.7% chance of a tie
9.0% chance of behind one run
----------------------------------------------
18.7% chance that a run will create half a win

Converting that 18.7% chance to R/W:

0.187 half-wins per run
= 5.35 runs per half-win
= 10.7 runs per win

So, we'll use 10.7 runs per win for our calculation.

(Why, by the way, do we get 10.7 runs per win instead of the rule of thumb that it should be 10.0 flat? I think it's becuase the Markov simulation always plays the bottom of the ninth, even when the home team is already up. It therefore includes a bunch of meaningless runs that don't occur in reality. When some of the run currency is randomly useless, it pushes the price of a win higher.

We'd expect that roughly 1/18 of all runs scored are in the bottom of the ninth with the home team having already won. If we discount those by multiplying 10.7 by 17/18, we get ... 10.1 runs per win. Bingo.)

We saw earlier that the home team had an advantage of 0.43 runs per game. Dividing that by 10.3 runs per win, gives us

Predicted: HFA of .42 wins per game (.542)
Actual: HFA of .39 wins per game (.539)

We're off a bit. The difference is about 2 SD. My guess is that the Markov calculation, which is necessarily simplified, is very slightly off, and we only notice because of the huge sample size of almost 100,000 actual games.

-------

OK, now let's do the same thing, but this time for Coors Field only.

I could do the same thing I did for MLB as a whole: split the combined Coors batting line into home and road, and calculate those individually. The problem with that is ... well, if I do that, I'll be getting the Rockies' actual HFA at Coors, which is huge, because it includes all kinds of factors that we're not concerned with, like altitude acclimatization, tailoring of personnel to field, etc.

So, I'm going to try to convert the Coors line into an approximation of what the split would look like if it were similar to MLB as a whole.

Here's that 1980-2022 MLB split from above, except I've added the percentage difference between home and road (on a per-AB basis) below:

AB H 2B 3B HR BB SO
---------------------------------------------------------
home 3209469 846723 161290 19928 95790 321178 612545
road 3363640 859813 163954 17203 96043 308047 668363
---------------------------------------------------------
diff +3.2% +3.5% +21.4% +4.5% +9.3% -3.9%

I'll try to create something similar for 2000 Coors. The overall batting line, for both teams, looked like this:

AB H 2B 3B HR BB SO R/9
---------------------------------------------
Coors 5843 1860 359 56 245 633 933 7.43

Here's my arbitrary split, into Rockies vs. road team, in such a way to keep roughly the same percentage differences as in MLB overall, while also keeping the R/9 roughly 7.43. Here's what I came up with:

AB H 2B 3B HR BB SO
--------------------------------------------------------
home 5843 1884 362 66 249 672 936
road 5843 1826 350 54 238 615 974
--------------------------------------------------------
diff +3.2% +3.4% +22.2% +4.6% +9.3% -3.9%

I ran those through Tango's calculator to get runs per 9 innings:

AB H 2B 3B HR BB SO R/9
---------------------------------------------------------
home 5843 1884 362 66 249 672 936 7.783
road 5843 1826 350 54 238 615 974 7.071
---------------------------------------------------------
avg 7.427
---------------------------------------------------------
diff +.712

Next, I ran the runs-per-game distribution calculation to get a runs-per-win estimate. (I won't go through the details here, but it's the same thing as before: calculate the probability of a tie, then a one-run home win, then a one-run road win, etc.)

The result: 14.37 runs per win.

As expected, that's significantly higher than the 10.7 we calculated for MLB overall. (Adjusting 14.37 for the superfluous bottom-of-the-ninth gives about 13.6, so, if you prefer, you can compare 13.6 Coors to 10.1 overall.)

The difference of .712 runs per game, divided by 14.43 runs per win, gives an HFA of

0.0495 wins per game

Which translates to a home winning percentage of .5495.

Comparing the two results:

.542 home field winning percentage normal
.549 home field winning percentage Coors
-----------------------------------------
.007 difference

The difference of .007 is worth only about half a win per home season. Sure, half a win is half a win, but I'm a little disappointed that's all we wind up with after all this work.

It's certainly not as much of an effect as I thought there would be before I started. Even if you deducted this inherent .007, it would barely make a dent in the Rockies' 150 percentage point difference between Coors and road. The Rockies would still be in first place on the FanGraphs chart by a sizeable margin -- 42 points instead of 49.

Looked at another way, an additional .007 would move an average team from the middle of the 29-year standings, to about halfway to the top. So maybe it's not that small after all.

Still, our conclusion has to be that the Rockies' huge HFA over the years is maybe 10 percent a mathematical inevitability of all those extra runs, and 90 percent other causes.

8 comments:

JGFWednesday, November 16, 2022 11:21:00 PM
Shortcut way of getting to the same place:

MLB total runs per game = 4.79 + 4.36 = 9.15
Coors 2000 total RPG = 2 * 7.43 = 14.86

Coors/MLB = 1.624

Use that scale factor to get
Coors home RPG = 1.624*4.79 = 7.78
Coors road RPG = 1.624*4.36 = 7.08

Then use Pythagenpat with x = RPG^0.287, W/L = (R.home/R.away)^x

MLB: x = 9.15^0.287 = 1.89
Coors: x = 14.86^0.287 = 2.17

For MLB, Wpct = 0.544, so HFA = 0.044
For Coors, Wpct = 0.551, so HFA = 0.051, or +0.007 due to the higher run environment.
JGFWednesday, November 16, 2022 11:22:00 PM
Should have added... I agree with the overall conclusion, of course (that Coors HFA is ~10% higher run environment and ~90% other).
JGFThursday, November 17, 2022 10:34:00 AM
FWIW, when I ran the Coors 2000 numbers through Tango's Markov simulator (I took the defaults on the base advancement probabilities), I got 8.60 R/G.
Phil BirnbaumThursday, November 17, 2022 12:55:00 PM
JGF,

I ran the numbers again and got 7.43 again. Also, I double-checked that I used the correct Coors numbers by going to "Ballparks" here: https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2000#site

Phil BirnbaumThursday, November 17, 2022 1:01:00 PM
JGF,

Your shortcut is nice. Two reasons I did it the long way:

1. I haven't worked with PythagenPat much and wasn't 100% sure that it was sufficiently accurate. This is not to say that it isn't accurate, just that I didn't know whether it is or not.

2. It wasn't entirely obvious to me that the home/road difference in runs would be proportional between Coors and non-Coors. I figured it was safer to use the difference for each hitting category separately. I am actually a bit surprised that it works out so well ... but maybe I shouldn't be.

Phil
DanThursday, November 17, 2022 1:49:00 PM
I'd expect the main reason for Colorado's big HFA to be that a more distinctive stadium gives a team more incentive to specialize their roster to fit their home stadium. A more distinctive stadium increases HFA twice over: the roster will be more specialized, and each bit of roster specialization produces a larger advantage.

So if there's a stadium where it's easy to hit home runs to right field, then a team which plays half its games there can specialize in signing batters who hit a lot of fly balls to right field, and pitchers who don't give up many fly balls to right field. That helps them in their home stadium, which opens up a wide HFA gap. And the easier it is to hit right field home runs in their stadium, 1) the more it will help them and 2) the farther they'll go in building an unusual roster.

Or, to make an example starker & sillier, if one stadium had a special rule that runs count double if they're scored by a player whose last name contains the letter Z, the team that plays there could fill its roster with Z names and win a lot of home games. Big HFA. Whereas, if runs scored by Z names provided a smaller bonus (say, as a tiebreaker in lieu of extra innings), then having a lineup with lots of Z names would be a smaller advantage and the team would be more willing to sign non-Z players, which both mean less HFA.

Coors is the most distinctive stadium, so by this theory I'd expect it to have the biggest HFA. The challenge for testing this is to find the data on how distinctive the Rockies roster has been, and how much that helps at home.
Phil BirnbaumThursday, November 17, 2022 1:54:00 PM
Dan,

That's exactly the post I was planning to write, but I was going to use Scrabble as my example instead of your letter Z. :)

Phil
JGFThursday, November 17, 2022 5:12:00 PM
I ran the numbers again and got 7.43 again. Also, I double-checked that I used the correct Coors numbers by going to "Ballparks" here: https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2000#site

My bad. I had the right numbers (I double checked those against yours), but I must have made a fat finger error entering into the form. That's what I get for not double-checking *that* result (indeed I got 7.43 this time).

Pages

Wednesday, November 16, 2022

Home field advantage is naturally higher in a hitter's park

8 comments: