Friday, February 28, 2020

Park adjusting a player's batting line has to depend on who the player is

Suppose home runs are scarce in the Astrodome, so that only half as many home runs are hit there than in any other park. One year, Astros outfielder Joe Slugger hits 15 HR at the Astrodome. How do you convert that to be park-neutral? 

It seems like you should adjust it to 30. Take the 15 HR, double it, and there you go. 

But I don't think that works. I think if you do it that way, you overestimate what Joe would have done in a normal park. I think you need to adjust Joe to something substantially less than 30.

------

One reason is that Joe might not necessarily be hurt by the park as much as other players. Maybe the park hurts weaker hitters more, the kind who hit mostly 310-foot home runs. Maybe Joe is the kind who hits most of his fly balls 430 feet, so when the indoor dead air shortens them by 15 feet, they still have enough carry to make it over the fence.

It's almost certain that some players should have different park factors than others. Many parks are asymmetrical, so lefties and righties will hit to different outfield depths. Some parks may have more impact on players more who hit more line drive HRs, and less impact on towering fly balls. And so on.

I suspect that's actually a big issue, but I'm going to assume it away for now. I'll continue as if every player is affected by the park the same way, and I'll assume that Joe hit exactly 15 HR at home and 30 HR on the road, exactly in line with expectations.

Also, to keep things simple, two more assumptions. First I'll assume that the park factor is caused by distance to the outfield fence -- that the Astrodome is, say, 10 percent deeper than the average park. Second, I'll assume that in the alternative universe where Joe played in a different home park, he would have hit every ball with exactly the same trajectory and distance that he did in the Astrodome.

My argument is that with these assumptions, the Astros overall would have hit twice as many HR at home as they actually did. But Joe Slugger would have hit *fewer* than twice as many.

------

Let's start by defining two classes of deep fly balls:

A: fly balls deep enough to be HR in any park, including the Astrodome; 
B: fly balls deep enough to be HR in any park *except* the Astrodome.

We know that, overall, class A is exactly equal in size to class B, since (A+B) is exactly twice A.

That's why, when we saw 15 HR in class A, we immediately assumed that implies 15 HR in class B. And so we assumed that Joe would have hit an extra 15 HR in any other park.

That seems like it should work, but it doesn't. Here's a series of analogies that shows why.

1. You have a pair of fair dice. You expect them to come up snake eyes (1-1) exactly as often as box cars (6-6). You roll the dice 360 times, and find that 1-1 came up 15 times. 

Since 6-6 comes up as often as 1-1, should you estimate that 6-6 also came up 15 times? You should not. Since the dice are fair, you expect 6-6 to have come up 1 time in 36, or 10 times.* The fact that 1-1 got lucky, and came up more often, doesn't mean that 6-6 must have come up more often.

(*Actually, you should expect that 6-6 came up only 9.86 times, since there are 5 fewer tosses left for 6-6 after taking out the successful 1-1s. But never mind for now.)

2. You have a pair of fair dice, and an APBA card. On that card, 1-1 is a home run, and 6-6 represents a home run anywhere except the Astrodome.

You roll the dice 360 times, and 1-1 comes up 15 times. Do you also expect that 6-6 came up 15 times? Same answer: you expect it came up only 10 times. The fact that 1-1 got lucky doesn't mean that 6-6 must also have gotten lucky.

3. You have a simulation game, with some number of fair dice, and a card for Joe Slugger. You know the probability of Joe hitting an Astrodome HR is equal to the probability of Joe hitting an "anywhere but Astrodome" HR.  But that probability -- Joe's talent level -- isn't necessarily 1 in 36.

You play a season's worth of Joe's home games, and he hit 15 HR. Can you assume that he also hit 15 "anywhere but Astrodome" HR? 

Well, in one special case, you can. If the 15 HR was Joe's actual expectation, based on his talent -- that is, his card -- then, yes, you can assume 15 near-HR. 

But, in all other cases, you can't. If Joe's 15 HR total was lucky, based on his talent, you should assume fewer than 15 near-HR. And if the 15 HR was unlucky, you should assume more than 15 near-HR.

------

So I think you can't park adjust players via the standard method of multiplying their performance by their park factor. The park adjustment has to be based on their *expected* performance, not their observed performance.

Suppose Joe Slugger, at the beginning of the season, was projected by the Marcel system to hit 10 HR at home. That means that he was expected to hit 10 HR at the Astrodome, and 10 "almost HR" at the Astrodome.

Instead, he winds up hitting 15 HR there. But we still estimate that he hit only 10 "almost HR". So, instead of bumping his 15 HR total to 30, we bump it only to 25.

-------

I was surprised by this, that there's no way to convert the Astrodome to a normal park that doesn't require you to estimate the player's talent. 

But here's what surprised me even more, when I worked it out: you only need to know the player's talent when you're adjusting from a pitchers' park. When you adjust from a hitters' park, one formula works for everyone!

Let's take it the other way, and suppose that Fenway affords twice as many home runs as any other park. And, suppose Joe Slugger, now with the Red Sox, hits 40 at Fenway and 20 on the road.

How many would he have hit if none of his games were at Fenway?

Well, on average, half of his 40 HR would have been HR on the road. So, that's 20. End of calculation. 

It doesn't matter who the batter is, or what his talent is -- as long as we stick to the assumption that every player's expectation is twice as many HR at Fenway, the expectation is that half his Fenway HR would also have been HR on the road.

(In reality, it might have been more, or it might have been less, since the breakdown of the 40 HR is random, like 40 coin tosses. But the point is, it doesn't depend on the player.)

-------

If you're not convinced, here's a coin toss analogy that might make it clearer.

We ask MLB players to do a coin toss experiment. We give them a fair coin. We tell them, take your day of birth, multiply it by 10, toss the coin that many times, and count the heads. Then, toss the coin that many times again, but this time, count the number of tails.

For the Fenway analogy: heads are "Fenway only" HR. Tails are "any park" HR.

We ask each player to come back and tell us H+T, the total number of Fenway HR. We then try to estimate the heads, the number of "Fenway only" HR.

That's easy: we just assume half the number. Mathematically, the expectation for any player, no matter who, is that H will be half of (H+T). That's because no matter how lucky or unlucky he was, there's no reason to expect he was luckier in H than T, or vice-versa.

Now, for the Astrodome analogy. Heads are "Any park including Astrodome" HR. Tails are "other park only" HR.

We ask each player to come back and tell us only the number of heads, which is the the Astrodome HR total. We'll try to estimate tails, the non-Astrodome HR total.

Rob Picciolo comes back and says he got 15 heads. Naively, we might estimate that he also tossed 15 tails, since the probabilities are equal. But that would be wrong. Because, we would check Baseball Reference, and we would see that Picciolo was born on the 4th of the month, not the 3rd. Which means he actually had 40 tosses, not 30, and was unlucky in heads.

In his 40 tosses for tails, there's no reason to expect he'd have been similarly unlucky, so we estimate that Picciolo tossed 20 tails, not 15.

On the other hand, Barry Bonds comes back and says he got 130 heads. On average, players who toss 130 heads would also have averaged about 130 tails. But Barry Bonds was born on the 24th of July. We should estimate that he tossed only 120 tails, not 130.

For Fenway, when we know the total number of heads and tails, the player's birthday doesn't factor into our estimate of tails. For the Astrodome, when we know only the total number of heads, the player's birthday *does* factor into our estimate of tails.

-------

So, when Joe Slugger plays 81 games at the Astrodome and tosses 15 home run "heads," we can't just expect him to have also tossed 15 long fly ball "tails". We have to look up his home run talent "date of birth". If he was only born on the 2nd of the month, so that we'd have only expected him to hit 10 HR "heads" and 10 near-HR "tails" in the first place, then we estimate he'd have hit only 10 neutral-park HR, not 15. 

If we don't do that -- if we don't look at his "date of birth" talent and just double his actual Astrodome HR -- our estimates will be too high for players who were lucky, and too low for players who were unlucky. 

Obviously, players who were lucky will have higher totals. That means that if we park-adjust the numbers for the Astros every year, the players who have the best seasons will tend to be the ones we overadjust the most. In other words, when a player was both good and lucky, we're going to make his good seasons look great, his great seasons look spectacular, and his spectacular seasons look like insane outliers. When a player is bad and unlucky, his bad seasons will look even worse.

But if we park-adjust the Red Sox every year ... there's no such effect, and everything should work reasonably well.

My gut still doesn't want to believe it, but my brain thinks it's correct. 

Well, my gut *didn't* want to believe it, when I wrote that sentence originally. Now, I realize that the effect is pretty small. When a player gets lucky by, say, 20 runs, with a season park factor of 95 ... well, that's only 1 run total. My gut is more comfortable with a 1-run effect.

But, suppose you're adjusting a Met superstar, trying to figure out what he'd hit in Colorado. Runs are about 60 percent more abundant in Coors Field than Citi Field, which means the park factor is around 30 percent higher. If the player was 20 runs lucky in that particular season, you'd wind up overestimating him by 6 runs, which is now worth worrying about.




-------

(Note: After writing this, but before final edit, I discovered that Tom Tango made a similar argument years ago. His analysis dealt with the specific case where the player's observed performance matches his expectation, and for that instance I have reinvented his wheel, 15 years later.)


Labels: , , ,