Monday, December 03, 2018

Answer to: a flawed argument that marginal offense and defense have equal value

The puzzle from last post was this:  What's wrong with this argument, that a run scored has to be worth exactly the same as a run prevented?

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

The answer is hinted at by a comment from Matthew Hunt:


"Is it the zero lower bound for runs? You can always increase the number of offensive runs, but you can't hold an opponent to -1 runs."

It's not specifically the zero lower bound -- the argument is wrong even if shutouts are rare -- but it does have to do with the issue of runs prevented.

-------

(Note: for this post, I'm going to treat runs as if they have a Poisson distribution, to make the argument smoother. In reality, runs in baseball come in bunches, and aren't Poisson at all. If that bothers you, just transfer the argument to hockey or soccer, where goals are much closer to Poisson.)

------

The answer, I think, is this:  If you want to properly remove a single opponent's run from a season, you don't do it by choosing a random game. You have do it by choosing a random *run*.

When you *add* runs, it's OK to do it by choosing a game first, because all games have roughly equal opportunities to score more runs. But when you *remove* runs, you have to remove a run that's already there ... and you have to weight them all equally when deciding which one to remove.

If you don't weight them the runs equally ... well, suppose you have game A with ten runs, and game B with two runs. If you choose a random game first, each B run has five times the chance of being chosen as each of the A runs. 

Here's another way of looking at it. Suppose you randomly allocate 700 runs among 162 games, and then you realize you made a mistake, you only meant to allocate 699 runs. You'd look up the 700th run you added, and reverse it. 

But, that 700th run is more likely to come from a high-scoring game than a low-scoring game. Why? Because, before you added the last run, the game you were about to add it to was as average as the 161 other games. But after you add the run, that game must now be expected to be one run more than average. (Actually, 699/700 more, but close enough).

So, if you removed a 700th run by choosing a random game first, you'd be choosing it from an expected average game, not an expected above-average-game. And so your distribution will be more bunched up than it should be, and it would no longer be the same as the distribution would be if you just stopped at 699 runs.

And, of course, you might randomly choose a shutout, which brings that game's runs to -1, proving more obviously that your distribution is wrong.

You don't actually have to reverse the 700th run ... there's nothing special about that one compared to the other 699. You can pick the first run, or the 167th run, or a random run. But you have to choose a particular run without regard to the game it's in, or any other context.

------

Why does a random run have a different value from a run from a random game? 

Because the probabilities change. 

For one thing, you're now much less likely to choose a game where you only allowed one run. You probably won those games anyway, so those runs are less valuable than average. Since you choose less valuable runs less often than before, the value of the run goes up.

But, for another thing, you're now much more likely to choose a game where you gave up a lot of runs. You probably lost those games anyway, so the saved run again probably wouldn't help; you'd just lose 8-3 instead of 9-3. Since you're more likely to choose these less-valuable runs than before, the value of the run goes down.

So some runs where the value is low, you're more likely to choose. Others, you're less likely to choose. Which effect dominates? I don't think we can decide easily from this line of thinking alone. We'd have to do some number crunching.

If we did, we'd find out (as the other argument proved) that "choose a run instead of a game" makes runs prevented more valuable when you already score more than you allow, but less valuable when you allow more than you score. 

But, I don't see a way to prove that from this argument. If you do, let me know!

-------

Finally, let me make one part of the argument clearer. Specifically, why is it OK to pick a random game when adding a run *scored*, but not when subtracting a run *allowed*? Shouldn't it be symmetrical?

Actually, it *is* symmetrical.

When you add a run, you're taking a non-run and changing it to a run. Well, there are so many occurrences of non-runs that they're roughly equal in every game. If you think about changing an out to a run, every game has roughly 27 outs, so every game is already equal.

If you think about hockey ... say, every 15-second interval has a chance of a goal. That's 240 segments per game. In a two-goal game, there are 238 non-goal segments that can be converted into a goal. In a 10-goal game, there are only 230 segments. But 230 is so closer to 238 that you can treat them as equal.*

(* In a true Poisson distribution, they're exactly equal, because you model the game as an infinite number of intervals. Infinity minus 2 is equal to infinity minus 10.)

When you subtract a run ... the process is symmetrical, but the numbers are different. A two-goal game has only two chances to convert a goal to a non-goal, while a ten-goal game has ten -- five times as many. Instead of a 230:238 ratio, you have a 2:10 ratio. The 2 and 10 aren't close enough to treat as equal.

In theory, the two cases are symmetrical in the sense that both are wrong. But, in practice, choosing goals scored by game is wrong but close enough to treat as right. Choosing goals allowed by game is NOT close enough to treat as right.

The fact that goals are rare compared to non-goals is what makes the difference. That difference is why the statistics textbooks say that Poisson is used for the distribution of "rare events."  

Goals are rare events. Non-goals are not.



Labels: ,