Another case where log5 works perfectly
Here's another case where log5 works perfectly: sudden death foul shooting.
Two players each take a foul shot. The player who makes his shot wins, if the other misses. If both players sink the shot, or both players miss, they each take another one. This continues until there's a winner.
Suppose player A is an 80% shooter, and player B is a 40% shooter. The chance that A makes and B misses is .800 multiplied by (1 - .400), which works out to .480. The chance that B makes and A misses is (1 - .800) multiplied by .400, which is .080.
So, A beats B 480 times for each 80 times that A loses to B. So, A's odds of winning are 6:1.
And that's exactly what log5 predicts. A's make ratio is 4:1, and B's make ratio is 2:3. Divide 4/1 by 2/3 and you get 6/1, which is the right answer.
Actually, that's a bit of a cheat. The log5 formula isn't based on chances of making a foul shot; it's based on the chance of beating a .500 player.
We can fix that problem. Consider player C, who happens to hit exactly 50% of foul shots. I won't do the calculaton again, but you can easily figure out that player A beats player C 80% of the time, but player B beats player C only 40% of the time.
So, player A is an .800 player against a .500 player, and player B is a .400 player against a .500 player.
We're not quite done yet. Player C was defined as one who hits 50% of foul shots, not one that wins 50% of games. They're not the same thing. We can fix that by just assuming the league average player is both a .500 player and a 50% shooter. That seems arbitrary, but I'm just trying to come up with an example of when log5 works, so arbitrary is fine.
But, actually, we don't need that assumption.
I've been saying all along that for talent, you need to use the expected odds ratio against a .500 team. But, actually, you don't need to be that specific. You can actually use the expected odds ratio against *any* other team (as long as it's the same other team for both sides).
So, it doesn't matter if player C is a .500 player, or a .600 player, or a .979 player. If you know A beats him X% of the time, and B beats him Y% of the time, you can just use X and Y in the log5 formula and it'll still work.
(Why does that work? Because the odds ratio against any given player is always a fixed multiple of the odds ratio against any other given player. So it doesn't matter whether you calculate A's odds ratio over B as a/b, or xa/xb -- it comes out the same regardless of x.)
That means that the log5 calculation using .800 (A's record against C) and .400 (B's record against C) is valid. The log5 formula works perfectly for sudden-death foul shooting.
What if we change the game a little, by extending it to two tries instead of one? This time, each player takes two shots, and whoever makes more wins the game. (Again, if it's a tie, repeat the game with two more shots each.)
If I've done my calculations right ... log5 does NOT work for this new game.
Assuming shots are independent, these are the probabilities of the three players hitting 2, 1, and 0 shots:
0-for-2 1-for-2 2-for-2
A (.800) .04 .32 .64
B (.400) .36 .48 .16
C (.500) .25 .50 .25
From that, we can calculate the exact probability that A beats B on the first two shots. It works out to 65.28%:
A wins 2-0 .64 * .36 = .2304
A wins 2-1 .64 * .48 = .3072
A wins 1-0 .32 * .36 = .1152
The chance that B beats A works out to 7.68 percent:
B wins 2-0 .16 * .04 = .0064
B wins 2-1 .16 * .32 = .0512
B wins 1-0 .48 * .04 = .0192
For every 6,528 games that A wins, B wins only 768 games. That's an odds ratio of exactly 8.5:1.
Now, bring C into the picture. I won't repeat all the calculations, but instead of eight-and-a-half, log5 gives an estimate of eight-and-three-elevenths:
Odds ratio of A over C: 56:11
Odds ratio of B over C: 24:39
Odds ratio of A over B: 2184:264 = 8.2727:1
So, in this case, log5 fails:
log5 estimate of A over B: 8.2727:1
Correct odds of A over B: 8.5000:1
The log5 estimate works out to a winning percentage of .892. The correct win probability is .895. It's close, but it's still wrong.
So, why doesn't log5 work here, and in Tango's case?
Because: it's a known result (hat tip: Ted Turocy) that for log5 to work, scores have to follow a certain, specific distribution. In most sports, they don't. How well log5 works depends on how well the real-life distribution of scores follows the assumed, theoretical distribution.
I'll get to that (finally) next post.
(Previous log5 posts: One, and Two)
(Updated 9/15 to remove incorrect reference to "height baseball.")