Sabermetric Research: When log5 does and doesn't work

Team A, with an overall winning percentage talent of .600, plays against a weaker Team B with an overall winning percentage of .450. What's the probability that team A wins?

In the 1980s, Bill James created the "log5" method to answer that question. The formula is

P = (A - AB)/(A+B-2AB)

... where A is the talent level of team A winning (in this case, .600), and B is the talent level of team B (.450).

Plug in the numbers, and you get that team A has a .647 chance of winning against team B.

That makes sense: A is .600 against average teams. Since opponent B is worse than average, A should be better than .600.

Team B is .050 worse than average, so you'd kind of expect A to "inherit" those 50 points, to bring it to .650. And it does, almost. The final number is .647 instead of .650. The difference is because of diminishing returns -- those ".050 lost wins" are what B loses to *average* teams because it's bad. Because A is better than average, it would have got some of those .050 wins anyway because it's good, so B can't "lose them again" no matter how bad it is.

In baseball, the log5 formula has been proven to work very well.

------

There was some discussion of log5 lately on Tango's site (unrelated to this post, but very worthwhile), and that got me thinking. Specifically, it got me thinking: log5 CANNOT be right. It can be *almost* right, but it can never be *exactly* right.

In the baseball context, it can be very, very close, indistinguishable from perfect. But in other sports, or other contexts, it could be way wrong.

Here's one example where it doesn't work at all.

Suppose that, instead of actually playing baseball games, teams just measured their players' average height, and the taller team wins. And, suppose there are 11 teams in the league, and there's a balanced 100-game season.

What happens? Well, the tallest team beats everyone, and goes 100-0. The second-tallest team beats everyone except the tallest, and winds up 90-10. The third-tallest goes 80-20. And so on, all the way down to 0-100.

Now: when a .600 team plays a .400 team, what happens? The log5 formula says it should win 69.2 percent of those games. But, of course, that's not right -- it will win 100 percent of those games, because it's always taller.

For height, the log5 method fails utterly.

------

What's the difference between real baseball and "height baseball" that makes log5 work in one case but not the other?

I'm not 100% sure of this, but I think it's due to a hidden, unspoken assumption in the log5 method.

When we say, "Team A is a .600 talent," what does that mean? It could mean either of two things:

-- A1. Team A is expected to beat 60 percent of the opponents it plays.

-- A2. If Team A plays an average team, it is expected to win 60 percent of the time.

Those are not the same! And, for the log5 method to work, assumption A1 is irrelevant. It's assumption A2 that, crucially, must be true.

In both real baseball and "height baseball," A1 is true. But that doesn't matter. What matters is A2.

In real baseball, A2 is close enough. So log5 works.

In "height baseball," A2 is absolutely false. If Team A (.600) plays an average team (.500), it will win 100 percent of the time, not 60 percent! And that's why log5 doesn't work there.

-------

What it's really coming down to is our old friend, the question of talent vs. luck. In real baseball, for a single game, luck dwarfs talent. In "height baseball," there's no luck at all -- the winner is just the team with the most talent (height).

Here are two possible reasons a sports team might have a .600 record:

-- B1: Team C is more talented than exactly 60 percent of its opponents

-- B2: Team C is more talented than average, by some unknown amount (which varies by sport) that leads to it winning exactly 60 percent of its games.

Again, these are not the same. And, in real life, all sports (except "height baseball") are some combination of the two.

B1 refers completely to talent, but B2 refers mostly to luck. The more luck there is, in relation to talent, the better log5 works.

Baseball has a pretty high ratio of luck to talent -- on any given day, the worst team in baseball can beat the best team in baseball, and nobody bats an eye. But in the NBA, there's much less randomness -- if Philadelphia beats Golden State, it's a shocking upset.

So, my prediction is: the less that luck is a factor in an outcome, the more log5 will underestimate the better team's chance of winning.

Specifically, I would predict: log5 should work better for MLB games than for NBA games.

--------

Maybe someone wants to do some heavy thinking and figure how to move this forward mathematically. For now, here's how I started thinking about it.

In MLB, the SD of team talent seems to be about 9 games per season. That's 90 runs. Actually, it's less, because you have to regress to the mean. Let's call it 81 runs, or half a run per game. (I'm too lazy to actually calculate it.) Combining the team and opponent, multiply by the square root of two, to give an SD of around 0.7 runs.

The SD of luck, in a single game, is much higher. I think that if you computed the SD of a single team's 162 runs-scored-that-game, you'd get around 3. The SD of runs allowed is also around 3, so the SD of the difference would be around 4.2.

SD(MLB talent) = 0.7 runs
SD(MLB luck) = 4.2 runs

Now, let's do the NBA. From basketball-reference.com, the SD of the SRS rating seems to be just under 5 points. That's based on outcomes, so it's too high to be an estimate of talent, and we need to regress to the mean. Let's arbitrarily reduce it to 4 points. Combining the two teams, we're up to 5.2 points.

What about the SD of luck? This site shows that, against the spread, the SD of score differential is around 11 points. So we have

SD(NBA talent) = 5.2 points
SD(NBA luck) = 11.0 points

In an MLB game, luck is 6 times as important as talent. In an NBA game, luck is only 2 times as important as talent.

But, how you apply that to fix log5, I haven't figured out yet.

What I *do* think I know is that the MLB ratio of 6:1 is large enough that you don't notice that log5 is off. (I know that from studies that have tested it and found it works almost perfectly.) But I don't actually know whether the NBA ratio of 2:1 is also large enough. My gut says it's not -- I suspect that, for the NBA, in extreme cases, log5 will overestimate the underdog enough so that you'll notice.

-------

Anyway, let me summarize what I speculate is true:

1. The log5 formula never works perfectly. Only as the luck/talent ratio goes to infinity, will log5 be theoretically perfect. (But, then, the predictions will always be .500 anyway.) In all other cases, log5 will underestimate, to some extent, how much the better team will dominate.

2. For practical purposes, log5 works well when luck is large compared to talent. The 6:1 ratio for a given MLB game seems to be large enough for log5 to give good results.

3. When comparing sports, the more likely it is that the more-talented team beats the less-talented team, the worse log5 will perform. In other words: the bigger the Vegas odds on underdogs, the worse log5 will perform for that sport.

4. You can also estimate how well log5 will perform with a simple test. Take a team near the extremes of the performance scale (say, a .600/.400 team in MLB, or a .750/.250 team in the NBA), and see how it performed specifically against only those teams with talent close to .500.

If a .750 team has a .750 record against teams known to be average, log5 will work great. But if it plays .770 or .800 or .900 ball against teams known to be average, log5 will not work well.

-------

All this has been mostly just thinking out loud. I could easily be wrong.

Labels: baseball, basketball, log5, luck, statistics, talent

9 Comments:

At Thursday, January 07, 2016 3:49:00 PM, Zach said...: Good stuff Phil.
At Thursday, January 07, 2016 5:54:00 PM, Phil Birnbaum said...: Thank you!
At Saturday, February 20, 2016 5:58:00 AM, Grandpa S said...: This is thinking out loud too...

When we say “If Team A plays an average team, it is expected to win 60 percent of the time,” what do we mean by “average team”? The obvious answer is that an average team is a .500 team. But if Team A is a .600 team, then its average opponent is something less than a .500 team because Team A has the scheduling benefit of never having to play itself. So Team A gets to a .600 record by playing a slightly less than .500 opponent schedule. So then if Team A played all of its games against .500 teams we would expect that it would win slightly less than 60% of its games. But then if it did, is it still a .600 team? Or is it a .590~ (talent) team with a .600 record? It has always seemed to me that when we talk about log5, a team with a .600 record playing a team with a .400 record is a different thing than a team with .600 talent playing a team with .400 talent, and that we should somehow account for this when evaluating the method. Perhaps this is why (or part of the reason why) "log5 will underestimate, to some extent, how much the better team will dominate." Because a team’s record slightly overstates its talent as compared to an average (.500) team.
At Saturday, February 20, 2016 10:17:00 PM, Marjorie R. Bennett said...: The Soccer carries a very substantial rate involving good luck for you to ability -- in just about any granted morning, your worst type of crew throughout soccer could overcome the top crew throughout soccer, along with no-one bats a close look.
At Saturday, April 30, 2016 4:16:00 PM, Unknown said...: Scott Segrin has it exactly right. Consider a 2 team league, one with a record of 0.600, the other with a record of 0.400. Obviously the 0.600 team is going to beat the 0.400 team 60% of the time.

The log5 method works for an infinite number of teams, but you've got to adjust for the number of teams on the schedule.
with 10 teams in a league, the team you are considering will win (log5 comp ratio)^(N-9/N-10)

Thus, with the 2 team leage example, the 600 team will win (36/16)^1/2, or a ratio of 6 to 4. That's assuming everyone plays everyone else the same number of games. With different amounts of games played due to divisions, it will be more complicated.
At Monday, June 13, 2016 4:36:00 PM, Unknown said...: The logistic equation for chess can work for any sport. Consider a 4 team legue.
Team A 7-3
Team B 6-4
Team C 4-6
Team D 3-7

The log normal distribution reads Probability of worse team winning = 1/( 1 + (10^x))
where X is a number giving the difference in power ratings.

Team A opponents win 30% of the time , and 30% = 1(( 1 + (10^.368)) So tram A has a rating 0.368.

Likewise, team D has a rating of minus 0.368. Team D should beat team A
1/( 1 + (10^0.736)))= 0.155 of the time.

For two evenly matched teams, the "weaker" team is 1/(1 + (10^0)) = 1/(1+1)= 0.5
At Tuesday, June 14, 2016 10:54:00 AM, Unknown said...: I see I made the same mistake as I pointed out the log5 method made. I've got to regress that 0.368 toward the mean by multiplying it by (n-1)/n where n is the number of teams in the league. rather than 0.368, the figure should have been 3/4(0.368) = 0.276 with a 4 taam league. That means team D beats team A 0.219 of the time
At Monday, June 29, 2020 4:51:00 AM, sk said...: You are confusing many things here. The example with the height of players. You are creating a game where statistics and probability are irrelevant. If you already know the taller team will win every game against a shorter team, why are you trying to predict the outcome? Log5 is for when winning and losing is part of a probability distribution that measures past performance. There is no statistical guarantee that a team will win or lose, except for the case when a 1.000 team plays a .000 team - and Log5 works here. It will never predict outcome is guaranteed, just that the probability of a correct estimation of the outcome is high. Like a political poll: 95% of the time, the poll is accurate within the margin of error; the other 5% of the time, the poll is wrong.
At Thursday, July 23, 2020 12:20:00 AM, Unknown said...: I like the height baseball analogy because it does show the point you’re trying to make, but I don’t think it works in the way you’re saying it does.

We could make it so A2 is partially false and go from there, so that a 1 inch difference makes it so the taller team wins 99% of the time, instead of 100. In that scenario, if A1 is true that’s fine, but then A2 would mean that the percentages to put in the log5 formula would be 0.99 (in place of A) and 0.5 (in place of B, to represent the “average” team). And a team 2 inches taller than average, (and thus one inch taller than the team with a 0.99 rating, would have a 0.9999 rating). This means they could beat the team with the 0.99 rating 99% of the time! And beat the average team 99.99% of the time according to the formula.
So my point in all of this is that the ratings wouldn’t be 0.600 and 0.700 for the taller-than-average teams; they would be 0.99 and 0.9999. The 0.99 team would beat 60% of its opponents (aka your A1), but not your A2.

However in baseball and basketball, the points and runs are not the same; a 5-run win in baseball is much more impressive than a 5-point win in basketball for example. I think what you need to do is look at their standard deviation overall and then convert that into a regular winning percentage. So a team that scores 1 standard deviation higher than average tends to win about 84.1% of the time. If you want to separate the talent and luck with each team that’s fine, I just don’t think it has anything to do with the log5 formula.
All that really matters is how often each team wins, and I understand and agree with your height baseball analogy, but I think it’s better with my interpretation above.

<< Home

Sabermetric Research

Thursday, January 07, 2016

When log5 does and doesn't work

9 Comments:

About Me

Previous Posts