How often does the best team win the pennant?
There are 30 teams in Major League Baseball. What is the chance that the best team will wind up winning the most regular season games?
According to this article in Science News, the answer is 30%. That figure comes from a new study by (among other, unnamed authors) Eli Ben-Naim, a New Mexico physicist.
I don't know how Ben-Naim came up with that result, because I couldn't find the study referred to in the article. There were references listed, but they're not much help. One is a study I reviewed a couple of weeks ago, which used an oversimplified model of team talent. It didn't deal with the probability that the best team wins.
In the other Ben-Naim study referred to in the article, the author again uses his oversimplified model (in which every game is won by the underdog with fixed probability, regardless of the actual relative talent of the two teams). He argues that if you want to ensure that the best team wins the most games, you need to play a huge number of games (on the order of N cubed, where N is the number of teams). However, he finds, you can reduce the number of games considerably if you play a series of round-robins, where each round robin eliminates a certain percentage of the teams remaining.
It's an interesting study, using only analytical math (no empirical observations or simulations). But it doesn't address the question of how often the best team wins, at least as far as I could tell.
So where does the 30% come from? Either Ben-Naim did a one-off calculation for the reporter, or there's a forthcoming study.
However, there are at least two existing sabermetric studies addressing this question, studies that Ben-Naim does not reference in his two articles.
In the "1989 Baseball Abstract" (by Brock Hanke and Rob Wood, self-published), there's a guest article by Bill James entitled "How Often Does the Best Team Win the Pennant?" James simulated 2,000 seasons and found that the best team in baseball won its division 72% of the time. (That's back in olden times when there were only four divisions.)
As a rough approximation, if the best team wins its division 72% of the time, it should win *all four* divisions with probability .72 to the power of 4. That's 27%, impressively consistent with Ben-Naim's 30% figure.
And in a BTN article in May, 2000 (see page 15), Rob Wood found that, in a 12-team league with a .060 standard deviation of talent, the best team wins the pennant 52% of the time. If you assume that the best team would also have won the *other* league 52% of the time, the chance is 27% (.52 squared) that the best team wins the most games in both leagues. Again very consistent.
Ben-Naim is also quoted as saying that if you want to raise the probability from 30% up to 90%, it takes a huge number of games -- 15,000 per team.
To check that, I ran a simulation. I assigned random talents of the 30 teams, from a normal distribution with mean .500 and standard deviation .060 (about 9.7 games over 162). I then compared the top two teams and figured the binomial probability that the top team would beat the second after 15,000 games. I repeated this for 10,000 seasons.
(Whether or not the best team wins depends, mostly, on the difference in talent between the best two teams. That's because 15,000 games is enough time for the SD of the luck separating them to shrink to about 1 win per 162 games. If the best team is, say, a 97-win talent while the second-best team is only a 94-game talent, the chance of the 94 beating the 97 is more than 3 SDs, which is effectively zero.
However, if the best team is only fractionally better than the second-best – say, 96.2 to 96.1 – that's only one-tenth of an SD, and the best team has almost a 50% chance of losing.
So, most of the time, the best team wins easily. But a small fraction of the time, the runner-up in talent is good enough to give the best team a run for its money.)
As it turned out, over 15,000 games, the best team won 93% of the time, again consistent with Ben-Naim's claims. You can't get closer than "consistent" because my simulation was oversimplified. Specifically:
-- I didn't play teams against each other; rather, I played 30 indpendent binomial schedules. This reduces luck (since "upsets" affect only one team instead of two), and would tend to inflate my percentage.
-- I considered only the top two teams in talent. It's possible that the number 3 team might also be close to the others, and manage to beat them out. Ignoring that third team would also tend to inflate my percentage.
-- I used a standard deviation of .060. This is the actual SD of team talent. However, the distribution is tighter on the top end of teams than the bottom (as Bill James pointed out in the Blue Jays comment of the 1984 Abstract). Perhaps I should have used .070 for the bottom teams, and .050 for the top teams, or something. Again, my decision not to do that inflated the observed percentage. (Quick check: dropping the SD to .050 does reduce the frequency, but only to 92%).
So, overall, Ben-Naim's numbers look reasonable. I look forward to seeing his methods when his paper comes available.
(Hat Tip: Tangotiger)
UPDATE: The 30% was indeed a one-off calculation, so there is no forthcoming paper. Ben-Naim kindly visited to let us know. See the first comment.