Sabermetric Research: An oversimplified competitive balance study

A study in last November's JQAS, called "Parity and Predictability of Competitions," tries to figure out which sports give the underdog the best chance of winning a given game.

The authors, Eli Ben-Naim et al, start with a simplified model. Given any two teams, the worse team has a probability "q" of beating the better team. The variable q is obviously less than 1/2, but doesn't vary based on the skills of the teams. So a .200 team has the same probability of defeating a .770 team as a .480 team has of beating a .520 team.

Given that model, the authors then find the value of q that makes the model come closest to the historical distribution of winning percentages in each sport. The results are:

.459 FA soccer
.413 MLB baseball
.383 NHL hockey
.316 NBA basketball
.309 NFL football

Their conclusion: soccer is the most competitive sport, while NFL football is the least competitive.

To check these predictions against the empirical results, the study looks at actual historical game results to see if the underdog winning percentages match the ones above. But which team is the underdog? Instead of choosing the underdog based on the full season's eventual record, the authors choose it based on the season's record to date. That means that on the third day of the season, if the Yankees are 1-1 but the Devil Rays are 2-0, the Rays are still the favorite. Using that process, the authors find that the model underestimates the chance of the underdog winning. For MLB, to take one example, the actual is .443, significantly higher than the .413 predicted.

The authors note this systematic bias, but they don't explain it. They do mention, in the same paragraph, that there have been changes in "q" over the eras, but it's unclear whether they think that explains the bias or not.

My feeling is that the bias is caused by the oversimplified model. First, q is not the same for all games. Second, the study's method of choosing the underdog in early-season games adds randomness and causes the "actual" number to be too high. Finally, home field advantage has a large effect on which team is actually the underdog, and that, too, would make the "actual" numbers turn out too high.

The authors briefly address all three issues. They say that if they look at only games in the second half of the season, "remarkably," this changes the upset frequency by less than .007. (I agree that this is remarkable; I would have expected a lot bigger a difference.) Also, when they ignore all games where the teams' winning percentages are less than .050 apart, the upset frequency changes by "less than .005." Again, I would have expected a larger difference. Finally, they mention home field advantage but make no predictions about its effects.

In a subsequent section of the paper, the authors run the same calculations on team's all-time records, calculating q so that the theoretical all-time winning percentages match those actually observed. I'm not sure this is a good idea at all – team X might have a higher 50-year winning percentage than team Y, but that doesn't mean that it's possible to know which was the underdog when X and Y played each other in May of 1953. But the authors nonetheless conclude that

"The fact that similar trends for the upset frequency emerge from game records as do from all-time team records indicate that the relative strengths of clubs have not changed considerably over the past century."

That doesn't make much sense to me.

In any case, there are better methods for figuring out the relative single-game competitiveness levels of different sports. One much easier way, that I suggested in my review of "The Wages of Wins," is to simply look at home field advantage. Start with the assumption that HFA is, in some physical sense, the same for every sport (as Tango once suggested in a comment somewhere that I can't find, maybe every athlete performs 1% better at home, and the rules of the sport turn that 1% into an increased chance of victory). If you look a very large number of home games, the skill would even out in all but home field advantage. In that case, the more likely the home team wins, the more likely the *better* team wins ("better" by an average of the HFA). This method avoids all the problems of this study – failing to account for HFA, failing to account for different team skills, and failing to accurately note which team is actually better.

Indeed, the home field advantage in the NBA is much higher than in MLB. I'd bet that if you ranked all the sports, the results would look roughly the same as the chart above.

Another method is to figure out the numbers directly. Start with Tango's method to figure out the league's talent distribution. For instance, in MLB, the SD of actual team wins is about 11.6. The theoretical SD due to luck is 6.3 games. Therefore, the SD of team talent is 9.7 (the square root of 11.6 squared minus 6.3 squared).

That means that the talent difference between two randomly selected teams is about 14 games (9.7 times the square root of 2). That's about .080. So the underdog team should win about 42% of games. (You can use log5 to be a bit more accurate if you like, but I think it's still about 42%.)

Again, if you repeat this for all five sports, I'd be willing to bet you'd get results similar to the chart found in this JQAS study: similar, but without the flaws resulting from the oversimplified method.

Labels: competitive balance, distribution of talent

6 Comments:

At Tuesday, September 25, 2007 11:10:00 AM, Anonymous said...: I really like the way Tango presented his data. Here are the length of seasons that would be equivalent in determining the talent-level of teams:

MLB: 162
NHL: 84
NBA: 33
NFL: 28

Those line up pretty well with the other results. It's interesting that compared with MLB and the NHL, the NBA only needs to play 40% of its current season and the NFL should play 75% more games. The NBA champion, outside of injuries and one-off matchup challenges, should pretty much always be the best team.
At Tuesday, September 25, 2007 11:13:00 AM, Phil Birnbaum said...: I always thought the NBA should just reduce its games from 48 minutes to 24 minutes, and play only doubleheaders (so the fans get their money's worth).

But cutting the season by 3/5 works OK for me too ... I'm not a big basketball fan. :)
At Tuesday, September 25, 2007 1:28:00 PM, Tangotiger said...: http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/

Team talent of MLB is 1 SD = .060. As Phil points out, taking two teams means the difference in talent level will have 1 SD = sqrt(.060^2+.060^2)= .060 * sqrt(2) = .085. And a gap of .085 talent level means that the losing team (q) will be .415 times per game.

Continuing with this process:
NHL 1 SD = .083, q=.383
NBA 1 SD = .134, q=.310
NFL 1 SD = .143, q=.298

Phil showed the numbers as:
.413 MLB baseball
.383 NHL hockey
.316 NBA basketball
.309 NFL football

That's what I call a bullseye. Occam's razor.
At Tuesday, September 25, 2007 1:29:00 PM, Tangotiger said...: The method
At Tuesday, September 25, 2007 6:11:00 PM, Unknown said...: Also the talent level for soccer varies a lot. If my memory serves me MLS is very competitive, European soccer isn't in any way, shape or form
At Wednesday, January 26, 2011 11:04:00 AM, Phil Birnbaum said...: I know it's three years later, but, Tango, nice job.

<< Home

Sabermetric Research

Tuesday, September 25, 2007

An oversimplified competitive balance study

6 Comments:

About Me

Previous Posts