Thursday, April 21, 2016

Noll-Scully doesn't measure anything real

The most-used measure of competitive balance in sports is the "Noll-Scully" measure. To calculate it, you figure the standard deviation (SD) of the winning percentage of all the teams in the league. Then, you divide by what the SD would be if all teams were of equal talent, and the results were all due to luck.

The bigger the number, the less parity in the league.

For a typical, recent baseball season, you'll find the SD of team winning percentage is around .068 (that's 11 wins out of 162 games). By the binomial approximation to normal, the SD due to luck is .039 (6.4 out of 162). So, the Noll-Scully measure works out to .068/.039, which is around 1.74.

In other words: the spread of team winning percentage in baseball is 1.74 times as high as if every team were equal in talent.


Both "The Wages of Wins" (TWOW), and a paper on competitive balance I just read recently (which I hope to post about soon), independently use Noll-Scully to compare different sports. Well, not just them, but a lot of academic research on the subject.

The Wages of Wins (page 70 of the second edition) runs this chart:

2.84 NBA
1.81 AL (MLB)
1.67 NL (MLB)
1.71 NHL
1.48 NFL

The authors follow up by speculating on why the NBA's figure is so high, why the league is so unbalanced. They discuss their "short supply of tall people" hypothesis, as well as other issues.

But one thing they don't talk about is the length of the season. In fact, their book (and almost every other academic paper I've seen on the subject) claims that Noll-Scully controls for season length. 

Their logic goes something like this: (a) The Noll-Scully measure is actually a multiple of the theoretical SD of luck. (b) That theoretical SD *does* depend on season length. (c) Therefore, you're comparing the league to what it would be with the same season length, which means you're controlling for it.

But ... that's not right. Yes, dividing by the theoretical SD *does* control for season length, but not completely.


Let's go back to the MLB case. We had

.068 observed SD
.039 theoretical luck SD
1.74 Noll-Scully ratio

Using the fact that SDs follow a pythagorean relationship, it follows that

observed SD squared = theoretical luck SD squared + talent SD squared


.068 squared = .039 luck squared + talent squared

Solving, we find that the SD of talent = .056. Let's write that this way:

.039 theoretical luck SD
.056 talent SD
.068 observed SD
1.74 Noll-Scully (.068 divided by .039)

Now, a hypothetical. Suppose MLB had decided to play a season four times as long: 648 games instead of 162. If that happened, the theoretical luck SD would drop in half (we'd divide by the square root of 4). So, the luck SD would be .020. 

The talent SD would remain constant at .056. The new observed SD would be the square root of (.020 squared plus .056 squared), which works out to .059:

.020 theoretical luck SD
.056 talent SD
.059 observed SD
2.95 Noll-Scully (.059 divided by .020)

Under this scenario, the Noll-Scully increases from 1.74 to 2.95. But nothing has changed about the game of baseball, or the short supply of home run hitters, or the relative stinginess of owners, or the populations of the cities where the teams play. All that changed was the season length.


My only point here, for now, is that Noll-Scully does NOT properly control for season length. Any discussion of why one sport has a higher Noll-Scully than another *must* include a consideration of the length of the season. Generally, the longer the season, the higher the Noll-Scully. (Try a Noll-Scully calculation early in the season, like today, and you'll get a very low number. That's because after only 15 games, luck is huge, so talent is small compared to luck.)

It's not like there's no alternative. We just showed one! Instead of Noll-Scully, why not just calculate the "talent SD" as above? That estimate *is* constant for season length, and it's still a measure of what academic authors are looking for. 

Tango did this in a famous post in 2006. He got

.060 MLB
.058 NHL
.134 NBA

If you repeat Tango's logic for different season lengths, you'll get the same numbers.  Well, you'll get different results because of random variation ... but they should average somewhere close to those figures.


Now, you could argue ... well, sometimes you *do* want to control for season length. Perhaps one of the reasons the best teams dominate the standings is because NBA management wanted it that way ... so they chose a longer, 82-game season, in order to create some space between the Warriors and the other teams. Furthermore, maybe the NFL deliberately chose 16 games partly to give the weaker teams a chance.

Sure, that's fine. But you don't want to use Noll-Scully there either, because Noll-Scully still *partially* adjusts for season length, by using "luck multiple" as its unit. Either you want to consider season length, or you don't, right? Why would you only *partially* want to adjust for season length? And why that particular part?

If you want to consider season length, just use the actual SD of the standings. If you don't, then use the estimated SD of talent, from the pythagorean calculation. 

Either way, Noll-Scully doesn't measure anything anybody really wants.

Labels: , , , , , , ,


At Thursday, April 21, 2016 5:32:00 PM, Anonymous Sobchak said...

Noll-Scully does directly map to some more easily interpretable competitive balance metrics (that take season length into account) like Tango's "regress-halfway-pct-season" (the percent of the way into the season you have to get where the standings are 50/50 talent/luck) and dcj's "better-team-better-record-pct" (the percent of the time the more talented of two teams finishes with the better record) - see

At Thursday, April 21, 2016 5:41:00 PM, Blogger Phil Birnbaum said...

Cool stuff! I will try to digest it and update the post. Thanks!

At Thursday, April 21, 2016 5:48:00 PM, Anonymous Anonymous said...

Say for example that the MLB season consisted of a single game, so after this game, you have 15 teams with 1-0 records and 15 teams with 0-1 records. Thus, your observed SD is 0.5. However, your theoretical luck SD is also 0.5 (sqrt(0.5 * 0.5) divided by 1 game, as per Tango's post). From the pythagorean formula, don't you then get talent SD equalling 0?

At Thursday, April 21, 2016 6:05:00 PM, Blogger Phil Birnbaum said...

The theoretical luck SD is by binomial approximation to normal, which applies only to larger numbers of games. One game isn't enough for the formula to be accurate enough, which is why it doesn't work for a single game. It's probably close enough for 15 games, though still not exact.

At Tuesday, April 26, 2016 1:10:00 PM, Anonymous Anonymous said...

Can you clarify what you mean by "theoretical luck SD is by binomial approximation to normal"? Binomial approximation to normal is used for e.g. estimating the probability of e.g. winning exactly X games out of 162 or at least X games out of 162, but is not needed to calc SDs.

My understanding is that what you call "theoretical luck SD" is the same as the SD of the percentage of heads you get by flipping a fair coin n times, so is exactly equal to sqrt(0.5 * 0.5 / n) - no approximation needed.

But the larger issue I want to ask about is that I don't understand why talent SD doesn't depend on the number of games played, which is implicitly assumed in your post. The fact that observed SD > theoretical luck SD after 162 games shows that talent SD after 162 games is nonzero. However, the example I mentioned highlights that observed SD = theoretical luck SD after 1 game, so at this point talent SD is 0 - i.e., talent SD must be increasing with the number of games played.

At Tuesday, April 26, 2016 1:32:00 PM, Anonymous Anonymous said...

I should add, though - if you do believe that talent SD increases with the number of games played, the main hypothesis of your post is still true, but even more so - i.e., Noll-Scully increases even more with the length of the season than if you assumed talent SD didn't depend on the number of games played. Still, that would make the disparity between NBA and MLB even more extreme after controlling for season length, considering there are nearly twice as many games in the MLB season.

At Tuesday, April 26, 2016 4:54:00 PM, Blogger Phil Birnbaum said...


Yes, "theoretical luck SD" is the same as the SD of the percentage of heads you get by flipping a fair coin n times. However, that is not EXACTLY equal to sqr(0.5 * 0.5 / n). It is APPROXIMATELY equal to that. It only becomes "exactly" equal to that as n goes to infinity.

After one game, it's not even approximately equal to that, which is why the logic fails.

The SD of talent is a constant, independent of games played. If you have a biased coin that is expected to land heads 55% of the time, its "talent" is 55% regardless of whether you flip it 16 times or 162 times.

At Wednesday, April 27, 2016 1:54:00 PM, Anonymous Anonymous said...

With your example, I agree that talent is 55% regardless of how many games are played. But by the same token, luck is 50% regardless of how many games are played. I'm raising a question not about the constancy about expected values, but rather the constancy of standard deviations.

I agree that the standard deviation of luck goes down with the number of games played. But I don't understand why it's acceptable to assume that the standard deviation of talent is constant regardless of the number of games played. For example, you can have standard deviation of talent being non-constant, but as long as it decreases faster than the standard deviation of luck, you will still have talent dominate luck in the long run.

At Wednesday, April 27, 2016 2:32:00 PM, Anonymous Anonymous said...

To make this more concrete - take the 2015 MLB season as an example.

After 81 games, the observed SD of winning percentage is 0.0639, and the theoretical luck SD is sqrt(0.25 / 81) = 0.0556. Thus, the talent SD is approximately sqrt(0.0639^2 - 0.0556^2) = 0.0315.

After 162 games, the observed SD of winning percentage is 0.0645, and the theoretical luck SD is sqrt(0.26/162) = 0.0392. Thus, the talent SD is approximately sqrt(0.0645^2 - 0.0392^2) = 0.0512.

So empirically, talent SD is NOT constant, and in fact increases with the number of games played.

At Wednesday, April 27, 2016 2:36:00 PM, Anonymous Anonymous said...

Sorry, in the 162-game example, I meant to write "theoretical luck SD is sqrt(0.25 / 162)".

At Thursday, April 28, 2016 1:58:00 PM, Blogger Zach said...

Wouldn't talent observed be different than theoretical talent based on injuries (and returning from injuries)?

At Friday, April 29, 2016 12:06:00 PM, Anonymous Anonymous said...

talent is always theoretical; injuries or the lack thereof cause changes to theoretical talent.

At Friday, May 13, 2016 3:17:00 PM, Blogger Zach said...

For any point in time, but historically certain players played. They had a certain amount of talent. Team talent should always have some level of bench/org talent factored in, right?


Post a Comment

<< Home