Thursday, April 21, 2016

Noll-Scully doesn't measure anything real

The most-used measure of competitive balance in sports is the "Noll-Scully" measure. To calculate it, you figure the standard deviation (SD) of the winning percentage of all the teams in the league. Then, you divide by what the SD would be if all teams were of equal talent, and the results were all due to luck.

The bigger the number, the less parity in the league.

For a typical, recent baseball season, you'll find the SD of team winning percentage is around .068 (that's 11 wins out of 162 games). By the binomial approximation to normal, the SD due to luck is .039 (6.4 out of 162). So, the Noll-Scully measure works out to .068/.039, which is around 1.74.

In other words: the spread of team winning percentage in baseball is 1.74 times as high as if every team were equal in talent.

------

Both "The Wages of Wins" (TWOW), and a paper on competitive balance I just read recently (which I hope to post about soon), independently use Noll-Scully to compare different sports. Well, not just them, but a lot of academic research on the subject.

The Wages of Wins (page 70 of the second edition) runs this chart:

2.84 NBA
1.81 AL (MLB)
1.67 NL (MLB)
1.71 NHL
1.48 NFL

The authors follow up by speculating on why the NBA's figure is so high, why the league is so unbalanced. They discuss their "short supply of tall people" hypothesis, as well as other issues.

But one thing they don't talk about is the length of the season. In fact, their book (and almost every other academic paper I've seen on the subject) claims that Noll-Scully controls for season length. 

Their logic goes something like this: (a) The Noll-Scully measure is actually a multiple of the theoretical SD of luck. (b) That theoretical SD *does* depend on season length. (c) Therefore, you're comparing the league to what it would be with the same season length, which means you're controlling for it.

But ... that's not right. Yes, dividing by the theoretical SD *does* control for season length, but not completely.

------

Let's go back to the MLB case. We had

.068 observed SD
.039 theoretical luck SD
-------------------------
1.74 Noll-Scully ratio

Using the fact that SDs follow a pythagorean relationship, it follows that

observed SD squared = theoretical luck SD squared + talent SD squared

So

.068 squared = .039 luck squared + talent squared

Solving, we find that the SD of talent = .056. Let's write that this way:

.039 theoretical luck SD
.056 talent SD
------------------------
.068 observed SD
---------------------------------------
1.74 Noll-Scully (.068 divided by .039)

Now, a hypothetical. Suppose MLB had decided to play a season four times as long: 648 games instead of 162. If that happened, the theoretical luck SD would drop in half (we'd divide by the square root of 4). So, the luck SD would be .020. 

The talent SD would remain constant at .056. The new observed SD would be the square root of (.020 squared plus .056 squared), which works out to .059:

.020 theoretical luck SD
.056 talent SD
-------------------------
.059 observed SD
---------------------------------------
2.95 Noll-Scully (.059 divided by .020)

Under this scenario, the Noll-Scully increases from 1.74 to 2.95. But nothing has changed about the game of baseball, or the short supply of home run hitters, or the relative stinginess of owners, or the populations of the cities where the teams play. All that changed was the season length.

--------

My only point here, for now, is that Noll-Scully does NOT properly control for season length. Any discussion of why one sport has a higher Noll-Scully than another *must* include a consideration of the length of the season. Generally, the longer the season, the higher the Noll-Scully. (Try a Noll-Scully calculation early in the season, like today, and you'll get a very low number. That's because after only 15 games, luck is huge, so talent is small compared to luck.)

It's not like there's no alternative. We just showed one! Instead of Noll-Scully, why not just calculate the "talent SD" as above? That estimate *is* constant for season length, and it's still a measure of what academic authors are looking for. 

Tango did this in a famous post in 2006. He got

.060 MLB
.058 NHL
.134 NBA

If you repeat Tango's logic for different season lengths, you'll get the same numbers.  Well, you'll get different results because of random variation ... but they should average somewhere close to those figures.

---------

Now, you could argue ... well, sometimes you *do* want to control for season length. Perhaps one of the reasons the best teams dominate the standings is because NBA management wanted it that way ... so they chose a longer, 82-game season, in order to create some space between the Warriors and the other teams. Furthermore, maybe the NFL deliberately chose 16 games partly to give the weaker teams a chance.

Sure, that's fine. But you don't want to use Noll-Scully there either, because Noll-Scully still *partially* adjusts for season length, by using "luck multiple" as its unit. Either you want to consider season length, or you don't, right? Why would you only *partially* want to adjust for season length? And why that particular part?

If you want to consider season length, just use the actual SD of the standings. If you don't, then use the estimated SD of talent, from the pythagorean calculation. 

Either way, Noll-Scully doesn't measure anything anybody really wants.





Labels: , , , , , , ,