Luck vs. talent in the NHL standings
How much of the NHL standings is luck?
A few years ago, Tom Tango found that 36 NHL games was the point where luck and talent were equally important. But that was based on older data, so I thought I'd revisit the question. I looked at the years 2006-07 to 2011-12 -- a total of 150 team-seasons.
Following the usual method for breaking down performance luck, we start by figuring out the expected standard deviation due to randomness alone. That's a little harder than usual, because the NHL doesn't just have wins and losses; it also has pity points for overtime or shootout losses.
For each 82 team-games, there were 9.68 overtime [or shootout] wins, and 9.68 overtime losses ("regulation ties"). That means the average team has 41 games where they earn two points, 9.68 games where they earn one point, and 31.32 games where they get nothing.
The variance of single-game points is E(points squared) - [E(points)] squared. That's .8680 for the average team. Multiplying by 82 for a season, then taking the square root, and we find that the season SD of luck is 8.44 points.
In the five seasons in question, the observed SD of team points was 12.3. Since
Var (observed) = Var (talent) + Var (luck),
Var(talent) = 12.3 squared minus 8.44 squared
= 8.95 points squared.
So, over an 82-game NHL season, talent is only barely more important than luck -- an SD of 8.95, vs. 8.44.
Where do talent and luck converge? At 73 games. At that point in the season, SD of talent is 8.0 points (per 73 games), and the SD of luck is also 8.0 points.
Why did I get 73 games, while Tango got 36? Well, Tango's dataset had an observed SD of .2 points per game. My dataset has .15. That probably accounts for most of the effect.
Why the difference? I'm not sure, but I have a couple of ideas.
First, NHL rules have changed since Tango's study. Back then, some games ended in ties, where each team got one point. That doesn't happen any more -- now, one of the two teams wins the shootout and gets that extra point. Perhaps more bonus points tend to reduce the advantage of being talented.
Second, teams may be playing for regulation ties more than they used to. And, third, there might be more overall parity in the league, for all I know.
After I did these calculations, I wasn't sure that they really captured everything. I won't tell you what else I thought might be going on, because ... well, I wrote a simulation to check, and it turned out I was wrong. The simulation matched the theory.
I'll tell you about the simulation anyway, since it's already done.
For each of the 150 teams in the sample, I took their regulation goals scored and goals against, and regressed them 30 percent to the mean (so a team that was 30 goals above average became 21 goals above average). Then, I simulated all five NHL seasons, 100 times each, using the actual game schedule.
For each game, I calculated the two teams' expected goals scored by combining their (regressed) figures. So if Boston's offense was 10 percent above average, and their opponent's defense was 10 percent below average, I'd have the Bruins pegged to score 21 percent more goals than the mean.
Then I went to the random number generator. For each team, I randomly assigned a score, based on a Poisson distribution for its expected number of goals scored. If the two teams wound up equal, I simulated OT. Half the time, I simulated the game being won in OT, the winner more likely to be the team with more goals. The other half, the simulated game was won in a shootout, each team with a 50/50 chance.
I ran that simulation, and it didn't work.
The standings were too homogeneous. I had put in too much regression to the mean. After some experimenting, I found that 22 percent worked best.
Also, the simulation had too few standings points. That's because real life has more regulation ties than Poisson predicts (as we already knew). So, I converted 20 percent of one-goal games to regulation ties (by randomly adding or subtracting one goal).
As an attempt at having the scores a bit more realistic, I added an empty-net goal to 20 percent of one-goal games. To balance, I subtracted one winning team's goal from half the lopsided games (4+ goal differential). These two changes didn't affect who won: only by how much.
After these changes, the simulation pretty much matched reality. Specifically, for one arbitrarily chosen run of my 100-fold simulation:
-- In real life, the average was 221.0 goals per team. In the simulation, it was 220.6.
-- In real life, the SDs of goals and goals allowed were 22.7 and 24.0, respectively. In the simulation, 23.9 and 24.5.
-- In real life, the average team scored 91.68 points (which means 9.68 overtime losses per season). In the simulation, it was 91.71 (9.71).
-- In real life, the SD of team points -- which is the most important thing for analyzing season luck -- was 12.30. In the simulation, it was 12.33.
So, in the simulation, how much of the standings turned out to be luck, and how much was talent? It turned out that the r-squared between talent (goal differential) and standings points was .53. That means there were .53 units of talent squared per .47 units of luck squared, a ratio of 1.13.
In real-life, we found 8.94-squared units of talent squared per 8.44-squared units of luck squared. That ratio was 1.12.
Pretty good match.
To recap, here's what I learned from all this:
1. For an 82-game NHL season like the last five, the SD of luck is 8.44 standings points.
2. The overall SD, which you can easily calculate from the official standings, is 12.3 points.
3. Therefore, the SD of team talent is 8.95 points.
4. That means the r-squared of talent vs. results is around .53.
5. From that, it follows that it takes 73 games until talent is as important as luck in predicting the standings.
6. Or, put another way: over an entire season, talent is more important than luck, but not by much.
And, if you trust the simulation is close to reality, we can add:
7. To estimate team talent, you can perhaps take its season goal differential and regress 22 percent to the mean.
In future posts, I'll use the simulation to do funner stuff, like figure out the probability that the number one seed is actually better than the number eight seed, and things like that.