### Luck vs. talent in the NHL standings

How much of the NHL standings is luck?

A few years ago, Tom Tango found that 36 NHL games was the point where luck and talent were equally important. But that was based on older data, so I thought I'd revisit the question. I looked at the years 2006-07 to 2011-12 -- a total of 150 team-seasons.

Following the usual method for breaking down performance luck, we start by figuring out the expected standard deviation due to randomness alone. That's a little harder than usual, because the NHL doesn't just have wins and losses; it also has pity points for overtime or shootout losses.

For each 82 team-games, there were 9.68 overtime [or shootout] wins, and 9.68 overtime losses ("regulation ties"). That means the average team has 41 games where they earn two points, 9.68 games where they earn one point, and 31.32 games where they get nothing.

The variance of single-game points is E(points squared) - [E(points)] squared. That's .8680 for the average team. Multiplying by 82 for a season, then taking the square root, and we find that the season SD of luck is 8.44 points.

In the five seasons in question, the observed SD of team points was 12.3. Since

Var (observed) = Var (talent) + Var (luck),

we get

Var(talent) = 12.3 squared minus 8.44 squared

= 8.95 points squared.

So, over an 82-game NHL season, talent is only barely more important than luck -- an SD of 8.95, vs. 8.44.

Where do talent and luck converge? At 73 games. At that point in the season, SD of talent is 8.0 points (per 73 games), and the SD of luck is also 8.0 points.

------

Why did I get 73 games, while Tango got 36? Well, Tango's dataset had an observed SD of .2 points per game. My dataset has .15. That probably accounts for most of the effect.

Why the difference? I'm not sure, but I have a couple of ideas.

First, NHL rules have changed since Tango's study. Back then, some games ended in ties, where each team got one point. That doesn't happen any more -- now, one of the two teams wins the shootout and gets that extra point. Perhaps more bonus points tend to reduce the advantage of being talented.

Second, teams may be playing for regulation ties more than they used to. And, third, there might be more overall parity in the league, for all I know.

------

After I did these calculations, I wasn't sure that they really captured everything. I won't tell you what else I thought might be going on, because ... well, I wrote a simulation to check, and it turned out I was wrong. The simulation matched the theory.

I'll tell you about the simulation anyway, since it's already done.

For each of the 150 teams in the sample, I took their regulation goals scored and goals against, and regressed them 30 percent to the mean (so a team that was 30 goals above average became 21 goals above average). Then, I simulated all five NHL seasons, 100 times each, using the actual game schedule.

For each game, I calculated the two teams' expected goals scored by combining their (regressed) figures. So if Boston's offense was 10 percent above average, and their opponent's defense was 10 percent below average, I'd have the Bruins pegged to score 21 percent more goals than the mean.

Then I went to the random number generator. For each team, I randomly assigned a score, based on a Poisson distribution for its expected number of goals scored. If the two teams wound up equal, I simulated OT. Half the time, I simulated the game being won in OT, the winner more likely to be the team with more goals. The other half, the simulated game was won in a shootout, each team with a 50/50 chance.

I ran that simulation, and it didn't work.

The standings were too homogeneous. I had put in too much regression to the mean. After some experimenting, I found that 22 percent worked best.

Also, the simulation had too few standings points. That's because real life has more regulation ties than Poisson predicts (as we already knew). So, I converted 20 percent of one-goal games to regulation ties (by randomly adding or subtracting one goal).

As an attempt at having the scores a bit more realistic, I added an empty-net goal to 20 percent of one-goal games. To balance, I subtracted one winning team's goal from half the lopsided games (4+ goal differential). These two changes didn't affect who won: only by how much.

After these changes, the simulation pretty much matched reality. Specifically, for one arbitrarily chosen run of my 100-fold simulation:

-- In real life, the average was 221.0 goals per team. In the simulation, it was 220.6.

-- In real life, the SDs of goals and goals allowed were 22.7 and 24.0, respectively. In the simulation, 23.9 and 24.5.

-- In real life, the average team scored 91.68 points (which means 9.68 overtime losses per season). In the simulation, it was 91.71 (9.71).

-- In real life, the SD of team points -- which is the most important thing for analyzing season luck -- was 12.30. In the simulation, it was 12.33.

------

So, in the simulation, how much of the standings turned out to be luck, and how much was talent? It turned out that the r-squared between talent (goal differential) and standings points was .53. That means there were .53 units of talent squared per .47 units of luck squared, a ratio of 1.13.

In real-life, we found 8.94-squared units of talent squared per 8.44-squared units of luck squared. That ratio was 1.12.

Pretty good match.

------

To recap, here's what I learned from all this:

1. For an 82-game NHL season like the last five, the SD of luck is 8.44 standings points.

2. The overall SD, which you can easily calculate from the official standings, is 12.3 points.

3. Therefore, the SD of team talent is 8.95 points.

4. That means the r-squared of talent vs. results is around .53.

5. From that, it follows that it takes 73 games until talent is as important as luck in predicting the standings.

6. Or, put another way: over an entire season, talent is more important than luck, but not by much.

And, if you trust the simulation is close to reality, we can add:

7. To estimate team talent, you can perhaps take its season goal differential and regress 22 percent to the mean.

------

In future posts, I'll use the simulation to do funner stuff, like figure out the probability that the number one seed is actually better than the number eight seed, and things like that.

Labels: distribution of talent, hockey, luck, NHL

## 23 Comments:

Phil,

I'm glad you took some time with this post. This is an area I just looked into, as it has big implications this season (shortened to 48 games instead of 82).

Here are the 2 articles I worked on.

http://nhlnumbers.com/2013/1/10/studying-luck-other-factors-in-pdo

http://www.fearthefin.com/2013/1/13/3864496/nhl-schedule-2013-playoffs-48-games-luck-pdo-fenwick

In the second, I ran a sim based on the percentage of wins, ties, and losses to simulate a season. Sounds like it was a little less rigorous than your sims, but I came out to virtually the same numbers. I think my var(talent) dropped slightly, when I ran the 82 numbers, (season's 2004-2005 through 2011-2012) I got a SD(obs) of 11.65, instead of 12.3.

Will take a look, thanks! 11.65 ain't too bad, you can still draw useful conclusions ...

Variables with larger values tend to have larger variances, so some of the difference between your numbers and Tango's could just be the additional points from regulation ties.

Can you explain in more detail how you arrived at the luck SD?

For the luck SD ... you've got a 41/82 chance of 2 points, and 9.68/82 chance of 1 point, and a 31.32/82 chance of 0 points.

The average points is 1.118. The average points squared (4 instead of 2) is 2.118.

The formula for variance is the average square, minus the square of the average. That's 2.118 minus 1.118 squared. That's 0.868. That's the variance of points for one game.

For 82 games, multiply that by 82. Then take the square root to get the SD.

Assuming X is a random variable for the number of points scored in a game, and you're interested in the number of points scored over a 82 game season, or 82X, shouldn't you be multiplying by 82^2 since Var(82X) = 82^2 * Var(X), or am I missing something about how you're expressing variance? Also, why wouldn't you just do 48, instead of 82?

Anonymous,

If you're adding 82 independent variables, then you multiply the variance by 82.

But, if you want the variance of 82 times a *single* variable, that's when you multiply by 82 squared.

The first way is the way the NHL works. The second way would be if you only played one game, the winner got 164 points, and the "overtime loser" got 82 points.

"The average points squared (4 instead of 2) is 2.118"

How did you arrive at 2.118 for this part of the formula? I.e., which numbers are you squaring and then averaging?

Hockeyfan,

You square the points. If you do that, you get 4 (2 squared) for a win, 1 (1 squared) for an OT loss, and 0 (zero squared) for a regular loss.

So you've got a 41/82 chance of 4 square points, a 9.68/82 chance of 1 square point, and a 31.32/82 chance of 0 square points. The average is 2.118 square points.

Ah, I think I understand. So essentially you are calculating the empirical variance of single-game points and multiplying by 82?

Aren't you making an assumption that the variance in single-game points is completely due to luck?

Oh nevermind, the whole point is to make that assumption.

You are saying... If every team were completely equal, and they played a "season" of 82 games against each other, then the variance in results would be due to chance alone.

So let's make that assumption about last season. Looking at single-game results, the average points was 1.12 with a variance of .86. Therefore, for 82 games each team would average 92 points with a variance of 70.78.

But even if this were the case, in a given season the spread of points between teams wouldn't be exactly 70.78 every time, right? Wouldn't there be variance in the variance?

(Does that makes sense?)

Right: if every team were equal against every other team, we'd expect a season variance of 70.78 points (or, 71.23, which I found, maybe because one of us rounded).

If you actually ran such random seasons, then, yes, you'd get a spread of observed variances that would be centered somewhere around 71.23.

Can you explain how you got the fractions? Ie 41/82 games ends with the team getting two points. My understanding is that the 9.68 was determined from the data set but how did you come to the 41/82?

Hi, Tim,

Every game, exactly one of the two teams leaves with two points. So, the average team will do that half the time, which is 41 out of 82.

First time reader of your blog. Fantastic post. Thanks!

I don't understand why you wouldn't regress 36% to the mean for goal differential.

I took the same number of seasons, and looked at this the same way, except I used single-game goal differential instead of points. Here's how that was distributed per 82 games on average:

margin per82

-9 0.02

-8 0.08

-7 0.18

-6 0.54

-5 1.18

-4 2.81

-3 7.99

-2 9.12

-1 19.09

1 19.09

2 9.12

3 7.99

4 2.81

5 1.18

6 0.54

7 0.18

8 0.08

9 0.02

The average goal differential is obviously going to be 0.0, but the variance is 5.766 -- squaring each single-game goal differential and taking the weighted avg by how often each type of goal diff happens for the avg team. 5.766 - 0.0 = 5.766, the variance due to luck.

The stdev of luck in 82 games, then, is sqrt(82*5.766), or 21.7. The observed stdev of team goal differentials per 82 games was 36.4, so the true talent stdev of goal differential is 29.2, or sqrt(36.4^2 - 21.7^2).

This implies that 64.3% of the observed variance in goal differential is due to skill, and 35.7% is due to luck -- not 22%.

Did I do something wrong there? I think I replicated the same methodology you used for points exactly.

Anonymous,

I think the difference might be that you took the distribution of goal differential for *all* teams. That gives you a wider variance than the theoretical one for a single team.

That is: the good teams are clustered high. The bad teams are clustered low. The middle teams are clustered in the middle. So, the variance of all of them combined is bigger than the variance of any individual team.

Put another way, you've got part of a talent distribution mixed in with the luck distribution.

I was going to suggest you do one team-season at a time, and average them ... but, again, your distribution will be wide because you'd be mixing up games against good teams and games against bad teams.

What you want for luck is: if all teams were the same, what would the goal differential distribution look like? I think you have to do that theoretically instead of looking at actual games.

Thanks, Phil. Perhaps another solution would be to take teams with pythagorean records close to .500 and look at their distribution of goals against other teams with pyth records close to .500? Since the theoretical "all-luck" league is composed of nothing but morally .500 teams, would that approximate the effect better?

Yup, that would probably be better!

Well, this is strange.

So I took the idea of using just ~ .500 teams, grabbing every team that was within +/- 0.15 Hockey-Reference Simple Rating of 0.0. Here were the distributions of the single-game score margins:

margin per82

7 0.08

6 0.91

5 1.32

4 3.71

3 7.09

2 9.40

1 18.48

-1 18.48

-2 9.40

-3 7.09

-4 3.71

-5 1.32

-6 0.91

-7 0.08

Going through the same exercise as before, the population variance of game scoring margin was actually 6.074, which was actually even larger than the variance when I looked at all teams!

Because of this, the per-82 stdev of goal differential talent was implied to be 28.7, and the proportion of the variance due to true talent was 62.4% -- again, even lower than when I looked at every team.

I have no idea what that means, but I thought I'd share. I guess you're probably right that the true amount of regression can only be derived via simulation.

Hmmm ... the variance actually increased? I wouldn't have expected that!

I used only regulation time goals, but subtracting a few singles won't change your variance much.

But you may be onto something. I assumed that goals had a Poisson distribution. They probably don't ... empty net goals would increase the variance beyond Poisson, but there might be other things.

If it really were Poisson, the variance of goal differential between two equal teams would be 2 times the average goals scored, which is ... 5.5ish. You got 6.1. Could be empty net goals, or other things ... not unreasonable.

Hang on, let me do empty net. Suppose 6 games per team have an empty net goal? That increases a 1-goal game to a 2-goal game ... times 6, is 18 square goals. 18 divided by 82 is .22, which brings the 6.07 down to 5.85. Did I do that right?

Now, subtract out 15 overtime goals, which is 15 square goals, and you're down to 5.67.

Hey, that worked pretty well ...

In any case, you may be right: you have to regress more to the mean when you consider OT and EN goals. That would make sense.

Actually, do you know how many EN goals there are per season? That would help figure it out.

Post a Comment

<< Home