True talent levels for NHL team shooting percentage
How much of the difference in team shooting percentage (SH%) is luck, and how much is talent? That seems like it should be pretty easy to figure out, using the usual Palmer/Tango method.
Let's start with the binomial randomness inherent in shooting.
In 5-on-5 tied situations in 2013-14 (the dataset I'm using for this entire post), the average team took 721 shots. At a 8 percent SH%, one SD of binomial luck is
The square root of (0.08 * 0.92 / 721)
... which is almost exactly one percentage point.
That's a lot. It would move an average team about 10 positions up or down in the standings -- say, from 7.50 (16th) to 8.50 (4th).
If you want to compare that to Corsi ... for CF% (defined as the percentage of shots at goal which go to the offense), the SD due to binomial luck is also (coincidentally) about one percentage point. That would take a 50.0% team to 51.0%, which is only maybe three places in the standings.
That's one reason that SH% isn't as reliable an indicator as Corsi: a run of luck can make you look like the best or worst in the in that category, instead of just moving you a few spots.
If we just go to the data and observe the SD of actual team SH%, it also comes out to about one percentage point.
SD(talent)^2 = SD(observed)^2 - SD(luck)^2
SD(talent)^2 = 1.0 - 1.0
Which equals zero. And so it appears there's no variance in talent at all -- that SH% is, indeed, completely random!
But ... not necessarily. For two reasons.
First, var(observed) is itself random, based on what happened in the 2013-14 season. We got a value of around 1.00, but it could be that the "true" value, the average we'd get if we re-ran the season an infinite number of times, is different.
How much different could it be? I wrote a simulation to check. I ran 5,000 seasons of 30 teams, each with 700 shots and a shooting percentage of 8.00 percent.
As expected, the average of those 5,000 SDs was around 1.00. But the 5,000 values varied with an SD of 0.133 percentage points. (Yes, that's the SD of a set of 5,000 SDs.) So the standard 95% confidence interval gives us a range of (0.83, 1.27).
That doesn't look like it would make a whole lot of difference in our talent estimate ... but it does.
At the top end of the confidence interval, an observed SD of 1.27, we'd get
SD(talent) squared = 1.27 squared - 1.00 squared
= 0.52 squared
That would put the SD of talent at 0.52 percentage points, instead of zero. That's a huge difference numerically, and a huge difference in how we think of SH% talent. Without the confidence interval, it looks like SH% talent doesn't exist at all. With the confidence interval, not only does it appear to exist, but we see it could be substantial.
Why is the range so wide? It's because the observed spread isn't much different from the binomial luck. In this case, they're identical, at 1.00 each. In other situations or other sports, they're farther apart. In MLB team wins, the SD of actual wins is almost double the theoretical SD from luck. In the NHL, it's about one-and-a-half times as big. In the NBA ... not sure; it's probably triple, or more.
If you have a sport where the range of talent is bigger than the range of luck, your SD will be at least 1.4 times as big as you'd see otherwise -- and 1.4 times is a significant enough signal to not be buried in the noise. But if the range of talent is only, say, 40% as large as the range of luck, your expected SD will be only 1.077 times as big -- that is, only eight percent larger. And that's easy to miss in all the random noise.
Can we narrow down the estimate with more seasons of data?
For 2011-12, SD(observed) was 0.966, which actually gives an imaginary number for SD(talent) -- the square root of a negative estimate of var(talent). In other words, the teams were closer than we'd expect them to be even if they were all identical!
For 2010-11, SD(observed) was 0.88, which is even worse. In 2009-10, it was 1.105. Well, that works: it suggests SD(talent) = 0.47 percentage points. For 2008-09, it's back to imaginary numbers, with SD(observed) = 0.93. (Actually, even 2013-14 gave a negative estimate ... I've been saying SD(luck) and SD(observed) were both 1.00, but they were really 1.01 and 0.99, respectively.)
Out of five seasons, we get four impossible situations, the teams are closer together than we'd expect even if they were identical!
That might be random. It might be something wrong with our assumption that talent and luck are independent. Or, it might be that there's something else wrong.
I think it's that "something else". I think that it's we're not using a good enough assumption about shot types.
Our binomial luck calculation assumed that all the shots were the same, that every shot had an identical 8% chance of becoming a goal. If you use a more realistic assumption, the effects of luck come out lower.
The typical team in the dataset scored about 56 goals. If that's 700 shots at 8 percent, the luck SD is 1 percent, as we found. But suppose those 56 goals come from a combination of high-probability shots and low-probability shots, like this:
5 goals = 5 shots at 100%
15 goals = 30 shots at 50%
30 goals = 300 shots at 10%
6 goals = 365 shots at 1.64%
56 goals = 700 shots at 8%
If you do it that way, the luck SD drops from 1.0% to 0.91%.
And that makes a big difference. 1.00 squared minus 0.91 squared is around 0.4 squared. Which means: if that pattern of shots is correct, then the SD of team SH% talent is 0.4 points.
That's pretty meaningful, about five places in the standings.
I'm not saying that shot pattern is accurate... it's a drastic oversimplification. But "all shots the same" is also an oversimplification, and the one that gives you the most luck. Any other pattern will have less randomness.
What is actually the right pattern? I have no idea. But if you find one that splits the difference, where the luck SD drops only to 0.95% or something ... you'll still get SD(talent) around 0.35 percentage points, which is still meaningfully different from zero.
(UPDATE: Tango did something similar to this for baseball defense, to avoid a too-high estimate for variation in how teams convert balls-in-play to outs. He describes it here.)
What's right? Zero? 0.35? 0.53? We could use some other kinds of evidence. Here's some other data that could help, from the hockey research community.
These two studies, that I pointed to in an earlier post, found year-to-year SH% correlations in the neighborhood of 0.15. Since the observed SD is about 1.0, that would put the talent SD in the range of 0.15. That seems reasonable, and consistent with the confidence intervals we just saw and the guesses we just made.
Var(talent) for Corsi doesn't have these problems, so it's easy to figure. If you assume a game's number of shots is constant, and binomial luck applies to whether those shots are for or against -- not a perfect model, but close enough -- the estimate of SD(talent) is around 4 percentage points.
Converting that to goals:
-- one talent SD in SH% = 1 goal
-- one talent SD in CF% = 10 goals
So, Corsi is 10 times as useful to know as SH%! Well, that might be a bit misleading: CF% is based on both offense and defense, while SH% is offense only. So the intuitive take on the ratio is probably only 5 times.
Still: Corsi talent dwarfs SH% talent when it comes to predicting future performance, by a weighting of 5 to 1. No wonder Corsi is so much more predictive!
Either way, it doesn't mean that SH% is meaningless. This analysis suggests that teams who have a very high SH% are demonstrating a couple of 5-on-5 tied goals worth of talent. (And, of course, a proportionate number of other goals in other situations.)
And, if I'm not mistaken ... again coincidentally, one point of CF% is worth the same, in terms of what it tells you about a team's talent, as one point of SH%. (Of course, SH% is much harder to achieve -- only a few of teams are as much as 1 point of SH% above or below average, while almost every team is above or below 51.0% CF%.)
So, instead of using Corsi alone ... just add CF% and SH%! That only works in 5-on-5 tied situations -- otherwise, it's ruined by score effects. But I wouldn't put too much trust in any shot study that doesn't adjust for score effects, anyway.
I started thinking about this after the shortened 2012-13 season, when the Toronto Maple Leafs had an absurdly high SH% in 5-on-5 tied situations (10.82, best in the league), but an absurdly low CF% (43.8%, second worst to Edmonton).
My argument is: if you're trying to project the Leafs' scoring talent, you can't just use the Corsi and ignore the SH%. If the Leafs are 2 points above average in SH%, that tells you as much about their talent as two Corsi points. Instead of projecting the Leafs to score like a 43.8% Corsi team, you have to project them to score like, maybe, a 45.8% team. Which means that instead of second worst, maybe they're probably only fifth or sixth worst.
That's almost exactly what I estimated a year ago, based on a completely different method and set of assumptions. Neither analysis is perfect, and there's still lots of randomness in the data and uncertainty in the assumptions ... but, still, it's nice to see the results kind of confirmed.