True talent levels for NHL team shooting percentage, part II
(Part I is here)
Team shooting percentage is considered an unreliable indicator of talent, because its season-to-season correlation is low.
Here are those correlations from the past few seasons, for 5-on-5 tied situations.
-0.19 2014-15 vs. 2013-14
+0.30 2013-14 vs. 2012-13
+0.33 2012-13 vs. 2011-12
+0.03 2011-12 vs. 2010-11
-0.10 2010-11 vs. 2009-10
-0.27 2009-10 vs. 2008-09
+0.04 2008-09 vs. 2007-08
The simple average of all those numbers is +0.02, which, of course, is almost indistinguishable from zero. Even if you remove the first pair -- the 2014-15 stats are based on a small, season-to-date sample size -- it's only an average of +0.055.
(A better way to average them might be to square them (keeping the sign), then taking the root mean square. That gives +0 .11 and +0.14, respectively. But I'll just use simple additions in this post, even though they're probably not right, because I'm not looking for exact answers.)
That does indeed suggest that SH isn't that reliable -- after all, there were more negative seasons than strong positive ones.
But: what if we expand our sample size, by looking at the correlation between pairs that are TWO seasons apart? Different story, now:
+0.35 2014-15 vs. 2012-13
+0.12 2013-14 vs. 2011-12
+0.27 2012-13 vs. 2010-11
+0.12 2011-12 vs. 2009-10
+0.41 2010-11 vs. 2008-09
-0.03 2009-10 vs. 2007-08
These six seasons average +0.21, which ain't bad.
Part of the reason that the two-year correlations are high might be that team talent didn't change all that much in the seasons of the study. I checked the correlation between overall team talent, as measured by hockey-reference.com's "SRS" rating. For 2008-09 vs. 2013-14, the correlation was +0.50.
And that's for FIVE seasons apart. So far, we've only looked at two seasons apart.
I chose 2008-09 because you'll notice the correlations that include 2007-08 are nothing special. That, I think, is because team talent changed significantly between 07-08 and 08-09. If I rerun the SRS correlation for 2007-08 vs. 2013-14 -- that is, going back only one additional year -- it drops from +0.50 to only +0.25.
On that basis, I'm arbitrarily deciding to drop 2007-08 from the rest of this post, since the SH% discussion is based on an assumption that team talent stays roughly level.
But even if team talent changed little since 2008-09, it still changed *some*. So, wouldn't you still expect the two-year correlations to be lower than the one-year correlations? There's still twice the change in talent, albeit twice a *small* change.
You can look at it a different way -- if A isn't strongly related to B, and B isn't strongly related to C, then how can A be strongly related to C?
Well, I think it's the other way around. It's not just that A *can* be strongly related to C. It's that, if there's really a signal within the noise, you should *expect* A to be strongly related to C.
Consider 2009-10. In that year, every team had a certain SH% talent. Because of randomness, the set of 30 observed team SH% numbers varied from the true talent. The same would be true, of course, for the two surrounding seasons, 2008-09, and 2010-11.
But both those surrounding seasons had a substantial negative correlation with the middle season. That suggests that for each of those surrounding seasons, their luck varied from the middle season in the "opposite" way. Otherwise, the correlation wouldn't be negative.
For instance, maybe in the middle season, the Original Six teams were lucky, and the other 24 teams were unlucky. The two negative correlations with the surrounding seasons suggest that in each of those seasons, maybe it was the other way around, that the Original Six were unlucky, and the rest lucky.
Since the surrounding seasons both had opposite luck to the middle season, they're likely to have had similar luck to each other.
In this case, they are. The A-to-B correlation is -0.27. The B-to-C correlation is -0.10. But the A-to-C correlation is +0.41. Positive, and quite large.
-0.10 2010-11 (A) vs. 2009-10 (B)
-0.27 2009-10 (B) vs. 2008-09 (C)
+0.41 2010-11 (A) vs. 2008-09 (C)
This should be true even if SH% is all random -- that is, even if all teams have the same talent. The logic still holds: if A correlates to B the same way C correlates to B ... that means A and C are likely to be somewhat similar.
I ran a series of three-season simulations, where all 30 teams were equal in talent. When both A and C had a similar, significant correlation to B (same sign, both above +/- 0.20), their correlation with each other averaged +0.06.
In our case, we didn't get +0.06. We got something much bigger: +0.41. That's because the underlying real-life talent correlation isn't actually zero, as it was in the simulation. A couple of studies suggested it was around +0.15.
So, the A-B was actually -0.25 "correlation points", relative to the trend: -0.10 relative to zero, plus -0.15 below typical. (I'm sure that isn't the way to do it statistically -- it's not perfectly additive like that -- but I'm just illustrating the point.) Similarly, the B-C was actually -0.42 points.
Those are much larger effects when you correct them that way, so they have a stronger result. When I limited the simulation sample so both A-B and A-C had to be bigger than +/- 0.25, the average A-C correlation almost tripled, to +0.16.
Add that +0.16 to the underlying +0.15, and you get +0.31. Still not the +0.41 from real life, but close enough, considering the assumptions I made and shortcuts I took.
Since we have six seasons with stable team talent, we don't have to stop at two-season gaps ... we can go all the way to five-season gaps, and pair every season with every other season. Here are the results:
14-15 13-14 12-13 11-12 10-11 09-10 08-09
14-15 -0.19 +0.35 +0.20 +0.15 +0.46 -0.07
13-14 -0.19 +0.30 +0.12 +0.27 -0.07 +0.42
12-13 +0.35 +0.30 +0.33 +0.27 +0.24 +0.26
11-12 +0.20 +0.12 +0.33 +0.03 +0.12 -0.08
10-11 +0.15 +0.27 +0.27 +0.03 -0.10 +0.41
09-10 +0.46 -0.07 +0.24 +0.12 -0.10 -0.27
08-09 -0.07 +0.42 +0.26 -0.08 +0.41 -0.27
The average of all these numbers is ... +0.15, which is exactly what the other studies averaged out to. That's coincidence ... they used a different set of pairs, they didn't limit the sample to tie scores, and 14-15 hadn't existed yet. (Besides, I think if you did the math, you'd find you wanted the root of the average r-squared, which would be significantly higher than +0.15.)
Going back to the A-B-C thing ... you'll find it still holds. If you look for cases where A-B and B-C are both significantly below the 0.15 average, A-C will be high. (Look in the same row or column for two low numbers.)
For instance, in the 14-15 row, 13-14 and 08-09 are both negative. Look for the intersection of 13-14 and 08-09. As predicted, the correlation there is very high -- +0.42.
By similar logic, if you find cases where A-B and B-C go in different directions -- one much higher than 0.15, the other much lower -- then, A-C should be low.
For instance, in the second row, 09-10 is -0.07, but 08-09 is +0.42. The prediction is that the intersection of 09-10 and 08-09 should be low -- and it is, -0.27.
Look at 2012-13. It has a strong positive correlation with every other season in the sample. Because of that, I originally guessed that 2012-13 is the most "normal" of all the seasons, the one where teams most played to their overall talent. In other words, I guessed that 2012-13 was the one with the least luck.
But, when I calculated the SDs of the 30 teams for each season ... 2012-13 was the *highest*, not the lowest. By far! And that's even adjusting for the short season. In fact, all the full seasons had a team SD of 1.00 percentage points or lower -- except that one, which was at the adjusted equivalent of 1.23.
What's going on?
Well, I think it's this: in 2012-13, instead of luck mixing up the differences in team talent, it exaggerated them. In other words: that year, the good teams got lucky, and the bad teams got unlucky. In 2012-13, the millionaires won most of the lotteries.
That kept the *order* of the teams the same -- which means that 2012-13 wound up the most exaggeratedly representative of teams' true talent.
Whether that's right or not, it seems that two things should be true:
-- With all the high correlations, 2012-13 should be a good indicator of actual talent over the seven-year span; and
-- Since we found that talent was stable, we should get good results if we add up all six years for each team, as if it was one season with six times as many games.*
*Actually, about five times, since there are two short seasons in the sample -- 2012-13, and 2014-15, which is less than half over as I write this.
Well, I checked, and ... both guesses were correct.
I checked the correlation between 2012-13 vs. the sum of the other five seasons (not including the current 2014-15). It was roughly +0.54. That's really big. But, there's actually no value in that ... it was cherry-picked in retrospect. Still, it's just something I found interesting, that for a statistic that is said to have so little signal, a shortened season can still have a +0.54 correlation with the average of five other years!
As for the six-season averages ... those DO have value. Last post, when we tried to get an estimate of the SD of team talent in SH% ... we got imaginary numbers! Now, we can get a better answer. Here's the Palmer/Tango method for the 30 teams' six-year totals:
SD(observed) = 0.543 percentage points
SD(luck) = 0.463 percentage points
SD(talent) = 0.284 percentage points
That 0.28 percentage points has to be an underestimate. As explained in the previous post, the "all shots are the same" binomial luck estimate is necessarily too high. If we drop it by 9 percent, as we did earlier, we get
SD(observed) = 0.543 percentage points
SD(luck) = 0.421 percentage points
SD(talent) = 0.343 percentage points
We also need to bump it for the fact that this is the talent distribution for a six-season span -- which is necessarily tighter than a one-season distribution (since teams tend to regress to the mean over time, even slightly). But I don't know how much to bump, so I'll just leave it where it is.
That 0.34 points is almost exactly what we got last post. Which makes sense -- all we did was multiply our sample size by five.
The real difference, though, is the credibility of the estimate. Last post, it was completely dependent on our guess that the binomial SD(luck) was 9 percent too high. The difference between guessing and not guessing was huge -- 0.34 points, versus zero points. In effect, without guessing, we couldn't prove there was any talent at all!
But now, we do have evidence of talent ... and guessing adds only around 0.6 points. If you refuse to allow a guess of how shots vary in quality ... well, you still have evidence, without guessing at all, that teams must vary in talent with an SD of at least 0.284 percentage points.