Thursday, December 11, 2014

The best NHL players create higher quality shots

A couple of months ago, I pointed to data that showed team shot quality is a real characteristic of a team, and not just random noise which the hockey analytics consensus believes it to be. 

That had to do with opposition shot distance. In 2013-14, the Wild allowed only 32 percent of opposing shots from in close, as compared to the Islanders, who allowed 61 percent. Those differences are far too large to be explained by luck.

Here's one more argument, that -- it seems to me -- is almost undeniable evidence that SH% must be a real team skill.

-----

It's conventional wisdom that some players shoot better than others, right? In a 1991 skills competition, Ray Bourque shot out four targets in four shots. The Hull brothers (and son) were well-known for their ability to shoot. Right now, Alexander Ovechkin is considered the best pure scorer in the league.*

(*Update: yeah, that's not quite right, as a reader points out on Twitter.)

In 1990-91, Brett Hull scored 86 goals with a 22.1 percent SH%. Nobody would argue that was just luck, right? You probably do have to regress that to the mean -- his career average was only 15.7 percent -- but you have to recognize that Brett Hull was a much better shooter than average.

Well, if that's true for players, it's true for teams, right? Teams are just collections of players. The only way out is to take the position that Hull just cherry-picked his team's easiest shots, and he was really just stealing shooting percentage from his teammates. 

That's logically possible. In fact, I think it's actually true in another context, NBA players' rebounds. It doesn't seem likely for hockey, but, still, I figured, I have to check.

-----

I went to hockey-reference.com and found the top 10 players in "goals created" in 2008-09. (I limited the list to one player per team.)  

For each of those players, I checked his team's shooting percentage with and without him on the ice, in even-strength situations in road games, the following season. (Thanks to "Super Shot Search," as usual, for the data.)  

As expected, their teams generally shot better with them than without them:

-----------------------
With  Without
-----------------------
10.1   5.8   Ovechkin
 5.5   9.3   Malkin
 7.1   5.3   Parise
 6.0   8.2   Carter 
10.3   9.1   Kovalchuk
 8.9   5.7   Datsyuk
10.8   7.5   Iginla
10.7   7.2   Nash
 9.8   7.5   Staal
 9.0   9.1   Getzlaf
-----------------------
 9.4   7.5   Average
-----------------------

Eight of the ten players improved their team's SH%. Weighting all players equally, the average increase came out to +1.9 percentage points, which is substantial. 

It would be hard to argue, I think, that this could be anything other than player influence. 

It looks way too big to be random. It can't be score effects, because these guys probably play roughly the same number of minutes per game regardless of the score. It can't be bias on the part of home-team shot counters, because these are road games only. 

And, it can't be players stealing from teammates, because the numbers are for all teammates on the ice at the same time. You can't steal quality shots from players on the bench.

------

I should mention that there was also an effect for defense, but it was so small you might as well call it zero. The opposition had a shooting percentage of 7.7 percent with one of those ten players on the ice, and 7.8 percent without. 

That kind of makes sense -- the players in that list are known for their scoring, not their defense. I wonder if we'd find a real effect if chose the players on some defensive scale instead? Maybe Selke Trophy voting?

Also ... what's with Malkin? On offense, the Penguins shot 3.8 percentage points worse on his shifts. On defense, the Penguins opponents shot 3.8 percent better. Part of the issue is that his "off" shifts are Sidney Crosby's "on" shifts. But even his raw numbers are unusually low/high.

------

Speaking of Crosby ... if you don't believe that the good players have consistently high shot quality, Crosby's record should help convince you. Every year of his career, the Penguins had higher quality shots with Crosby on the ice than without:

------------------------------
With  Without 
------------------------------
11.7   9.4   2008-09 Crosby
10.1   7.1   2009-10 
 9.6   6.9   2010-11
13.9   7.3   2011-12
13.2   8.8   2012-13
10.4   6.5   2013-14
 7.1   7.0   2014-15 (to 12/9)
-------------------------------
10.9   7.6   Average
-------------------------------

Sidney Crosby shifts show a consistent increase of 3.3 percentage points -- even including the first third of the current season at full weight. 

You could argue that's just random, but it's a tough sell.

------

Now, for team SH%, you could still make an argument that goes something like this:


"Of course, superstars like Sidney Crosby create better quality shots. Everyone always acknowledged that, and this blog post is attacking a straw man. The real point is ... there aren't that many Sidney Crosbys, and, averaged over a team's full roster, their effects are diluted to the point where team differences are too small to matter."

But are they really too small to matter?  How much do the Crosby numbers affect the Penguins' totals?

Suppose we regress Crosby's +3.3 to the mean a bit, and say that the effect is really more like 2.0 points. In 2013-14, about 38 percent of the Penguins' 865 (road, even-strength) shots came with Crosby on the ice. That means that the Crosby shifts raised the team's overall road SH% by about 0.76 percentage points. 

That's not dilute at all. Looking at the overall team 5-on-5 road shooting percentages, 0.76 points would move an average team up or down about 8 positions in the rankings.

----

Based on all this, I think it would be very, very difficult to continue arguing that team shooting percentage is just random.

Admittedly, that still doesn't mean it's important. Because, even if it's not just random, how is it that all these hockey sabermetric studies have found them so ineffective in projecting future performance? 

The simple answer, I think, is: weak signal, lots of noise. 

I have some ideas about the details, and will try to get them straight in my mind for next a future post. Those details, I think, might help explain the Leafs -- which is one of the issues that got this debate started in the first place.



Labels: , , ,

Tuesday, December 09, 2014

Corsi vs. Tango

Tom Tango is questioning the conventional sabermetric wisdom on Corsi. In a few recent posts, he presents evidence that Corsi can be improved upon as a predictor of future NHL team performance. Specifically: goals are important, too.

That has always seemed reasonable to me. In fact, it seems so reasonable that I wonder why it's disputed. But it is. 

Goals is just shots multiplied by shooting percentage (SH%). The consensus among the hockey research community is that their studies show SH% is not a skill that carries over from year to year. And, therefore, goals can't matter once you have shots.

I've been disputing that for a while now -- at least seven posts worth (here's the first). But I've been doing it from argument. Tango jumps right to the equations. He split seasons in half randomly, and ran a regression to try to predict one half from the other half. Goals proved to be very significant. In fact, when you try to predict, you have to weight goals *four times as heavily* as Corsis. (Five times as heavily as unsuccessful Corsis.)

In a tongue-in-cheek jab at statistics named after their inventors, he called that new statistic the "Tango." 

Despite Tango's regression results, the hockey analysts who commented still disagree. I'm still surprised by that ... the hockey sabermetrics community are pretty smart guys, very good at what they do, and a lot of them have been hired by NHL teams. I've had times when I've wondered if I'm missing something ... I mean, when it's 1 dabbler against 20 analysts who do this every day, it's probably the 20 who are right. Well, now it's two against the world instead of one ... and the second is Tango, which makes me a little more confident. 

Also ... Tango jumps right to the actual question, and proves goals significantly improve the prediction. That's hard to argue with; at least, it's harder to argue with than what I'm doing, which is attacking the assumption that shooting percentage is all random. 

You can see one response here, and Tango's reply here.  

Tango got his data from war-on-ice.com, who agreed to make it available to all (thank you!!!). I was planning to do some work with the data myself, but ... I guess Tango and I think about things from different angles, because, the more I thought about it, the more of my "usual" arguments I thought of, the less direct ones. So, there'll be another post coming soon. I'll play with the data when my thoughts wind up somewhere that I need to look at it.

For this post, a few of my observations from Tango's posts and the discussion that followed.

-----

In one of his posts, Tango wrote,


"One of the first things that (many) people did was to run a correlation of Corsi v Tango, come up with an r=.98 or some number close to 1 and then conclude: “see, it adds almost nothing”. If only that were true. "

Tango is absolutely right (you should read the whole thing). It's just another case of jumping to conclusions from a high or low correlation coefficient.

Sabermetrics is pretty good at figuring out good and bad. It has to be -- I mean, even fans and sportswriters are pretty good at it, and the whole point of sabermetrics is to do better. We're already in "high correlation" territory, able to separate the good teams and players from the bad teams and players pretty easily. 

Find a 10-year-old kid who's a serious sports fan -- any sport -- and get him to rank the players from best to worst -- whether by formula, or by raw statistics. Then, find the best sabermetric statistic you can, and rank the players that way.

I bet the correlation would be over 0.9. Just by gut.

We're already well into the .9s, when it comes to understanding hockey. Any improvements are going to be marginal, at least if you measure them by correlation. And so, it follows that *of course* Tango and Corsi are going to correlate highly. 

Also, and as Tom again points out, if Corsi already has a high correlation with something, at first glance, Tango can appear to increase it only slightly. If you start with, say, 0.92, and Tango improves it to 0.93 ... well, that doesn't look like much, intuitively. But it IS much. If you look at it another way -- still intuitively -- it was 8% uncorrelated before, and now it's only 7% uncorrelated. You've improved your predictive ability by 12%!

The point is, you have to think about what the numbers actually mean, instead of just having it click "93 isn't much bigger than 92, so who cares?"

Tom illustrates the point by noting that, even though Tango and Corsi appear to be highly correlated to each other, Tango improves a related correlation from .44 to .50. There must be some significant differences there.

-----

There's a more important argument, though, than to not underestimate how much better .93 is than .92. And that argument is: *it's not about the correlation*. 

Yes, it's nice to have a higher and higher r-squared, and be able to reduce your error more and more. But it's not really error reduction we're after. It's *knowledge*. 

It's quite possible that a model that gives you a low correlation actually gives you MORE of an understanding of hockey than a model that gives you a high correlation. Here's an easy example: the correlation between points in the standings, and whether or not you make the playoffs, is very high, close to 1.0. The correlation between your Corsi and whether or not you make the playoffs is lower, maybe 0.7 or something (depending how you do it -- which is another reason not to rely on the correlation alone). 

Which tells you more about hockey that you didn't already know? Obviously, it's the Corsi one. Everyone already knows that points determine whether you qualify for the playoffs. When you find out that shots seem to be important, that's new knowledge -- the knowledge that outshooting your opponents means something. (Of course, *what* it means is something you have to figure out yourself -- correlation doesn't tell you what type of relationship you've found.)

And that's what's going on here. Corsi has a high correlation with future winning, but Corsi *and goals* has an even higher correlation (to a significant extent). What does that tell us? 

Goals matter, not just shots. 

That's an important finding!  You can't dismiss it just because the predictions don't improve that much. If you do, you're missing the point completely. 

You wouldn't do that in other aspects of life, would you? Those faulty airbags in the news recently, the ones that kill people with shrapnel ... those constitute a small, small fraction of collision deaths. If you looked only at predicting fatalities, knowing about those airbags is a rounding error. 

But the point is not just to predict fatality rates!  Well, not for most of us. If you're an insurance company, then, sure, maybe it doesn't make that much difference to you, a couple of cents on each policy you write. But that doesn't mean the information isn't important. It's just important for different questions. Like, how can we reduce fatalities? We can reduce fatalities by replacing the defective air bags!

Also, the information means it is false to state that faulty airbags don't matter. You can still argue that they don't matter MUCH, relative to the overall total of collision deaths; that might be true. But for that, you can't argue from correlation coefficients. You have to argue from ... well, you can use the regression equation. You can say, "only 1 person in a million dies from the airbag, but 1000 in a million die from other causes."

In this case, Tango found that a goal matters four times as much as a shot. If, roughly speaking, there are 12 shots for every goal, then every 12 shots, you get 12 "points" of predictive value from the shots, and 4 "points" of predictive value from the goals. 

The ratio isn't 1000:1 like for the airbags. It's 3:1. How can you dismiss that? Not only is it important knowledge about hockey, but the magnitude of the effect is really going to affect your predictions.

-----

Why does the conventional wisdom dispute the relevance of goals? Because the consensus is that shooting percentage is random -- just like clutch hitting in baseball is random.

Why do they think that? Because the year-to-year team correlation for shooting percentage is very low.

I think it's the "low number means no effect" fallacy. Here are two studies I Googled. This one found an r-squared of .015, and here's another at .03. 

If you take the square root of those r-squareds, to give you correlations, you get 0.12 and 0.17, respectively.

Those aren't that small. Especially if you take into account how much randomness there is in shooting percentage. A signal of 0.17 in all that noise might be pretty significant.

-----

It's well-known that both Corsi and shooting percentage change with the score of the game. When you're up by one goal, your SH% goes up by almost a whole percentage point -- and your Corsi goes down by four points. When you're up two or more, the differences are even bigger. 

That's probably because when teams are ahead, they play more defensively. Their opponents, who are trailing, play more aggressively -- they press in the offensive zone more, and get more shots. 

So, teams in the lead see their shots go down. But their expected SH% goes up, because they get a lot of good scoring chances when the opposition takes more chances -- more breakaways, odd-man rushes, and so on.

It occurred to me that these score effects could, in theory, explain Tango's finding. 

Suppose Team A has 30 Corsis and 5 goals in a game, and Team B has 30 Corsis and no goals. 

Even if shooting percentage is indeed completely random, team A is probably better than team B. Why? Because, with 5 goals, team B probably had a lead most of the game. If it had 30 Corsis despite leading, it must be a much better Corsi team to overcome the “handicap” of shooting so much when it’s ahead. So, when it’s behind, it’ll probably kick the other team’s butt.

I don't think Tango's finding is *all* score effects. And, even if it were, all that would mean is that if you didn't explicitly control for score, "Tango" would be a more accurate statistic than Corsi. And most of the hockey studies I've seen *don't* control for score.

----

Here's one empirical result that might help -- or maybe won't help at all, I'm actually not sure. (Tell me what you think.)

My hypothesis has been that some teams have better shooting percentages, along with lower Corsis, because they choose to set up for higher quality shots. Instead of taking a 5% shot from the point, they take a 50/50 chance on setting up a 10% shot from closer in. 

As I have written, I think the evidence shows some support for that hypothesis. In 5-on-5 tied situations, there's a negative correlation between Corsi rate and SH%. In 2013-14 (raw stats here), it was -0.16. In the shortened 2012-13 season, it was -0.04. In 2011-12, -0.14. 

Translating the -0.14 into hockey: for every additional goal due to shot quality, teams lowered their Corsi by 2.1 shots.

That's a tradeoff of around 2 Corsis per goal. Tango found 4 Corsis per goal. Does that mean that two of Tango's four Corsis come from shot quality, and the other two come from score effects?

Not sure. There's probably too much randomness in there anyway to be confident, and I'm not completely sure that they're measuring the same thing. But, there it is, and I'll think about it some more.

-----

UPDATE, 30 minutes later: Colby Cosh pointed out, on Twitter, that Tango's regression used Corsi and goal *differential* -- offense minus defense.  That means the "goals against" factor is partially a function of save percentage, which partially reflects goalie talent, which, of course, carries over into the future. So, goalie talent absolutely has to be part of the explanation of the "goals" term.


So now we have two factors: score effects and goalie effects. Could that fully explain the predictive value of goals, without resort to shot quality?  I'll have to think about the actual numbers, whether the magnitudes could be high enough to cover the full "4 shots" coefficient.








Labels: , , , , ,

Tuesday, November 04, 2014

Corsi, shot quality, and the Toronto Maple Leafs, part VII

In previous posts, I've argued that when it comes to shots, NHL teams might differ in how they choose to trade quantity for quality. That might partly explain why the Toronto Maple Leafs, for the past few seasons now, have had ugly-looking shot stats, but with an above-average shooting percentage.

Skeptics argue that team shooting percentage (SH%) doesn't seem to have predictive value from season to season, which suggests it's luck rather than skill or strategy. But, at the same time, Corsi for teams seems to have a negative correlation to SH%, which is one piece of evidence that shot quality strategy might be a real issue.

Anyway, read the previous six posts for that argument. This is just an anecdote.

It comes from a piece by James Mirtle, the Maple Leafs beat writer for the Globe and Mail. Mirtle notes that the Toronto coaching staff has directed Morgan Rielly to increase his shot attempts:


[In the October 28 game vs. Buffalo,] Rielly rang up two assists – including a beauty cross-crease pass on James van Riemsdyk’s goal – and was all over the puck generally, generating nine shot attempts.

That propensity to shoot has been Rielly’s biggest shift from a year ago. The coaches want him putting more pucks on the net, and he has responded in dramatic fashion, with 2.8 shots a game compared to 1.3 in his rookie year despite similar ice time.

Even more impressively, Rielly leads all NHL defencemen in generating shot attempts, with 21.6 per 60 minutes at even strength, meaning he’s getting a look at the net roughly every 2.5 minutes he’s on the ice.

He’s winding up more frequently than not only every Leafs defenceman but every Leaf, including shot demon Phil Kessel, something that’s helping drive Toronto to respectable totals on the shot clock most nights.

Entering [the October 31] game against the injury-plagued Blue Jackets in Columbus, the Leafs have been outshot, but only by one: 281-280.

"I told myself this year that I would shoot more," Rielly said. 

Well, isn't that exactly the kind of thing Corsi skeptics should be looking for? It's evidence that coaching decisions can affect shot quantity and quality -- in other words, Corsi and SH%.

It's a small sample size -- the Leafs had played only nine games when Mirtle's piece came out -- but let's see what happens if we take Rielly's numbers at face value and make a few estimates.

Assume Rielly gets 20 minutes of ice time per game. If 80 percent of that is at even strength. it's 16 minutes at 5-on-5. Let's call it 15 to make the calculations easier.

Since he's generating 21.6 even-strength Corsis per 60 minutes, that's 5.4 even-strength Corsis per 15-minute game.*  

*I'm assuming that the "shot attempts" in the article refers to Corsi. If it refers to Fenwick, the effect is even larger than what I'm about to calculate, because the denominators are smaller (since Fenwick leaves out blocked shots).

Rielly's shots roughly doubled since last year, so let's assume his Corsis doubled too. That means his increase from last year must be about 2.7 Corsis per game. 

Last year, those extra 2.7 Rielly shot attempts would have been passes or stickhandles. Assuming half those attempts would have eventually resulted in shots by other players, the increase due to Rielly's shooting is down to 1.3. 

How significant is 1.3 Corsis per game?  In 2013-14, the Leafs were out-Corsied 4,342 to 3,259 at even strength, giving them a league-worst 42.9 Corsi percentage. If you add in 107 Corsis to the "for" side (1.3 times 82 games), it's now 4,342 to 3,366. That would bump Toronto to 43.7 percent. Now, only second worst.

It's not huge, but it's something that would indeed show up in the stats. And, according to Mirtle, it's something that's due to a deliberate coaching decision. 

How big would the effect be if the coaches decided everyone should shoot more, instead of just one defenseman whose minutes comprise only about six percent of total player ice-time?

-------

Also, you would think those extra shots would have to result in a reduction in shooting percentage, right?  Last year, when Rielly wasn't shooting as much, it was probably because he thought he could set up a better quality shot some other way. And, I would assume, Rielly's shots are taken farther from the net than average, since defensemen usually play the point. 

You could come up with a scenario where shot quality wouldn't drop ... maybe shots from the point lead to a lot of juicy rebounds, so long shots lead to a certain number of extra dangerous shots. Sure, that's possible. But I doubt if that effect, or any other, would make up the quality difference completely. If there were *never* a tradeoff between quantity and quality, every team would be shooting all the time. So, there must be some level of "dangerousness" above which a point shot is a good idea, and below which a pass is better. For shot quality would stay the same when Rielly shoots more, all his new shots would have to come in situations where not only was the shot the best move, but the shot was SO dangerous from the point that it would even be higher quality than the best alternative from closer in.

That's unlikely to be happening if Rielly now leads the league in shot attempts by defensemen. There aren't that many ultra-super-dangerous shot opportunities, never mind ultra-super-dangerous shot opportunities that Rielly wouldn't have taken advantage of last year.

-----

As I write this, it's only eleven Leaf games into the season, which is a very small sample size.  But I checked anyway.  (Here's a YTD link that may be outdated if you're reading this later than today.)

In those 11 games, the Leafs have an above-average Corsi at 50.9% in 5-on-5 tied situations. But they've scored only 5 goals in 121 shots. That's a shooting percentage of 4.13%, dead last in the league.  

There's not enough data for that to really be meaningful, but it's interesting nonetheless.




(There are seven parts. Part VI was previousThis is Part VII.)

Labels: , , , ,

Tuesday, October 14, 2014

Corsi, shot quality, and the Toronto Maple Leafs, part VI

A year ago, I wrote about how I wasn't completely sold on Corsi and Fenwick as unbiased indicators of future NHL success. In a series of five posts (one two three four five), I argued that it did appear that "shot quality" issues could be a big factor -- if not for all teams, then maybe at least for some of them, like, perhaps, the Toronto Maple Leafs.

I haven't kept up with hockey sabermetrics as much as I should have, but, as far as I know, the issue of how much shot quality impacts Corsi remains unresolved.

In that light, and in hopes that I haven't rediscovered the wheel, here's some more evidence I came across that seems to suggest shot quality might be a bigger issue than even I had suspected.

It's from a post at Hockey-Graphs, where Garret Hohl looked at some shot quality statistics for every NHL team, for approximately the first 30 road games of last season (2013-14). 

His data came from Greg Sinclair's "Super Shot Search," which plots every shot on goal by plotting it on the ice surface. Sinclair's site allows you restrict your search to what he calls "scoring chances," which are shots taken from closer in. Specifically, a "scoring chance" is defined as a shot on goal taken from within the pentagon formed by the midpoint of the goal line, the faceoff dots, and the tops of the two circles. 

Hohl calculated, for every team, what percentage of opposing shots were close-in shots. (He limited the count to 5-on-5 situations in road games, in order to reduce power-play and home-scorer biases.)  Data in hand, he then ran a regression to see how well a team's "regular" Fenwick corresponded to its "scoring chances only" Fenwick. His chart shows what appears to be a strong relationship, with a correlation of 0.83. 

However: the biggest outlier was ... Toronto. 

Just as in the previous two seasons, the Leafs continued to outperform their Fenwick in 2013-14. What Hohl has done is to produce some data that shows that the effect resulted, at least in part, by their opposition taking lower quality shots. 

----

Anyway, the Leafs are really just a side point. What struck me as much more important are some of the other implications of the data Hohl unearthed. Specifically, how teams varied so much in those opponent scoring chances. The differences were much, much larger than I expected.

I'll steal Hohl's chart:




The Minnesota Wild defense was the best at limiting their opponents to weaker shots: only 32.3 percent of their shots allowed were from in close (206 of 637). The New York Islanders were the worst, at 61.4 percent (475 of 773). 

Shot for shot, the Islanders gave up twice as many close-in chances as the Wild. 

Could this be luck?  No way. The average number of shots in Hohl's table is around 750. If the average scoring-chance ratio is 44 percent, the SD from binomial luck should be around 1.8 percentage points. That would put the Islanders around 9 SD from the mean, and the Wild 7 SD from the mean. 

The observed SD in the chart is 5.6 percentage points. That means the breakdown is:

1.8 SD of theoretical luck
5.3 SD of real differences
--------------------------
5.6 SD as observed

Now, the "real" differences might be score effects: shooting percentages rise when a team is ahead, presumably because they take more chances and give up more odd-man rushes, and such. Those effects are large enough that they screw up a lot of analyses, and I wish more of those little studies you find on the web would limit themselves to 5-on-5 tied to avoid those biases.

But, in this case, the differences are too big to just be caused by score effects.

In 5-on-5 situations from 2007-2013, the league shooting percentage was 7.52 percent when teams were tied, but 9.19 percent for teams ahead by 2 goals or more. As big an difference as that is, it can't be that the Islanders were behind 2+ goals that much that it could make such a huge difference in scoring chances.

From my calculations, the difference between the Islanders and Wild is something that would happen naturally only if the Islanders were *always* down 2+ goals, and the Wild were *always* up 2+ goals.** But that obviously isn't the case. In fact, the Islanders were down 2+ goals only about 10 percent more often than the Wild last year, and up 2+ goals only 21 percent less often. The total of the two differences is about eight periods total out of a full 5-on-5 road season.

(** How did I figure that?  Suppose the shooting percentage on close shots is 13%, and 4% on far shots. At 45 percent close and 55 percent far, you get a shooting percentage of 8.1% percent. At 65 percent close, and 35 percent far, shooting percentage rises to 9.8%. That's a little bigger than the difference between up 2+ and tied.

So, it seems like, when you're up 2+ goals, 60 to 65 percent of your shots are scoring chances, compared to 35 to 40 percent when you're down 2+ goals.)

------

As for the Leafs: they were fourth-best in the league in percentage of shots that were scoring chances, at 38.2%. That's despite -- or because of? -- allowing the most shots, by far, of any team in the sample, at 926. (The second highest was Washington, at 843.)

It seems to me like this is significant evidence that teams vary in the quality of shots they allow -- in a huge way. The score effects can't be THAT large.

The only possibility that I can think of is biased scorers. But Hohl confirms that each team had an assortment of opposition home team scorers and rinks, so that shouldn't be happening.

-----

Here's some additional evidence that the scoring chance data is meaningful. 

I ran a correlation between team scoring chance percentage and goalie save percentage. If scoring chance percentage didn't matter, the correlation would be low. If it did matter, it would be high. (For save percentage, I used 5-on-5, tie score, both home and road.)

The correlation turned out to be ... -0.44. That's pretty high. (Especially considering that the scoring chance percentage was based on only 30 road games per team.)  

The SD of save percentage was 0.96 percentage points. The SD of scoring chance percentage (after 3/4 of the season) was 5.6 points. 

That means for every excess percentage point of scoring chance percentage, you have to adjust save percentage by 0.075 percentage points. 

The Los Angeles Kings gave up a bit more than 3 percentage points weaker shots than normal. That had the effect of inflating their goalies' save percentage by about 0.25 points. So, we can estimate that their "true talent" was closer to 93.45 than 93.7. 

If you like, think of it as two or three points of PDO: the Kings move from 1000 to 997.5 on this adjustment. 

For Toronto, it's five points: they drop from 1019 to 1014. 

The Rangers, for one more example, went the other way -- they gave their opponents 8 percentage points more close-in shots than average. Adjusting for that would boost their adjusted save percentage from 91.6 to 92.2, and their PDO from 974 to 980.

-----

OK, one more bit of evidence, this time subjective.

Recently, a survey from nhl.com ranked the best goalies in the league, from 1 to 14, with 15-18 mentioned in the footnotes. (I'm leaving out John Gibson, who only played one regular-season game, and I'm considering goalies not mentioned to have a ranking of 19.)

I checked the correlation between team goalie ranking and save percentage. It was -0.45. Again, that's pretty strong, considering how subjective the rankings are. 

Of course, some goalies were probably ranked high *because* of their the save percentage. So cause and effect are partly mixed up here (but I think that will actually strengthen this argument).

For the next step, I adjusted each goalie's save percentages to give credit for the quality of the shots their team faced. That is, I raised or lowered their SV% for the shot quality percentages listed in Hohl's post, at the rate of 0.075 points we discovered earlier. 

What happened?  The correlation between ranking and SV% got *stronger* -- moving from -0.45 to -0.50. 

It looks like the voters "saw through" the illusion in save percentage caused by differing shot quality. Well, that might be giving them too much credit: they might have ignored save percentage entirely, and just concentrated on what they saw with their eyes. Actually, I'm probably giving them too *little* credit: they're no doubt basing their evaluations on a full career, not just one season, and maybe team shot quality evens out somewhat in the long run.

Either way, when the voters differed from SV%, it was in the direction of the goalies who faced tougher tasks.  I think that's reasonable evidence that differences in shot quality are real. 

Oh, and one more thing: the highest correlation seems to occur almost exactly at the theoretical adjustment the regression picked out, 0.075. When I drop the adjustment in half (to 0.0375), the correlation drops a bit (-0.48, I think). When I double the adjustment to 0.15, the correlation drops to -0.44. 

Now, that *has* to be coincidence; the voters can't be that well calibrated, can they? And ranking numbers of 1 to 19 are kind of arbitrary.

Still, it does work out nicely, that the voters do seem to agree with the regression.

------

I think all this casts serious doubt on the idea that PDO (the sum of team shooting percentage and save percentage) is essentially random. The Islanders had a league-worst PDO of 982, but that's probably because their opponents took 61.4% of their shots from close-in, compared to the Islanders' own 42.8%. In other words, if you calculate a "shot quality PDO", the Islanders come in at 814. (That's calculated as 428 + (1000-614).)

The Leafs had the league's fourth best PDO, at 1019. But their shots were much higher quality than their opponents', 47.2% to 38.2%. So their "shot quality PDO" was 1090. 

For all 30 teams, the correlation between PDO and "shot quality PDO" was 0.43 -- signficantly high. The coefficient works out to approximately a 1:10 ratio. The Islanders' -186 point "shot quality PDO" difference translates to around -19 points of PDO. The Leafs' +90 works out to about +9.

I'll show data and work out more details in a future post (probably next week, I'm out of town for a few days starting tomorrow). 

(One thing that's interesting, that I want to look into, is that the SD of team quality shot percentage *for* is only about half of the SD of quality shot percentage *against* (2.7 versus 5.6). Does that mean that defenses vary more than offenses? Hmmm...)

------

So I think all of this comprises strong evidence that teams differ non-randomly in the quality of shots they allow. That doesn't invalidate the hypothesis that Corsi is still a better predictor of future success than goals scored. But it *does* suggest that you can likely improve Corsi by adjusting it for shot quality. And it *does* suggest that PDO isn't random after all.

In other words: Corsi might be misleading for teams with extreme shot quality differences.

A baseball analogy: using Corsi to evaluate NHL teams is like using on-base percentage average to evaluate MLB teams. Some baseball teams will do much better or worse than their "OBP Corsi", for non-random reasons -- specifically, if they have high "hit quality" by hitting lots of home runs, or low "hit quality" by building their "OBP Corsi" on "lower quality" walks.

In 2014, the Orioles were fifth-worst in the American League with an OBP of only .311. But they were above average in runs scored. Why?  Mostly because they hit more home runs than any other team, by a wide margin.

Might the Toronto Maple Leafs be the Baltimore Orioles of the NHL?



(There are seven parts. Part V was previousThis is Part VI.  Part VII is next.)


Labels: , , , , ,