Tuesday, October 14, 2014

Corsi, shot quality, and the Toronto Maple Leafs, part VI

A year ago, I wrote about how I wasn't completely sold on Corsi and Fenwick as unbiased indicators of future NHL success. In a series of five posts (one two three four five), I argued that it did appear that "shot quality" issues could be a big factor -- if not for all teams, then maybe at least for some of them, like, perhaps, the Toronto Maple Leafs.

I haven't kept up with hockey sabermetrics as much as I should have, but, as far as I know, the issue of how much shot quality impacts Corsi remains unresolved.

In that light, and in hopes that I haven't rediscovered the wheel, here's some more evidence I came across that seems to suggest shot quality might be a bigger issue than even I had suspected.

It's from a post at Hockey-Graphs, where Garret Hohl looked at some shot quality statistics for every NHL team, for approximately the first 30 road games of last season (2013-14). 

His data came from Greg Sinclair's "Super Shot Search," which plots every shot on goal by plotting it on the ice surface. Sinclair's site allows you restrict your search to what he calls "scoring chances," which are shots taken from closer in. Specifically, a "scoring chance" is defined as a shot on goal taken from within the pentagon formed by the midpoint of the goal line, the faceoff dots, and the tops of the two circles. 

Hohl calculated, for every team, what percentage of opposing shots were close-in shots. (He limited the count to 5-on-5 situations in road games, in order to reduce power-play and home-scorer biases.)  Data in hand, he then ran a regression to see how well a team's "regular" Fenwick corresponded to its "scoring chances only" Fenwick. His chart shows what appears to be a strong relationship, with a correlation of 0.83. 

However: the biggest outlier was ... Toronto. 

Just as in the previous two seasons, the Leafs continued to outperform their Fenwick in 2013-14. What Hohl has done is to produce some data that shows that the effect resulted, at least in part, by their opposition taking lower quality shots. 


Anyway, the Leafs are really just a side point. What struck me as much more important are some of the other implications of the data Hohl unearthed. Specifically, how teams varied so much in those opponent scoring chances. The differences were much, much larger than I expected.

I'll steal Hohl's chart:

The Minnesota Wild defense was the best at limiting their opponents to weaker shots: only 32.3 percent of their shots allowed were from in close (206 of 637). The New York Islanders were the worst, at 61.4 percent (475 of 773). 

Shot for shot, the Islanders gave up twice as many close-in chances as the Wild. 

Could this be luck?  No way. The average number of shots in Hohl's table is around 750. If the average scoring-chance ratio is 44 percent, the SD from binomial luck should be around 1.8 percentage points. That would put the Islanders around 9 SD from the mean, and the Wild 7 SD from the mean. 

The observed SD in the chart is 5.6 percentage points. That means the breakdown is:

1.8 SD of theoretical luck
5.3 SD of real differences
5.6 SD as observed

Now, the "real" differences might be score effects: shooting percentages rise when a team is ahead, presumably because they take more chances and give up more odd-man rushes, and such. Those effects are large enough that they screw up a lot of analyses, and I wish more of those little studies you find on the web would limit themselves to 5-on-5 tied to avoid those biases.

But, in this case, the differences are too big to just be caused by score effects.

In 5-on-5 situations from 2007-2013, the league shooting percentage was 7.52 percent when teams were tied, but 9.19 percent for teams ahead by 2 goals or more. As big an difference as that is, it can't be that the Islanders were behind 2+ goals that much that it could make such a huge difference in scoring chances.

From my calculations, the difference between the Islanders and Wild is something that would happen naturally only if the Islanders were *always* down 2+ goals, and the Wild were *always* up 2+ goals.** But that obviously isn't the case. In fact, the Islanders were down 2+ goals only about 10 percent more often than the Wild last year, and up 2+ goals only 21 percent less often. The total of the two differences is about eight periods total out of a full 5-on-5 road season.

(** How did I figure that?  Suppose the shooting percentage on close shots is 13%, and 4% on far shots. At 45 percent close and 55 percent far, you get a shooting percentage of 8.1% percent. At 65 percent close, and 35 percent far, shooting percentage rises to 9.8%. That's a little bigger than the difference between up 2+ and tied.

So, it seems like, when you're up 2+ goals, 60 to 65 percent of your shots are scoring chances, compared to 35 to 40 percent when you're down 2+ goals.)


As for the Leafs: they were fourth-best in the league in percentage of shots that were scoring chances, at 38.2%. That's despite -- or because of? -- allowing the most shots, by far, of any team in the sample, at 926. (The second highest was Washington, at 843.)

It seems to me like this is significant evidence that teams vary in the quality of shots they allow -- in a huge way. The score effects can't be THAT large.

The only possibility that I can think of is biased scorers. But Hohl confirms that each team had an assortment of opposition home team scorers and rinks, so that shouldn't be happening.


Here's some additional evidence that the scoring chance data is meaningful. 

I ran a correlation between team scoring chance percentage and goalie save percentage. If scoring chance percentage didn't matter, the correlation would be low. If it did matter, it would be high. (For save percentage, I used 5-on-5, tie score, both home and road.)

The correlation turned out to be ... -0.44. That's pretty high. (Especially considering that the scoring chance percentage was based on only 30 road games per team.)  

The SD of save percentage was 0.96 percentage points. The SD of scoring chance percentage (after 3/4 of the season) was 5.6 points. 

That means for every excess percentage point of scoring chance percentage, you have to adjust save percentage by 0.075 percentage points. 

The Los Angeles Kings gave up a bit more than 3 percentage points weaker shots than normal. That had the effect of inflating their goalies' save percentage by about 0.25 points. So, we can estimate that their "true talent" was closer to 93.45 than 93.7. 

If you like, think of it as two or three points of PDO: the Kings move from 1000 to 997.5 on this adjustment. 

For Toronto, it's five points: they drop from 1019 to 1014. 

The Rangers, for one more example, went the other way -- they gave their opponents 8 percentage points more close-in shots than average. Adjusting for that would boost their adjusted save percentage from 91.6 to 92.2, and their PDO from 974 to 980.


OK, one more bit of evidence, this time subjective.

Recently, a survey from nhl.com ranked the best goalies in the league, from 1 to 14, with 15-18 mentioned in the footnotes. (I'm leaving out John Gibson, who only played one regular-season game, and I'm considering goalies not mentioned to have a ranking of 19.)

I checked the correlation between team goalie ranking and save percentage. It was -0.45. Again, that's pretty strong, considering how subjective the rankings are. 

Of course, some goalies were probably ranked high *because* of their the save percentage. So cause and effect are partly mixed up here (but I think that will actually strengthen this argument).

For the next step, I adjusted each goalie's save percentages to give credit for the quality of the shots their team faced. That is, I raised or lowered their SV% for the shot quality percentages listed in Hohl's post, at the rate of 0.075 points we discovered earlier. 

What happened?  The correlation between ranking and SV% got *stronger* -- moving from -0.45 to -0.50. 

It looks like the voters "saw through" the illusion in save percentage caused by differing shot quality. Well, that might be giving them too much credit: they might have ignored save percentage entirely, and just concentrated on what they saw with their eyes. Actually, I'm probably giving them too *little* credit: they're no doubt basing their evaluations on a full career, not just one season, and maybe team shot quality evens out somewhat in the long run.

Either way, when the voters differed from SV%, it was in the direction of the goalies who faced tougher tasks.  I think that's reasonable evidence that differences in shot quality are real. 

Oh, and one more thing: the highest correlation seems to occur almost exactly at the theoretical adjustment the regression picked out, 0.075. When I drop the adjustment in half (to 0.0375), the correlation drops a bit (-0.48, I think). When I double the adjustment to 0.15, the correlation drops to -0.44. 

Now, that *has* to be coincidence; the voters can't be that well calibrated, can they? And ranking numbers of 1 to 19 are kind of arbitrary.

Still, it does work out nicely, that the voters do seem to agree with the regression.


I think all this casts serious doubt on the idea that PDO (the sum of team shooting percentage and save percentage) is essentially random. The Islanders had a league-worst PDO of 982, but that's probably because their opponents took 61.4% of their shots from close-in, compared to the Islanders' own 42.8%. In other words, if you calculate a "shot quality PDO", the Islanders come in at 814. (That's calculated as 428 + (1000-614).)

The Leafs had the league's fourth best PDO, at 1019. But their shots were much higher quality than their opponents', 47.2% to 38.2%. So their "shot quality PDO" was 1090. 

For all 30 teams, the correlation between PDO and "shot quality PDO" was 0.43 -- signficantly high. The coefficient works out to approximately a 1:10 ratio. The Islanders' -186 point "shot quality PDO" difference translates to around -19 points of PDO. The Leafs' +90 works out to about +9.

I'll show data and work out more details in a future post (probably next week, I'm out of town for a few days starting tomorrow). 

(One thing that's interesting, that I want to look into, is that the SD of team quality shot percentage *for* is only about half of the SD of quality shot percentage *against* (2.7 versus 5.6). Does that mean that defenses vary more than offenses? Hmmm...)


So I think all of this comprises strong evidence that teams differ non-randomly in the quality of shots they allow. That doesn't invalidate the hypothesis that Corsi is still a better predictor of future success than goals scored. But it *does* suggest that you can likely improve Corsi by adjusting it for shot quality. And it *does* suggest that PDO isn't random after all.

In other words: Corsi might be misleading for teams with extreme shot quality differences.

A baseball analogy: using Corsi to evaluate NHL teams is like using on-base percentage average to evaluate MLB teams. Some baseball teams will do much better or worse than their "OBP Corsi", for non-random reasons -- specifically, if they have high "hit quality" by hitting lots of home runs, or low "hit quality" by building their "OBP Corsi" on "lower quality" walks.

In 2014, the Orioles were fifth-worst in the American League with an OBP of only .311. But they were above average in runs scored. Why?  Mostly because they hit more home runs than any other team, by a wide margin.

Might the Toronto Maple Leafs be the Baltimore Orioles of the NHL?

(There are seven parts. Part V was previousThis is Part VI.  Part VII is next.)

Labels: , , , , ,


At Tuesday, October 14, 2014 1:41:00 PM, Blogger Phil Birnbaum said...

1. As always, I may have screwed up any manner of logic or calculations here. Let me know if I've done something wrong.

2. As before, I am NOT saying the Leafs are a good team, just that they might be a better team than Corsi suggests.

3. And I'm saying they MIGHT be. The evidence is suggestive, but not strong enough to say anything definite. Well, I think there's enough evidence to say that Corsi alone isn't enough. I'd say if you had to choose "the Leafs' discrepancies from Corsi are luck" or "the Leafs' discrepancies from Corsi are something real in addition to luck," the evidence favors the latter. But, it's still an open question, as far as I'm concerned.

At Tuesday, October 14, 2014 2:03:00 PM, Blogger Phil Birnbaum said...

Also, here's a post


that seems to verify that the Leafs' defense gives up lower-value shots. It concludes that they give up MORE shots because of it, but that's exactly what I think is happening.

The blogger does believe that shot quality matters and the Leafs are doing something different, but concentrates on showing that it doesn't help the Leafs win. Which is not something I disagree with. I think the Leafs are allowing more and weaker shots, but I have no idea if the tradeoff is worth it.

Regardless, "more and weaker" is almost the exact definition of "outperforms Corsi."

At Tuesday, October 14, 2014 11:33:00 PM, Blogger dtm said...

Great post.

"Does that mean that defenses vary more than offenses? "

I wish I knew the source but I am pretty sure it has been shown that GA varies much more season to season than GF on the player level

At Thursday, December 11, 2014 3:49:00 PM, Blogger SkinnyFish said...

I'm the guy who wrote the article on PPP that you reference.

The post says that while the Leafs % of shots close in is lower than most of the league, their raw #s, the more important information there, are not. They're terrible.

This post is fraught with constantly changing between raw #s and %s and not understanding the important differences between them.

At Thursday, December 11, 2014 4:10:00 PM, Blogger Phil Birnbaum said...

Right! The Leafs gave up a lot of shots (Corsis), but relatively weak ones. I'd argue that you need BOTH pieces of information to adequately evaluate them.

Your conclusion supports that! It says the Leafs' system ADDED weak shots. That makes the opponents' Corsi look better, while making their SH% also look worse -- which is exactly my point.


Post a Comment

<< Home