## Thursday, November 14, 2013

### Corsi, shot quality, and the Toronto Maple Leafs, part V

In the past four posts, I speculated that NHL teams may vary in shooting percentage partly because they take different quality shots.  I also speculated that, maybe, their shots vary in quality just randomly.

But I hadn't checked whether the numbers would work out consistent with plausible limits on how teams might vary.  So, to check for reasonableness, I created a simulation, and played around with the specifics until I got something that looked reasonably like the 2012-13 season.

The simulation worked like this.  Teams vary in "possession" talent, because some teams are simply better than others.  Their talent is the number of times they get the puck into the opponent's zone for a decent chance to shoot.

Once they get into the zone, they all behave identically.  Sometimes, the shoot right away.  Most of the time, though, they move the puck around trying to get a higher-quality shot.

When they enter the zone, they have a 3.2 percent shooting percentage if they shoot right away.  But, they first decide if they want to shoot or pass.  Randomly, 17 percent of the time, they shoot.  The other 83 percent, they pass.  If they pass, it's successful 70 percent of the time; the other 30 percent, they lose the possession.  If the pass does work, the shot quality improves by the inverse of 70 percent; that is, it goes from 3.2 percent to 4.57 percent (since 4.57 percent is 3.2 divided by .7).

Actually, I set it up so that you keep deciding whether to shoot or pass -- shooting 17 percent of the time, and passing 83 percent.  You could wind up passing 2, 3, 4 or more times before shooting.  I limited it to 7 passes, then you always shoot.  (After 7 successful passes, your shooting percentage is 38.9%.)

No matter how many times you try to pass before shooting, your expectation of scoring is the same: 3.2 percent.  If you shoot every time, you score 32 goals per 1000 possessions.  If you pass and then shoot, you wind up with only 700 shots in those same 1000 possessions, but you still score 32 goals on those shots (since your shooting percentage has improved to 4.57%).   And so on.

It turns out that, under this model, teams will shoot around 419 times out of 1000.  The rest of the times, they'll lose possession while moving the puck around.

-------

Now, clearly, there WILL be a negative correlation, here, between shots taken and shooting percentage.  Because, no matter how many or how few shots you take, your expectation is still 32 goals.

The inverse relationship is true even though the model chose *randomly* whether to pass or shoot.  If the random numbers come up so that you shoot early, you'll have more shots of lower quality.  If the random numbers come up so that you shoot later, you'll have fewer shots of higher quality.

-------

OK, now, the first result.  I created 2,000 teams, and ran the simulation.  I expected to see a negative correlation between shots and shooting percentage.  But I didn't.  The correlations were always close to zero, and I didn't see any real effect at all.  I think that's because:

1.  Over 1,000 possessions, shot quality evens out enough that there's not much difference between teams.  400 shots with a quality of 8% isn't really that much different than 375 shots with a quality of 8.53%, which is two teams about one standard devaition apart.

2.  My simulation included quality differences between teams.  On average, they had 1000 possessions, but with a standard deviation of 47, which means the better teams will have significantly more shots than the worse teams.  The "more shots means a better team" effect is much larger than the "more shots means worse shots" effect.

3.  SH% depends on whether a goal goes in, which is just random.  If you have, say, 400 shots with an average quality of 8%, the standard deviation of goals is 5.4, or a shooting percentage of 1.36 percent.  That's pretty big, compared to the random differences in shot quality that we're looking for.

So, under this model, I have to admit that real-life differences in shooting percentage aren't due to just random differences in shot quality between teams that would otherwise be the same.

That's not to say that there isn't another model in which this would work -- one involving more breakaways, say.  But, I doubt even a more realistic model would show enough random variation to explain the 2012-13 Toronto Maple Leafs.

--------

Which means: I'm forced to stick with the idea that there are differences between teams.  So, here's what I did to create those differences.

As I said, the model assumed a 17% chance of shooting, versus an 83% chance of attempting a pass.  I made those percentages random.  The average was still 17%, but with a standard deviation of about 1.8 percentage points.  So, around 1 team in 6 would shoot 18.8% of the time (or more), and 1 team in 6 would shoot 15.2% of the time (or less).  That is: some teams like to shoot more, and some teams like to shoot less.

Those differences seem small, but they made the effect really come through.  With that one change, I got a negative correlation of about -.21.

That's smaller than the actual real-life Corsi vs. Sh% correlation of -.24.  But, in the simulation, I'm using shots instead of Corsi.   Last year, the shot/SH% correlation was -.17, and the year before, it was -0.21.  So, close enough.

(One thing to keep in mind, though, is that the real-life numbers were based on shots for AND shots against.  I'm only using shots for.  So, when we talk about the effect, you should mentally split it between offense and defense.  For instance, instead of thinking about a team that shoots 19 percent of the time instead of 17 percent -- a 2 percentage point difference -- maybe think of it as one point on offense, and one point on defense.)

-------

So, anyway, here are the results of the "shots vs. SH%" regression, as compared to the actual 2012-13 NHL season (5-on-5 tied):

Average Team Sh%:    simulation 7.7%,  real life 7.7%
SD Team Sh%:         simulation 1.42%, real life 1.60%
Average Team shots:  simulation 420,   real life 405
SD Team shots:       simulation 62,    real life 58 (SF-SA)

Correlation:         simulation -0.21, real life -0.17
Coefficient:         simulation 9.0,   real-life 9.4

(Note: some of the "real life" numbers are approximate, but close enough for these purposes.)

It's actually not quite as good a fit as it looks ... in real life, the SD of SH% is significantly higher than in the simulation.  If I had matched them, the correlation would have been much more extreme than -0.21.  I'm not sure how to explain that; it could be random, or it could be something I'm not seeing.

-------

The coefficient of around 9 means that if you take a team like the Leafs that's 3 percentage points "too high" in shooting percentage, you'd expect it to be 27 "too low" in shots.  If you reduce the Leafs' shot difference by 27, it moves them from 45.7% to 47.6% of shots taken.  (That is, 47.6% of all shots taken by the Leafs, and 52.4% by their opponents.)

If Corsis are roughly twice as frequent as shots ... that bumps the difference to 54.  Reducing the Leafs' Corsi difference by 54 moves them from 43.8% to 45.7%.

Either way -- shots or Corsi -- that's a jump of 4 or 5 teams, which seems reasonable.

But is the *model* reasonable?  Is it plausible that teams vary in their willingness to take shots, and/or their ability to affect their opponents' propensity to shoot?  Because, that's what this is about.  If you accept that teams can differ that much in how they shoot, then you have an explanation for at least part of the Leafs' bad Corsi.

I know the model isn't very realistic, but I'm trying to get a feel quantitatively, rather than qualitatively.  If you built your own model, and included breakaways and counterattacks and defensive zone alignment and all kinds of other things, would differences in team strategy be roughly equivalent to what I've done here?

It seems not too unreasonable to me, but, to be honest, I have little expertise in hockey strategy.  As usual, I'll wait to see what you readers say.

(There are seven parts. Part IV was previousThis is Part V.  Part VI is next.)

Labels: , , ,

At Thursday, November 14, 2013 4:48:00 PM,  taylorjohnwright said...

Nice work Phil. I've enjoyed this series on the Leafs.

It might be worth having a look at the New Jersey Devils as well. They are essentially the antithesis of the Leafs: strong possession numbers but bad shooting percentage. The same idea of unsustainable play was applied and the assumption was that the Devils would do much better this season since it was unlikely they would be that unlucky again. But the Devils haven't done better and deserve to be every bit the "case study" for hockey analytics as the Leafs.

At Thursday, November 14, 2013 8:51:00 PM,  Anonymous said...

Phil -

Have you seen the stuff on time of possession showing that the Leafs shots correlate really tight to their time spent in the offensive zone?

If what you're suggesting is what's going, I'd think the Leafs would have way more offensive zone time. Instead, their OZ time seems tightly connected to their Corsi%.

At Saturday, November 16, 2013 12:01:00 AM,  Phil Birnbaum said...

Hi, Tyler,

You'd expect OZ time to be correlated to Corsi for an individual team regardless, wouldn't you? Whether a team shoots X% of the time, or Y% of the time, it still correlates with possessions.

But: does OZ time correlate with "shots per minute of OZ time" for different teams? That would tell us whether some teams take longer to shoot than others.