Thursday, May 03, 2018

NHL referees balance penalty calls between teams


That finding, from Michael Lopez, shows that the next penalty in an NHL game is significantly less likely to go to the team that's had more penalties so far in the game.

That was a new finding to me. A few years ago, I found that the next penalty is more likely to go to the team that had the (one) most recent penalty -- but I hadn't realized that quantity matters, too.

(My previous research can be found here: part one, two, three.)

So, I dug out my old hockey database and see if I could extend Michael's results. All the findings here are based on the same data as my other study -- regular season NHL games from 1953-54 to 1984-85, as provided by the Hockey Summary Project as at the end of 2011.

-------

Quickly revisiting the old finding: referees do appear to call "make-up" penalties. The team that got the benefit of the most recent power play is almost 50 percent more likely to have the next call go against them. That team got the next penalty 59.7% of the time, versus only 40.3% for the previously penalized team.

39599/98167 .403 -- team last penalized
58568/98167 .597 -- other team

Now, let's look at total numbers of penalties instead. I've split the data into home and road teams, because road teams do get more penalties -- 52 percent vs. 48 percent overall.  (That difference is mitigated by the fact that referees balance out the calls. The first penalty of the game goes to the road team 54 percent of the time. The drop from 54 percent for the first call, down to 52 percent overall, is due to the referees balancing out the next call or calls.)

So far, nothing exciting. But here's something. It turns out that the *second* call of the game is much more likely than average to be a makeup call:

.703 -- visiting penalty after home penalty
.297 -- home penalty after home penalty

.653 -- home penalty after visiting penalty 

.347 -- visiting penalty after visiting penalty

Those numbers are huge. Overall, there are more than twice as many "make up" calls as "same team" calls.

In this case, quantity and recency are the same thing. Let's move on to the third penalty of the game, where they can be different.  From now on, I'll show the results in chart form:

.705 0-2 
.462 1-1
.243 2-0

Here's how to read the chart: when the home team has gone "0-2" in penalties -- that is, both previous penalties to the visiting team -- it gets 70.5% of the third penalties. When the previous two penalties were split, the home team gets 46.2%, similar to the overall average. When the home team got both previous penalties, though, it draws the third only 24.3% of the time (in other words, the visiting team drew 75.7%).

Here's the fourth penalty. I've added sample sizes, in parentheses.

.701 0-3 (755)
.559 1-2 (6951)
.373 2-1 (5845)
.261 3-0 (468)

It's a very smooth progression, from .701 down to .261, exactly what you would expect given that make-up calls are so common. 

Here's the fifth penalty:

.677 0-4 ( 195)
.619 1-3 (3244)
.465 2-2 (6950)
.351 3-1 (2306)
.316 4-0 ( 117)

That's the chart that corresponds to Michael Lopez's tweet, and if you scroll back up you'll see that these numbers are pretty close to his.

Sixth penalty:

.667 0-5 (  48)
.637 1-4 (1182)
.520 2-3 (4930)
.413 3-2 (4134)
.323 4-1 ( 773)
.226 5-0 (  31)

Again, the percentages drop every step ("monotonically," as they say in math).

Seventh penalty:

.692 0-6 (  13)

.585 1-5 ( 369)
.577 2-4 (2528)
.489 3-3 (4140)
.399 4-2 (1798)
.379 5-1 ( 219)
.200 6-0 (  13)

Eighth penalty:

.667 0-7 (   3)
.607 1-6 ( 122)
.588 2-5 ( 969)
.527 3-4 (2721)
.422 4-3 (2414)
.374 5-2 ( 652)
.412 6-1 (  68)
.000 7-0 (   1)

Still a perfect pattern.  It breaks up just a little bit here, for the ninth penalty, but that's probably just small sample size.

.000 0-8 (   1)
.553 1-7 (  38)
.586 2-6 ( 348)
.566 3-5 (1358)
.484 4-4 (2063)
.392 5-3 (1037)
.340 6-2 ( 191)
.333 7-1 (  21)

(This is getting boring, so here's a technical note to break the monotony. I included all penalties, including misconducts. I omitted all cases where both teams took a penalty at the same time, even if one team took more penalties than the other. In fact, I treated those as if they never happened, so they don't break the string. This may cause the results to be incorrect in some cases: for instance, maybe Boston takes a minor, then there's a fight and Montreal gets a major and a minor while Boston gets only a major. Then, Montreal takes a minor. In that case, the study will treat the Montreal minor as a make-up call, when it's really not. I think this happens infrequently enough that the results are still valid.)

I'll give two more cases. Here's the twelfth penalty:

.692 2-9 ( 13)
.623 3-8 ( 61)
.532 4-7 (250)
.506 5-6 (478)
.488 6-5 (459)
.449 7-4 (198)
.457 8-3 ( 35)
.200 9-2 (  5)

Almost perfect.  But ... the pattern does seems to break down later on, at the 14th to 16th penalty (I stopped at 16), probably due to sample size issues. Here's the fourteenth, which I think is the most random-looking of the bunch. You could almost argue that it goes the "wrong way":

.000  2-11 (  1)
.375  3-10 (  8)
.333  4- 9 ( 27)
.516  5- 8 ( 95)
.438  6- 7 (169)
.480  7- 6 (148)
.465  8- 5 ( 71)
.577  9- 4 ( 26)
.600 10- 3 (  5)

Still, I think the overall conclusion isn't threatened, that quantity is a factor in make-up calls.

------

OK, so now we know that quantity matters. But couldn't that mean that recency doesn't matter? We did find that the team with the most recent penalty was less likely to get the next one -- but that might just be because that team is also more likely to have a higher quantity at that point. After all, when a team takes three of the first four penalties, there's a 75 percent chance* it also took the most recent one. 

(* It's actually not 75 percent, because make-up calls make the sequence non-random. But the point remains.)

So, maybe the recency effect is just an illusion, by the quantity effect. Or vice versa.

So, here's what I did: I broke down every row in every table by who got the more recent call. It turns out: recency does matter.

Let's take that 3-for-4 example I just used:

.613 home team overall     (3244)
---------------------------------
.508 after VVVH            ( 486)
.639 after other sequences (2758)

From this, it looks like there's both aspects here. When the home team is "up 3-1" in penalty advantage, it gets only 51 percent of the penalties if its previous penalty was the last of the four. That's still more than the 46.1 percent it gets to start the game, or the 46.5 percent it would get if it had been 2-2 instead of 3-1.

This seems to be true for most of the breakdowns -- maybe even all the ones with large enough sample sizes. I'll just arbitrarily pick one to show you ... the ninth penalty, home team 3-5.

.392 home team overall     (1037)
---------------------------------
.362 when most recent was H (743)
.469 when most recent was V (294)

Even better: here's the entire chart for the eighth penalty: overall vs. last penalty went to home team ("last H") vs. last penalty went to visiting team "last V". 

overall   last H    last V
----------------------------------
 .607      .750      .596      1-6 
 .588      .477      .609      2-5 
 .527      .446      .584      3-4 
 .422      .372      .518      4-3 
 .374      .357      .466      5-2 
 .412      .406      .500      6-1 

Clearly, both recency and quantity matter. Holding one constant, the other still follows the "make-up penalty" pattern. 

Can we figure out *how much* is recency and *how much* is quantity?  It's probably pretty easy to get a rough estimate with a regression. I'm about to leave for the weekend, but I'll look at that next week. Or you can download the results (speadsheet here) and do it yourself.




Labels: , , ,