Tuesday, January 03, 2012

Do NHL referees call "make up" penalties? Part II

Last post, I found that referees are likely to "even up" their penalty calls: they're around 50% more likely to give the other team the next power play than to give the same team two power plays in a row.

I wasn't not convinced this is because of referee bias, or what Tango calls the "compassionate referee."

Tango suggested this experiment: check to see if a power play goal was scored on the first penalty. If the referee is indeed "compassionate" towards the other team, he should be more compassionate if the penalty actually cost them a goal, less so if there was no goal, and even less so if the penalized team *benefited* from the penalty by scoring shorthanded.

So I checked. I looked at all cases where there was a power play goal (PPG) on a first penalty, and then no more scoring until the next penalty was called. Indeed, that does appear to make the ref more compassionate.

After a penalty resulting a PPG, the next penalty was of the "even-up" variety 65.9% of the time. That's higher than the overall rate of 59.7%. Repeating that in a better font:

65.9% after a PPG
59.7% overall rate

And, the same effect appears for shorthanded goals (SHG):

52.5% after an SHG
59.7% overall rate

It's a large effect, and exactly in the direction Tango predicted.


But wait! It might not be referee bias at all. Because, it turns out that teams with a lead take significantly more penalties than teams who are behind. For instance, when a penalty is called while you have a two goal lead, there's a 55.2% chance the penalty goes against you (and so a 44.8% chance the penalty goes against the other team). Full chart:

55.2% of penalties to team leading by 1
58.2% of penalties to team leading by 2
59.0% of penalties to team leading by 3
59.4% of penalties to team leading by 4
59.7% of penalties to team leading by 5

So, the score effect could explain what we're seeing. After a power play goal, the team has a bigger lead (or smaller deficit) than before. That would make it likely to take more penalties in future, even if the referee wasn't compassionate at all.

(Of course, the score effect might itself be due to referee "compassion," but that's a whole other argument.)

Specifically: a power play goal makes the team 6 percentage points more likely to take the next penalty. But scoring ANY tiebreaking goal in the first period makes a team 5 percentage points likely to take the next penalty. So how can we be sure there's a separate power-play effect, or how big it is?


What might also complicate things is there's a "time of game" effect:

42,721 PPs came in the first period.
38.060 PPs in the second period.
26,705 PPs came in the third period.

There are fewer penalties in the third period than in the first. Is that a separate period effect? It might be.

Here's the score effect chart, again, but this time only for first-period penalties. The effect is more extreme than for the entire game:

55.6% of penalties to team leading by 1
60.1% of penalties to team leading by 2
61.0% of penalties to team leading by 3
66.5% of penalties to team leading by 4
58.7% of penalties to team leading by 5 (only 46 datapoints)


It almost looks like we need a regression to sort all this out. But, wait! One more try before we turn to the dark side. Let's engineer a comparison where score and period won't screw things up.

I took every situation where:

1. It was the first period.

2. The game was tied at the time of the first penalty, and exactly one additional goal was scored before the second penalty.

3. The one extra goal was scored by the team that had the power play on the first penalty.

Then, I divided those situations into two groups.

The "Highest compassion" group is where the team scored the goal *on the power play*, presumably making the referee feel extra bad that he caused the goal. The "Typical compassion," is where the team scored the goal *after* the power play, and the referee's call wasn't the cause.

What percentage of the second penalties went to the other team?

Highest compassion: 71.6% (2163 datapoints)
Average compassion: 69.7% (1051 datapoints).

There's a small effect there, in the expected direction, of 1.9 percentage points. (That's less than 1 SD, so not statistically signficant.)

Here's the same result, but the other way, where it's the originally-penalized team that scored before the next penalty. When that goal was scored shorthanded, we can call that "Lowest compassion". When it wasn't, it's again "Average compassion."

Again, what percentage of the time did the second penalty even things out?

Lowest compassion: 62.5% (253 datapoints)
Average compassion: 58.2% (1006 datapoints).

This time the effect goes the "wrong" way, but there's too little data to draw any conclusions.

Doing the same thing for the second period instead of the first, we find a larger difference, but still not statistically significant (1.4 SD):

Highest compassion: 70.4% (568 datapoints)
Average compassion: 65.7% (271 datapoints).

And the shorthanded case, which really has too small a sample to take seriously:

Lowest compassion: 51.8 (83 datapoints)
Average compassion: 48.9% (268 datapoints).


So, in summary: yes, there appears to be weak evidence for a small "compassion effect."

In the previous post, I considered three hypotheses:

1. Referee bias
2. Penalized teams play more carefully after the penalty
3. Power play teams play more aggressively after the penalty

Here's a fourth one, a variation of one suggested by commenter Wexler in the previous post:

4. Referees like to let the players play, and dislike calling penalties. But, sometimes they have to assert themselves to make sure the game doesn't get out of hand. Sometimes they're a bit too late, and they have to call a penalty on something that wasn't a penalty two minutes ago. This sends a message to the players, "OK, enough."

That might be necessary, but is obviously unfair to the penalized team. And, so, the referees know they have to call a "make up" penalty on those particular calls. Both teams understand what's happening, and won't object to either that call or the subsequent call.

I don't know if #4 is plausible or not ... but one of my co-workers is a soccer referee, and it's consistent with what he says about having to keep the game under control before it's too late.

As usual, I await comments from readers who know more about this stuff than I do.


UPDATE: Part 3 is here.

At Tuesday, January 03, 2012 6:21:00 PM, Anonymous matskralc said...

I officiate middle and high school football, and would agree that #4 is a plausible explanation. A lot of football penalties are procedural (false starts, illegal formations) and so we kind of have to call them (although we'll often give as much leeway as we can). We really don't like to be involved in the game beyond tweeting our whistles and spotting the ball, though. We'd rather let the kids play and will get involved when things are egregious or if things start getting out of hand.

At Wednesday, January 04, 2012 3:03:00 PM, Anonymous J.-P. Martel said...

Another theory could be that referees have instructions from the NHL to "even things up" to keep the game interesting for the fans and the viewers.

That would explain how come the 1975-1977 Philadelphia Flyers weren't spending most of their games short-handed.

In the same vein, referees would also have instructions to penalize the team that is trailing in the score.

In this latter case, there's also the fact that blow-outs can easily lead to situations that get out of hand, so referees may call penalties on the leading team so that the trailing team still thinks it has a chance to come back, rather than resort to fighting to "prepare" the next game between the two teams.

Actually, you may want to check penalties in the second half of the third period when the teams' next game is (or may be, depending on outcome) against each other (particularly in the playoffs), as opposed to when it's not.

Former referee Ron Fournier once stated that he'd been warned by the NHL that he was giving too many penalties, as his average was about 2 minutes per game higher than the average of the other referees. Now, the NHL would probably never admit to asking referees to "stick to the average", so it would not be surprising if they also had "secret" guidelines regarding when and who to penalize.

It would also be interesting to see the average number of penalties handed out per game for each referee, to see if there were harsher referees or, as Fournier's statement suggested, everyone was "convinced" to conform to the "norm".

At Wednesday, January 04, 2012 3:10:00 PM, Blogger Phil Birnbaum said...


I did look quickly at penalties per game, and they varied quite widely between referees. I was surprised that the number varied so much, but the percent of "make-up" penalties varied so little.

Are you suggesting that when the next game is between the same teams, there will be more penalties in the third period, or less?

At Wednesday, January 04, 2012 4:47:00 PM, Anonymous J.-P. Martel said...

I'm suggesting there will be more penalties in the third period when the two teams are playing their next game against one another.

1. The referee(s) may want to make sure that the losing team does not start "sending messages" for the next game.

2. The losing team may still start "sending messages" for the next game, in which case the referee(s) has/have to start calling penalties.

Either way, this would mostly happen when one team wins by a big margin (the bigger the margin, the
more penalties).

On your study of referees, it would be interesting to see if, as a referee's career progresses, the number of penalties he calls gets closer to the average. It would also be interesting to see if the careers of those way above (or below) average seemed to suffer from being that far from the average (e.g. officiating fewer games or not being called for the playoffs).

At Wednesday, January 04, 2012 7:51:00 PM, Blogger Phil Birnbaum said...

OK, makes sense. I'll try to run that 3rd period test over the next couple of days.

Thanks, J-P!

At Thursday, January 05, 2012 4:40:00 PM, Blogger tim said...

i don't think you're necessarily looking at the correct referee bias. they don't have compassion per se, but they do like to keep the games close. it's like mario kart catchup code, where the player in first faces a headwind so the rest of the field can catch up easier. this makes the referees look balanced and gives the perception of greater league parity.

At Friday, January 06, 2012 10:18:00 AM, Blogger godhammel said...

I have been a ref for the past 5 years now, and while I agree that referees do not like calling penalties, I, for one, have never called something as a 'make-up' call. But if I am giving teams leeway (which I think most refs do on borderline calls) I will try to give them a warning that they are getting close to a call rather than calling a penalty on a team and making up for the call later. Much like the referee in 24/7 who warned talbot he was going to get a penalty for doing several borderline things instead of one egregious penalty.

That being said, I think that the numbers on my penalties would be around 60% as well. But its more of the fact that I am giving both teams the same leeway, both teams the same warning, and then both teams ignore the warning. Just one team does the borderline action again first. So I end up calling the first penalty and then the second looks like a ‘make-up’ call. And what I call sometimes wasn’t a penalty two minutes ago, but since I am taking the leeway I was giving to both teams back, both teams get called for a penalty.

That’s just some insight into my own thought process. I don’t know what goes on at the professional level or what other refs do (everybody has a different way of doing it). I will tell you this though, myself and all of the referees I have worked with absolutely hate looking back on a penalty and thinking you got the call wrong. We don’t want to influence the game at all if we could—we are just trying to keep both sides playing fair. That alone keeps me from calling ‘make-up’ calls because I don’t want to call anything that isn’t normally a penalty and end up having that call influence the outcome of the game.

At Friday, January 06, 2012 1:53:00 PM, Anonymous Gadi said...

Great stuff, Phil, thanks for posting this analysis. Godhammel, I think a lot of good refs call "makeup" penalties, but a part of this is the subconscious. The outstanding book "Scorecasting" talked about the inherit bias refs often have towards home teams, because of the pressures on them to give the home team a break (that's super-paraphrasing their argument). The numbers back them up. Look at the Winter Classic and the comments by Torts after the game. I wouldn't have been surprised if somewhere in their subconscious, the refs had the thought that the home team, in this huge game, was down by one and that played a role in the penalties that were called or not called. We're all human, so its hard to be completely non-biased.


