Do NHL referees call "make up" penalties? Part II
Last post, I found that referees are likely to "even up" their penalty calls: they're around 50% more likely to give the other team the next power play than to give the same team two power plays in a row.
I wasn't not convinced this is because of referee bias, or what Tango calls the "compassionate referee."
Tango suggested this experiment: check to see if a power play goal was scored on the first penalty. If the referee is indeed "compassionate" towards the other team, he should be more compassionate if the penalty actually cost them a goal, less so if there was no goal, and even less so if the penalized team *benefited* from the penalty by scoring shorthanded.
So I checked. I looked at all cases where there was a power play goal (PPG) on a first penalty, and then no more scoring until the next penalty was called. Indeed, that does appear to make the ref more compassionate.
After a penalty resulting a PPG, the next penalty was of the "even-up" variety 65.9% of the time. That's higher than the overall rate of 59.7%. Repeating that in a better font:
65.9% after a PPG
59.7% overall rate
And, the same effect appears for shorthanded goals (SHG):
52.5% after an SHG
59.7% overall rate
It's a large effect, and exactly in the direction Tango predicted.
But wait! It might not be referee bias at all. Because, it turns out that teams with a lead take significantly more penalties than teams who are behind. For instance, when a penalty is called while you have a two goal lead, there's a 55.2% chance the penalty goes against you (and so a 44.8% chance the penalty goes against the other team). Full chart:
55.2% of penalties to team leading by 1
58.2% of penalties to team leading by 2
59.0% of penalties to team leading by 3
59.4% of penalties to team leading by 4
59.7% of penalties to team leading by 5
So, the score effect could explain what we're seeing. After a power play goal, the team has a bigger lead (or smaller deficit) than before. That would make it likely to take more penalties in future, even if the referee wasn't compassionate at all.
(Of course, the score effect might itself be due to referee "compassion," but that's a whole other argument.)
Specifically: a power play goal makes the team 6 percentage points more likely to take the next penalty. But scoring ANY tiebreaking goal in the first period makes a team 5 percentage points likely to take the next penalty. So how can we be sure there's a separate power-play effect, or how big it is?
What might also complicate things is there's a "time of game" effect:
42,721 PPs came in the first period.
38.060 PPs in the second period.
26,705 PPs came in the third period.
There are fewer penalties in the third period than in the first. Is that a separate period effect? It might be.
Here's the score effect chart, again, but this time only for first-period penalties. The effect is more extreme than for the entire game:
55.6% of penalties to team leading by 1
60.1% of penalties to team leading by 2
61.0% of penalties to team leading by 3
66.5% of penalties to team leading by 4
58.7% of penalties to team leading by 5 (only 46 datapoints)
It almost looks like we need a regression to sort all this out. But, wait! One more try before we turn to the dark side. Let's engineer a comparison where score and period won't screw things up.
I took every situation where:
1. It was the first period.
2. The game was tied at the time of the first penalty, and exactly one additional goal was scored before the second penalty.
3. The one extra goal was scored by the team that had the power play on the first penalty.
Then, I divided those situations into two groups.
The "Highest compassion" group is where the team scored the goal *on the power play*, presumably making the referee feel extra bad that he caused the goal. The "Typical compassion," is where the team scored the goal *after* the power play, and the referee's call wasn't the cause.
What percentage of the second penalties went to the other team?
Highest compassion: 71.6% (2163 datapoints)
Average compassion: 69.7% (1051 datapoints).
There's a small effect there, in the expected direction, of 1.9 percentage points. (That's less than 1 SD, so not statistically signficant.)
Here's the same result, but the other way, where it's the originally-penalized team that scored before the next penalty. When that goal was scored shorthanded, we can call that "Lowest compassion". When it wasn't, it's again "Average compassion."
Again, what percentage of the time did the second penalty even things out?
Lowest compassion: 62.5% (253 datapoints)
Average compassion: 58.2% (1006 datapoints).
This time the effect goes the "wrong" way, but there's too little data to draw any conclusions.
Doing the same thing for the second period instead of the first, we find a larger difference, but still not statistically significant (1.4 SD):
Highest compassion: 70.4% (568 datapoints)
Average compassion: 65.7% (271 datapoints).
And the shorthanded case, which really has too small a sample to take seriously:
Lowest compassion: 51.8 (83 datapoints)
Average compassion: 48.9% (268 datapoints).
So, in summary: yes, there appears to be weak evidence for a small "compassion effect."
In the previous post, I considered three hypotheses:
1. Referee bias
2. Penalized teams play more carefully after the penalty
3. Power play teams play more aggressively after the penalty
Here's a fourth one, a variation of one suggested by commenter Wexler in the previous post:
4. Referees like to let the players play, and dislike calling penalties. But, sometimes they have to assert themselves to make sure the game doesn't get out of hand. Sometimes they're a bit too late, and they have to call a penalty on something that wasn't a penalty two minutes ago. This sends a message to the players, "OK, enough."
That might be necessary, but is obviously unfair to the penalized team. And, so, the referees know they have to call a "make up" penalty on those particular calls. Both teams understand what's happening, and won't object to either that call or the subsequent call.
I don't know if #4 is plausible or not ... but one of my co-workers is a soccer referee, and it's consistent with what he says about having to keep the game under control before it's too late.
As usual, I await comments from readers who know more about this stuff than I do.
UPDATE: Part 3 is here.