Saturday, December 31, 2011

Do NHL referees call "make up" penalties?

Among NHL fans, there's a perception that referees like to call "make up" penalties. If a ref has just called a minor penalty on one team, it's very likely that the next penalty will go to the other team.

I was skeptical, until I downloaded a bunch of data from The Hockey Summary Project ... they're like Retrosheet for hockey. (Their website is here, and if you want data downloads, you can join their group by going here.)

I looked at all penalties from 1953-54 to 1984-85 (for which the HSP data is almost complete). I eliminated all cases where there both teams got penalties at the same time. Then, I checked what was left, to see if the team that got the current penalty was less likely to get the next one.

Absolutely, very much so. There's a 60% chance the next penalty will go to the other team -- 59.7%, to be more exact. (But, since I'm not sure that database is complete, and I forgot to remove misconducts, and I didn't consider situations where both teams got a penalty but one team got an extra one, I'm happier to drop the decimal and just go with 60%.)

The effect is reasonably consistent over time, although it was a little stronger back in the six-team era. Here's a too-long chart.

1953-54: 62.7 888/1416
1954-55: 61.3 857/1397
1955-56: 60.6 912/1506
1956-57: 61.2 833/1360
1957-58: 58.8 793/1348
1958-59: 61.4 801/1305
1959-60: 62.3 723/1160
1960-61: 61.7 740/1199
1961-62: 62.0 821/1324
1962-63: 62.0 797/1286
1963-64: 61.8 826/1337
1964-65: 61.7 841/1362
1965-66: 58.6 820/1399
1966-67: 58.6 710/1212
1967-68: 60.0 1515/2527
1968-69: 59.5 1666/2800
1969-70: 60.5 1793/2962
1970-71: 60.4 1944/3220
1971-72: 57.8 1918/3317
1972-73: 60.6 2200/3633
1973-74: 58.8 2135/3628
1974-75: 56.7 2873/5069
1975-76: 56.3 2890/5130
1976-77: 57.2 2371/4144
1977-78: 59.1 2316/3916
1978-79: 59.5 2337/3925
1979-80: 59.3 3021/5091
1980-81: 60.1 3780/6293
1981-82: 60.7 3613/5957
1982-83: 60.3 3479/5774
1983-84: 60.2 3788/6296
1984-85: 60.6 3542/5847
Overall: 59.7 39597/58543

Even though the effect is real, we can't say for sure that it's referee bias. It could just be that, after a penalty, the penalized team plays more cautiously, trying to avoid a second penalty. Or, it could be that the just had the power play decides to play more aggressively.

(As an aside: why did penalties drop so much between 1975-76 and 1976-77? At first I thought it might be bad data, but then I checked power-play opportunities on Hockey Reference, and it checked out.)

Here's what I think is some relevant evidence. I broke down the stats by referee (minimum 300 datapoints). The database only has the referee named for about a quarter of the total games (mostly older ones), but I figure it's probably good enough to at least look at.

The first column is the main number, the percentage of penalties called against the team who drew the last one.

Pctg Z-sc Size Ref
---- ---- ---- ----------------------------
59.3 00.0 0509 Andy Van Hellemond
60.4 +0.4 0846 Art Skov
59.3 -1.2 1128 Ashley
60.0 00.0 0460 Bill Friday
57.0 -1.1 0537 Bob Myers
58.9 -0.3 0878 Bruce Hood
57.1 -0.9 0580 Bryan Lewis
59.3 -1.6 1453 Buffey
64.6 +1.6 0933 Chadwick
59.4 -0.1 0567 Dave Newell
60.4 -0.3 0356 Farelli
60.1 -0.3 0511 Friday
59.9 -0.1 0696 John Ashley
61.3 +0.8 0359 Lloyd Gilmour
61.3 -0.2 0789 Macarthur
56.7 -1.4 0319 Mehlenbecher
63.0 +0.5 0327 Olinski
60.4 -0.8 1906 Powers
55.2 -2.2 0698 Ron Wicks
63.7 +1.7 1095 Skov
57.7 -3.1 2197 Storey
63.1 +2.7 4717 Udvari
59.9 +0.4 0709 Wally Harris

The least "biased" referee is 55%, and the most "biased" is 64%. If you think it's only referee bias that keeps the numbers from being 50%, you'd have to think that EVERY referee is biased almost exactly the same way. It's hard for me to accept that none of the referees noticed the bias and saw fit to try to eliminate it.

The second column of the table is the Z-score, the number of standard deviations the referee is from expected (which is normalized to the seasons he officiated). Normally, you concentrate on those with at least plus or minus 2 SD. That gives you Red Storey and Ron Wicks (less biased than most) and Frank Udvari (more biased than most).

The standard deviation of the Z-scores was 1.29. If every referee were the same, and differences were only random, it would be 1.00. This suggests that there are real differences between referees. Specifically, the SD of referee tendencies (or "talent", you might say) is 0.8 (since 1 squared plus 0.8 squared equals 1.29 squared).

In English, you can perhaps interpret that as saying that the differences in the table are about half real and half random, with a little more random than real (since 1.00 is a little higher than 0.8).

The observed range is 55 to 64. Regressing to the mean, the actual range of referee tendencies is probably 57 to 62, or something like that.

So if you think it's referee bias, you have to explain why all the referees seem to be biased within such a tight range, especially, when, presumably, they are all working hard to be as unbiased as possible.


Here's another interesting breakdown, by time since the previous penalty:

0:01 to 1:00: 69.1 (7000)
1:01 to 2:00: 64.7 (9444)
2:01 to 3:00: 68.5 (11778)
3:01 to 4:00: 64.2 (10574)
4:01 to 5:00: 61.2 (8831)
5:01 to 6:00: 59.7 (7470)
6:01 to 7:00: 58.9 (6328)
7:01 to 8:00: 58.3 (5333)
8:01 to 9:00: 56.6 (4399)
9:01 to 10:00: 55.5 (3719)
10:01 to 11:00: 56.7 (3000)
11:01 to 12:00: 55.5 (2591)
12:01 to 13:00: 53.8 (2193)
13:01 to 14:00: 55.2 (1837)
14:01 to 15:00: 53.3 (1565)
15:01 to 16:00: 53.1 (1376)
16:01 to 17:00: 53.1 (1135)
17:01 to 18:00: 52.4 (1019)
18:01 to 19:00: 51.9 (807)
19:01 to 20:00: 53.6 (757)
20:01 to 99:99: 51.8 (3883)

The longer the interval since the previous penalty, the less likely the next penalty will go to the other team. That's consistent with many theories. The "referees are biased" theory would say that referees "forget" to even things up as the game goes on. The "other team wants revenge and plays aggressively" theory would say that if they don't get revenge early, they don't need it as much later. And the "penalized team takes fewer chances" theory would say that as time goes on, the players "forget" that they have to be more careful.

So, the data doesn't help us choose, but it's interesting nonetheless.

By the way, the 1:01 to 2:00 group is an exception to the pattern, but that's probably due to power plays, since the first penalty is probably still in effect. Actually, I'd have expected that part to go the other way, with the first two minutes being *more* than 50 percent, on the logic that the shorthanded team playing in the defensive zone is more likely to be forced to take a penalty. But, that doesn't happen.

And here's an interesting breakdown of the first half of the first group:

81.9% within 5 seconds
78.1% between 6 and 10 seconds
76.0% between 11 and 15 seconds
73.8% between 16 and 20 seconds
69.3% between 21 and 25 seconds
67.0% between 26 and 30 seconds.


Finally, one more question: after one team gets, say, four straight penalties, what happens then? Is there an even stronger bias for the other team to take the next penalty?


57.1 after exactly 1 in a row (64858 datapoints)
64.0 after exactly 2 in a row (23850)
66.6 after exactly 3 in a row (7042)
66.0 after exactly 4 in a row (1781)
63.8 after exactly 5 in a row (442)
60.6 after exactly 6 in a row (127)
67.5 after 7 or more in a row (40)


So: what's going on? Any ideas?

UPDATE: Part 2 is here. Part 3 is here.

Labels: , , ,


At Saturday, December 31, 2011 4:47:00 PM, Anonymous Anonymous said...

Pardon me is I'm being too linear here, but doesn't this make perfect sense since a penalized has fewer players to commit more penalties.

At Saturday, December 31, 2011 4:48:00 PM, Blogger Phil Birnbaum said...

Good point, never thought of that!

Still, that's true in the first two minutes, but not afterwards.

At Saturday, December 31, 2011 6:47:00 PM, Anonymous matskralc said...

(As an aside: why did penalties drop so much between 1975-76 and 1976-77? At first I thought it might be bad data, but then I checked power-play opportunities on Hockey Reference, and it checked out.)

The only rule change I can find for the 76-77 season is that fighting was amended to give an extra major and game misconduct to any player who clearly instigated a fight.

At Sunday, January 01, 2012 12:26:00 AM, Anonymous Wexler said...

1)"Among NHL fans, there's a perception that referees like to call "make up" penalties. If a ref has just called a minor penalty on one team, it's very likely that the next penalty will go to the other team."

Phil, as a lifelong hockey fan, this is not quite a good definition of what a make-up call is. I'd defer if other observers disagreed with me here, but my understanding of what is termed a make-up call is a penalty called on one team at a lower standard in order to even out a marginal or mistaken call made on the opposing team. i.e., A make-up call is explicitly defined as a referee bias issue, not just the fact that penalties tend to alternate btw teams as you show in the data.

It's the same phenomenon as in baseball. An ump isn't much more likely to call a strike for Gio Gonzales just because he just called a ball. But sometimes you'll see an ump clearly get fooled by some wicked movement and, in the moment, miss that a curveball dropped into the strike zone. If they then realize they might have missed the call (or if they are influenced by players or fans complaining), they will have a stronger tendency to call the next borderline pitch a strike.

Because so many difficult snap judgments are required of umps/refs, they are bound to make some mistakes which they realize after making the call. Or, as a slight alteration, if they make a few close close in one direction consecutively, they intuit that there's a greater chance that they made a mistake somewhere along the way, so they lower their threshold for making a call in the opposite way to make up for their mistake. I think this is somewhat tolerated in baseball, and is to a greater extent in hockey where a single penalty call has a much larger effect on the game than a single ball/strike call.

I strongly suspect that the effect you show in the data is indeed a make-up call effect- ie, a ref bias effect.

2)"If you think it's referee bias, you have to explain why all the referees seem to be biased within such a tight range, especially, when, presumably, they are all working hard to be as unbiased as possible."

I don't think they are working hard to be unbiased as possible in each call, though they do seek to be or appear to be unbiased over the course of an entire game. I think they both consciously & subconsciously lower their standards for calling a penalty on one team after they have called a marginal penalty on the opposing team. I think that this practice is accepted and encouraged by the hockey world (coaches, players, owners, fans, & league personnel) to just about exactly the extent that it occurs (in the global sense, if not in a given instance).

If there were less make-up calls, coaches and fans et al would be upset because they would too often find games where a team won due to refs calling penalties that everyone (refs included) recognized afterwards were marginal to mistaken. People know that these calls will happen due to humans' natural limitations, so they accept referees occasionally restoring balance when they realize they've accidentally tilted the ice/field towards one team. Specifically, the hockey world allows refs to make calls that hew to the vaguely drawn rule book even if they don't hew to the standard the ref has set forth in the game to that point.

If make-up calls happened at a higher rate, it would be perceived as refs completely abandoning integrity as they'd be calling penalties on plays that were clearly not violations of the rule book.

At Sunday, January 01, 2012 12:27:00 AM, Anonymous Wexler said...

3)"I'd have expected that part to go the other way, with the first two minutes being *more* than 50 percent, on the logic that the shorthanded team playing in the defensive zone is more likely to be forced to take a penalty"

This is evidence for the referee bias story, no? It seems highly unlikely that a team that has a man advantage and that for the most part is not having to defend in its own end would be called for the penalty 69% one is called- which is the case in the first min of their pp. It makes sense under the ref bias scheme that refs would be most likely to try and make their make-up calls right after the mistaken call so as to negate the advantage they conferred with the pp- IOW, to end the pp before a goal is unfairly tallied. If they cannot reasonably call a penalty on the pp team in the 1st minute, we'd expect the % to go down in the second minute of the pp (as you show that it does), because there is less of a chance of the pp team score now so the motivation to call a penalty is lessened. However, we'd still expect the observed % to jump after the pp has ended- even though the motivation to make a make-up call is lessened- since the refs will have much more opportunities to do so now that the pp team has to defend.

This strikes me as entirely consistent with the ref bias story, and inconsistent with the other two proposed explanations. There's just no way a power play team is actual committing more penalties than a short handed team.

At Sunday, January 01, 2012 1:09:00 AM, Anonymous Wexler said...

Here's two suggestions for testing the theory that the effect is caused primarily by referee bias:

1)Minors vs Double Minors & Majors

Break down the %'s into calls made after minors vs after double minors and majors (excluding fighting majors which are coincidental).

Presumably, refs are going to be more sure of themselves when calling double minors than when calling minors. It's more clear cut when a guy's stick comes up to hit a player in the face & make him bleed or when a guy boards someone from behind leaving the player prone on the ice than when a guy gets his stick in a position to hook or trip a player.

If most of the effect that you show is ref bias, than we should see the %'s (of alternating penalties) be lower after double minors and majors than minors, since refs will be less likely to feel they need to make a make-up call.

One problem with this is that refs still might have some subconscious compulsion to even up the minutes a bit since they imposed such a large penalty on one team. But I doubt that would be terribly large and I suspect we will see slightly lower %'s.

2)Home vs Road Team Make Up Calls

Find the % of the time that visiting teams are whistled for the next minor after the home team has been whistled for a minor. Then compare that to the rate that visiting teams are whistled for minors after they received the last minor, and to the % of the time visiting teams are hit with a minor overall.

We'd expect visiting teams to be whistled more than 50% in all cases (partly because home teams play better and partly because the refs are surrounded by 18,000 screaming partisans holding their 4th bottle of Labatt Blue in their hands). But if the visiting team is whistled most often in the case that a home team has recently been awarded a penalty, that would be consistent with the idea that the home crowd bias is amplified- IOW, that the refs are feeling extra pressure to even up the calls.

Actually, if we accept that there is a home crowd bias at all, then this gives us a prior. We should be more willing to accept the theory that penalties alternate because referees are subjects to these kinds of biases.

At Sunday, January 01, 2012 11:48:00 AM, Blogger Phil Birnbaum said...

Thanks for the comments, Wexler!

1. Right, "make-up call" is probably not the best term for "follow-up call that happens to be made to the other team." Maybe I should have used some other term.

2. That's a plausible theory. If I understand you right, you're saying that sometimes refs make a call that's marginal, and so they try to make an equally-marginal (but not *more* marginal) call on the other team when they get a chance.

I like it! But, is the ratio reasonable? For every 100 follow up calls, 20 extra ones go to the other team (60 vs. 40). That means that there are 20 "extra" calls to the other team. That means that 20 out of 100 original calls would have to be so marginal that the ref chooses to even them up.

It would actually have to be MORE than that. Because, what if either team takes a "real" penalty before the ref has a chance to even it out? That wouldn't show up in the "next penalty" stat, but it would still lead to an "other team" penalty later, since the ref still feels he has to call it. So it's more than 20%.

And, wouldn't we see referees varying much more in how many marginal penalties they feel they've called? They seem a little too homogeneous for it to be a subjective decision on their part.

But, maybe not.

3. Yes, I agree that the PP numbers are consistent with the ref bias theory.

Another possibility: maybe refs, in general, give more leeway to the team killing a penalty? That is, there are different unwritten standards for the PP team vs. the PK team.

At Sunday, January 01, 2012 2:53:00 PM, Blogger Phil Birnbaum said...

Let me correct my logic, I think.

Suppose there are 80 original penalty calls originally. Because the referee thinks some of them were marginal, he makes an extra marginal call for the other team. So, there are 80 subsequent calls, 40 for each team ... but then we know there are 20 more for the other team, because the ratio is 60:40.

So, 25% of all original calls must be marginal, if the ref is making extra calls: 20 for every 80.


Post a Comment

<< Home