Sabermetric Research: NHL: Does the two-referee system deter on-ice evildoers?

If criminals are rational, they should commit fewer crimes when the chances of getting caught increase. This is the “economic model of crime,” and economists are fond of trying to find data to show that fear of getting caught means that increased enforcement leads to fewer offenses.

In this 2002 paper, “Testing the Economic Model of Crime: The National Hockey League’s Two-Referee Experiment,” economist Steven Levitt tries to test this theory using NHL hockey data.

For most of NHL history, there was only one referee on the ice. Recently, the league switched to a two-referee system, in order to be able to spot and call more penalties.

Should we expect more penalties, or fewer, in games with the additional referee? Economic theory says that we can’t know just from theory. Either result is possible.

On the one hand, the extra referee means that more offenses will be spotted and called. That means more penalties. On the other hand, the fact that offenses are more likely to be called means that players are less inclined to commit them. That means fewer penalties.

In theory, there’s no way to figure out which effect is larger. Penalties could go up, or they could go down. You have to look at real-life data to find out.

As it happens, there’s an ideal body of data to look at for this question. In the first half of the 1998-99 season, to allow the NHL to evaluate the new system, about one-third of games were played with two referees, and the other two-thirds were played using the old system of only one referee. This unexpectedly provided Levitt with a natural control on which to base his study.

In that sample, there were 10.33 minor penalties called in the 510 games with one referee, and 10.9 penalties called in the 270 games with two referees. That’s a smaller difference than I expected – only a 5.5% increase -- and it’s barely statistically significant (z = 1.9). It appears that roughly, the two effects cancel each other out – more infractions are being called at about the same rate as players are choosing to play nicer. The change in enforcement roughly equals the change in deterrence.

What Levitt wants to do, though, is to break out the two effects separately, to see if the economic theory of crime indeed holds – that is, whether players are indeed responding to the increased enforcement by committing fewer offenses. For that, he wants to isolate the absolute change in number of offenses. And based on only this data, there’s no way to tell.

It could be that players aren’t changing their behavior at all, but the second ref is seeing more of their offenses, causing those extra 5.5% of penalties. It could be that players are committing only half as many fouls as before, but the second ref more than doubles the chance of a foul being spotted. Or it could be any other combination of X% fewer penalties and Y% more enforcement, where the combination of X and Y multiplies out to 5.5%.

Separating the two factors out is a hard problem. Levitt thinks about it a bit, and then comes up with a set of assumptions that make it possible. As it will turn out, one of the assumptions is questionable, and so the idea doesn’t quite work – but it’s an intriguing attempt nonetheless.

What Levitt does is assume that all penalties are committed defensively. That is, players will take penalties only in an attempt to prevent the opposition from getting an immediate scoring chance. If players are rational, they will take a penalty only when the expected cost of the offense is less than the benefit of curtailing the scoring chance.

For instance, suppose a single referee calls a penalty only 50% of the time. A penalty has a “linear weight” [my term] of 0.17 goals, Levitt finds. So a hook or a slash costs 0 goals if not called, and 0.17 goals if called, for a linear weight of 0.085 goals. A rational player will therefore commit the hook if and only if it reduces the immediate probability of a goal by more than 0.085.

However, suppose the addition of an additional referee increases the chance of getting caught from 50% to 75%. Now, the linear weight of a hook rises to 0.1275. So the defense will hook only to reduce a scoring chance by 12.75% or more. So, with the second referee, all those hooks that previously occurred between 8.5% and 12.75% will no longer happen. That’s the amount of deterrence we want to measure.

If that deterrence is actually happening, what evidence would it leave behind?

Answer: more even-strength goals.

Previously, the defense had lots of profitable opportunities to foul the opposition for free, and prevent them from scoring. Now, because of increased enforcement, some of those goal-prevention opportunities are no longer profitable. And so, the defense has to allow the offense more scoring chances, and the opposition scores more often.

(To see that more vividly, consider the extreme case. If penalties are never called, opponents are manhandled off the puck whenever they cross the blue line, and no goals are ever scored. If penalties are always called, opponents are left alone and have a chance to score. So increased enforcement means more goals.)

So what do the data show? Only a 5% increase in even-strength goals. That means, under Levitt’s assumptions, that not many penalties are becoming unprofitable. Furthermore, and after doing a bit of algebra, Levitt finds that the numbers show that the second referee actually leads to an increase in offenses committed! It’s only a 1.7% increase, but it’s still the opposite of what you’d expect – more enforcement should lead to less crime, not more.

But Levitt’s equations also allow him to estimate the change in probability of getting caught. And it’s small, which explains why fouls are committed as often as ever.

That is, under Levitt’s assumptions, the data show that because the second referee didn’t help much in spotting offenses, players didn’t have any reason to stop hooking and slashing. He writes,

“while the result … might superfically argue against the deterrence hypothesis … the true explanation … seems to be that there was no discernible change in the probability of detection.”

Intutively, that doesn’t make sense; adding a second referee should substantially increase the chances of catching offenders. For instance, suppose both referees watch the action the same way, and each independently has an 80% chance of catching an offender. In that case, one referee will catch 80%, and the second will catch 16% more (80% of the 20% the first guy didn’t see). That’s a 20% increase, which is substantial. And when you consider that the second referee is positioned specifically to try to spot what the first referee cannot, the real-life number should be even higher than 20%.

The problem, of course, is Levitt’s assumption that all penalties are defensive. I’m not an expert, but from watching hockey I know that’s not even close to being the case – penalties are common in the offensive zone. (If they weren’t, few teams would be penalized while on the power play, but, of course, many are.) Further, many penalties are taken for purposes of intimidation or retaliation, which can happen anywhere.

The change in even-strength scoring, and thus the calculation of deterrence, is very sensitive to what percentage of penalties are defensive, what percentage are offensive, and what percentage are neither. Any combination where defensive equals offensive – say, 40% defensive, 40% offensive, 20% neither -- will make the numbers come out roughly equivalent to Levitt’s.

That is, there are two competing explanations for the observations:

(a) Penalties are fairly evenly divided between offensive and defensive; or
(b) Penalties are all defensive, but the second referee does not lead to an increase in enforcement.

It seems obvious that (a) is a much more reasonable explanation than Levitt’s (b).

So if there’s one criticism I have of this paper, it’s that I wish that Levitt would have made that point more clear, that the results are unreliable because of his assumption. He does imply that in a footnote (page 8, footnote 12), but I’d have also liked to see it stated more explicitly. As the paper stands, the reader who examines only the abstract and conclusions will be led to believe that the “second referee doesn’t make a difference” conclusion is strongly shown. And I don’t think it is.

In any case, this is a fun study. We learn a bit about hockey. We find out that adding a referee increases penalties by 5% but has little effect on scoring. And Levitt also shows us that fighting penalties decreased by 14%. He conjectures that’s because fights arise from escalations of uncalled fouls, and the second referee reduces those uncalled fouls. That makes sense to me.

But mostly, I was intrigued by Levitt’s idea for solving the problem of separating out the effects of deterrence. It’s a creative “Eureka!” kind of discovery -- that if you increase enforcement on the defense, it leads to less obstruction, better scoring chances, and therefore a measurable change in even-strength goals. That insight alone makes the paper worth a look.

Sabermetric Research

Friday, October 13, 2006

NHL: Does the two-referee system deter on-ice evildoers?

2 Comments:

About Me

Previous Posts