Sabermetric Research: Do hockey fights lift a team's performance?

It's been said that when an NHL team needs a lift, a fight can jolt it out of its complacency and make it better. And, just a few days ago, the media cited a study by researcher Terry Appleby, of powerscouthockey.com, showing that momentum (in terms of shots on goal) usually increases for at least one team after a fight.

But, if *either* team can benefit from a fight, what's the point? You want to know if *your* team can benefit from a fight, at least more than the other team does.

The problem is: how can you know that? A fight involves both teams, so if it helps one, it hurts another by the same amount. If you look at both teams, you'll always find the total effect to be zero.

So, the "fighting helps a team" theory has to say *which* team is helped. The most logical interpretation would be that that the fight helps the team that instigated it.

If you're going to study that, you need to know which team is the instigating team. That's tough to figure out from historical data. But, one shortcut would be to assume that the team that generally gets involved in more fights is the team that's more likely to have instigated. The 1974-75 Philadelphia Flyers took 76 fighting penalties (actually, 76 offsetting majors, which I used as a proxy for fights). That same season, the expansion Kansas City Scouts took only 19. It seems fair to assume that if a fight broke out at a Flyers/Scouts game, it was the Flyers who were likely responsible.

On that assumption, I decided to check.

Using data from the Hockey Summary Project, I looked at fights between 1967-68 and 1984-85, and checked to see how the more-likely-to-fight team did in the remainder of the game. Then, I found a control game to match it with. The result was two large groups of games, which could then be compared.

I'll give you an example of how the controls were found.

On Feburary 16, 1969, the Bruins played the Black Hawks at Chicago Stadium. Just as the first period ended, with the score 2-0 Chicago, the Bruins' Don Awrey got into a fight against Stan Mikita of the Hawks.

I looked for a game to serve as the control for that Boston/Chicago game. What I wanted was:

1. A game in the same season, the season before, or the season after;
2. ... where the home team had the same size lead at that same time of the game;
3. ... and where the two teams were of roughly similar relative quality.

#1 and #2 were non-negotiable (except that all differences of 4 or more goals were considered the same). But, for #3, the quality only had to be close, within two goals (which I'll explain in a minute).

I started pulling random games until I found one that matched all three requirements. In this particular case, the control wound up being the Bruins vs. Rangers game of February 23, 1969.

That game qualifies under the rules because

1. 1968-69 is in the same season as the original;
2. That game had the home team also leading by two goals at 20:00 of the first period, and
3. The two sets of teams are of similar relative quality.

Now, let me explain #3.

In 1968-69, the Bruins were +82 in goal differential (303 goals for, 221 against). The Black Hawks were +34 (280-246). So, for the original game, the home team was 48 goals worse than the visiting team.

Since the control game was the same year, the Bruins were still +82. The Rangers were +35 (231-196). So, in the control game, the home team was 47 goals worse than the visiting team.

Since "47 goals worse" is within two goals of "48 goals worse," that's close enough for the Bruins/Rangers game to serve as a control. If it hadn't been within two goals -- which is most of the time -- that game wouldn't have qualified under #3, and I would have tried another random game. (If there were absolutely no games that qualified under #3, I would have taken the one where the team quality was closest in goals. If none of the random games had qualified under #1 and #2, I would have thrown the original game out of the study -- but that never happened.)

OK, so now we have our real game, and our control game.

Which team in our real game are we going to expect to have gotten the "lift" from the fight? In 1968-69, the Bruins had 41 fights, but the Black Hawks had only 20. So, the assumption is that the fight was more the work of the Bruins, and they should be the ones expected to benefit.

How did the Bruins do in the rest of the game relative to the Black Hawks?

Well, the final score was 5-1 Hawks. Since it was 2-0 at the time of the fight, that means the Black Hawks outscored the Bruins 3-1 in the remainder of the game. In other words, a "minus 2" goal differential for the visiting Bruins. (I excluded any goals in the last three minutes of the third period, to make sure empty-net goals didn't screw things up.)

What about the control game? That game actually wound up 9-0 Rangers, which means 7-0 Rangers from the fight to the end of the game. Since the "real" game was relative to the Bruins, the visiting team, we also want to express the control game from the standpoint of the visiting team. So that's "minus 7".

So, our score so far is:

Actual games: -2.0 goal differential for the fighting team

Control group: -7.0 goal differential for the control team

So far, it looks like fighting helps, by five goals a fight!

Of course, that's only one game. I repeated this process for every fight from 1967-68 to 1984-85. Actually, not *every* fight. First, I included only fights where one team appeared to be significantly more aggressive than the other (specifically, where the two teams were 10 or more fighting penalties apart for the season). Second, I included only first- or second-period fights, to increase the amount of time for the "lift" effect to make itself felt.

Even with those restrictions, there were 2,834 fights total. The results:

Fighting teams ... -0.04 goals
Control group .... -0.02 goals

The team with more fights was 0.04 goals worse than the other team over the remainder of the game. It "should have" been 0.02 goals worse. (Both numbers are negative probably because the teams that got in more fights were slightly worse teams overall than their opponents.)

So, there seems to be a small, negative effect: a team loses one additional goal for every 50 fights. But, that difference isn't even close to statistically significant. It's less than one SD from zero. (The two individual SDs are about 0.04 each, so the SD of the difference is around 0.06.)

Conclusion: it doesn't appear that fighting helps a team.

-----

Maybe a difference of 10 fights a year isn't enough to separate the two teams? I redid the study, but required the teams to be 20 fighting penalties apart. That reduced the sample size to 1,581 each group. The results were about the same (the +/- in parentheses is the standard error):

Fighting teams .... 0.00 goals (+/- 0.05)
Control group .... -0.03 goals (+/- 0.06)

-----

Looking at the entire database, I found that the average fight starts with a goal differential of 1.617. The average goal differential in all other games, weighted by the times of fights, is 1.421 goals. So, it seems like fights start when the game is a little more lopsided than usual.

So, maybe it's the team that's *trailing* that starts the fight, in an effort to wake itself up. Maybe we should look at trailing teams, not goonier teams.

I tried that. I threw away all situations where the score was tied when the fight happened, and looked at all the rest. The results:

2,941 datapoints
----------------
Trailing teams ... -0.20 goals (+/- 0.04)
Control group .... -0.19 goals (+/- 0.04)

Again, no real difference.

-----

Trying again, but looking only at fights where one team was trailing by at least three goals:

591 datapoints
--------------
Trailing teams ... -0.25 goals (+/- 0.08)
Control group .... -0.29 goals (+/- 0.08)

Nothing there, either.

-----

Is it possible that the benefit accrues only to GOOD teams trailing by three goals? Those are the teams playing the worst relative to their abilities, so the "wake up" effect should be strongest. Here are teams trailing by 3 goals that were at least +30 in goal differential for the season:

122 datapoints
--------------
Trailing teams ... +0.14 goals (+/- 0.17)
Control group .... +0.17 goals (+/- 0.16)

Nope. What if we look at good teams trailing by *any* number of goals?

841 datapoints
--------------
Trailing teams ... +0.40 goals (+/- 0.07)
Control group .... +0.26 goals (+/- 0.07)

Aha! This time, there's a small "lift" effect, at about 1.4 SD. But, why would there be an effect for teams trailing by 1 goal, but not for teams trailing by 3 goals?

I got curious and ran the same study again, and this time the random control group came in at +0.33, bring the difference down to 1.0 SD. (Of course, it's not appropriate to dismiss the first result just because the second one came out less extreme.)

-----

At this point, you might reasonably argue that the rules "team with more majors that year" and "team trailing in the game" are not precise enough in selecting teams that started the fight. So, this time, I assumed the fight was started by the *player* with the most majors that season, rather than the *team* with the most majors that season. So when the goon of a pacifist team starts a fight with a pacifist of a goon team, you go with the goon player on the pacifist team. The results:

4,185 datapoints
----------------
Goonier Player ... +0.01 goals (+/- 0.03)
Control Group .... -0.02 goals (+/- 0.03)

Again, less than 1 SD difference. There's not much difference between this "goon player" breakdown and the previous "goon team" breakdown, probably because most of the goonier players also played on goonier teams. But it was worth a try.

-----

Finally, one last try. For this run, I combined all three criteria. To be included in the study:

(a) one team had to have at least 20 more majors for the season than its opponent;
(b) that same team's fighter had to have more majors that year than his opponent; and
(c) that same team had to be trailing in the game.

This *has* to work, right? I mean, that pushes all the right buttons: a truculent team, with a figher selected for that purpose, behind in the game and likely to be needing a lift. If *those* teams don't benefit from the fight, then who would?

I expected the same non-result, but, this time, we get the biggest effect so far:

364 datapoints
--------------
Teams qualifying ... -0.18 goals (+/- 0.10)
Control group ...... -0.38 goals (+/- 0.11)

There's a difference of .20 goals -- almost a fifth of a goal per fight! Taken at face value, that means that when a team like that starts a fight, it benefits by even more than a power play (which has a 15 to 20 percent success rate).

That difference is still only about 1.4 SD from zero. Still, I hate to just dismiss it. I've always thought that if you get a result that's significant in the real-world (hockey) sense, but it's not statistically signficant, that's a problem with your study -- it's just that you haven't used enough data to be able to prove anything. We should still be open to the possibility that the effect might be real.

I ran it a few more times, to check if maybe the control group was just a random outlier. The extra results:

Control group: -0.26 goals
Control group: -0.21 goals
Control group: -0.27 goals
Control group: -0.38 goals (again)
Control group: -0.38 goals (again)
Control group: -0.29 goals

So, the original run was a little extreme, but not much.

There are, however, some mitigating factors. First, the control group numbers aren't all independent, since there's a limited number of control games to choose randomly from. Second, we obviously can't do extra runs to reduce random chance in the *real* games, but it's still possible those teams scored more goals for random reasons having nothing to do with any lift they got from the fight. Third, the SDs of both groups are a bit understated: I calculated them based on the assumption that games are independent, but they're not -- a real game appears in the study multiple times, once for each fight, and a control game could get randomly selected more than once, too.

If you average the seven control groups in the seven repetitions of the study, you get -0.31 goals. That's 0.13 goals worse than the actual games. Taking into account the fact that we ran the control group five times, the 0.13 difference is now around 1 SD.

Oh, and this is as good a time as any to emphasize that I could also have screwed up somewhere ... I've already had to rerun everything once when I found a misplaced parenthesis in my code.

-----

So, I guess, our overall conclusion from this study isn't completely certain. We wind up with a summary like:

1. The effect doesn't seem to exist for run-of-the-mill fights.
2. When a goon fighter on a goon team fights when his team is down, it seems to benefit that team by 1/8 of a goal, or a bit less than a normal power play.
3. But, that effect isn't statistically significant, so we have some doubts that it's real.
4. And, with only 364 such datapoints qualifying out of around 5,000, only a small percentage of fights match the criterion for that kind of boost.

If you had to reduce that to one line, it might be:

At best, there might be a small effect in certain specific circumstances ... but much, much less than sportscasters make it out to be.

UPDATE: Part 2 is here.

Labels: fighting, goons, hockey, NHL

Sabermetric Research

Tuesday, January 10, 2012

Do hockey fights lift a team's performance?

2 Comments:

About Me

Previous Posts