Are soccer referees biased for the home team?
A little while ago, one of the economics blogs I read (I forget which one) posted a link to a recent (2007) home field advantage paper. The paper is "Referee bias contributes to home advantage in English premiership football," by Ryan H. Boyko, Adam R. Boyko, and Mark G. Boyko. It's a free download.
It's actually pretty clever what they did. What they figured is this: if home field advantage (HFA) is caused by referee bias, it stands to reason that different referees would have different levels of such bias. So they checked the individual HFAs of different referees, to see if the distribution matched what would be expected if they were the same, and any differences were random error.
At least I *think* that's how they did it. They used an "ordinal multinomial regression model," and, unfortunately, I can't explain that because I don't know how it works. The results look pretty much like a normal regression. They tried to predict goal differential for every game. To do that, they used crowd size and percentage of available seats filled. They had dummy variables for year. Most importantly, they also included expected goals for and against for the home and visiting teams, where "expected" means average for the games of that season, not including that game (but not adjusting for the fact that both averages will be one game biased for home/road). And, of course, they used the identity of the referee for the game.
From all that, they got that referees were collectively statistically significant, at p=0.0263. But that was the result of a Chi-squared test on the entire group of 50 referees, so there's no coefficient we can look at. So, we know the referees have statistically significant differences in HFA, but we can't figure out *by how much*.
It turns out, however, that the significance goes away if you omit one outlier referee from the study. That referee's HFA is a huge 1.2 goals. The mean was 0.412, and no other referee was higher than 0.7. When the authors exclude that one referee from the study, the statistical significance jumps to p=.18.
The authors provide a chart (Figure 1) with the HFAs of all fifty referees. From that chart, you can't really tell if the referees are all the same or not -- you need the significance test. To the naked eye, the differences look pretty consistent with what you'd expect if the differences are just random (except, of course, for the one outlier).
Only two out of fifty referees have negative HFAs (that is, they refereed games where the visiting team outscored the home team, on average). However, it does appear that the referees with lower HFAs are a little farther from the average han the referees with higher HFAs, for what that's worth.
So the question remains: *how much* is the difference in referees, as compared to HFA? We don't know. It would have been nice if the authors had given us some variances: how much would the variance be if there were no bias? Then we could subtract the theoretical from the observed, and conclude something like, "the variance of HFA bias amongst referees is X".
But, as I said before, the authors were so concerned about showing there IS bias that they didn't calculate HOW MUCH bias they actually observed.
One thing to keep in mind is that while it's possible to estimate the *differences* among referees, there's no way to know the actual level of bias. It could be that all referees are biased, but a bunch just happen to be a little less biased. Or it could be that almost all referees are unbiased, but a select few were able to overcome that, and it's those unbiased referees that are causing the statistical significance.
It's like, suppose one interviewer wants to higher all three of the black candidates interviewed for a position, and another wants to hire none of them. You can tell one or both of them is biased. But is it that one interviewer doesn't like blacks? Is it that the other interviewer is practicing affirmative action? Or is it a combination of both? You can't tell unless you know enough about the candidates to figure out which of them "should" have been hired by an unbiased interviewer.
Same thing here. We need to know what the HFA "really" would be if all referees were unbiased. But that, we don't know, and there's no real way to know from this study.
The authors of the paper acknowledge that, but nonetheless argue for the position that HFA is all refereeing, and that most referees are biased:
"Certainly, the [many referees biased option] seems more reasonable, especially given the floor near gD = 0 (no home advantage)."
I don't really understand that, and I don't really agree with it ... I think the most plausible assumption is to assume bias among the fewest number of referees that seems reasonable. Actually, I think the most plausible assumption is to note the huge outlier, and the lack of significance of the distribution of the others, and reach the tentative conclusion that (a) there's no evidence of bias in general, but (b) we should really look closely at the outlier referee, to see if we can figure out what's going on there.
Finally, even if there IS a difference in referees, it might not be bias in favor of the home team. It could just be style of refereeing. According to Table I of the paper, home teams score a lot more goals on penalty kicks than visiting teams do. Overall, the difference was 0.044 goals per game.
Suppose that certain referees are just less likely to call penalties -- say, 1/3 less likely. That would reduce the HFA on penalties from 0.044 goals, down to 0.03 goals -- a difference of 0.015 goals.
It's not much, but add in differences in yellow cards, red cards, free kicks, and so on, and see what you get. It could turn out that a significant part of variability in referee HFA could be referee characteristics that have nothing to do with the home team at all.
Hang on -- maybe we *can* get at least an upper limit for how much HFA the referees could cause. In Table 1 of the paper, the authors give home and road stats for yellow cards, red cards, and penalty kick goals.
-- Yellow cards: road teams get 0.45 more per game than home teams.
-- Red cards: road teams get 0.038 more per game than home teams.
-- Penalty goals: home teams get 0.044 more per game than road teams.
A red card sends the player off for the rest of the game (and the team plays a man short). I remember reading somewhere what that's worth, but I don't remember where I saw it. Let's say it's an entire goal.
A yellow card is a warning. It doesn't cost the team anything (other than a free kick), but, since a second yellow card leads to a red card, the player affected might play with a bit more caution. It looks like there are about 20 yellow cards for one red card. So, let's suppose the player with the yellow card would get the red card one time in 10 if he didn't adjust his play. That means the first yellow card gives him a 10% chance of costing his team a goal. If he "spends" the entire 0.1 goals on more cautious play, we could say a yellow card is worth 0.1 goals.
A penalty goal, obviously, is a goal. I think I read somewhere that there's very little HFA on penalty kicks, so we can assume that the difference is the number of penalty kicks awarded.
So, let's add this up:
0.45 yellow cards times 0.1 goals equals 0.045 goals;
0.038 red cards times 1 goal equals 0.038 goals;
0.044 penalty successes times 1 goal equals 0.044 goals.
The total: 0.127 goals.
What else could the referees be doing to influence the outcome? Well, there's free kicks. And there's extra time -- some studies have suggested that the refs allow more injury time when the home team needs it. But those seem like they'd be much smaller factors than the ones above. Let's bump up the total from 0.127 to 0.15.
Also, it could be that visiting teams have to play more cautiously because of referee bias, and those numbers are artificially low because they don't include the effects of that. We included the effects of caution in the yellow card calculation, but not in the others. I don't know how to estimate that. It could be anything, really, from 0 percent to 1000%. However, if it were seriously high, someone would have noticed how teams play so much more aggressively at home than on the road. Certainly it would be mostly a conscious decision, so players would talk about it all the time, how they have to play so much more timidly to avoid provoking the referee.
Since that doesn't happen much (or does it?), it seems reasonable to assume that there's not much of that going on. Still, I'm going to set it high, and assume the effect of unpenalized cautious play is 50 percent of the total. That unrealistic assumption brings us up to about 0.22 goals.
We're still at just a little over half of observed HFA.
And, to get to half, we had to make some seriously unrealistic assumptions -- that ALL of the difference in yellow cards, red cards, and penalty kicks was due to referee bias against the visiting team, and that players are compensating with another 50 percent on top of that.
So, Table 1 of the paper is the strongest evidence I've seen that referees can't be causing much of HFA. And no regression is required -- it's just simple arithmetic!