May, 2012, "By the Numbers" now available
Recently, I found out about "Econ Journal Watch," a journal and website that debunks bad papers. According to its website, EJW
"watches the journals for inappropriate assumptions, weak chains of argument, phony claims of relevance, and omissions of pertinent truths."
“Although our results indicate that player race has no statistically significant effect on baseball card prices, we are mindful of Ziliak and McCloskey (2004, 334) who note that "statistical significance, to put it shortly, is neither necessary nor sufficient for a finding to be economically important." The estimated coefficient on the Black dummy variable indicates that the price of a black player’s rookie card, all else fixed, is 9.3% lower than that of an otherwise identical white player."
It occurred to me that it might be possible to predict, or explain, the difference in home field advantage between different sports, based on their rules and outcomes. In this post, I'll just talk about the "absolute" home field advantage, in terms of goals or points. (The translation to winning percentage is easy after that, but I'll save that for a future post.)
Take a look, and let me know what you think.
Here's the home field advantage (HFA) for three different sports, in terms of goals or points.
0.453 - Premier League Soccer (2010-11)
4.000 - NBA (estimate)
0.783 - NHL (1980-81 to 1984-85)
As expected, the HFA is highest for basketball, where the most points are scored, and lowest for soccer, where the fewest "points" are scored. Still, they're very different in terms of rates. Here are the percentages by which the home team outscores the visiting team:
38% - Premier League (+0.453 home goals per 1.1737 road goals)
4% -- NBA (roughly, +4 per 100)
27% - NHL (+.783 per 2.937)
These HFAs are all over the place. In basketball, the home team only outscores the visiting team by 4 percent. But, in the NHL, it jumps to 27%, and in soccer, it's almost 40%!
Why the big differences?
It has nothing to do with the length of the game. If the home team scores 4% more points over 48 minutes, you can also expect it to score 4% more points over a minute, a quarter, or a season. (It's the percentage of how many more *wins* the home team gets that depends on game length, but, again, that's not what we're discussing in this post.)
So, what is it then? There are probably many contributing factors, but I think the biggest has to do with the structure of the individual games. That's because it's easy to get a higher or lower HFA just by changing the rules.
Start by looking at the NBA, where the HFA is about 4 points a game. Let's change the way basketball works, to move that difference away from 4 points.
In fact, let's do that while keeping many aspects of the game constant. We'll stay with a game where each team gets 100 possessions, has to throw a ball through a hoop on a basketball court, and has an average score of 100. We'll just change the "internal" rules.
Suppose we change the game to consist of only foul shooting. Each possession, the team gets two foul shots. If it sinks them both, it gets two points. Otherwise, it gets zero. We can assume the average player shoots 71%, so that the probability of two straight makes is almost exactly 50%. (If 71% seems a bit low, just imagine that we make the hoop a bit smaller at the same time we change the rules.)
We have empirical data that lets us figure out what HFA would be, thanks to King Yao, who compiled home and road free-throw percentages for a few recent NBA seasons. The numbers were:
75.95% home team
75.72% visiting team
For the chance of making two straight shots, then, we can just square those numbers:
57.68% home team
57.34% visiting team
The difference is 0.34%. Over 100 possessions, that's .34 extra scores, or around 0.7 extra points. That's much smaller than the 4 point HFA in "real" basketball. We've reduced HFA by 80 percent just by changing the rules!
How can we construct a game where HFA is higher? Again, that's easy -- try "double or nothing" basketball.
In that game, when you score a field goal, you don't get the two points yet. Instead, you immediately get a second possession, and you have to score on that one too. If you do get two in a row, you get 4 points. If you don't get the second one too, you get zero.
A game consists of 100 possessions for each team (so each team will get somewhere between 100 and 200 attempts to make a field goal).
In this game, the HFA will be roughly double. How do we know? Well, the real life HFA is 4 points, and each team gets roughly 100 attempts. So, we can guess that, in normal basketball, the home team might score on around 52% of attempts, while the visiting team will score only on 50%.
But, now, each team has to make two in a row. The home team will do that around 27% of the time (52% squared), while the visiting team will be at 25% (50% squared). That's still two extra scores per game, but now each score is worth four points. So, instead of winning 104-100, the home team will win 108-100.
The change in the rules has increased the HFA from 4 points to 8 points -- from 4% to 8%.
So: three different hoops games, three different HFAs. That shows that you have to examine the rules in order to understand where HFA is coming from. It can't be just crowd influence, or referee bias, or familiarity with the home court, or anything like that. Those are things that could *cause home field advantage to exist*. But they aren't things that could, on their own, cause the level of home field advantage to *vary between sports*. For that, you need to examine the rules.
So, is there a factor that explains how the HFAs change for the three games?
It seems to me that the answer is: the level of "compounding" of events, the number and difficulty of the things that all have to go right for you to score.
For the "two in a row" game, you need to score twice in a row, not just once. If you have a 4% advantage on each one, you'll have an 8% advantage on two of them compounded. (1.04 squared is about 1.08.)
For the foul shooting game: In "real" basketball, there's more than just shooting. To score a field goal, you have to do a whole bunch of things right. For instance, you have to (a) pass the ball around accurately; (b) deke out a defender enough to get a good shot; (c) have the other members of your team distract the other defenders so they can't block; and then (d), take an accurate shot.
That's four things that might all have to go right. From the discussion above, we know the HFA for two consecutive foul shots is 0.34%. Suppose each of those four of those things, from (a) to (d), have that same 0.34% advantage. Then the home team gets an advantage of roughly four times 0.34%, or roughly 1.4 percent.
That hypothetical only gets us to 1.4 percent, not to 4 percent. That suggests that more than four compoundings are necessary -- maybe as many as 10. That's certainly reasonable. Foul shooting seems to be something that's simpler than most other basketball skills. We assumed, for convenience, that "taking an accurate shot" was exactly as complex as foul shooting ... but it might be twice as complex. You have to shoot accurately, but first you also have to judge the shot. If the other three things are also twice as complex, which doesn't seem implausible, then we have eight compoundings.
Anyway, the point is not to get this particular example to work out perfectly, but to show that it's at least a decent approximation.
The "compounding" explanation also seems to work if you compare hockey to basketball.
In basketball, the net is unguarded. In hockey, there's a goalie trying to stop the puck. The goalie has his own HFA, while the hoop presumably does not. So, that's at least one extra compounding in favor of hockey.
In basketball, it's difficult to force a turnover; it's a fairly rare event. In hockey, it's easy, since physical play is allowed -- you just run into the puck carrier, if you can, and dislodge him.
So, in hockey, part of the goal scoring process is avoiding checks from the opposition. If four players touch the puck before a shot, and each one has the same chance of losing the puck as the entire team has in a basketball possession ... then you have three extra compoundings, since there's an additional HFA for each of the four players.
In soccer, it's even more extreme: it isn't unusual to take 10 or 15 passes before you get a decent chance at a shot. So, ten things have to go right. If each pass in soccer has the same chance of being intercepted as a pass in basketball, but soccer requires three times as many passes before a goal ... well, now you have three more compoundings.
This is all theoretical, of course, but we can check the numbers to see if they're reasonable.
In soccer, the absolute HFA was 38 percent more goals. In basketball, it was only 4 percent. To get from 4 percent to 38 percent, you need about eight times as many compoundings (since 1.04 to the eighth power equals approximately 1.38).
Is eight compoundings reasonable? I think it is, because we can get a similar result another way.
In the NBA, about 50 percent of possessions result in a score. In soccer, it's probably, what, around 2 percent?
If soccer had two compoundings to every one basketball compounding, the scoring rate would still be 25 percent (you'd have to do something with a 50 percent success rate, twice). If it had three compoundings, it would be 12.5 percent. Four compoundings, about 6 percent. Five compoundings, 3 percent. Six compoundings, 1.6 percent, and we're there.
So, a naive estimate is that it to score a goal in soccer, you have to be skilled enough to do what it takes to score a goal in basketball, six consecutive times.
We were expecting 8 compoundings from the "compare HFA" argument. The "probability of scoring" method suggests 6 compoundings.
Not bad! Those two estimates, 8 and 6, are pretty close. Why aren't they closer?
Well, it could be that some of my estimates were off, like the one where I guessed that 2 percent of soccer possessions score.
Or, it could be that there's a large difference in competitive balance between the two leagues. The more lopsided the talent, the lower the HFA (when the better team is so good that it always wins, the HFA is obviously zero).
But, most importantly, it could be that there are factors outside of "compoundings". For instance, referee bias, which need not be anywhere near the same order of magnitude as player HFA. Actually, that could very well be it: in soccer, the referee can have a very large impact on the game. In the NBA, a blown call is worth a couple of points out of 100. But, in soccer, a blown penalty call could be one goal out of two.
Anyway, if you buy the idea that these two estimates of compounding should be the same, that suggests a way to get a rough estimate of what HFA should be in other sports, at least for sports that are similar to basketball/hockey/soccer, in the respects we used in our arguments. Specifically:
-- you can divide the game into possessions
-- each team gets roughly an equal number of possessions
-- you can only score once per possession
-- you can estimate a probability of an average team scoring on each possession
-- what keeps you from scoring at will is a defense that's similar to defenses in basketball/hockey/soccer and also subject to HFA (which excludes, say, foul shooting or skills competitions)
-- referee bias is roughly the same order of magnitude as for basketball/hockey/soccer.
To get an estimate, you start with a known sport and a known HFA, and you adjust it by the differences. Let's use the NBA as our reference point. It has a 50% percent chance of scoring on each possession (0.5), and an absolute HFA of 4% (1.04).
We now adjust for the number of compoundings based on the difference in probability of scoring on a possession. That leads to this formula:
Let p be the probability of scoring on a single possession in your particular sport. Then:
HFA = 1.04 ^ [log(p)/log(0.5)]
(Checking that it works for the "double or nothing" basketball variation: p=0.25 gives HFA=1.08, which is 8% more points, which is correct.)
That's your rough estimate of HFA. I emphasize it's *rough*. You'd still need to adjust it (slightly) up if your league has more competitive balance than the NBA, or down if it has less. And you'd have to adjust it up (perhaps substantially) if you think the effect of referee bias on HFA is higher, or down if you think referee bias is lower.
And, of course, there might be other factors I haven't thought of.
Does someone want to try this for other leagues or other sports, and see how close it comes? I'd be curious to see the NLL, which has, maybe, 12 goals per team per game. The problem is estimating possessions, which is hard for lacrosse but easier for, say, the WNBA.
The more games in a season, the more likely the best teams will rise to the top, and the worse teams will fall to the bottom. That's just common sense, and the law of large numbers.
Similarly, the more innings in a game, the more likely the best team will win. If you put the whole season into a single 1,458-inning game, there's no doubt that (for instance) the Yankees would beat the Twins.
Home field advantage (HFA) is one of those things that makes teams better. And so, the longer the game, the more likely the home team's advantage will show up in the results. HFA for a 3-inning game would be smaller than HFA for a 9-inning game.
So, it's easy to compare HFA within one given sport. But how do you compare two? According to "Scorecasting," from 1989 to 1999, in the NBA, home teams went .605. In the NHL, they went .557. Why the difference?
In the past, I've used an argument that I now somewhat regret. It went something like this: "We know that a longer game means a higher HFA. Therefore, basketball must be "longer" than hockey in some sense. Perhaps there are more confrontations between players, or something, which allows the HFA to expose itself more easily."
But now, I think, that line of thinking is too vague. It's almost a circular argument. "Why is the NBA higher?" "Because the game is longer." "What do you mean by longer?" "I don't know exactly, but it's the attribute of NBA games that makes home field advantage bigger."
It's like, suppose we don't know what causes lung cancer, except smoking. And then we find a country that has a high rate of lung cancer, even though they don't smoke much. Do we say, "that country must be somehow 'cigarettier'?" That would be silly.
And, in any case, we can do better. There are identifiable reasons why the NBA home record is higher than the NHL home record. They don't solve the problem entirely, but at least they're concrete factors.
I'm going to start by calculating the theoretical HFA for the National Hockey League, step by step.
From 1980-81 to 1984-85, the home team outscored the visiting team by .70619 goals per game.
The home team scored an average 4.264 goals per game. Since it's typically assumed that goals have a Poisson distribution, the SD of goals per game is the square root of that, or 2.065. (It's a property of the Poisson distribution that the SD is the square root of the mean.)
The visiting team scored an average of 3.557 goals, for an SD of 1.886.
So, the SD of (home team - visiting team) is 2.797.
Therefore, if the two teams were exactly equal, the goal differential would be a normal curve with mean 0, and SD 2.797.
But the HFA makes those two teams unequal, by .76019 goals. Divide that by 2.797 and we see that they're unequal by 0.271 of an SD. Therefore, the home team wins if the random outcome is greater than -0.271 SDs.
Going to a normal distribution table, that probability is 0.607.
In those actual NHL games, the home team actually had a winning percentage of .592. Not bad!
Why is our theoretical estimate too high? Well, one reason is that our calculation assumed two equally talented teams. But in real life, there are always differences in talent, sometimes large ones.. And HFA decreases as the talent gets more uneven. (If an .000 team plays a 1.000 team, the HFA is obviously zero.)
So, that's one reason our estimate is too high. It's probably not all of it.
Now, let's go back to game length. We all agree that if we increased the length of an NHL game, say from 60 minutes to 120, the HFA would increase.
Suppose the league does that. But, at the same time, it decides to also reduce the number of goals scored. Now, every time a goal is scored in the six-period game, the referee flips a coin. If it's heads, the goal stands. If it's tails, the goal doesn't count.
That means the average game score is the same. The distribution of goal differential is the same. The only thing that's different is the length of the game -- the number of confrontations between players, and the length of time one team has to show it's superior to the other team.
So, we should expect the HFA to go up, right?
It doesn't. It stays the same. (actually, it goes down a bit, but never mind.)
In the old NHL, home goals had a Poisson distribution with mean 3.557. And in the new NHL, home goals *also* has a Poisson distribution with mean 3.557. The distributions are identical, because Poisson applies (as an approximation) to any rare events. Whether it's over 60 minutes or 120 minutes, 3.557 goals qualifies as rare.
So if we repeat the calculation for HFA, every step is exactly the same as before! And so we get the same answer.
So if "length of the game," in terms of confrontations or action, doesn't matter, what *does* matter?
Goals. The more goals scored, the higher the Poisson mean, and so the higher the SD of game results. That means more randomness, a wider spread. If there's a wider spread, that means the HFA of 0.706 goals is smaller relative to luck. And so, it has less opportunity to express itself, and we get a lower HFA.
Just to give you an example: suppose the average goals per team increases to 6. That means the SD of a game difference is 3.46. The difference of 0.706 goals is now only .204 of an SD, which gives you 58.1 percent of a normal curve. So the HFA drops to .581.
Goal difference is part of the reason that I got a theoretical HFA of .592, but "Scorecasting" showed an actual HFA of only .557. The Scorecasting study used the ten seasons ending 2009, when scoring was historically low. I used 1980 to 1984, when scoring was historically high.
So, have we found an answer? Can we say that one reason why basketball HFA is higher than hockey HFA, is that basketball has so much more scoring? Well, yes and no. Yes, scoring is part of it, but, no, we can't use this particular argument, because basketball is not Poisson.
Indeed, non-Poisson-ness is one of the factors boosting the NBA home field advantage. As it turns out, the farther the distribution gets from ideal Poisson, the lower the random variance. And lower randomness boosts HFA, by providing less noise to drown out the HFA's signal.
If the NBA switched to Poisson, by making the game 20 times longer, and making baskets 20 times harder to achieve, HFA would go down, even though scoring would stay the same.
Well, not necessarily. It depends whether teams change the way they play under the new system. The home advantage in the NBA is, what, 3 or 4 points a game? If the hoop became 20 times harder to hit, that 3 or 4 points might change to something else entirely, and we'd have to recalculate.
So we have two factors affecting HFA so far, all else being equal:
1. Non-Poisson-ness increases HFA.
2. More scoring decreases HFA.
You can probably think of more factors. I've got a couple I'll save for a future post.
Both teams get 100 possessions. The team leading at that time gets to pad its score by taking extra possessions until it has the same number of misses as the trailing team.
UPDATE, 4:45pm Sunday: Oops! I had to add another condition to make the puzzle work. See below.
(Note: This time, the puzzle has some sports content.)
The rules in the Oversimplified Basketball Association (OBA) are as follows: Each team takes turns getting a possession of the ball. If they score a field goal, they get two points. There are no three-point shots, rebounds, or fouls. If there's a turnover or a missed shot, the referee blows the whistle and the other team inbounds the ball to begin their possession.
So, all possessions are independent and identical. The game lasts exactly 100 possessions for each team. (Except that if the score is tied, each team takes one extra possession to try to break the tie. If the score is still tied after that, each team gets one more, and so on.)
The average field goal percentage in the OBA is .500, so the average team scores 100 points. But some teams have a field goal talent of, say, .550, and score an average 110 points a game, while others have a talent of, say, .400, and score only 80 points a game.
One year, the owners decide the fans like blowouts more than close games. They suggest changing the rules slightly.
Under the new proposal, if a team makes a field goal, it gets to keep the ball for another possession. Only when they miss does the ball get turned over to the other team. The game ends when both teams have an equal number of misses, and they both have passed 100 possessions. (If the game is tied at that point, the game continues, until each team has missed once more. If it's still tied, repeat.)
It might help to think of the new rules by a baseball analogy. The game begins. The first team gets the top of an "inning", consisting of as many baskets as it can make until it makes an "out" by missing. Then the second team gets the bottom of the inning.
Innings repeat until, at the end of a given inning, both teams have had 100 possessions ("at bats"). Then the game is over, and the most baskets ("runs") wins. If there's a tie, play an extra "inning" until the tie is broken.
The owners argue to the fans that this should work. They reason,
"Suppose team A has a .750 field goal percentage, and team B is .500. Before the rule change, team A would go 3-for-4, and team B would go 2-for-4. So, team A would outscore team B by a 3:2 ratio.
"But with the new rules, team A scores 3 field goals for every miss. Team B scores only 1 field goal for every miss. So, now, team A should outscore team B by 3:1, not 3:2.
"So, instead of winning by an average 150-100, team A should now win by an average 300-100."