Does a cricketer's career really depend on luck?
A couple of days ago, the "Freakonomics" blog, and several others, quoted a study, based on cricket, that purported to show that random luck in your first job can have a big impact on your career. The paper is called "What Can International Cricket Teach Us About the Role of Luck in Labor Markets?" and it's by Shekhar Aiyar and Rodney Ramcharan.
It's probably true that luck plays a big part in your work life, but I don't think the study actually shows that.
Here's what the authors did (and apologies in advance if I get some of the cricket terminology wrong). They figured out home field advantage in a player's cricket batting average (runs per out made), and found that the average batter hits for about 25% more runs at home than on the road. Given that's the case, then decision-makers should take that into account when evaluating players. Obviously, a batter who hits for 30 runs on the road likely has more ability than a batter who hits for 30 runs at home.
Now, consider a batter who's playing in an elite "test" cricket match for the first time. Some portion of batters, after their debut match, will be dropped from the team -- presumably those who didn't bat well. (It turns out that figure is about 25% of first-time batters being immediately dropped.)
Obviously, the worse a player bats, the more likely he'll be dropped. But managers should take into account whether it's a home match or a road match. All things being equal, a batter who hits for X runs at home should be more likely to be dropped than a player who hits for X runs on the road.
The authors do a regression, to predict the probability of being dropped, based on their batting that match, a dummy for whether it was home or road, and an interaction term (the dummy times the number of runs). It turned out that both the dummy and the interaction term were not statistically significant.
Therefore, the authors concluded, managers neglect to take home field advantage (HFA) into account when evaluating players -- they just look at runs. Therefore, a player's career is strongly affected by random chance -- the luck of whether his first match happened to be at home (where he gets a better chance of making the team) or on the road (where he gets a worse chance of making the team).
Except that ... the regression does NOT show that the manager ignores HFA, not at all. The regression equation the authors found (Table 9, column 2) was
Chance of being dropped = -0.0043 * (runs) + .000356 * (runs) if at home + .0527 if at home
Now, suppose a batter hits for 10 fewer runs than average. If he does that on the road, his additional chance of being dropped is
0.0043 * 10 = 4.3%.
But suppose he does that at home. His additional chance of being dropped is:
0.0043 * 10 + .000356 * 10 + .0527 = 9.9%.
Doesn't that seem like a reasonable adjustment for HFA? I think it does. I'm not sure what an average batter hits for ... say, 35 runs? That means if the player hits for 25 runs at home, he'll be cut 10% of the time. If he hits for 25 runs on the road, he'll be cut 4% of the time. What's wrong with that?
What the authors would say is wrong with that is that the signficance levels for the last two terms were too low, so we have to drop them. To which I say, nonsense! They look almost exactly as you'd think they would based on your prior expectation of managers not being dumb. If they're not significant, it's because you don't have enough data!
Looking at it in a different way: the authors chose the null hypothesis that the managers' adjustment of HFA is zero. They then fail to reject the hypothesis.
But, what if they chose a contradictory null hypothesis -- that managers' HFA *irrationality* was zero? That is, what if the null hypothesis was that managers fully understood what HFA meant and adjusted their expectations accordingly? The authors would have included a "managers are dumb" dummy variable. The equations would have still come up with 4% for a road player and 10% for a home player -- and it would turn out that the significance of the "managers are dumb" variable would not be significant.
Two different and contradictory null hypotheses, neither of which would be rejected by the data. The authors chose to test one, but not the other. Basically, the test the authors chose is not powerful enough to distinguish the two hypotheses (manager dumb, manager not dumb) with statistical significance.
But if you look at the actual equation, which shows that home players are twice as likely to be dropped than road players for equal levels of undperformance -- it certainly looks like "not dumb" is a lot more likely than "dumb".
It's like this: suppose I want to sell lots of lottery tickets. So I claim that your chances of winning the Lotto 6/49 jackpot are 1 in 1000. Mathematicians and experts all say that I'm wrong, that the odds are really 1 in 13,983,816. But I don't think that's right, and I have a study to back me up!
I randomly took 500 ticket buyers, and, it turns out, none of them won the jackpot. But I run an analysis on that dataset anyway. And you know what? I find that if the odds truly were 1 in 1000, the chance of nobody winning the jackpot would be 60%. That's really insignificant, not even close to the 5% level required to reject the null hypothesis that the chance are 1 in 1000!
What's the flaw in my logic? Well, technically, there isn't one: it's actually true that the data don't permit me to reject the 1 in 1000 hypothesis. But the data also don't permit me to reject the null hypothesis that the chances are 1 in 10000, or 1 in 100,000, or 1 in 1,000,000, or 1 in 10,000,000, or 1 in 13,983,816, or 1 in 1,000,000,000,000,000,000! Why should I specifically focus on 1 in 1000? Only because I want that one to be true? That's not right.
What I should have done, and what the authors should have done, is calculate a confidence interval. My confidence interval would be 1 in (168, infinity), and the reader would see that, even though 1 in 1000 is in the interval, so is the much more plausible result of 1 in 13,983,816.
If the authors of this study had done that, they would have noticed that their confidence interval, which included "managers ignore home field advantage completely", also included "managers are perfectly rational." Not only is the "rational" hypothesis more plausible than the "dumb" hypothesis -- it sure does seem to fit the authors' data a lot better.