Are economists bad at statistics?
(Warning: this is a boring "how to interpret a regression" post, not much sports.)
Are economists bad at statistics?
Felix Salmon comments on a paper that presented academic economists with the results of a hypothetical regression, and asked them several questions about the results. It turned out that most of them got it right when they looked at a scatter plot of the raw data. But when they were given traditional regression results, as produced by statistical software, they blew it.
Specifically, for three of the four questions, a majority of econometricians got them wrong when looking at only the regression results.
I'll translate one of the questions into baseball (using unrealistic numbers that I made up).
A regression finds that each point of OPS (that is, .001) is worth $20,000 in salary. The regression found an r-squared of 0.5. In the data, salary had an SD of $4 million, and OPS had an SD of .200.
1. What is the minimum OPS for which a player has a 95% chance of earning more than $10 million?
2. What minimum OPS would give a player a 95% chance of earning more money than a player with an OPS of .600?
3. Given that the confidence interval for salary-per-point-of-OPS is ($19,500, $20,500), if a player has an OPS of .800, what is the chance he will earn more than $15.6 million (which is $19,500 multiplied by 800)?
4. If a player has an OPS of .800, what is the chance he will earn more than $16 million (the point estimate)?
You should be able to figure all these out exactly, except #3 (which you can still estimate).
Here are my answers:
1. The SD of salary was $4 million. The r-squared was 0.5. So, after the regression, the SD of unexplained salary is around $2.8 million ($4 million divided by the square root of 2).
You need about 2 SDs above the expected $10 million for a 95% chance. 2 SDs above $10 million is $15.8 million. That translates into an OPS of about .790.
(Actually, I think you need only 1.65 SDs, because it's one-tailed, but never mind.)
2. The SD of unexplained salary is $2.8 million. So the SD of the difference of two of those is $4 million. The .600 guy makes $12 million. For a 95% confidence interval, we add 2 SD, giving $20 million. That works out to an OPS of 1.000.
3. The confidence interval is a bit of a red herring. My first reaction was to estimate the chance the *estimate* was greater than $19,500, which is .975. But that's not what's being asked. What's being asked is the chance a *player* is over $19,500.
The point estimate is $20,000 per point, and the chance of the player beating the estimate would be exactly 50 percent. However, we're being asked the chance the player beats $19,500. That's a bit easier, so the chance is a bit higher. Call it, say, 52 percent or something like that (I won't bother figuring it out exactly).
4. Since the point estimate is unbiased, the chance of beating it is exactly 50 percent.
These questions are much easier if you look at the scatter plot. I've stolen it from the original post:
The equivalent of question 1 was: what value of X do you need for a 95% chance the value of Y is greater than zero? That's really easy from the plot -- it looks like it's somewhere between 40 and 50.
Most of the economists got that.
But most got the first three questions wrong when they had the numbers. In their defense, though, those aren't really the kind of questions normally answered in academic papers. Normally, questions involving "95%" refer to the coefficient estimates, not the individual datapoints.
So, I'm not convinced that, in every case, the results show a real flaw in their education; rather, I think some of the economists answered a different question, by force of routine.
My guess is that if you explained to some of the participants why their "red herring" answer was wrong, they'd say, "oh, right," and most of them would come up with the right answer.
But I might just be making excuses, because I fell for the red herring trap myself, at first.
I agree with Salmon that more of the economists should have been able to answer the questions. But I'm not sure about his conclusion:
" ... I see a paper demonstrating a statistically significant correlation between one variable and another, and I generally assume that if the experiment were repeated, we'd see the same thing again. But that's not actually true.
And so it's easy to see, I think, how economists become convinced of things the rest of us aren't sure of at all -- and how the economists often end up being wrong, while the rest of us were right to be dubious.
... A lot of papers are written; a few of them have interesting findings. Those are the papers which tend to get publicity. But there's a very good chance they don't actually show what the headlines say that they show."
Actually, I don't disagree with these statements: I *agree* with them, very much so. But I disagree that it has much to do with the economists being wrong about this quiz. Yes, it's true that the incorrect answers tended to discount the amount of randomness in a single observation, assuming that individual datapoints were clustered much closer to their estimate than they really are. But, strictly speaking, that has nothing to do with whether the experiment is repeatable, or whether the effect is real.
It's like, a study finds that smokers have a 20 percent chance of getting cancer, plus or minus 2 percent. And the incorrect economists say, "Joe Smith is a smoker. There's a 95 percent chance that between 16 and 24 percent of Joe's body will get sick."
The economists have missed the point, sure. But that doesn't affect how real the link is between cancer and smoking.
Hat tip: anonymous commenter in the previous post.