At the SABR Analytics Conference last month, a group of academics, led by Patrick Kilgo and Hillary Superak, presented some comments on the differences between academic sabermetric studies, and "amateur" studies. The abstract and audio of their presentation is here (scroll down to "Friday"). Also, they have kindly allowed me to post their slides, which are in .pdf format here.
I'm not going to comment on the presentation much right now ... I'm just going to go off on one of the differences they spoke about, from page 11 of their slides:
-- Classical sabermetrics often uses all of the data -- a census.
-- [Academic sabermetrics] is built for drawing inferences on populations, based on the assumption of a random sample.
That difference hadn't occurred to me before. But, yeah, they're right. You don't often see an academic paper that doesn't include some kind of formal statistical test.
That's true even when there are times when there are better methods available. I've written about this before, about how academics like to derive linear weights by regression, when, as it turns out, you can get much more accurate results from a method that uses only logic and simple arithmetic.
So, why do they do this? The reason, I think, is that academics are operating under the wrong incentives.
If you're an academic, you need to get published in a recognized academic journal. Usually, that's the way to keep your job, and get promoted, and eventually get tenure. With few exceptions, nobody cares how brilliant your blog is, or how much you know about baseball in your head. It's your list of publications that's important.
So, you need to do your study in such a way that it can get published.
In a perfect world, if your paper is correct, whether you get published would depend only the value of what you discover. But, ha! That's not going to happen. For one thing, when you write about baseball, nobody in academia knows the value of what you've discovered. Sabermetrics is not an academic discipline. No college has a sabermetrics department, or a sabermetrics professor, or even a minor in sabermetrics. Academia, really, has no idea of the state of the science.
So, what do they judge your paper on? Well, there are unwritten criteria. But one thing that I'm pretty sure about, is that your methodology must use college-level math and statistics. The more advanced, the better. Regression is OK. Logit regression is even better. Corrections for heteroskedasticity are good, as are methods to make standard errors more robust.
This is sometimes defended under the rubric of "rigor". But, often, the simpler methods are just as "rigorous" -- in the normal English sense of being thorough -- as the more complicated methods. Indeed, I'd argue that computing linear weights by regression is *less* rigorous than doing it by arithmetic. The regression is much less granular. It uses innings or games as its unit of data, instead of PA. Deliberately choosing to ignore at least 3/4 of the available information hardly qualifies as "rigor", no matter how advanced the math.
Academics say they want "rigor," but what they really mean is "advanced methodology".
A few months ago, I attended a sabermetrics presentation by an academic author. He had a fairly straightforward method, and joked that he had to call it model "parsimonious," because if he used the word "simple," they'd be reluctant to publish it. We all laughed, but later on he told me he was serious. (And I believe him.)
If you want to know how many cars are in the parking lot today, April 10, you can do a census -- just count them. You'll get the right answer, exactly. But you can't get published. That's not Ph.D. level scholarship. Any eight-year old can count cars and get the right answer.
So you have to do something more complicated. You start by counting the number of parking spots. Then, you take a random sample, and see if there's a car parked in it. That gives you a sample mean, and you can calculate the variance binomially, and get a confidence interval.
But again, that's just too simple, a t-test based on binomial. You still won't get published. So, maybe you do this: you hang out in the parking lot for a few weeks, and take a detailed survey of parking patterns. (Actually, you get one of your grad students to do it.) Then, you run regressions based on all kinds of factors. What kind of sales were the stores having? What was the time of day? What was the price of gas? What day of the week was it? How close was it to a major holiday? How long did it take to find a parking spot?
So, now you're talking! You do a big regression on all this stuff, and you come up with a bunch of coefficients. That also gives you a chance to do those extra fancy regressiony tests. Then, finally, you then plug in all the dependent variables for today, April 10, and, voila! You have an estimate and a standard error.
Plus, this gives you a chance to discuss all the coefficients in your model. You may notice that the coefficient for "hour 6", which is 12pm to 1pm, is positive and significant at p=.002. You hypothesize that's because people like to shop at lunch time. You cite government statistics, and other sociological studies, that have also found support for the "meridiem emptor" hypothesis. See, that's evidence that your model is good!
And, everyone's happy. Sure, you did a lot more work than you had to, just to get a less precise estimate of the answer. But, at least, what you did was scholarly, and therefore publishable!
It seems to me that in academia, it isn't that important to get the right answer, at least in a field of knowledge that's not studied academically, like baseball. All journals seem to care about is that your methodology isn't too elementary, that you followed all the rules, and that your tone is suitably scholarly.
"Real" fields, like chemistry, are different. There, you have to get the right answer, and make the right assumptions, or your fellow Ph.D. chemists will correct you in a hurry, and you'll lose face. But, in sabermetrics, academics seem to care very little if their conclusions or assumptions about baseball are right or wrong. They care only that the regression appears to find something interesting. If they did, and their method is correct, they're happy. They did their job.
Sure, it could turn out that their conclusion is just an artifact of something about baseball that they didn't realize. But so what? They got published. Also, who can say they're wrong? Just low-status sabermetricians working out of their parents' basement. But the numbers in an academic paper, on the other hand ... those are rigorous!
And if the paper shows something that's absurd, so much the better. Because, nobody can credibly claim to know it's absurd -- it's what the numbers show, and it's been peer reviewed! Even better if the claim is not so implausible that it can't be rationalized. In that case, the author can claim to have scientifically overturned the amateurs' conventional wisdom!
The academic definition of "rigor" is very selective. You have to be rigorous about using a precise methodology, but you don't have to be rigorous about whether your assumptions lead to the right answer.
Just a few days ago, after I finished my first draft of this post, I picked up an article from an academic journal that deals with baseball player salaries. It's full of regressions, and attention to methodological detail. At one point, the authors say, "... because [a certain] variable is potentially endogenous in the salary equation, we conduct the Hausman (1978) specification test ..."
I looked up the Hausman specification test. It seems like a perfectly fine test, and it's great that they used it. When you're looking for a small effect, every little improvement helps. Using that test definitely contributed to the paper's rigor, and I'm sure the journal editors were pleased.
But, after all that effort, how did their study choose to measure player productivity? By slugging percentage.
Sometimes, academia seems like a doctor so obsessed with perfecting his surgical techniques that he doesn't even care that he's removing the wrong organ.