Another academic champion of peer review
Here's an interesting guest posting by Steve Walters at the "Wages of Wins" blog.
Walters warns us against putting too much faith in any individual sabermetric study,
"that we need to be careful before we conclude that some “study” by anyone actually “proves” something."
Which is actually excellent advice – studies are often flawed, or incomplete, and there's often statistical error. But I find Walters' arguments a little odd.
He starts out by mentioning Bill James' famous study (in his 1982 Abstract) on player aging. James analyzed all players born in the 30s, and found that they tended to peak, both individually and as a group, at age 27.
Ha! responds Walters. We shouldn't have believed James. Because, 20 years later, Professor Jim Albert did another study, and found that, indeed, while players born in the 30s did peak at 27, players both before and after peaked later. In fact, players born in the 1960s appear to have peaked at almost 30!
"James’s findings were… well, flukey. ... Why do I bring this up? Emphatically not to suggest that profs always know more than best-selling writers like Bill James. The point is that it’s actually damned hard to figure out what’s really, really true by sifting through numbers. Sometimes profs do it better than intelligent laymen, and sometimes the reverse is true."
(Ironically, a careful reading of both studies shows that they're not really contradictory. "Guy" points out in the comments to Walters' post that the James study was based on 502 hitters. The Albert study was based on only 61 hitters, those with at least 5,000 plate appearances. As Guy writes, "it’s possible — indeed, likely — that a sample of players with longer than average careers will have peaked at a later than average age."
Also, Guy notes that the two studies used different measures -- one used bulk value, and one used hitting rates. So, again, the two studies aren't constrained to give the same results.
So perhaps Walters jumps to conclusions when he argues that the "27" hypothesis is convincingly disproven. But I will proceed as if it is.)
Perhaps I'm hypersensitive on the subject of academic vs. non-academic researchers, but geez ... "best-selling writer?" "Intelligent layman?"
When it comes to sabermetrics, calling Bill James a "best-selling writer" is kind of missing the point, like calling Abraham Lincoln a "calligrapher" because he wrote the Gettysburg Address in longhand.
And Calling Bill James an "intelligent layman" because he doesn't have a Ph.D. is like calling Adam Smith a layman because he never took an econometrics course. It's like calling Shakespeare a layman because he never studied King Lear at the doctoral level. It's like calling Isaac Newton a layman because he never took a high-school calculus course.
As for the broader point, this particular case is not a question about who does it "better." Bill James found, correctly, that one group of players peaked at 27. Jim Albert found, correctly, that players born in other decades peaked at other ages. This isn't a case of "better," or "worse". It's just the scientific method. One researcher extends the work of another, and sometimes finds slightly different results. That's how science proceeds.
From that standpoint, Walters is correct; it's always "damned hard to figure out what's really, really true." That's the case whether it's numbers, or whether it's medicine, or whether it's physics. A researcher publishes a result, and, if the evidence is convincing, it's accepted as true – but only until other evidence comes along. And when that happens, the original researcher is not at fault, nor has he done anything "worse" than the guy who proves his theory wrong.
But perhaps Walters just picked a bad example, or found Bill James to be an irresistibly juicy target. Because he immediately starts talking about errors in logic or methdology, rather than just insufficient evidence:
"[Sometimes] a researcher’s methodology may unintentionally twist things in a particular way. Or a boatload of statistical subtleties may confound things."
That's absolutely true. There are lots of studies, both academic and "amateur," that have flaws – huge, obvious flaws. (Some of them I've reviewed on this blog.) The Bill James study he quotes, though, isn't one of them. But they do exist, and in fairly large numbers.
So when should we trust, and when shouldn’t we? I'd argue that it's just common sense. Don't rely on anything just because it says so in a single paper. Assume that the more a result is cited and used, the more likely that it's been replicated, or found to be sound. If you see two conflicting results, try following the implications of the results and see which ones make sense. And if you're still not confident, read the paper itself and see if the methodology holds up.
Walters has a different solution: rely mostly on academic peer review.
Does academic peer review work in sabermetrics? I say no – I think academic peer review has largely failed to separate the good work from the flawed.
And peer review certainly wouldn't have worked in the Bill James case. How would the peer reviewers have noticed a flaw? And what would the referees have said?
"You know, Mr. James, that's a good piece of work. But it's possible that aging patterns for players born in the 30s may not be the same as for other decades. So we have to reject your paper."
Or maybe, "we weren't sure if the results are generalizable, so we pulled out our Sporting News guides, and spent three weeks repeating your study for all other decades. And it turns out they're different. So we're rejecting your paper."
Or, "Even though it's only 1982, how do we know that players born in the 1960s, who aren't even in the major leagues yet, may peak closer to 30? So I'm afraid we have to reject your paper."
None of those seem very realistic ... I'm not sure how Mr. Walters thinks any economist, in 1982, would have spotted the results as "flukey." Or on what other criterion they would have rejected it. In every respect, the study is truly outstanding.
Even if you accept that academic peer review works for sabermetrics – which I think it doesn't – Walters admits that it takes "excruciating months" to go through the process ... and a few few jealous, picky, anonymous rivals get to dissect our work."
And it's a basic theorem of economics that when you tax something, you get less of it. Forcing researchers to endure "excruciating months" of "dissection by jealous rivals" is a pretty hefty tax. And, fortunately, sabermetric research is something anyone can do. Retrosheet provides free, high-quality data to everyone. You don't need to dissect rats in a dedicated laborarory, or have access to expensive particle accelerators, in order to discover sabermetric knowledge.
As a result of these two factors, a low-tax alternative jurisdiction has sprung up. The non-academic sabermetric community, fathered by Bill James in the 80s, has flourished – and almost all our wealth of knowledge in the field today has come from those "amateurs." This is whether the knowledge now resides. And it is my belief that of the true "peers" of the best sabermetric researchers today, at least ninety percent of them work outside of academia. There are several websites where studies can get instant evaluations from some of the best sabermetric researchers anywhere. Academic peer review simply cannot compete, not just in turnaround time, but also in quality.
One more excerpt:
"When you’re consuming statanalysis, ask yourself whether the author is an expert or pseudo-expert—and even then whether other experts have had a crack at debunking the work. (E.g., it’s notable that [The Wages of Wins] is from a renowned university press, and that much of the research on which it’s based was initially published in refereed journals.)"
It's ironic -- but on this last quote, I agree with Walters completely.