Another academic champion of peer review
Here's an interesting guest posting by Steve Walters at the "Wages of Wins" blog.
Walters warns us against putting too much faith in any individual sabermetric study,"that we need to be careful before we conclude that some “study” by anyone actually “proves” something."
Which is actually excellent advice – studies are often flawed, or incomplete, and there's often statistical error. But I find Walters' arguments a little odd.
He starts out by mentioning Bill James' famous study (in his 1982 Abstract) on player aging. James analyzed all players born in the 30s, and found that they tended to peak, both individually and as a group, at age 27.
Ha! responds Walters. We shouldn't have believed James. Because, 20 years later, Professor Jim Albert did another study, and found that, indeed, while players born in the 30s did peak at 27, players both before and after peaked later. In fact, players born in the 1960s appear to have peaked at almost 30!
Walters writes,"James’s findings were… well, flukey. ... Why do I bring this up? Emphatically not to suggest that profs always know more than best-selling writers like Bill James. The point is that it’s actually damned hard to figure out what’s really, really true by sifting through numbers. Sometimes profs do it better than intelligent laymen, and sometimes the reverse is true."
---
(Ironically, a careful reading of both studies shows that they're not really contradictory. "Guy" points out in the comments to Walters' post that the James study was based on 502 hitters. The Albert study was based on only 61 hitters, those with at least 5,000 plate appearances. As Guy writes, "it’s possible — indeed, likely — that a sample of players with longer than average careers will have peaked at a later than average age."
Also, Guy notes that the two studies used different measures -- one used bulk value, and one used hitting rates. So, again, the two studies aren't constrained to give the same results.
So perhaps Walters jumps to conclusions when he argues that the "27" hypothesis is convincingly disproven. But I will proceed as if it is.)
---
Perhaps I'm hypersensitive on the subject of academic vs. non-academic researchers, but geez ... "best-selling writer?" "Intelligent layman?"
When it comes to sabermetrics, calling Bill James a "best-selling writer" is kind of missing the point, like calling Abraham Lincoln a "calligrapher" because he wrote the Gettysburg Address in longhand.
And Calling Bill James an "intelligent layman" because he doesn't have a Ph.D. is like calling Adam Smith a layman because he never took an econometrics course. It's like calling Shakespeare a layman because he never studied King Lear at the doctoral level. It's like calling Isaac Newton a layman because he never took a high-school calculus course.
As for the broader point, this particular case is not a question about who does it "better." Bill James found, correctly, that one group of players peaked at 27. Jim Albert found, correctly, that players born in other decades peaked at other ages. This isn't a case of "better," or "worse". It's just the scientific method. One researcher extends the work of another, and sometimes finds slightly different results. That's how science proceeds.
From that standpoint, Walters is correct; it's always "damned hard to figure out what's really, really true." That's the case whether it's numbers, or whether it's medicine, or whether it's physics. A researcher publishes a result, and, if the evidence is convincing, it's accepted as true – but only until other evidence comes along. And when that happens, the original researcher is not at fault, nor has he done anything "worse" than the guy who proves his theory wrong.
But perhaps Walters just picked a bad example, or found Bill James to be an irresistibly juicy target. Because he immediately starts talking about errors in logic or methdology, rather than just insufficient evidence:"[Sometimes] a researcher’s methodology may unintentionally twist things in a particular way. Or a boatload of statistical subtleties may confound things."
That's absolutely true. There are lots of studies, both academic and "amateur," that have flaws – huge, obvious flaws. (Some of them I've reviewed on this blog.) The Bill James study he quotes, though, isn't one of them. But they do exist, and in fairly large numbers.
So when should we trust, and when shouldn’t we? I'd argue that it's just common sense. Don't rely on anything just because it says so in a single paper. Assume that the more a result is cited and used, the more likely that it's been replicated, or found to be sound. If you see two conflicting results, try following the implications of the results and see which ones make sense. And if you're still not confident, read the paper itself and see if the methodology holds up.
Walters has a different solution: rely mostly on academic peer review.
Does academic peer review work in sabermetrics? I say no – I think academic peer review has largely failed to separate the good work from the flawed.
And peer review certainly wouldn't have worked in the Bill James case. How would the peer reviewers have noticed a flaw? And what would the referees have said?
"You know, Mr. James, that's a good piece of work. But it's possible that aging patterns for players born in the 30s may not be the same as for other decades. So we have to reject your paper."
Or maybe, "we weren't sure if the results are generalizable, so we pulled out our Sporting News guides, and spent three weeks repeating your study for all other decades. And it turns out they're different. So we're rejecting your paper."
Or, "Even though it's only 1982, how do we know that players born in the 1960s, who aren't even in the major leagues yet, may peak closer to 30? So I'm afraid we have to reject your paper."
None of those seem very realistic ... I'm not sure how Mr. Walters thinks any economist, in 1982, would have spotted the results as "flukey." Or on what other criterion they would have rejected it. In every respect, the study is truly outstanding.
Even if you accept that academic peer review works for sabermetrics – which I think it doesn't – Walters admits that it takes "excruciating months" to go through the process ... and a few few jealous, picky, anonymous rivals get to dissect our work."
And it's a basic theorem of economics that when you tax something, you get less of it. Forcing researchers to endure "excruciating months" of "dissection by jealous rivals" is a pretty hefty tax. And, fortunately, sabermetric research is something anyone can do. Retrosheet provides free, high-quality data to everyone. You don't need to dissect rats in a dedicated laborarory, or have access to expensive particle accelerators, in order to discover sabermetric knowledge.
As a result of these two factors, a low-tax alternative jurisdiction has sprung up. The non-academic sabermetric community, fathered by Bill James in the 80s, has flourished – and almost all our wealth of knowledge in the field today has come from those "amateurs." This is whether the knowledge now resides. And it is my belief that of the true "peers" of the best sabermetric researchers today, at least ninety percent of them work outside of academia. There are several websites where studies can get instant evaluations from some of the best sabermetric researchers anywhere. Academic peer review simply cannot compete, not just in turnaround time, but also in quality.
One more excerpt:"When you’re consuming statanalysis, ask yourself whether the author is an expert or pseudo-expert—and even then whether other experts have had a crack at debunking the work. (E.g., it’s notable that [The Wages of Wins] is from a renowned university press, and that much of the research on which it’s based was initially published in refereed journals.)"
It's ironic -- but on this last quote, I agree with Walters completely.
13 Comments:
Phil, I don't know if you've ever been through a full academic peer review, but for those who haven't, here's what really happens, at least in psychology, from someone who's had his work reviewed and been a reviewer for others.
Papers are anonymized so that if I'm reviewing a paper, I have no idea who wrote it. Reviewers are picked for their "expertise" in the field and their willingness to actually perform the review. Sometimes the latter outweighs the former. For example, I was asked to review an article on a specific kind of child abuse. The editor asked me to do this because I am one of the few people that have actually written on the subject. But, the article was more about government policies in the UK on this type of abuse. I'm not a policy expert and I don't live in the UK. So, was I the best reviewer. I don't know who the other reviewers are (a good journal generally has 2-3 reviewers per paper), but I hope that they were more expert in that area than I.
When I submit a paper, I get anonymous feedback on all sorts of issues, from punctuation to methodology to statistical techniques to my conclusions. The reviewers are not there to re-run my analyses for me, just to make sure that my methods are sound. That's the gate to getting published.
There's a second "review" though that happens in academia. If your methods are clearly flawed and your data don't make sense, no one will bother to cite your data or talk about it. In other words, you can say you published, but no one cares.
Yes, some stuff slips through the cracks. The problem with some of the Sabermetric stuff that's come out in "real" journals is that the reviewers probably weren't Sabermetricians or baseball fans at all, but were just other Ph.Ds in some field. Academic review is not a cure-all, but it has its advantages. In psychology, we do a lot of study on different types of therapies for mental illness and people make treatment recommendations based on these studies. Think I'm going to listen to what someone on a random blog somewhere says? No, he needs to show me the data, explain his methods, and make his case to a panel of (hopefully) experts.
Yes, it takes months, sometimes years to get something published (I just had something accepted after 2+ years of waiting!)
When I publish something on Statistically Speaking, people can easily dismiss that. No one's checking my work, after all, and really other than my good word, what's to say that I didn't make it all up? But, peer review means that I'm going to have to make my case to people with specific training in this area, who can see through some of the tricks that peope can pull to try to weasel what they "think" into what "really is."
But, let's say that we had a Journal of Sabermetrics (ironically, I'd say BTN is the closest thing we have to that...) When I write a paper, it gets sent off to 2-3 of the "professors" of Sabermetrics. (An argument can be made here about who makes sense to be the reviewers... for example, actual Ph.D's in Econ or Bio Stats? Other accomplished Sabermetricians?) Putting that aside, if I get published in this Journal of Sabermetrics, that comes with a certain amount of "you can trust this" that goes with it.
The downside is it means that not everyone with a blog who can parse Retrosheet files can be a Sabermetrician. One of the charms of the field is that anyone can do it.
I see the question like this: peer review is the key to academic respectability. If we want to be seen as respectable scientists, then the current system based on blogs and bulletin boards has to go, and the field will need to get a lot more elitist. Considering that this is a hobby for most people, (we are, after all, analyzing a game), we may not care about being seen as actual scientists. But, it means we have to put up with folks who say things like "Divide wins by ERA and multiply by the number of people in Sheboygan!"
My $.02, from someone who works inside the academic realm by day and publishes hack-ish studies on a random blog at night.
Phil makes a good case against the snobbery among some academics for discrediting, more commonly, ignoring, work not published in top-line peer-refereed journals. A sentiment I share, but probably for slightly different reasons. One’s peers in science may just a easily be blinkered in what they consider ‘correct’ science and may refuse to look at any idea than does not conform to whatever passes for the current ‘consensus’ or paradigm. I get a taste of this in observing today the hostility among some scientists against anybody who is ‘guilty’ of ‘climate change denial’, which I regard as a most unscientific stance.
However, the post illustrates against its excellent approach, why ‘peer review’ has a useful role to play in that it refers to Adam Smith, somewhat ambiguously, in what can be taken as an inappropriate example of the point made (the example of Shakepeare in this context is spot on!):
“… Calling Bill James an "intelligent layman" because he doesn't have a Ph.D. is like calling Adam Smith a layman because he never took an econometrics course.”
It may not be appreciated by numerate readers of Wealth Of Nations today, but Adam Smith was an accomplished mathematician by 18th-century standards. He took a great deal of scholarly interest in mathematics as a student and later as an academic at Glasgow University.
Professor Robert Simson (1687-1768), a leading mathematician at Glasgow, and a specialist in geometry, encouraged Smith’s extra curricula studies and his fellow student, Matthew Stewart (1718-1787), later professor of mathematics at Edinburgh University (1747-85), remarked to Dugald Stewart (his son and Smith’s first biographer) about Smith’s mathematical abilities in solving a ‘geometrical problem of considerable difficulty’ set as an exercise by Dr Simson (Stewart, D. 1793, Account of the Life and writings of Adam Smith, LL.D). Smith was also a friend and an ‘intimate’ correspondent with Jean le Rond d’Alembert (1717-83), (John Rae, Life of Adam Smith, 1895, p 11), known for the ‘d'Alembert principle’ of motion, among many others.
Though Smith did not take courses in econometrics (and none of his contemporaries did either), he was not innumerate by any standards. And he held a LL.D too. I think another comparison is warranted.
That is where a peer-review of an article provides a useful service – it might save us embarrassment before the profession reads a ‘slip’ we might leave in error and thereby destroy the credibility of the good points we wish to make. In my experience, hostile critics pick on the most inconsequential of slips to discredit that which they cannot abide or people they don’t like.
Thanks, Pizza.
Just to be clear, I do approve of peer review in principle. As you point out, stuff slips through the cracks because the reviewers aren't always the best at spotting problems in the sabermetric logic.
My argument is (a) academic peer review of sabermetrics doesn't seem to work all that well yet, and (b) unlike most other fields, it is possible for "amateurs" to produce world-class work. Therefore (c) please stop arguing that because your work passed academic peer review, it's so much more valuable than non-academic work.
Gavin -- I chose econometrics for the Adam Smith example because I assume the field didn't exist at the time and was built (partially) to better continue on his work. I wasn't trying to imply that Smith didn't have a mathematical background, and I think the fact that he did makes the point stronger. Or maybe I'm not understanding you fully.
I agree with you that peer review provides a useful service. It's just not a very reliable signal of quality -- at least not in sabermetrics, and not right now.
When it comes to sabermetrics, I don't think that the peer review thing matters all that much. In the first place, it should be remembered that baseball is a pastime. The consequences of mistakes in studies of baseball performance aren't the same as in many other fields, like medicine, for instance. Peer review becomes important in those fields, not so much in baseball.
Secondly, if the results of a sabermetric study are made available on certain sites on the web, like BTF, any flaws in it will soon be ferreted out by the accomplished sabermetricians who frequent those sites. I have seen this occur on many occasions.
In addition, there is a political angle to all this. There are now economists who hold university positions as sports economists and the like. They are judged on the basis of their publications. Hence, it is in their interest to limit the competition. One way to do that is to label those who are not members of the academic fraternity as "amateurs" and "laymen", and to create a situation where only those who have been academically trained have access to publication in the accepted journals. I don't mean to suggest that this is a conspiratorial process or anything like that. It is merely the result of academics pursuing their own best interests.
The main point here is one of sabermetrics being peer reviewed by academics. Just because it works in many fields, doesn't mean it will work in sabermetrics.
Just to use my co-author (Andy) as an example: he, I think, writes in tons of scientific journals. He is also, I know, a world-class sabermetrician. MGL doesn't write in any journals, and he's also a world-class sabermetrician. There's barely a blip of difference between them. The number 1 requirement is love and understanding of baseball. This overrides absolutely everything else.
You talk about getting a strikeout as opposed to a flyout with a runner on 3B and less than 2 outs, a guy who "gets" baseball knows what that means. An academic who has a passing understanding of baseball will run a regression. PizzaCutter did some great research on pickoffs. There's just no way that an academic with a passing understand of baseball would have been able to provide the insight we did.
When PizzaCutter says:
"If we want to be seen as respectable scientists, then the current system based on blogs and bulletin boards has to go, and the field will need to get a lot more elitist. "
And I say the opposite! If an academic wants to be seen as a respectable sabermetrician, then their used-to system of being peer reviewed by whoever is available has to go. You need to be dissected by the best or a chance to be dissected by the best. There was a journal a year or two ago where some top academics looked at some baseball issues. And, who did they have to rebut them? Bill James. And in print for all of us to see. I don't remember the journal or issues, but I seem to remember James adding real insight to most of those articles.
PizzaCutter is like Andy... a guy who loves baseball first and foremost, who also happens to write technical papers for a living. Him, I'll listen to. These other guys who see the mountain of baseball data collected, and simply see baseball as a means to an end, to be able to flex their numbered muscles, them, I'd be skeptical of. *They* need to impress me, not the other way around.
The Bill James critiques Tom refers to were in "Contingencies," a magazine for actuaries. Dan Fox linked to everything here.
I might also add that the peer review process is changing in some scientific fields. Grigori Perelman who won the Fields Medal, the equivalent of the Nobel Prize for mathematics, published his work online. I don't believe that he ever published it in a journal. Many physicists and mathematicians now publish their papers online prior to journal publication in order to get additional feedback.
ChuckO: Is that what a "working paper" is -- a prepublication version that's made available for comments? If not, what IS a working paper?
Seems then that the question isn't peer review. We all poke holes in each other's works, so I guess the question is whether we're going to have recognized "experts" in the field. Then the question is what would constitute the proper credentials for being a "real" reviewer. Obviously, there are some people whose feedback I take more seriously than others. Perhaps what we need is a Ph.D. program in Sabermetrics (I'll sign up for that!) But, that creates snobbery. One of the reasons that I'm not sticking around academia after I'm done with my program is this very reason.
Academics often don't have the actual love for or understanding of baseball. The average bored security guard with a love of baseball doesn't often have training in research methodology or stats.
Here's the stakes that I see: Most of us got into this as a hobby (We are, after all, analyzing baseball), because we like numbers and we like sitting around and having an excuse to talk about baseball. I suppose peer review would endow a bit more respect on the field, but... it would make Sabermetrics sound a lot more like a job than a hobby.
Putting on my clinical psychologist hat, I dare say that we are in the middle of an identity crisis. Do we want to identify with academia or not?
Not.
You have to figure out the purpose first, the question: "How do I put out research of high quality?" And the best way is to:
- do the research
- get others who care about it to read it and give their opinion
The Wisdom of the Crowds approach works for sabermetrics. Look at a genius like David Smyth. Dude's a dentist (with more letters after his name than any academic!). He's the guy I want to reach. Any mechanism that you suggests that puts him on the outside or sets the barrier to entry high enough that keeps him out is not a good mechanism.
In short, what we have works.
What is needed is some sort of IMDB.com approach, where registered users can evaluate each piece in a centralized location, and provide ratings in various categories, along with commentary. When I feel like a snob, I'll read Roger Ebert.
Each user can not only rate the piece, but also rate the reviewer. The "accreditation" is done in a very google-like way.
In short, there is no more powerful mechanism in the world than the Internet. Leverage it.
"*They* need to impress me, not the other way around."
I think this is the key point. Thus far, what have academic economists contributed to our knowledge of the game on the field? (leaving aside the business of baseball) All I can think of is Bradbury's Mazzone study. As I said on the that thread, I'm personally skeptical about the claims for Mazzone, but the article is well-regarded and the conclusions certainly haven't been disproven. So let's count it.
But what else? Anything of significance? It's just a very thin track record.....
I like Tom's idea.
Not only can you rate reviewers and research, but you can get some idea of any study's relevance by how often it's linked to by other studies, and how those other studies are rated.
Search for stuff in Google Scholar, and the search comes back ranked by number of citations. Imagine if you weighted that number by the ratings of the studies that cited them? You'd pretty easily be able to figure out which work is worthy and which isn't.
Post a Comment
<< Home