Evaluating scientific debates: some ramblings
Last week's renewed debate on JC Bradbury's aging study (JC posted a new article to Baseball Prospectus, and comments followed there and on "The Book" blog) got me thinking about some things that are tangential to the study itself ... and since I have nothing else to write about at the moment, I thought I'd dump some of those random thoughts here.
1. Peer review works much better after publication than before.
When there's a debate between academics and non-academics, some observers argue that the academics are more likely to be correct, because their work was peer reviewed, while the critics' work was not.
I think it's the other way around. I think post-publication reaction, even informally on the internet, is a much better way to evaluate the paper than academic peer review.
Why? Because academic peer reviewers hear only one side of the question -- the author's. At best, they might have access to the comments of a couple of other referees. That's not enough.
After publication, on the internet, there's a back and forth between people on one side of the question and people on the other. That's the best way to get at the truth -- to have a debate about it.
Peer review is like the police deciding there's enough evidence to lay charges. Post-publication debate is like two lawyers arguing the case before a jury. It's when all the evidence is heard, not just the evidence on one side.
More importantly, no peer reviewer has as good a mastery of previous work on a subject than the collective mastery of the public. I may be an OK peer reviewer, but you know who's a better peer reviewer? The combination of me, and Tango, and MGL, and Pizza Cutter, and tens of other informed sabermetricians, some of whom I might only meet through the informal peer review process of blog commenting.
If you took twelve random sabermetricians whom I respect, and they unanimously came to the verdict that paper X is flawed, I would be at least 99% sure they were right and the peer reviewer was wrong.
2. The scientific consensus matters if you're not a scientist.
It's a principle of the scientific method that only evidence and argument count -- the identity of the arguer is irrelevant.
Indeed, there's a fallacy called "argument from authority," where someone argues that a particular view must be correct because the person espousing it is an expert on the subject. That's wrong because even experts can be wrong, and even the expertest expert has to bow to logic and evidence.
But that's a formal principle that applies to situations where you're trying to judge an argument on its merits. Not all of us are in a position to be able to do that all the time, and it's a reasonable shortcut in everyday life to base your decision on the expertise of the arguer.
If my doctor tells me I have disease X, and the guy who cleans my office tells me he saw my file and he thinks I really have disease Y ... well, it's perfectly legitimate for me to dismiss what the office cleaner says, and trust my doctor.
It only becomes "argument from authority" where I assert that I am going to judge the arguments on their merits. Then, and only then, am I required to look seriously at the office cleaner's argument, without being prejudiced by the fact that he has zero medical training.
Indeed, we make decisions based on authority all the time. We have to. There are many claims that are widely accepted, but still have a following of people who believe the opposite. There are people who believe the government is covering up UFO visits. There are people who believe the world is flat. There are people who believe 9/11 was an inside job.
If you're like me, you don't believe 9/11 was an inside job. And, again, if you're like me, you can't actually refute the arguments of those who do believe it. Still, your disbelief is rational, and based solely on what other people have said and written, and your evaluations of their credibility.
Disbelieving solely because of experts is NOT the result of a fallacy. The fallacy only happens when you try to use the experts as evidence. Experts are a substitute for evidence.
You get your choice: experts or evidence. If you choose evidence, you can't cite the experts. If you choose experts, you can't claim to be impartially evaluating the evidence, at least that part of the evidence on which you're deferring to the experts.
The experts are your agents -- if you look to them, it's because you are trusting them to evaluate the evidence in your stead. You're saying, "you know, your UFO arguments are extraordinary and weird. They might be absolutely correct, because you might have extraordinary evidence that refutes everyone else. But I don't have the time or inclination to bother weighing the evidence. So I'm going to just defer to the scientists who *have* looked at the evidence and decided you're wrong. Work on convincing them, and maybe I'll follow."
The reason I bring this up is that, over at BPro, MGL made this comment:
"I think that this is JC against the world on this one. There is no one in his corner that I am aware of, at least that actually does any serious baseball work. And there are plenty of brilliant minds who thoroughly understand this issue who have spoken their piece. Either JC is a cockeyed genius and we (Colin, Brian, Tango, me, et. al.) are all idiots, or..."
Is that comment relevant, or is it a fallacious argument from authority? It depends. If you're planning on reading all the studies and comments, and reaching a conclusion based on that, then you should totally ignore it -- whether an argument is correct doesn't depend on how many people think it is.
But if you're just reading casually and trying to get an intuitive grip on who's right, then it's perfectly legitimate.
And that's how MGL meant it. What he's saying is something like: "I've explained why I think JC is wrong and I'm right. But if you don't want to wade through all that, and if you're basing your unscientific decision on which side seems more credible -- which happens 99% of the time that we read opposing opinions on a question of scientific fact -- be aware that the weight of expert opinion is on my side."
Put that way, it's not an appeal to authority. It's a true statement about the scientific consensus.
3. Simple methods are often more trustworthy than complex ones.
There are lots of studies out there that have found that the peak age for hitters in MLB is about 27. There is one study, JC Bradbury's, that shows a peak of 29.
But it seems to me that there is a perception, in some quarters, that because JC's study is more mathematically sophisticated than the others, it's therefore more trustworthy. I think the opposite: that the complicated methods JC used make his results *less* believable, not more.
I've written before about simpler methods, in the context of regression and linear weights. Basically, there are two different methods that have been used to calculate the coefficients for the linear weights formula. One involves doing a regression. Another involves looking at play-by-play data and doing simple arithmetic. The simple method actually works better.
More importantly, for the argument I'm making here, the simple method is easily comprehensible, even without stats classes. It can be explained in a few sentences to any baseball fan of reasonable intelligence. And if you're going to say you know a specific fact, like that a single is worth about .46 runs, it's always nicer to know *why* than to have to trust someone else, who used a mathematical technique you don't completely understand.
Another advantage of the simple technique is that, because so many more people understand it, its pros and cons are discovered early. A complex method can have problems that don't get found out until much later, if ever.
For instance, how much do hitters lose in batting skill between age 28 and age 35? Well, one way to find out is to average the performance of 28-year-olds, and compare it to the averaged performance of 29-year-olds, 30-year-olds, and so on, up to 35-year-olds. Pretty simple method, right, and easy to understand? If you do it, you'll find there's not much difference among the ages. You might conclude that players don't lose much between 28 and 35.
But there's an obvious flaw: the two groups don't comprise the same players. Only above-average hitters stay in the league at 35, so you're comparing good players at 35 to all players at 28. That's why they look similar: the average of a young Joe Morgan and a young Roy Howell looks similar to the average of an old Joe Morgan and a retired, zero-at-bat Roy Howell, even though they both Morgan and Howell each declined substantially in the intervening seven years.
Now that flaw ... it's easy to spot, and the reason it's easy to spot is that the method is simple enough to understand. It's also easy to explain, and the reason it's easy to explain is again that the method is simple enough to understand.
If I use the more complicated method of linear regression (and a not very complicated regression), and describe it mathematically, it looks something like this:
"I ran an ordinary least squares regression, using the model P(it) = ax(it) + b[x(it)^2] + e, where P(it) is the performance of player i at age t, x(it) is the age of player i at age t, and all player-seasons of less than 300 PA were omitted. The e is an error term, assumed iid normal with mean 0."
The flaw is actually the same as in the original, simpler case, the fact that the sample of players is different at each age. But it's harder to see the flaw way, isn't it? It's also harder to describe where the flaw resides -- there's no easy one-sentence explanation about Morgan and Howell like there was before.
So why would you trust the complicated method more than the simple one?
Now, I'm not saying that complexity is necessarily bad. A complex method might be more precise, and give you better results, assuming that there aren't any flaws. But, you still have to check for flaws. If the complex method gives you substantially different results (peak age 29) from the simple methods (peak age 27), that's a warning sign. And so you have to explain the difference. Something must be wrong, either with the complex method, or with all the simple methods. It's not enough to just explain why the complex method is right. You also have to explain why the simple methods, which came up with 27, came out so wrong.
In the absence of a convincing explanation, all you have are different methods, and no indication which is more reliable. In that case, why would you choose to trust the complicated method that you don't understand, but reject a simple methods that you *do* understand? The only reason for doing so is that you have more faith that whoever introduced the complicated method actually got everything right, the method and the calculations and the logic.
I don't think that's justified. My experience leads me to think that it's very, very risky to give that kind of blind trust without understanding the method pretty darn well.