Properly interpreting data is more than just "numeracy"
Here's a recent post from Freakonomist Stephen Dubner on numeracy. Dubner quotes John Allen Paulos on a specific, interesting example on how numbers can mislead:
"Consider the temptation to use the five-year survival rate as the primary measure of a treatment for a particular disease. This seems quite reasonable, and yet it’s possible for the five-year survival rate for a disease in one region to be 100 percent and in a second region to be 0 percent, even if the latter region has an equally effective and cheaper approach.
"This is an extreme and hypothetical situation, but it has real-world analogues. Suppose that whenever people contract the disease, they always get it in their mid-60s and live to the age of 75. In the first region, an early screening program detects such people in their 60s. Because these people live to age 75, the five-year survival rate is 100 percent. People in the second region are not screened and thus do not receive their diagnoses until symptoms develop in their early 70s, but they, too, die at 75, so their five-year survival rate is 0 percent. The laissez-faire approach thus yields the same results as the universal screening program, yet if five-year survival were the criterion for effectiveness, universal screening would be deemed the best practice."
Dubner argues that, in order to better spot these kinds of situations, we need a public that's more numerate.
But, I think, numeracy isn't really what's needed here: it's not really mathematics, but just common logic. Really, you can rewrite Paulos's situation involving almost no numbers at all (except numbers of years, which everyone understands):
"Suppose that if you're diagnosed with a certain disease in one region, you always die within five years. But if you're diagnosed with the same disease in another region, you *never* die in five years. Does that mean that treatment is better in the second region? No, not necessarily. It could be that the disease takes ten years to kill you, and no treatment helps. In the "bad" region, it's not diagnosed until year eight. In the "good" region, they have universal screening, and the disease is always diagnosed in year one. So even if the universal screening does no good at all, it *looks* like it does."
All but the most innumerate person should be able to understand that, right? It's not really a question of mathematics: it's a question of *logic*. Sure, the more used to numbers you are, the more likely you are to think of this possibility -- at least to a certain extent. But, in this particular case, having an open mind and being able to think clearly are more important than mathematical ability.
And it needs a certain amount of creativity, cleverness, interest and skepticism. And data. Even a Ph.D. in math wouldn't necessarily be able to figure out what's going on, because you need the right details. Suppose a newspaper editorial calling for pre-screening simply said,
"In region A, which prescreens for the disease, five-year survival rates after diagnosis are 100%. In region B, which does not, the survival rate is zero. This evidence strongly suggest that we need to encourage prescreening."
If you'd never seen this example before, would you immediately realize the possibility that the pre-screening was no use at all? I probably wouldn't, and I regularly read academic studies with a skeptical eye. How would you expect a guy skimming the morning paper on his way to work to think of it?
Sure, I might think of it if I had seen other studies that had shown pre-screening to be ineffective. Then, I'd think, geez, how did this study come up with such a contradictory result? And maybe I'd then figure it out. But I'd have to care enough to want to think about it skeptically.
And even if I'd figured out that was a logical possibility, consistent with the numbers ... a possibility doesn't mean it's actually happening, right? I might correctly think, "yeah, it could be just a question of when the disease is diagnosed ... but is it really likely that that's *all* of it? It's more likely that *some* of the effect is early diagnosis, and the rest of the effect is more time to treat after diagnosis."
To know for sure, it's not enough to be numerate, or reasonable, or logical. You need more data. But the data might not exist. If you're lucky, the issue arises from a study somewhere, and you can read that study and figure out what's going on. But if it's just raw data, and all that's being recorded is the five-year survival rate, all you can do is note that both hypotheses fit the data: that prescreening always helps, and that it never helps.
"Footnotes, I guess, and transparency, and a generally higher level of numeracy among the populace."
All of which are good things. But "footnotes" assume that the authors of the study realize what might be going on. From what I've seen in sports studies, that's generally not the case. My experience is that authors do just fine at the "numeracy" stuff, dealing with the numbers and doing the regressions and statistical tests. But they're weaker at properly understanding what the numbers actually *mean*. I'd bet you that eight out of ten researchers who found data matching Paulos' example wouldn't think of Paulos' explanation. Peer reviewers would be even more likely to miss it.
Since Dubner's kind of "numeracy" is so difficult, isn't it more efficient to better use what's already out there, instead of making what would probably be a futile attempt to create more of it? If you really want to try to solve the problem of improper conclusions in studies like this, just get lots of people to look at the study, and give those people incentives to spot these kinds of things.
Here's how such a plan might work:
-- a researcher submits a study to the journal, as normal, and peer review proceeds as normal.
-- after deciding to provisionally accept the paper, the journal immediately posts it online, and allows for online comments and discussion.
-- after a suitable interval, a second set of senior referees review the comments and peer-review the paper again in light of those comments.
-- those peer reviewers allocate a fixed sum of money per paper -- $500, say -- among the commenters, in proportion to the usefulness of their comments, whether the paper gets published or not.
The benefits of a system like this are pretty clear: a potentially unlimited number of peer reviewers, motivated by money and reputation, are more likely to spot flaws than a couple of anonymous peer reviewers who are perhaps motivated to go easy on their peers. That means a lot fewer flawed papers, and a lot fewer press reports of incorrect conclusions -- conclusions that newspaper readers, no matter how intelligent or numerate, aren't given enough evidence to question.
There are other beneficial side effects beyond the obvious advantage of fewer fallacious papers:
-- the system provides an incentive for authors to give their conclusions more careful thought before submitting, saving the system time and money.
-- it provides a way for smart grad students to make a little extra money to support themselves.
-- it's cheap. I bet $500 a paper would be enough to clear most flaws. If you figure how long a paper takes to write, and how much professors get paid, $500 a study is a drop in the bucket compared to the cost of the original study.
-- the record of corrections makes it clear to everyone in and out of academia what kinds of flaws to avoid. (Within a few months, nobody will be neglecting to talk about regression to the mean, and everyone will know what selective sampling is.)
-- it will allow for a more accurate measure of academic accomplishment. Since academia is known to be a competitive environment but without formal scorekeeping, anything that allows a more accurate subjective rating of a researcher's proficiency has to be a good thing.
But I don't think this will ever happen, for political reasons. Nobody likes to have their errors exposed publicly. The current system, where peer review is mostly private, allows everyone to save face.
Furthermore, professors would probably be somewhat reluctant to criticize other professors, so the most effective critiquing would come from outside of academia. I am pretty certain that academic Ph.D.s, as a group, will never stand for a system that lowers their status by having laymen publicly correct their mistakes.