Sabermetric Research: Properly interpreting data is more than just "numeracy"

Friday, May 21, 2010

Properly interpreting data is more than just "numeracy"

Here's a recent post from Freakonomist Stephen Dubner on numeracy. Dubner quotes John Allen Paulos on a specific, interesting example on how numbers can mislead:

"Consider the temptation to use the five-year survival rate as the primary measure of a treatment for a particular disease. This seems quite reasonable, and yet it’s possible for the five-year survival rate for a disease in one region to be 100 percent and in a second region to be 0 percent, even if the latter region has an equally effective and cheaper approach.

"This is an extreme and hypothetical situation, but it has real-world analogues. Suppose that whenever people contract the disease, they always get it in their mid-60s and live to the age of 75. In the first region, an early screening program detects such people in their 60s. Because these people live to age 75, the five-year survival rate is 100 percent. People in the second region are not screened and thus do not receive their diagnoses until symptoms develop in their early 70s, but they, too, die at 75, so their five-year survival rate is 0 percent. The laissez-faire approach thus yields the same results as the universal screening program, yet if five-year survival were the criterion for effectiveness, universal screening would be deemed the best practice."

Dubner argues that, in order to better spot these kinds of situations, we need a public that's more numerate.

But, I think, numeracy isn't really what's needed here: it's not really mathematics, but just common logic. Really, you can rewrite Paulos's situation involving almost no numbers at all (except numbers of years, which everyone understands):

"Suppose that if you're diagnosed with a certain disease in one region, you always die within five years. But if you're diagnosed with the same disease in another region, you *never* die in five years. Does that mean that treatment is better in the second region? No, not necessarily. It could be that the disease takes ten years to kill you, and no treatment helps. In the "bad" region, it's not diagnosed until year eight. In the "good" region, they have universal screening, and the disease is always diagnosed in year one. So even if the universal screening does no good at all, it *looks* like it does."

All but the most innumerate person should be able to understand that, right? It's not really a question of mathematics: it's a question of *logic*. Sure, the more used to numbers you are, the more likely you are to think of this possibility -- at least to a certain extent. But, in this particular case, having an open mind and being able to think clearly are more important than mathematical ability.

And it needs a certain amount of creativity, cleverness, interest and skepticism. And data. Even a Ph.D. in math wouldn't necessarily be able to figure out what's going on, because you need the right details. Suppose a newspaper editorial calling for pre-screening simply said,

"In region A, which prescreens for the disease, five-year survival rates after diagnosis are 100%. In region B, which does not, the survival rate is zero. This evidence strongly suggest that we need to encourage prescreening."

If you'd never seen this example before, would you immediately realize the possibility that the pre-screening was no use at all? I probably wouldn't, and I regularly read academic studies with a skeptical eye. How would you expect a guy skimming the morning paper on his way to work to think of it?

Sure, I might think of it if I had seen other studies that had shown pre-screening to be ineffective. Then, I'd think, geez, how did this study come up with such a contradictory result? And maybe I'd then figure it out. But I'd have to care enough to want to think about it skeptically.

And even if I'd figured out that was a logical possibility, consistent with the numbers ... a possibility doesn't mean it's actually happening, right? I might correctly think, "yeah, it could be just a question of when the disease is diagnosed ... but is it really likely that that's *all* of it? It's more likely that *some* of the effect is early diagnosis, and the rest of the effect is more time to treat after diagnosis."

To know for sure, it's not enough to be numerate, or reasonable, or logical. You need more data. But the data might not exist. If you're lucky, the issue arises from a study somewhere, and you can read that study and figure out what's going on. But if it's just raw data, and all that's being recorded is the five-year survival rate, all you can do is note that both hypotheses fit the data: that prescreening always helps, and that it never helps.

Dubner recommendations:

"Footnotes, I guess, and transparency, and a generally higher level of numeracy among the populace."

All of which are good things. But "footnotes" assume that the authors of the study realize what might be going on. From what I've seen in sports studies, that's generally not the case. My experience is that authors do just fine at the "numeracy" stuff, dealing with the numbers and doing the regressions and statistical tests. But they're weaker at properly understanding what the numbers actually *mean*. I'd bet you that eight out of ten researchers who found data matching Paulos' example wouldn't think of Paulos' explanation. Peer reviewers would be even more likely to miss it.

Since Dubner's kind of "numeracy" is so difficult, isn't it more efficient to better use what's already out there, instead of making what would probably be a futile attempt to create more of it? If you really want to try to solve the problem of improper conclusions in studies like this, just get lots of people to look at the study, and give those people incentives to spot these kinds of things.

Here's how such a plan might work:

-- a researcher submits a study to the journal, as normal, and peer review proceeds as normal.

-- after deciding to provisionally accept the paper, the journal immediately posts it online, and allows for online comments and discussion.

-- after a suitable interval, a second set of senior referees review the comments and peer-review the paper again in light of those comments.

-- those peer reviewers allocate a fixed sum of money per paper -- $500, say -- among the commenters, in proportion to the usefulness of their comments, whether the paper gets published or not.

The benefits of a system like this are pretty clear: a potentially unlimited number of peer reviewers, motivated by money and reputation, are more likely to spot flaws than a couple of anonymous peer reviewers who are perhaps motivated to go easy on their peers. That means a lot fewer flawed papers, and a lot fewer press reports of incorrect conclusions -- conclusions that newspaper readers, no matter how intelligent or numerate, aren't given enough evidence to question.

There are other beneficial side effects beyond the obvious advantage of fewer fallacious papers:

-- the system provides an incentive for authors to give their conclusions more careful thought before submitting, saving the system time and money.

-- it provides a way for smart grad students to make a little extra money to support themselves.

-- it's cheap. I bet $500 a paper would be enough to clear most flaws. If you figure how long a paper takes to write, and how much professors get paid, $500 a study is a drop in the bucket compared to the cost of the original study.

-- the record of corrections makes it clear to everyone in and out of academia what kinds of flaws to avoid. (Within a few months, nobody will be neglecting to talk about regression to the mean, and everyone will know what selective sampling is.)

-- it will allow for a more accurate measure of academic accomplishment. Since academia is known to be a competitive environment but without formal scorekeeping, anything that allows a more accurate subjective rating of a researcher's proficiency has to be a good thing.

But I don't think this will ever happen, for political reasons. Nobody likes to have their errors exposed publicly. The current system, where peer review is mostly private, allows everyone to save face.

Furthermore, professors would probably be somewhat reluctant to criticize other professors, so the most effective critiquing would come from outside of academia. I am pretty certain that academic Ph.D.s, as a group, will never stand for a system that lowers their status by having laymen publicly correct their mistakes.

Labels: academics, numeracy

16 Comments:

At Friday, May 21, 2010 3:48:00 PM, Anonymous said...: This is an interesting post Mr. Birnbaum. Would you be willing to have your research/opinions undergo the same scrutiny that you are proposing for academics?

How can you be state that academic PhDs, as a group, will never stand for a system that lowers their status by having laymen publicly correct their mistakes. PhDs are incredibly bright individuals who take great care to ensure their research and findings are above reproach. Consequently they welcome scrutiny from academia and laymen. It would not lower their status – merely validate their findings. You should be on Facebook. LOL
At Friday, May 21, 2010 4:11:00 PM, Gerard Monsen said...: From another part of Phil's article:

"The benefits of a system like this are pretty clear... That means a lot fewer flawed papers, and a lot fewer press reports of incorrect conclusions -- conclusions that newspaper readers, no matter how intelligent or numerate, aren't given enough evidence to question."

This is absolutely wrong and is the primary reason why this will never ever work. Media are all about being the first to publicize things they find out about, and if a particularly interesting preliminary paper is published, the media will not patiently wait for the article to be reviewed, critiqued, and corrected. What's more, if it turns out that the conclusions of a paper they had already discussed were partially or completely wrong, the media will often downplay or not report the correction. This process will lead to an explosion of misinformation being released to the public.

When a researcher gives their results to a reporter before they get published in a peer reviewed journal, it is sometimes called, "publishing in a newspaper," which is considered a very big problem that can potentially destroy a researcher's career. I remember vividly the panic in a researcher's face at Lawrence Berkeley Laboratory, where I worked, when he found out that a preliminary version of his paper was found in the Physics library on campus by the science writer of a local newspaper. The professors at UC Berkeley would place pre-pub papers there so that fellow professors and graduate students could read and critique them at their leisure. The science writer, new to the business, found out about this binder and started snooping for scoops. When the researcher was called about his paper, he panicked and did his best to emphasize that two other groups were also doing work in the field and that they should be contacted and given credit as well. He then called those groups and tried to explain what happened and tried to reassure them that he wasn't trying to scoop them or claim sole credit for a theory he was proposing (which was one among three of proposed theories for a phenonmenon). The article in the SF Chronicle came out anyway and gave not nearly enough credit to those other groups or their ideas. To this day, he still is haunted by the reputation of being one who "publishes in the newspaper."

Trust me, this is the back-breaker of the proposal and the primary reason why no self respecting researcher would agree to this system. The release of scientific information to the media and the general public has to be carefully planned and orchestrated in order to make sure that everyone receiving the information gets a good broad understanding of the discovery, the background behind it, who should get credit for the discovery, and what further issues have yet to be resolved. This may not seem critical when talking about baseball research, but when you're talking about major scientific research, it's crucial.
At Friday, May 21, 2010 4:50:00 PM, Gerard Monsen said...: I can’t let this go. Let me give you an example of how releasing information to newspapers can be dangerous. In 1998, Dr. Andrew Wakefield published a small 12-kid study in The Lancet where he claimed that there is a link between vaccines with MMR and autism. So, yeah, technically, he did publish his small paper in a peer-reviewed journal, but let’s keep moving. A brief cartoon synopsis can be found here for those who just want an outline:

http://tallguywrites.livejournal.com/148012.html

It turns out that not only had the study involved a small sample size, the research was also bought with over a million dollars by a class action lawyer. Plus it turns out that Wakefield filed a patent for a one-shot non-MMR measles vaccine. None of this was disclosed to The Lancet. Once the paper was published, Wakefield went on a media blitz publicizing this supposed link between vaccines and autism, causing a huge vaccine scare. The media went nuts, parents of autistic children went nuts, and a Hollywood couple that is nuts (Jenny McCarthy and Jim Carry) who has an autistic child (some think he might have had another condition) went vocal. Huge numbers of people stopped giving vaccines to their children.

Now, follow-up studies were done. Huge ones with large sample sizes, and all showed that there is in fact no connection between autism and vaccines. Has this been publicized enough? Not really. People like Jenny McArthy will never recind their bliefs. So, now measles outbreaks are occuring even in first world countries. Measles of all things! A 6-month year old child died of Whooping Cough in Austrialia, because the local population had such low vaccination rates that the “herd immunity” was gone. There are outbreaks and epidemics of diseases that have no reason at all to affect us anymore, because of one bad paper and the media who went nuts over it.

Now, you want to allow any researcher in the world to publish their preliminary papers in a manner that any media outlet can get their hands on it? That is completely and utterly foolish.
At Friday, May 21, 2010 5:02:00 PM, Phil Birnbaum said...: Hi, Gerard,

Thanks for your comments ... I didn't know about the cultural implications of "publishing in the newspaper".

Let me ask you: there must be exceptions, right? Because a lot of the studies I comment on here have actually not been published. There was the big Hamermesh (et al) study on racial bias among umpires. There was that hockey study that (wrongly) claimed that fighting penalties are linked to winning. There was also a study about golfers giving less effort when playing against Tiger Woods -- I commented on that one a couple of years ago, and a couple of weeks ago Slate commented on an updated version.

Those are just off the top of my head ... there are probably many more. And isn't there a repository of preliminary social science papers that are free to download? They have comments on the end not to cite without contacting the author first, but they're free for the taking.

And I think the "Freakonomics" blog writes about unpublished papers fairly regularly.

So what's the exception? As far as I can tell, although I may be wrong, Daniel Hamermesh is a very respected economist, despite his unpublished paper getting lots of press in the mainstream media.
At Friday, May 21, 2010 5:07:00 PM, Unknown said...: Anon -- come on now. Phil has a blog. All his works gets published here on perhaps at BBTN, which he links to here. The sabermetric community has the opportunity and criticize and scrutinize Phil's work more than most happens for most peer reviewed academic paper.
At Friday, May 21, 2010 5:09:00 PM, Phil Birnbaum said...: To address Gerard Monsen's other points:

I am arguing for publishing papers online *only after a first round of peer review* -- where otherwise, absent my proposal, the paper would just wind up being published. So it wouldn't be the free-for-all that you suggest.

Secondly, if this happened with *all* papers, there would no issue of some researchers "publishing in the newspapers". It would be a standard part of peer review.

Thirdly, newspapers would understand that this is a part of the process where corrections are made and debates occur. You know how when papers cover a lawsuit, they always say "the accusations have not been proven in court?" They might do the same here: "the claims have not yet been approved by the public peer review process."

That means bad papers would be less likely to get implied unconditional approval from the press. Right now, they'd be published and any complaints would be considered uncorroborated or sour grapes. With public peer review, it would be obvious that confirmation is still required.

Fourth, in abuses like your autism example, public peer review might have exposed some of the problems BEFORE the paper got published. Wouldn't that be better? Instead of a near-fraudulent paper in a journal, it would just be some self-promoter getting caught trying to get his faulty research get published so he could get rich. That's quite an improvement!
At Saturday, May 22, 2010 10:59:00 AM, minesweeper said...: yeah, I have to laugh at #1. You must still be in graduate school, going on what, year 12?

If I have a problem with Phil's work, I can post my criticism right here on his blog, where everyone will see it. Now if I have a problem with the work of an academic, I cannot criticize it in the medium in which it appears unless I storm the antiquated peer-review process. I have to wait about a year for someone to APPROVE of my criticism and then publish it alongside perhaps other APPROVED criticisms.
At Sunday, May 23, 2010 5:44:00 PM, Unknown said...: I think this is an interesting idea, but it is mostly a formalization of what currently takes place at a post publication stage - as a given field sifts through various new findings and decides which data are solid enough to be incorporated into its canon. For fields who's papers are generally the result of a single finding based on one particular analysis, I can imagine something like this being practical, but the idea of publicly disclosing findings in fields where many individual results are combined into a single manuscript, is simply not practical without major overhauls to "academic scorekeeping." Furthermore, a "system which provides an incentive for authors to give their conclusions more careful thought before submitting" is probably a very good thing in cases where the data is readily available, and a large fraction of a works value is based on careful or innovative interpretation. When the creation of data itself is highly valuable, and analyses more trivial, however, public good may be better served by rapid reporting of data regardless of the quality of the conclusions.
At Wednesday, May 26, 2010 12:59:00 AM, Don Coffin said...: A couple of points. A $500 prize-for-comments pool per article may sound like it's not a burden, but I can guarantee you that it's well beyond the financial capabilities of most of the journals I have done work for (almost all published by university presses, with unpaid editors and referees, I will add). I edited a journal for a couple of years, and we sent more than 100 articles out for peer review each year; in both years, only about 25% were outright rejected, so we would have had to post 75 articles per year, uinder this scheme, for public peer review. If the entire $500 pool got paid out for each article, that's what, $37,500? Our entire budget was about $5,000 (we "publish" on-line, and our subscription revenue is not much).

Or consider a larger, better-established journal that I do some refereeing for--the Journal of Economic Education. In 2008, it had about 750 subscribers, and annual subscription revenue of $100,000 (as near as I can calculate). It's a print journal, publishing 4 issues a year (and probably printing close to 1,000 copies per issue). Each issue probably costs about $10,000 per issue to print. The Journal also has some office staff, advertising, and other expenses (postage, supplies, etc.), which probably equalled their printing expenses. Referees don't get paid. (I don't know whether the eidtor or section editors got paid, but I'd doubt it.) So the total costs were probably around $80,000.

JEE received close to 300 submissions in 2008. My guess is that something like half of those received peer review. The "prize pool," then, at $500 per article would have been $75,000. But there was nothing like that much money available.

JEE is not published by a university press, but by Heldref, which is (I think) a not-for-profit publisher. But that doesn't mean a "we're happy to lose money" publisher.

Finally, this: "...professors would probably be somewhat reluctant to criticize other professors..." You have obviously never submitted a paper to an academic journal that does peer review, or presented a paper at an academic conference. Criticizing each other is not only OK, it is, in most disciplines, a sign that you take each other's work seriously to care whether it's right. After all, bad work, in a very real sense, reflect on all of us.
At Wednesday, May 26, 2010 1:15:00 AM, Don Coffin said...: Just for kicks, I looked at the financial statement of the American Economic Association for 2009. $7.7 million in revenue, $7.1 million in expenses. Nine journals, with over in excess of 10,000 submissions in 2009. If only 30% of those were peer-reviewed and posted for open comment as per Phil's suggestion, that's $150,000 in prizes...doable? Maybe, but I wouldn't count on it...and the AEA is probably in the best financial shape of any publisher of academic journals in my discipline.
At Wednesday, May 26, 2010 9:55:00 AM, Phil Birnbaum said...: Hi, Doc,

I was suggesting that only articles that pass peer review and are deemed ready to publish receive the $500 peer review. I don't know if that changes your conclusion about the feasibility.

If 3,000 submissions are peer reviewed, that's $150,000, which is too much. But how many articles were actually *published*? Those are the only ones that require the extra $500.

In any case, you're much more expert than I am on this point, so I defer to you.

However: given the costs in salaries, offices, etc., which contribute to a high overall cost in getting an article published ... well, it does seem to me that $500 is a small fraction of that. You're saying that the journals can't afford that, which make sense to me ... but maybe the *schools* could afford it?

It just seems to me that it would take so little extra to catch problematic papers that the money could be found somewhere. Is it a just a matter of figuring out how to pay for the public good?
At Wednesday, May 26, 2010 12:41:00 PM, Don Coffin said...: Phil--Maybe. But the problem is that peer review is not an up-or-down system; it's a "revise-and-resubmit" system. It's often two years (or more) between initial submission and acceptance, which is fairly problematic as things stand. So we'd need to expect a substantial improvement in product quality to drag things out even more.

Second, peer reviewers *work for free.* If we create a system in which some reviewers get to share a prize pool for their c omments, it would, I think, reduce the incentive to be a peer reviewer--why not wait and get a slice of the prize? Ins hort, a second-order effect is to reduce the quality of peer review.

Third, the suggestion that institutions might pay for this? Again, there's a cost issue. My relatively small public institution employs nearly 200 full-time faculty. We average over 100 peer-reviewed publications a year--that'd be $50,000 just for us, just if those that were provisionally accepted were covered by your scheme. We budget to break even. We don't have a $50,000 pot to use for this.

So maybe the faculty themselves would pay the prize money? I rather doubt it.
At Wednesday, May 26, 2010 1:17:00 PM, Phil Birnbaum said...: Right, with the prize, I can see how peer reviewers might not want to review. Still, the $500 comes into play only if the paper passes peer review the first time, so there's no guarantee that the flaws you see now will still be there at $500 time.

And I can see how $50,000 can break a budget. On the other hand, it's only half a professor, or 0.5% of your budget.

What if the $500 went to the *budgets* of whoever finds the flaws, instead? Then it's zero sum overall, although the high-publishers low-reviewers will subsidize the high-reviewers low-publishers. Just thinking out loud.

Anyway, what you're saying makes sense, that there are financial issues, and you're the expert. So let me try another question:

Is it worth it as a public good? Suppose that (say) 80% of flaws that would otherwise be published would be caught for $500 a paper. Is that good value for money, independent of who would pay for it and where the money would come from?

Or do you think $500 a paper could better be spent improving the quality of publications in other ways? Or is the quality of publications good enough that you wouldn't be buying much for the $500?

My view is that some of the papers that I've reviewed have such important flaws that $500 is a small price to pay to fix them. Especially the ones that get coverage in the popular press. But I'm cherry-picking the flawed ones. If only 1 in 10 is bad, then you're talking $5000 a flaw, which seems like too much.

(I suppose that you could scale the awards to the "size" of the flaw. $500 for a paper with big flaws, $50 for a paper with only nitpicks. Again, I'm thinking out loud.)
At Wednesday, May 26, 2010 2:11:00 PM, Don Coffin said...: Nature has tried an open peer review process, not exactly what you've proposed, and uncompensated. Anyway, here's there report on it:

http://www.nature.com/nature/peerreview/debate/nature05535.html
At Wednesday, May 26, 2010 2:15:00 PM, Don Coffin said...: Here's what Shakespeare Quarterly experienced when they did open peer review:

http://mediacommons.futureofthebook.org/mcpress/ShakespeareQuarterly_NewMedia/
At Wednesday, May 26, 2010 2:17:00 PM, Phil Birnbaum said...: Thanks! I'll take a look at both of those.

Sabermetric Research

Friday, May 21, 2010

Properly interpreting data is more than just "numeracy"

16 Comments:

About Me

Previous Posts