Sabermetric Research: GiveWell: Overcomplicating research studies can cost lives

"GiveWell" is an organization that evaluates charities. Not just the usual things -- how well they're run, or how much money goes to administrative expenses -- but also how much good they do for the money they receive.

The idea is: if you have $100 to give to try to make the world a better place, shouldn't you give that $100 where it would give the most benefit? Not just to whoever shows up at your door that day, or whatever organization makes you feel guiltiest, or whoever's suffering kids look the cutest ... but, seriously, to where you can do the most good.

That might not appeal to everyone. If you donate to maximize your own good feelings, instead of the good your donation actually does, GiveWell's evaluations won't make much difference to you. Some people hate to say "no", and so they prefer to give $5 to each of the twenty charities that ask for money. Some people prefer to give to diseases that killed their loved ones, or diseases associated with heroes like Terry Fox. Some people give to causes that signal their political views. Most people prefer to give to help people in their own city or country, even when their dollars will save many more lives abroad.

(I've done all these things, and I'm bit embarrassed about some of them. But I'm not alone. I mean, people give money to the Children's Wish Foundation to send a terminally ill kid to Disneyland ... which is nice, but, that same amount of money might actually save ten lives if they sent it to Africa where kids are actually dying of things that are easily preventable. I'm not sure what's up with me, and my fellow humans, sometimes. But I digress.)

So, in at least one sense, GiveWell is to donors what sabermetrics is to Joe Morgan. It does analysis to reach conclusions that some might find uncomfortable.

However, in another sense, what GiveWell does is *unlike* sabermetrics, in that it usually doesn't try to get down to the third decimal place. It argues that it can evaluate charities heuristically, that the differences are big enough that they can figure out which charities are the best, using the charities' own reports. As I interpret what they're saying, GiveWell can very easily tell you whether a charity is a Danny Ainge or an Albert Pujols, and it can even tell you more subtle things, like whether a charity is a Joe Carter or an Albert Pujols. But it doesn't try to figure out if a charity is a Ryan Braun or an Albert Pujols. It will just tell you that both are recommended.

That is, GiveWell argues that its goals are better met by the transparency of its recommendations than by any detailed, opaque analyses.

Which is almost exactly what I argued in one of my recent posts -- that, in research, simplicity and transparency are more important than rigor. Simple studies make it much easier to understand the results and catch the inevitable errors. A gentleman from GiveWell, Elie Hassenfeld, read that post, and pointed me to a particular example of a serious error that his organization uncovered.

(Disclaimer: I don't really know much about GiveWell. However, I've been impressed by what I've seen, and at least two of the blogs I read and respect (here's one) say very good things about them. So my Bayesian evaluation of them is quite high.)

-----

As I said, GiveWell doesn't believe they need detailed statistical cost/benefit studies to decide which charities to recommend. However, charities themselves often use such analyses to decide where the money should be spent. There's a whole bunch of organizations and academics devoted to figuring out how to save the most lives for the fewest dollars.

With that objective, the Bill and Melinda Gates Foundation donated $3.5 million to fund a study, "Disease Control Priorities in Developing Countries". They published a report ranking various interventions on cost-effectiveness. The Gates Foundation didn't do that itself -- it was done jointly by The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau. Those sound like heavyweights in the world health field.

The results found that -- unsurprisingly to me -- hygiene promotion was the cheapest way to reduce death and disease. The second cheapest, though, was deworming. Specifically, "soil-transmitted helminth" (STH) deworming treatments.

After the report was released, the Gates Foundation provided another $4.4 million to promote the findings. And the findings did indeed attract serious attention. GiveWell writes,

The DCP2’s cost-effectiveness estimates for deworming have been cited widely to advocate a greater focus on treating STH infections, including in:

-- an article in The Lancet

-- a report by REACH, a consortium of large international NGOs and other organizations working to end child hunger, which labeled deworming one of 11 “promoted interventions”

-- the most-cited paper published in the journal International Health

-- an editorial by Peter Hotez, a co-founder of the Global Network for Neglected Tropical Diseases, which has received more than $40 million in funding from the Gates Foundation

-- work by charity evaluators, such as GiveWell, Giving What We Can, and the University of Pennsylvania’s Center for High Impact Philanthropy.

But, as GiveWell later discovered, it turns out the STH estimate was wrong.

That doesn't sound too serious, but here's the thing: it's not just that the estimate was wrong. It was wrong by a factor of almost ONE HUNDRED. The study said that you could save one "disability-adjusted life year" by spending $3.41 on deworming treatments. But, after correcting for the (acknowledged) errors in the study, the actual number was $326.43.

All these well-respected organizations, with serious researchers and serious money, wound up promoting a conclusion that was about as wrong as it could have been. Until the error was caught, then, effectively, 99% of the money devoted to STH treatment was wasted.

How did GiveWell catch the error? Subject matter expertise, mostly. In reading the report, they noticed that the STH estimate was much, much lower than other estimates they had seen. Instead of just assuming that this research was somehow better than the previous studies, they investigated.

That seems like just common sense, right? If you see a study that says an iPod can be bought for $3, when you know it usually costs $300, you should look again, shouldn't you? But that didn't happen until someone at GiveWell decided to figure out what was going on.

So they wrote to one researcher, who sent them to other researchers, who sent them complicated spreadsheets. They tried to figure those out, but they couldn't, so they wrote back and forth with questions and explanations. They were referred to still another researcher, who sent them a copy of yet another study that was the source of some of the data.

Eventually, they figured out where the issues were ... if you want a full explanation, it's in their post. It was a lot of detailed, technical effort to figure out what went wrong, and which parameters were in error.

GiveWell's conclusions:

We believe that the errors we’ve found in the estimate would have been caught by a helminth expert independently examining the estimate. Therefore, the presence of these errors implies to us that there has been no such examination. If this is the case, it would argue against the reliability of the DCP2’s estimates in general.

We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.

More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust - no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.

Absolutely right. You can't substitute credentials for subject matter expertise, and you can't substitute complexity for transparency.

And, one thing I would add: when a study appears to discover that you can get benefits at 99% off the original, well-accepted price ... you have to be suspicious about accepting that conclusion, even if you have no other reason to believe there was any mistake.

-----

P.S. GiveWell expands on the theme here.

Labels: academics, bayes, charity, GiveWell

1 Comments:

At Wednesday, January 25, 2012 11:03:00 AM, Elie Hassenfeld said...: This is Elie Hassenfeld from GiveWell.

A quick comment and a clarification:

Even though we found large errors in the standard estimate for deworming's cost-effectiveness, we still believe that deworming is among the best options for donors. Our #2-ranked charity last year was the Schistosomiasis Control Initiative, a deworming charity.

All things considered, we felt that deworming remained competitive with the best options we've found. (All this is written up in detail in our full report on deworming.)

More broadly, decisions about allocation of funding to global health programs aren't just a function of estimated cost-effectiveness. For instance, a great deal of money goes to HIV/AIDS programs that everyone would agree are less cost-effective than other programs which don't receive as much funding. In the context of all global health funding, deworming is poorly funded relative to its (updated and correct) cost-effectiveness.

Finally, one quick clarification in the post. You wrote, "But it doesn't try to figure out if a charity is a Ryan Braun or an Albert Pujols. It will just tell you that both are recommended." We spend more time separating the Joe Carter's from the Albert Pujols's, but we do care and expend resources distinguishing Ryan Braun's from Albery Pujols's too. We just don't try to do this based purely on numbers, because we don't think the numbers in this domain are reliable enough to be used that way.

So, while one message is "SCI and AMF are our two top-rated charities," we also rank AMF #1 and SCI #2 (and recommend donors give significantly more to AMF than SCI), a decision we've written about at on our blog.)

<< Home

Sabermetric Research

Friday, January 20, 2012

GiveWell: Overcomplicating research studies can cost lives

1 Comments:

About Me

Previous Posts