Thursday, October 30, 2008

Review of "Super Crunchers"

I wrote this review of "Super Crunchers" in January, and for some reason never posted it. Brian Burke's comment in the previous post, which deals with some of the same subject matter, reminded me that I still had it.


I've just finished Ian Ayres' book "Super Crunchers", and I'm a little disappointed. It's an excellent book, and I enjoyed it. But it's not as sabermetric as I thought; I was hoping for lots of meaty examples, like the one in the introduction where statistician Orley Ashenfelter came up with a formula, based on temperature and rainfall, to predict how good a year's wine crop will be. But most of the examples are more commonplace.

"Why Thinking-by-Numbers Is the New Way to Be Smart," is the book's subtitle. And there are many flavors of "knowledge through numbers" discussed. For instance, there's a chapter on traditional statistics, like the normal distribution and standard deviations. There's a bit on the "false positives" problem, where if a disease is very rare, most of the people who test positive for it won't actually have it.

There's an entire chapter on how to make decisions by randomization. In choosing the title for his book, Ayres bought two sets of internet ads: one that used "Super Crunchers," and another that used "The End of Intuition." The first title got 60% more hits than the second, and the rest is history.

In Mexico, this kind of technique was used to test a certain anti-poverty program. Certain households, randomly chosen, were offered cash incentives if they took their children to health clinics and kept them in school. The results showed the program worked – the randomly-chosen group had better results than the non-chosen group. Ayres calls this method of policymaking "Government by Chance."

And there's lots of stuff on just, plain, regular, analysis of data, like Steven Levitt's study concluding that legal abortion reduces crime (since the unaborted babies are more likely to commit crimes because grow up poor). There's also predicting of trends, like when Wal-Mart knows that, after hurricanes, demand for Pop-Tarts skyrockets.

Those chapters are interesting, but the best parts of the book – and the largest – are where Ayres talks about how well data analysis works in situations where you'd think an informed, intuitive, expert judgment would be better. Such as, for instance, the Moneyball claim that players can be better scouted by their statistical record than by the opinions of the scouts who watch them.

A significant portion of the discussion involves medicine. The impression you get is that the medical professionals fly by the seat of their pants, like baseball scouts, trying to figure out what's best. But, as Ayres demonstrates, the data can be a lot more accurate than intuition. And, again like the scouts, doctors are defiant when outsiders' knowledge competes with theirs.

For instance, in the 1840s, a researcher found that mortality rates dropped by about 80% when doctors washed their hands between patients. The doctors didn't want to wash their hands, so they dismissed the findings, and patients died.

It sounds like we should know better now, but apparently not. In 1999, a doctor named Don Berwick went on a crusade to convince hospitals to implement certain basic procedures that would have a huge effect. Hand washing was one of them, but there were others, like formal procedures to double-check drug doses. There were six reforms in total, and they saved some 122,000 lives.

Berwick compares these procedures in hospitals to formal FAA procedures that guide aviation flights – no matter how experienced the pilot, procedures have to be followed.

What's amazing to me is that even though the data was out there, and the research was done, nobody bothered to change the way they were doing things. Apparently baseball is not unique in clinging to tradition and scoring new knowledge.

This is also the case in diagnosis. There are now computer systems out there that will take patient information – such as symptoms, genetic history, etc. – and produce a list of possible causes. Even the best of doctors can't sift through 11,000 diseases in their heads, and, if you rely on the doctor's memory and intuition, misdiagnoses, perhaps fatal ones, will be made. Ayres writes,

"... about 10 percent of the time, [the] Isabel [software] helps doctors include a major diagnosis that they would not have considered but should have."

Ten percent is a LOT. In my opinion, any doctor who doesn't use this software, and misdiagnoses a patient, should meet a swift and unpleasant death. Or, worse, a lawsuit. There's just no excuse for refusing to use all reasonable methods to check your diagnosis. Especially out of arrogance.

Diagnosing rare medical conditions seems reasonably complex, and you might be comfortable with the idea that computers can do it better than humans. But it turns out that formulas can often beat humans even when the formulas are very simple. For instance, experts were asked to predict how US Supreme Court Justice Sandra Day O'Connor would rule on certain cases. They competed against a flowchart that the researchers had devised in advance, a chart that fits on less than one page of the book.

The flowchart beat the experts.

Ayres makes much of the fact that all this is "number crunching." My view is that it doesn't matter that there are numbers involved. What there is, rather, is evidence. The rules of logic and evidence, and the scientific method, are what makes the knowledge, not the arithmetic. I'd argue that the people in the book shouldn't be called "Super Crunchers." They should just be called "scientists."


By the way, it seems to me that the book's main baseball discussion isn't quite correct. In a scouts-versus-sabermetricians discussion, Ayres quotes Michael Lewis quoting Bill James:

"The naked eye was inadequate for learning what you need to know to evaluate players. Think about it. One absolutely cannot tell, by watching, the difference between a .300 hitter and a .275 hitter. The difference is one hit every two weeks."

But James didn't say that to justify sabermetrics – he used that to justify *baseball records*, even traditional ones such as batting average. By this standard, baseball people have been "super crunchers" for over 100 years.

The scouts-vs.-records debate has little to do with which formulas you use to measure productivity, but a lot to do with whether the statistical records of prospects have an importance beyond scouts' impressions. The Jeremy Brown debate – Billy Beane liked him because he could hit, the scouts hated him because he was fat – could have just as easily happened 40 years ago, without Bill James.

And one last point: on page 210, Ayres quotes Ben Polak and Brian Lonergan's statistic that rates players based on changes in win probability. He says "they have done [Bill] James one better." Of course, they have not.

Labels: ,


Post a Comment

<< Home