Sabermetric Research: Acknowledging incorrect facts but not incorrect logic

What is the chance of seeing an "Original Six" NHL Final, like we did this past season with Chicago facing Boston?

Well, the Finals are comprised of one team from each conference. After realignment, five of the Original Six are in the East, and one (Chicago) is in the West. Both conferences have fifteen teams.

So, if all teams have an equal chance, the probability is 5/15 (One of the five reaching in the east) multiplied by 1/15 (Chicago reaching in the west). That's 1/45, or 2.2 percent.

-----

This came up in a recent article in Sports Illustrated:

"In the future, with only Chicago in the West, the random odds, calculated by David Madigan, chair of the statistics department at Columbia, shrink to 2.2%."

That made me a little sad, that an expert had to be quoted.

I guess I can't really complain, because the writer had to do it ... even if he understood how to calculate the number -- which I suspect he did -- skeptical readers would believe he pulled the number out of his butt, if it suited their preconceptions.

That got me wondering what the criterion is, for when you quote an expert as opposed to just stating a fact.

"Over 82 games, an NHL team can accumulate as many as 164 standings points, according to Dr. Mary Doe, of the Mathematics department at Harvard."

That wouldn't happen, right? Too simple: 82 games, times two points a game. How about,

"The chance of a fair coin landing heads twice in a row is 25%, according to John Smith, mathematics professor at Yale and author of several texts on probability theory."

Probably not: the reporter would probably just explain how that 25% is calculated. But what about more coins?

"The chance of a fair coin landing heads six times in a row is less than 2%, according to ..."

That one would probably happen.

This is just from my gut ... it almost seems like there's a rule, that if it's not something simple enough for most readers to understand, you're not allowed to just state it without having a source. And that seems to be the case even if you know it for yourself. It's almost like ... if it turns out to be wrong, it's important that it not be the reporter's fault.

------

In that regard, it always seemed strange to me how journalism is so careful about correcting "facts," but so lax about acknowledging bad logic. Here's an example I'm making up (but based on actual articles I've seen):

"Speeding on Anytown roads is at its highest level ever. At any given time, 60 percent of city drivers are exceeding the limit, according to researchers at the Anystate Insurance Institute. And the lack of enforcement exacts a hefty toll. The Institute reports that 77 percent of all fatal multi-car collisions involved at least one speeding driver.

"However, at a press conference yesterday, Mayor Doe downplayed this evidence that speeding kills, and resisted calls for additional enforcement."

Now, if the reporter accidentally misquoted the numbers, there would be a correction in the next day's paper:

"Yesterday, we reported an incorrect Insurance Institute statistic on the proportion of speeding drivers. The correct figure is 70 percent, not 60 percent as reported. The Anytown Daily News regrets the error."

But ... even if the facts were correct, the conclusion doesn't follow.

If 60 percent of drivers speed, then only 40 percent of drivers aren't speeding. In that case, all things being equal, only 16 percent of two-car collisions -- 40 percent of 40 percent -- would involve only non-speeders. That means 84 percent would involve at least one speeder. But the actual number is only 77 percent. At face value, those numbers actually suggest that speeding *prevents* accidents!

So, you'd expect a correction:

"Yesterday, we incorrectly reported that Insurance Institute data proved that speeding is dangerous. However, the quoted facts actually show no evidence that speeding kills, and could even be interpreted as evidence that speeding saves lives. The Anytown Daily News regrets the error."

That would never happen, right? You have to correct facts, but not logic.

-----

From that same SI article:

"So there is at least a small chance that the 2013 finals ... is not last call for the Original Six. But ... the bartender is checking his watch."

Well, it's not that small a chance. Over the next 30 years, say, there's almost a 50-50 chance of at least one Original Six final. (That's 100% minus (97.8 percent to the 30th power)). But that's the author doing his own logic, not quoting the expert.

It seems like that's another rule: you have to quote a source for the raw facts, but you can say anything you want about what those facts mean.

------

What's my point? Um ... I'm not completely sure I have one. Well, I guess, I find it frustrating how these things work. Because, usually, if a conclusion is wrong, whether in journalism or academia or blogs or conversation, it's because of the logic, not the facts. So the emphasis is backwards. You get in deep trouble if you accidentally omit part of your dataset, even when it doesn't change your conclusion ... but, if you get the data right and badly misinterpret what it means, it's not a big deal.

I'm all in favor of getting the facts right, but not at the expense of pretending that the reasoning doesn't matter.

Labels: journalism, statistics

10 Comments:

At Tuesday, September 10, 2013 8:56:00 AM, Anonymous said...: How do you get two 40% of 40%? Are you taking 1-60% and its two non-speeders, so 40x40?

I just want to make sure I have the logic down.
At Tuesday, September 10, 2013 9:07:00 AM, Phil Birnbaum said...: Yes, that's how I'm calculating it.
At Tuesday, September 10, 2013 11:33:00 AM, Anonymous said...: Thanks - do you have any good recommendations on books (maybe textbooks) that discuss probability theory?

Its amazing how much the random person has no sense of probability (i.e. your example of original 6 teams playing over the next 30 years).

Picked some good stuff in this book:

http://www.amazon.com/Seeking-Wisdom-Darwin-Munger-3rd/dp/1578644283/ref=sr_1_1?s=books&ie=UTF8&qid=1378827192&sr=1-1&keywords=seeking+wisdom
At Tuesday, September 10, 2013 11:38:00 AM, Phil Birnbaum said...: I don't know that one! I think I read the Mlodinow book a few years back, and thought it would be pretty good as an introduction. I'll let you know if I think of others ...
At Tuesday, September 10, 2013 11:40:00 AM, Phil Birnbaum said...: Also, I haven't actually seen the one below, but I have the "Statistics" one, and the "Very Short Introduction" books are usually reasonable. They're more "advanced" introductions, more "academicky".

http://www.amazon.ca/Probability-Very-Short-Introduction-ebook/dp/B007CJBYDE/ref=sr_1_10?ie=UTF8&qid=1378827348&sr=8-10&keywords=probability
At Tuesday, September 10, 2013 11:41:00 AM, Phil Birnbaum said...: This review quote from the above book encourages me:

"The book goes a long way in showing that probability, like many other areas of mathematics, is at the bottom of it just applied common sense."
At Tuesday, September 10, 2013 4:35:00 PM, Anonymous said...: Thank you sir. I will check this stuff out. It seems like most people continually misunderstand and misapply probability, etc.

Enjoy your blog and thanks again!
At Saturday, September 14, 2013 2:01:00 PM, James said...: This, especially the lack of knowledge of basic probability theory, bothers me as well. I once tried to explain to someone how hard it is to win the Super Bowl even once you make the playoffs (6.25% for 3-6 seeds, assuming a 50% chance to win each game) and someone responded with "you can't just multiply percentages together". I don't even know how to respond to that.
At Tuesday, September 17, 2013 10:15:00 AM, fizzer555 said...: On last night's Baseball Tonight one expert said, if pushed, he would put a 'cartload' of money on St Louis to win the NL Central purely because Pittsburgh and Cincinnati have to play each other 6 times in the remaining games.

Well, that means that one of Cincy/Pitt is guaranteed at least 3 wins and there is a good probability one of them will come away with at least 4 wins. That sounds like STL will need to win at their current win percentage rate just to keep pace. Wouldn't be taking short odds with that cartload!
At Wednesday, September 18, 2013 5:50:00 AM, Matt said...: I agree with most everything you say here. But doesn't 97% to the 30th power (0.97^30) really approach zero by about the 20th power?

Also, I think it's a gut call on a writer's part when he feels he should cite a source. If I'm writing for a scientific journal, I could say something like, "97% to the 30th power is negligible" and my readers would understand it. If you're writing on a sports blog, the audience may be different and citing an expert might be necessary. Just my two cents.

Interesting blog you have here.

<< Home

Sabermetric Research

Monday, September 09, 2013

Acknowledging incorrect facts but not incorrect logic

10 Comments:

About Me

Previous Posts