Significance testing and contradictory conclusions
      A bank is robbed.  There are witnesses and video cameras.  There are two suspects -- identical twin brothers.  The police investigate, and are unable to determine which brother is the criminal.  
The police call a press conference.  They are familiar with the standards of statistical inference, where you don't have evidence until p < .05.  And, of course, the police have only p = .5 -- a fifty/fifty chance.
"There is no evidence that Al did the crime," they say, accordingly.  "And, also, there is no evidence that Bob did the crime."
But if the newspaper says, "police have no evidence pointing to who did it," that seems wrong.  There is strong evidence that *one* of the brothers did it.
------
"Bad Science," by Ben Goldacre, is an excellent debunking of some bad research and media reporting, especially in the field of medicine.  A lot of the book talks about the kinds of things that we sabermetricians are concerned about -- bad logic, misunderstandings by reporters, untested conventional wisdom, and so forth.  
I just discovered that Goldacre has a blog, and I found this post on significance, which brings up the twins issue.
Let me paraphrase his example.  You have two types of cells -- normal, and mutant.  You give them both a treatment.  In the normal cells, firing increased by 15 percent, but that's not statistically significant.  In the mutant cells, firing increased by 30 percent, and that *is* significant.  (I'm going to call these cases the "first treatment" and the "second treatment," even though it's the cells that changed, and not really the treatment.)
So, what do you conclude? 
Under the normal standards of statistical inference, you can say the following:
Seeing that, a lay reader would obviously conclude that the researcher found a difference between the two treatments.
Goldacre objects to this conclusion. Because, after all, the difference between the two treatments was only 15 percent. That's probably not statistically significant, since that same 15 percent number was judged insignificant when applied to the normal cell case. So, given that insignificance, why should we be claiming there's a difference?
What Goldacre would have us do, I think, is to say that we don't have evidence that there's a difference. So we'd say this:
And now we have the twin situation. Because even though there's no evidence for #1, and there's no evidence for #3, there is evidence that one of the two must be true. Either the first treatment has an effect, or the two treatments are different. They can't both be false. At least of the twins must be guilty.
You have to be especially careful, more careful than usual, that you don't assume that absence of evidence is evidence of absence. Otherwise, if you insist that both coefficients should be treated as zero, you're claiming a logical impossibility.
----
Even if you just assume that one of the two coefficients is zero ... well, how do you know which one? If you choose the first one, you assume the effect is zero, when the observation was 15 percent. If you choose the second one, you assume the effect is 30 percent, when the observation was 15 percent.
And it gets worse. Imagine that instead of 15 percent for the first treatment, the result was 27 percent, which was significant. Now, you can say there is evidence for the first treatment, and there is also evidence for the second treatment. And, you can also say that there is no evidence that the two treatments are different.
That's all good so far. But, now, you head to your regression, and you start computing estimates. And, what do you do? You probably use 27 percent for the first treatment, and 30 percent for the second treatment. But you just finished saying there's no evidence they're different! Shouldn't you be using 27 for both, or 30 for both, or 28.5 for both? Shouldn't it be a problem that you assume one thing on one page, and the opposite on the very next page?
If you're going to say, "there's no evidence for a difference but we're going to assume it anyway," why is that better than saying (in the previous example) "there's no evidence that the first treatment works, but we're going to assume it anyway?"
Labels: goldacre, statistics


