Friday, February 20, 2015

Replacing "statistically significant"

In his recent book, "How Not To Be Wrong," mathematician Jordan Ellenberg writes about how the word "significant" means something completely different in statistics than it does in real life:

"In common language it means something like 'important' or 'meaningful.' But the significance test scientists use doesn't measure importance ... [it's used] merely to make a judgment that the effect is not zero. But the effect could still be very small -- so small that the drug isn't effective in any sense that an ordinary non-mathematical Anglophone would call significant. ...

"If only we could go back in time to the dawn of statistical nomenclature and declare ... 'statistically noticeable' or 'statistically detectable' instead of 'statistically significant!'"

I absolutely agree.

In fact, in my view, the problem is even more serious the other way, when there is *no* statistical significance. Researchers will say, "we found no statistically-significant effect," which basically means, "we don't have enough evidence to say either way." But readers will take that as meaning, "we find at best a very small effect." That's not necessarily the case. Studies often find values that would be very significant in the real world, but reject them because the confidence interval is wide enough to include zero. 


Tom Tango will often challenge readers to put aside "inertial reasoning" and consider how we would redesign baseball rules if we were starting from scratch. In that tradition, how would we redo the language of statistical significance?

I actually spent a fair bit of time on this a year or so ago. I went to a bunch of online thesauruses, and wrote down every adjective that had some kind of overlap with "significant." Looking at my list ... I notice I actually didn't include Ellenberg's suggestions, "noticeable" or "detectable." Those are very good candidates. I'll add those now, along with a few of their synonyms.

OK, done. Here's my list of possible candidates:

convincing, decisive, unambiguous, probable cause, suspicious, definite, definitive, adequate, upholdable, qualifying, sufficing, signalling, salient, sufficient, unambiguous, defensible, sustainable, marked, rigorous, determinate, permissible, accreditable, attestable, credentialed, credence-ive, credible, threshold, reliable, presumptive, persuasive, confident, ratifiable, legal, licit, sanctionable, admittable, acknowledgeable, endorsable, affirmative, affirmable, warrantable, conclusive, sufficing, sufficient, valid, assertable, clear, ordainable, non-spurious, dependable, veritable, creditable, attestable, avowable, vouchable, substantive, noticeable, detectable, perceivable, discernable, observable, appreciable, ascertainable, perceptible

You can probably divide these into classes, based on shades of meaning:

1. Words that mean "enough to be persuasive." Some of those are overkill, some are underkill. "Unambiguous," for instance, would be an obvious oversell; you can have a low p-value and still be pretty ambiguous. On the other hand, "defensible" might be a bit too weak. Maybe "definite" is the best of those, suggesting precision but not necessarily absolute truth.

2. Words that mean "big enough to be observed." Those are the ones that Ellenberg suggested, "noticeable" and "detectable." Those seem fine when you actually find significance, but not so much when you don't. "We find no relationship that is statistically detectable" does seem to imply that there's nothing there, rather than that you just don't have enough data in your sample.

3. Words that mean "enough evidence." That's exactly what we want, except I can't think of any that work. The ones in the list aren't quite right. "Probable cause" is roughly the idea we're going for, but it's awkward and sounds too Bayesian. "Suspicious" has the wrong flavor. "Credential" has a nice ring to it -- as an adjective, not a noun, meaning "to have credence." You could say, for instance, "We didn't have enough evidence to get a credential estimate."  Still a bit awkward, though. "Determinate" is pretty good, but maybe a bit overconfident.

Am I missing some? I tried to think, what's the word we use when we say an accused was acquitted because there wasn't enough evidence? "Insufficient" is the only one I can think of. Everything else is a phrase -- "within a reasonable doubt," or "not meeting the burden of proof."

4. Words that mean "passing an objective level," as in meeting a threshold. Actually, "threshold" as an adjective would be awkward, but workable -- "the coefficient was not statistically threshold." There's also "adequate," and "qualifying,” and "sufficient," and  "sufficing." 

5. Finally, there's words that mean "legal," in the sense of, "now the peer reviewers will permit us to treat the effect as legitimate." Those are words like "sanctionable," "admittable," "acknowledgable," "permissible," "ratifiable," and so on. My favorite of these is "affirmable." You could write, "The coefficient had a p-value of .06, which falls short of statistical affirmability." The reader now gets the idea that the problem isn't that the effect is small -- but, rather, that there's something else going on that doesn't allow the researcher to "affirm" it as a real effect.

What we'd like is a word that has a flavor matching all these shades of meaning, without giving the wrong idea about any of them. 

So, here's what I think is the best candidate, which I left off the list until now:


"Dispositive" is a legal term that means "sufficient on its own to decide the answer." If a fact is dispositive, it's enough to "dispose" of the question.

Here's a perfect example:

"Whether he blew a .08 or higher on the breathalyzer is dispositive as to whether he will be found guilty of DUI."

It's almost exact, isn't it? .08 for a conviction, .05 for statistical significance.

I think "dispositive" really captures how statistical significance is used in practice -- as an arbitrary standard, a "bright line" between Yes and No. We don't allow authors to argue that their study is so awesome that p=.07 should really be allowed to be considered significant, any more than we allow defendants to argue that should be acquitted at a blood alcohol level of .09 because they're especially good drivers. 

Moreover, the word works right out of the box in its normal English definition. Unlike "significant," the statistical version of "dispositive" has the same meaning as the usual one. If you say to a non-statistician, "the evidence was not statistically dispositive," he'll get the right idea -- that an effect was maybe found, but there's not quite enough there for a decision to be made about whether it's real or not. In effect, the question is not yet decided. 

That's the same as in law. "Not dispositive" means the evidence or argument is a valid one, but it's not enough on its own to decide the case. With further evidence or argument, either side could still win. That's exactly right for statistical studies. A "non-significant" p-value is certainly relevant, but it's not dispositive evidence of presence, and it's not dispositive evidence of absence. 

Another nice feature is that the word still kind of works when you use it to describe the effect or the estimate, rather than the evidence: 

"The coefficient was not statistically dispositive."

It's not a wonderful way to put it, but it's reasonable. Most of the other candidate words don't work well both ways at all -- some are well-suited only to describing the evidence, others only to describing the estimates. These don't really make sense:

"The evidence was not statistically detectable."  
"The effect was not statistically reliable."
"The coefficient was not statistically accreditable."

Another advantage of "dispositive" is that unlike "significant," you can leave out the word "statistical" without ambiguity:

"The evidence was not dispositive."
"The coefficient was not dispositively different from zero."

Those read fine, don't they? I bet they'd almost always read fine. I'd bet that if you were to pick up a random study, and do a global replace of "statistically significant" with "dispositive," the paper wouldn't suffer at all. (It might even be improved, if the change highlighted cases where "significant" was used in ways it shouldn't have been.)


When I'm finally made Global Despotic Emperor of Academic Standards, the change of terminology will be my first official decree.

Unless someone has a better suggestion. 

Labels: , , ,


At Friday, February 20, 2015 9:46:00 AM, Blogger Colby Cosh said...

So... all statistically significant evidence can correctly be described as "dispositive"? Don't think this works, at all, from that angle: indeed, it seems like a disaster.

At Friday, February 20, 2015 11:59:00 AM, Blogger Phil Birnbaum said...

In practice, it actually IS dispositive: dispositive of the questions, (a) "can my paper proceed to treat the relationship as real?" and (b) "can my paper proceed to dismiss the idea that there's any relationship at all?"

(b) is absolutely wrong: absence of evidence, etc. "Not dispositive" solves that problem. You can't say "the evidence is not dispositive" and then pretend you have proven absence.

If your point for (a) is that it implies scientific proof ... OK, point taken. You have to avoid "We found dispositive evidence that smoking causes cancer." I'm thinking more like "We found dispositive evidence that the relationship between smoking and cancer is unlikely to be random."

At Friday, February 20, 2015 12:18:00 PM, Blogger Phil Birnbaum said...

Hmmm ... maybe it *would* be better to say "statistically dispositive" instead of just "dispositive."

At Thursday, February 26, 2015 1:26:00 PM, Blogger Don Coffin said...

Maybe you're having an influence:

"Psychology Journal Bans Significance Testing"


Post a Comment

<< Home