Sabermetric Research: Statistical significance is not a property of the real world

Here's a quote from a recent post at a sports research blog:

"... strength of schedule does not have a statistically significant effect on winning percentage in FBS college football."

That sentence doesn't make sense. Why? Because there can be no such thing as a "statistically significant effect" in college football. Real-life effects do not possess the property "statistically significant." They might be big effects, they might be small effects. They might be positive, or maybe negative. They might be zero effects, which means no effect at all.

But they cannot be "statistically significant." Either strength of schedule has an effect on winning percentage, or it doesn't.

(This is obvious if you think of other real life effects. Would an extra week of vacation have a statistically significant effect on your job satisfaction? Would a sale on soup have a statistically significant effect on how much you buy?)

"Statistically significant effect" has no meaning outside of the context of a particular statistical procedure. Statistical significance is a property of the *evidence* of an effect, not the effect itself.

That's not always obvious, because the word "effect" is being used to mean two different things. In normal conversation, we use "effect" to mean the real life impact. But in research studies, they will often use "effect" to represent the study's ESTIMATE of the real-life effect. The two are not the same.

What you can say, instead, is:

"... my study was not able to find a statistically significant non-zero estimate of the effect of strength of schedule on winning percentage in FBS college football."

That works -- but as a statement about your study, not about college football.

What does that sentence say about real life? You can't tell. It could be that there really is no effect in college football. Or, it could be that you didn't have enough data to find the effect. It could also mean that you didn't look hard enough, one way or another ... maybe you didn't use or the right data, or your model is wrong.

It's impossible for us to know what to think, unless we actually look at your study! It's very, very easy to design a study, in almost any context, that won't find statistical significance. Smoking and cancer? No problem: I'll take three random smokers, and three random smokers, and compare. It's almost certain that's not enough data to create statistical significance, even though, of course, smoking *does* cause cancer.

"Not statistically significant" means, at best, "I didn't find evidence." I wish researchers would say that more explicitly:

"... my study failed to find statistically significant evidence that strength of schedule has an effect on winning percentage."

That does two things. First, it reminds the reader that statistical significance is only about a level of evidence. And, second, it leads the reader to wonder, "where did you look?"

-----

OK, so this post was meant to talk just about the use of the words "statistically significance," about how they apply only to you and your study, and not to the real world. But, now, it looks like we're heading into "absence of evidence is not evidence of absence" territory again. Sorry.

-----

If you don't find statistical significance, and you want to use that to argue that there's no real world effect, you can't stop there. You have to tell us where you looked. You have to say, "the estimate was not statistically significant BASED ON MY METHOD AND DATA." And then you show the method and data.

For instance:

"... my regression tried to predict winning percentage from strength of schedule, looking at 10 seasons of 100 teams each, and I found no statistically significant evidence of an effect."

That would work better. Now, we have an idea of how you tried to look for the evidence. Maybe, now, we can say, "Well, if he found no significant effect based on all that data, maybe there really is none! It looks like he looked pretty deep."

But, that's still not sufficient. Because, who knows how much data is enough?

Suppose a researcher says, "I wanted to find out if chemical X causes cancer. So I looked at 10,000,000 people and found no statistically significant effect."

That sounds like chemical X is safe, right? I mean, ten million people, surely that would provide enough evidence if it existed! But ... no. If X causes cancer in only one person in a million, then the sample isn't nearly large enough. Even if your data cleanly splits into X and non-X -- 5,000,000 people each -- the expected difference is only five cases of cancer! There's very little chance of finding statistical significance in that.

There are ways to actually calculate whether the sample size is high enough. But you don't actually have to go through the trouble. The regression will do that for you automatically, when it gives you a standard error and confidence interval for your estimate. If you *do* have enough data, you'll get a pretty small confidence interval, hugging zero. If you don't have enough data, the interval will be wide.

Suppose you do a study on whether an extra hour of class affects a student's percentage grades. And you find that the estimate of the effect is not statistically significant.

You should certainly tell us that your estimate isn't significantly different from zero. But what's your confidence interval? If it's, say, plus or minus 1 point out of 100, that's pretty decent evidence. But if it's really wide -- say, between minus 10 and plus 30 -- it's obvious that your study isn't strong enough. Because, that result doesn't narrow it down, much, does it? I mean, if your confidence interval includes every plausible reasonable prior guess at what the class is worth ... what's the point?

In a case like that, absence of evidence is really absence of a good enough study.

-------

A researcher saying there's no effect because he didn't find statistical significance is like a kid saying he didn't do his homework because he couldn't find a pen in the house.

Either way, you have to go out of your way to show how hard you looked, and to prove that you would have found it if it were there.

Labels: signficance, statistics

3 Comments:

At Thursday, October 10, 2013 5:57:00 AM, Ari Berkowitz said...: So basically the sentence, this article had no statistically significant effect on my understanding of statistical significance, would be incorrect.
At Thursday, October 10, 2013 9:49:00 AM, Alex said...: Sports is actually an interesting case. I agree with you in terms of, say, medicine, but in some sports analyses you would actually be looking at the real world because all the data is available; we're studying the population, not a sample. Does that mean that if I were to run an analysis that managed to satisfy everyone, I should just go ahead and say 'Strength of schedule did not affect winning in college football (in 2013)' ? Or should I still add some kind of caveat that my conclusion came through statistical analysis, even though I've included and accounted for every piece of real-life data available?
At Wednesday, October 30, 2013 6:40:00 PM, BELIALITH said...: I was just thinking about all of this when I put into the google search engine the words "there are no such thing as statistics." That's when I came across your post and I see that you see what I see, haha, too.

Just before that, I was thinking of a person who had become pregnant at age 18 on account of 'fooling around.' They never got married, only lived together because they did not like one another, so they only stayed together for as long as they could to raise two children. So I wanted to know what the statistics were of shotgun weddings. But then I got to thinking, well heck, I know this other person who got married at 19. She had no kids and divorced two years later. Her choice. She just wasn't happy with the marriage. And then there's this other couple who wanted to get married, so they were engaged, and before they married they got pregnant. They are still together many years later and are a very happy couple.

Therefore, I have realized, there is NO way to make any kind of life events into statistics. Each and every individual is unique. Each and every group of people are unique and how the groups affect one another are all unique. Just one good look at reality tells you, there is no such thing as statistics. Statistics is all a fantasy, for a lazy mind.

<< Home

Sabermetric Research

Tuesday, October 08, 2013

Statistical significance is not a property of the real world

3 Comments:

About Me

Previous Posts