Sabermetric Research: Dollars increase wealth, but cents don't

Is there a correlation between twenty-dollar bills and money? I think there is. Here's what I did: I took a thousand random middle-class people off the street. I counted how many twenties they had in their wallet, and how much money they had altogether. Then, I ran a regression.

It turns out that there is a strong relationship between twenties and money. The result:

Coefficient of twenties = $19.82 (p=.0000)

Because the coefficient for twenties was very strongly statistically significant, we can say that every twenty dollar bill increases wealth by about $20.00.

I was so excited by this conclusion that I wondered whether a similar result holds for pocket change. So I added quarters to the regression. The result:

Coefficient of quarters = $0.26 (p=.25)

As you can see, the p-value is only .25, much higher than the .05 we need for statistical significance. Since the coefficient for quarters turns out not to be statistically significant, we conclude that there is no evidence for any relationship between quarters and money.

This is surprising – in the popular press, there is a widespread theory that quarters are worth $0.25. But, as this study shows, no statistically significant effect was found, using 1,000 people and the best methodology, so we have to conclude that quarters aren't worth anything.

--------

That sounds ridiculous, doesn't it? We have a regression that shows that quarters are worth about 25 cents, but we treat them as if they're worth zero just because the study wasn't powerful enough to show statistical significance.

But we're biased here because of the choice of example.

So suppose that instead of adding quarters to the equation, we had added something else, something that we could agree was completely irrelevant. Say, number of siblings.

And, suppose that we got exactly the same results for siblings: for each sibling in our random subject's family, he winds up with an 26 cent increase in pocket money. The signficance level the same 0.25. (The result is certainly not farfetched: one in four times, we'd find an effect of at least this magnitude.)

In this case, if we said "there is no evidence for any relationship between siblings and money," that would be quite accceptable.

What's the difference between quarters and siblings? The difference is that there is a good reason to believe that the quarters result is real, but there is no good reason to believe that the siblings result is. By "good reason," I don't mean just our prior intuitive beliefs. Rather, I mean that there's a good reason based partly on the results of the study itself.

The study showed us that twenty-dollar bills were highly significant. We therefore concluded that there was a real relationship between twenties and wealth. But we know, for a fact, that 80 quarters equal one twenty. It is therefore at least reasonable to expect that the effect of 80 quarters should equal the effect of a twenty – or, put another way, that the effect of one quarter should be 1/80 the effect of a twenty. And that was almost exactly what we found.

How does it make sense to accept that twenty dollar bills have an effect, but 1/80ths of twenty-dollar bills do not? It doesn't.

If the convention in these kinds of studies is to treat any non-significant coefficient as zero, I think that's wrong. A reasonable alternative, keeping in mind the "sibling" argument, might be that if a factor turns out to be statistically insignificant, and there is no other reason to suggest there should be a link, only then can you go ahead and revert to zero. But if there are other reasons – like if you're analyzing cents, and you know dollars are significant – reverting to zero can't be right.

-----

I argued this same point a little while ago when describing the Massey/Thaler football draft study. That situation was remarkably similar to the pocket money example.

The study was attempting to figure out how much an NFL player's production correlates to draft position. They broke production into something similar to dollars and cents.

"Dollars" were the most obvious attributes of the player's skill. Did he make the NFL? Did he play regularly? Did he make the Pro Bowl?

"Cents" is what was left after that. Given that he played regularly but didn't make the Pro Bowl, was he a very good regular or just an average one? If he made the Pro Bowl, was he a superstar Pro Bowler, or simply an excellent player?

The study found strong significance for "dollars" – players drafted early were much more likely to play regularly than players drafted late. They were also more likely to make the Pro Bowl, or to make an NFL roster at all.

But it found less signficiance for the "cents." The authors did find that players with more "cents" were likely to be better players, but the result was significant only at the 13% level (instead of the required 5%). From this, they concluded

"there is nothing in our data to suggest that former high draft picks are better players than lower draft picks, beyond what is measured in our broad ["dollar"] performance categories."

And that's got to be just plain wrong. There is not "nothing in the data" to suggest the effect is real. There is actually strong evidence in the data – the significance of the other, broader, measure of skill. If you assume that dollars matter, you are forced to admit that cents matter.

There's an expression, "absence of evidence is not evidence of absence." This is especially true when you find weak evidence of a strong effect. If you find a correlation that's significant in football terms, but not significant in statistical terms, your first conclusion should be that your study is insufficiently powerful to be able to tell if what you found is real. Ideally, you would do another study to check, or add a few years of data to make your study more powerful. But it seems to me that you are NOT entitled to automatically conclude that the observed effect is spurious based on the significance level alone, especially when it leads to a logical implausibility, such as that dollars matter but cents don't.

In this particular study, I think that if you do accept the coefficient for cents at face value, instead of calling it zero, you reach the completely opposite conclusion than the authors do.

-----

The reason I'm writing about this again is that I've just found another occasion of it, in the study discussed here (full review to come). The authors come up with an estimate of a coefficient for three separate NBA seasons. The coefficient is (to oversimplify a bit) the amount by which you'd expect a team that's been eliminated from the playoffs to underperform in winning percentage.

Their three results are .220 (significant), .069 (not significant), and .192 (significant).

My conclusion would be to say that the effect of being eliminated in the middle season is much lower than the effect in the other two seasons, and whether the difference was statistically significant. I would point out, for the record, that the .069 is not signficantly different from zero.

But the study's authors go further – they say that the .069 should be arbitarily treated as if it were actually .000:

"Our results show that teams that were eliminated from the playoffs [in the .069 year] were no more likely to lose than noneliminated teams." [emphasis mine.]

That's is simply not true. The results show those teams were .069 more likely to lose than noneliminated teams.

That is: given that our prior understanding of how basketball works, given the structure of the study (which I'll get to in a future post), given that the study shows a strong effect for other years, and given that the effects are in the same direction, there is certainly enough evidence that the .069 is likely closer to the "real" value than zero is.

In this case, the conclusions of the study -- that the second year is different from the other two -- don't really change. But the conclusion turns out much more punchy when the authors say there is no effect, instead of a small one.

6 Comments:

At Tuesday, December 19, 2006 11:05:00 AM, Phil Birnbaum said...: Two more quick comments:

1. Has anyone made this observation before? I'm sure it's been done better than this.

2. For the record, I ran a simulation for the first "dollars and cents" result.

And, in fairness, I should point out that it was coincidence that the estimate of the quarter's value came out so accurate. Roughly speaking, less statistical significance means less accuracy in the estimate. For quarters, the standard error was about 22 cents, which is quite high.

About one in four times you run this study, the estimated value of a quarter will actually come out negative -- the more quarters you have, the less money.

My point is not that the estimate is perfect, or even that it's adequate – but rather, assuming it's zero is even less accurate.
At Tuesday, December 19, 2006 1:41:00 PM, JavaGeek said...: All this stuff is about is hypothesis testing and many people confuse "not rejecting the null" as proving the null [In the real world the NULL is always false.]

In your example:
The hypothesis test for the quarter probably shouldn't be 0.00, but rather 0.25 (known expected value), similarly $20 for the bills.

Problem is there is too much software that makes regressions easy, but people no longer understand the results.
At Tuesday, December 19, 2006 3:55:00 PM, Anonymous said...: It seems to me that another part of the problem here is looking at single-year rather than multi-year data. Unless there is some reason to think the middle year would be different (such as different draft rules), the three years should be combined to create a more robust sample. This reminds me of JC Bradbury's study of pitcher salaries and DIPS, and the Sauer "Moneyball" paper, both of which ran separate regressions for each season studied. These authors need to do a better job of recognizing the very small sample size each single season represents (especially if looking at subsets like non-playoff NBA teams).
At Tuesday, December 19, 2006 4:01:00 PM, Phil Birnbaum said...: Guy,

Actually, there *were* different draft rules ... that's why they chose those three seasons in particular. The middle season is one where every non-playoff team got an equal shot at the first draft choice. They wanted to see if out-of-playoff teams lost fewer games that year because they had no incentive to lose more.

So using those three years makes sense. Arbitrarily reducing the middle year from .069 to .000 does not.

More to come in a couple of days in a full review.
At Tuesday, December 19, 2006 4:56:00 PM, Anonymous said...: Fair enough. I'll wait for the review. But I still think a lot of the academic research gives far too much weight to single-season data.....
At Wednesday, December 20, 2006 10:48:00 PM, Phil Birnbaum said...: The full review is now up, the post immediately following this one. Here's a link if you want it.

<< Home

Sabermetric Research

Tuesday, December 19, 2006

Dollars increase wealth, but cents don't

6 Comments:

About Me

Previous Posts