Sabermetric Research: A research study is just a peer-reviewed argument

Friday, November 18, 2011

A research study is just a peer-reviewed argument

To make your case in court, you need two things: first, some evidence; and, second, an argument about what the evidence shows.

The same thing is true in sabermetrics, or any other science. You have your data, and your analysis; that's the evidence. Then you have an argument about what it means.

But, most of the time, the "argument" part gets short shrift. Pick up a typical academic paper, and you'll see that most of the pages are devoted to explaining a regression, and listing the results and the coefficients and the corrections and the tests. Then, the author will just make an unstated assumption about what that means in real life, as if the regression has proven the case all by itself.

That's not right. The regression is important, but it's just the gathering of the evidence. You still have to look at that evidence, and explain what you think it means. You have to make an argument. The regression, by itself, is not an argument. The *interpretation* of the regression is the argument.

For instance: suppose you do a simple regression on exercise and lifespan, and you get the result that every extra mile jogged is associated with an increased lifespan of, say, 10 minutes. What does that mean in practical terms? Probably, the researcher will say that if you want Americans' lifespan to increase by a day, we should consider getting each of them to jog 144 more miles than they would otherwise. That would seem reasonable to most of us.

Suppose, now, another study looks at pro sports, and finds that every year spent as a starting MLB shortstop is associated with an extra $2 million in lifetime earnings. Will the researcher now say that if we want everyone to earn an extra $2 million, we should expand MLB so that everyone in the USA can be a starting shortstop? That would be silly.

Still another researcher does a regression to use triples to predict runs scored. That one finds a negative relationship. Should the study conclude that teams stop trying to hit triples, that it's just hurting them? Again, that would be the wrong conclusion.

All three of these regressions have exactly the same structure. The math is the same, the computer software is the same, the testing for heteroskedasticity is the same ... everything about the regressions themselves is the same. The difference is in the *interpretation* of what the regressions mean. The same interpretation, the same argument, makes sense in the first case, but is obviously ludicrous in the other two cases. And even the third case is very different from the second case.

The regression is just data, just evidence. It's the *interpretation* that's crucial, the argument about what that evidence means.

Why, then, do so many academic papers spend pages and pages on the details of the regression, but only a line or two justifying their conclusions? I don't know for sure, but I'd guess it's because regression looks mathematical and scholarly and intellectual and high-status, while arguments sound subjective and imprecise and unscientific and low-status.

Nonetheless, I think the academic world has it backwards. Regressions are easy -- shove some numbers into a computer and see what comes out. Interpretations -- especially correct interpretations -- are the hard part.

-----

If you think my examples are silly because they're too obvious, here's a real-life example that's more subtle: the relationship between salary and wins in baseball, a topic that's been discussed quite a bit over the last few years. If you do a regression on 2009 data, you'll get that

-- the correlation coefficient is .48
-- the r-squared = .23
-- the value of the coefficient is .16 of a win per $1 million spent
-- the coefficient is statistically significant (as compared to the null hypothesis of zero).

That's all evidence. But, evidence of what? So far, it's just numbers. What do they actually *mean*, in terms of actual knowledge about baseball?

To get from the raw numbers to a conclusion, you have to interpret what the regression says. You have to make an argument. You have to use logic and reason.

So you look the coefficient of .16. From that, you can say, in 2009, every extra $6 million spent resulted, on average, in one extra win. I'm happy calling that a "fact" -- it's exactly what the data shows. But, almost anywhere you go from there now becomes interpretation. What does that *mean*, that every extra $6 million resulted in an extra win? What are the practical implications?

For instance, suppose you're a GM and want to gain an extra win next year. How much extra money do you have to spend on free agents? If you want to convince me that you know the answer, you have to take the evidence provided by the regression, and *make an argument* for why you're right.

A naive interpretation might be to just use that $6 million figure, and say, that's it! Spend an extra $6 million, and get an extra win. It seems obvious from the regression, but it would be wrong.

Why is it wrong? It's wrong because there are other causes of winning than spending money on free agents. There's also spending money on "slaves," and spending money on "arbs". Those are much cheaper than free agents. Effectively, some teams get wins almost for free, by having good young players. The teams that don't have that have to spend double, as it were: they have to buy a free agent just to catch up to the team with the cheap guys, and then they have to buy another one to surpass him.

For instance, team A has 80 wins for "free". Team B has 70 wins for "free" and buys another 20 on the free-agent market. The regression doesn't know free from not free. It sees that team B has 10 more wins, but spent an extra $20X dollars, where X is the actual cost of a free agent per win. Therefore, it spits out that it took 2X dollars to buy each extra win, even though it only took X.

That is: the coefficient of dollars per win from the regression is twice what it actually costs to buy one. The coefficient doesn't measure what a naive researcher might think it does.

My numbers are artificial, but I chose numbers that actually come fairly close to real life. Various sabermetric studies have shown that a free agent win actually costs $4.5 million. But regressions for 2008, 2009, and 2010 respectively show figures of $8.9, $6.2, and $12.6 million, respectively -- about twice as much.

Again, the issue is interpretation. If you're just showing the regression results, and saying, "here, figure out what this means," then, fine. But if your paper has a section called "discussion," or "conclusions," that means you're interpreting the results. And that's the part where it's easy to go wrong, and where you have to be careful.

----

Which brings me, finally, to the point that I'm trying to make: we should stop treating academic studies as objective scientific findings, and start treating them as arguments. Sure, we can remember that academic papers are written by experts, and peer reviewed, and that much of the time, there's no political slant behind them. If we want, we can consider them as generally well-reasoned arguments by experts of presumably above-average judgment.

But they're still arguments.

So when an interesting study is published, and the media report on it, they should treat it as an argument. And we should hold it to the same standards of skepticism to which we hold other arguments. A research paper is like an extended op-ed. The fact that there's math, and a review process, doesn't make them any less argument-like. The New York Times wouldn't present Paul Krugman's column as fact just because he used regressions and peer review, would they?

I googled the phrase "a new study shows." I got 55 million results. "A new study claims" gives only 4 million. "A new study argues" gives only 300,000.

But, really, It should be the other way around. New studies normally don't "show" anything but the regression results. Their conclusions are always "claimed" or "argued".

-----

The word "show" should be used only when the writer wants to indicate that the claim is true, or that it has been widely accepted in the field. At the time his original Baseball Abstract came out, you'd have to say Bill James was "arguing" that the Pythagorean Projection is a good estimator of team wins. But now that we know it's right, we say he "showed" it.

"Show" implies that you accept the conclusion. "Argue" or "claim" implies that you're not making a judgment.

The interesting thing is that the media seem to understand this. Sure, 90 percent of the time, they say "show". But when they don't, it's for a reason. The "claims" and "argues" are saved for controversial or frivolous cases, ones that the reporter doesn't want to imply are true. For instance, "New study claims gun-control laws have no effect on Canadian murder rate." And, "a new study argues that poker is a game of skill, not chance."

It's as if the reporters want to pretend scientific papers are always right, unless they conclude something that the reporter or editor doesn't agree with. But it's not the reporter's job to be implying the correctness of a conclusion, unless the reporter has analyzed the paper, and is writing the article as an opinion piece.

Ninety-nine percent of the time, a research paper does not "show" anything -- it only argues it. Because, correct conclusions don't just pop out of a regression. They only show up when you support that regression with a good, solid argument.

Labels: academics, regression, scientific method

15 Comments:

At Friday, November 18, 2011 10:57:00 AM, Chad McEvoy said...: Thanks for posting - some very good points made. I was particularly struck by the following statement: "Why, then, do so many academic papers spend pages and pages on the details of the regression, but only a line or two justifying their conclusions? I don't know for sure, but I'd guess it's because regression looks mathematical and scholarly and intellectual and high-status, while arguments sound subjective and imprecise and unscientific and low-status."

This an issue I've been chewing on for a while. As an author and journal editor, I think the reason we see this is due to the nature of the review process. I believe authors write with a mindset of trying to avoid major criticism from reviewers. Authors realize that statistical analyses are black-and-white, fact-based areas that (if the stats are sound) won't open themselves up to reviewer critique. Interpretation, on the other, inevitably requires authors to make opinions about what their results show. Anytime you write opinion in a manuscript, you open yourself up to the possibility that the reviewer will have an alternative opinion and will shoot holes in yours. I think this risk-averse writing style perpetuates the problem you identify.
At Friday, November 18, 2011 11:16:00 AM, Phil Birnbaum said...: Thanks, Chad, good to hear from someone on the inside.

What you say makes sense ... it's a lot easier to criticize an argument than it is to criticize a rejection.

That would explain why the conclusions are simple, but ... doesn't that presume that the reviewers aren't paying too careful attention to the arguments? I mean, wouldn't the reviewer be just as likely to disagree with the "naive" conclusion as a "non-naive" one?

Maybe it's just that the less time spent on the argument, the better, and the "naive" conclusions are shorter and more likely to be glossed over by the referee ...
At Friday, November 18, 2011 11:26:00 AM, Chad McEvoy said...: The challenge then is how to get authors and reviewers to recognize the issue and work towards improvement. Certainly there isn't a "quick fix" available. Something to ponder as I stare at the gigantic stack of grading on my desk.

Cheers,
Chad
At Friday, November 18, 2011 11:30:00 AM, Phil Birnbaum said...: Two comments up, I meant "regression," not "rejection".
At Friday, November 18, 2011 9:44:00 PM, Alex said...: I would say that 95% or more of the scientific articles I read contain an experiment. That is, someone explicitly manipulates something and measures the outcome. Experiments are analyzed statistically, but for the most part the results are the results and the meat of the article is about why the experiment was done, what was predicted to happen, and explaining what actually happened. The statistical analyses are important and open to criticism, but they are not the crux of the issue. Perhaps you should read some different academic articles?
At Friday, November 18, 2011 11:42:00 PM, Phil Birnbaum said...: Hi, Alex,

Most sabermetrics and economics articles aren't experiments, because you just gotta use what baseball (or other sport's) data is there. It's not like you can get your favorite manager to do a randomized double-blind clutch hitting study, right?

Or am I misunderstanding you?
At Saturday, November 19, 2011 3:54:00 AM, BDF said...: Linked from Tango. This is great stuff. My roots are in the social sciences, and the lesson is badly needed there. Hell, it's badly needed in the humanities, where in, say, English departments, you can substitute "close reading" for "regression" and make the same point about the need for analysis to draw conclusions. Neither numbers or close readings make their own arguments; interpreters always have to do that.
At Saturday, November 19, 2011 8:19:00 PM, Bill said...: Frankly, I think you're talking about something that you don't know much about. Significant issues about “interpretation” of results are pretty rare in most empirical studies that appear in academic economics journals. That is because, for the most part, researchers usually shy away from making “grand conclusions” even if their empirical results turn out just as they had predicted. Most papers simply conclude with the idea that the results “suggest” something that is in accord with (or is consistent with) what the researcher hypothesized at the start of the analysis.

For the study itself, interpretation is almost never a point of contention. What are common points of contention are things like: unrepresentative data sets, implicit assumptions in the analysis that may not hold water, the use of weak statistical tests, and whether other possibly conflicting hypotheses may be consistent with the same data.

Actually, it is up to the reader to interpret the results in the context of what else he or she knows about the subject in order to assess how the results fit into a larger picture.
At Sunday, November 20, 2011 12:38:00 PM, Alex said...: Bill - You're absolutely right that sabermetrics/economics articles aren't experiments (although there are some 'natural experiments' articles throughout economics). But that isn't what you said. You refer early on to sabermetrics 'or any other science'. You talk about the 'typical academic paper', the 'academic world', and make an analogy to exercise research. And you conclude by saying that people should stop viewing academic studies as objective science. Those are all strong words, especially the conclusion. For those of us in academia who perform actual experiments (dare I say, perform objective science?), it's a bit of a slap in the face to read that we've just been expressing our opinions all this time. If you meant to just badmouth economics and other correlation-based fields, you could have been more specific.
At Sunday, November 20, 2011 1:01:00 PM, Bill said...: Alex, I agree (but I think you meant to say Phil, not Bill). Phil is frankly talking out of his league here.

(PS if anyone cares: Sort of coincidentally, I(Bill) am the same guy who posted earlier as NSD Board -- that was an old blogger ID from years ago that I had forgotten about.)
At Sunday, November 20, 2011 5:59:00 PM, Alex said...: I did mean Phil, sorry about the typo.
At Sunday, November 20, 2011 7:48:00 PM, Percy P Tron said...: Regression has a simple interpretation that is true for all analyses, if all the required assumptions are met. The argument here seems to be simply that some people use their own interpretation, which might be wrong.

For any ols regression, the interpretation is: 'for this random sample of some population, an increase of __ units in the X variable is associated with an increase of __ units in the Y variable, with everything else held constant.' In my experience, there are three parts of this interpretation that people ignore/misconstrue: 'for this random sample of some population,' 'associated with,' and 'everything else held constant.'

First, your data is a random sample from a very specific population. For instance, if your data is all batters who had at least 250 PAs in 2011, then your regression only really makes sense for players who had at least 250 PAs in 2011. This is the issue with example #2. Your sample is all MLB shortstops, not the US population. You cannot extend your regression beyond the population that your sample comes from. Furthermore, your sample must be random, otherwise your results may be biased, though there are ways to deal with this.

Second, regression is merely a model of association. This is where the classic correlation does not imply causation comes from. In an experimental setting, this association can be interpreted as causation, but in an observational setting, you cannot. This is where the salary-wins example goes awry. You cannot conclude that spending $6 million more will provide you an additional win on average, but you can conclude there is some association between winning and spending money. Unless of course you are controlling for all possible confounders, which if you can pull that off in an observational study I would be impressed.

Third, as Phil explains in the link from his triples example, in multiple regression analysis, the interpretation requires you hold everything else constant. Specifically, everything else you controlled for (ie everything else in your model). In examples with only one covariate, this really doesn't mean anything, but it is a very important part otherwise (especially when your variables are correlated).

To clarify, the triples example is far more related to the second problem than the third. It is 100% correct to say in this sample that triples and runs have a negative association. The existence of confounding variables that you are not controlling for (speed) gives you a weird result, hence one of the reasons why correlation says nothing of causation in observational studies.

The interpretation of a regression analysis really is not an art. Choosing your model is the real art, the interpretation comes directly from this. The problem is that most people don't know what that interpretation is and thus create their own.
At Monday, November 21, 2011 9:11:00 AM, Phil Birnbaum said...: Alex,

OK, point taken. I should have been more specific that I was referring mostly to observational studies.

But ... the principle does hold, to some extent, on other types of study. For instance, in psychology, I've heard some debate on whether results obtained in the artificial environment of the psych lab really apply in real life. That's something that should be addressed in a paper, unless, of course, the paper just talks about what it means for laboratory behavior.

As Bill James once wrote: the question is, "what is the evidence, and what does it mean?" Sometimes the "what does it mean" is obvious, but usually it needs a bit of thinking about.
At Monday, November 21, 2011 4:37:00 PM, Unknown said...: This is an excellent piece, thanks for posting it. I think Percy made a very useful contribution in noting that, in your salaries versus wins example, an additional $6M does not "result in" an additional win, but, rather, is associated with an additional win.

In other words, your "fact" in the example is that there is an association between salaries and wins. Saying that salaries result in wins is part of the interpretation.

Statistics never lie. They say exactly what they say, when properly couched in the context of their sample populations and sampling methodologies.

Interpretation lies all the time. Or, more charitably, is often mistaken.
At Friday, April 13, 2012 6:30:00 PM, Dougie said...: Okay, okay, I know this is an old post and everyone's moved on, but I just found it and I have a mior problem with it.

You say: "For instance, team A has 80 wins for 'free'. Team B has 70 wins for "free" and buys another 20 on the free-agent market."

No. The 80 wins aren't free. We're talking a regression line with a positive slope. The 80 or so wins are for an average $$$ output. If you extrapolate the regression line back to no dollars, you'll get very few wins. Possibly negative wins.

Now back to the discusion.

Sabermetric Research

Friday, November 18, 2011

A research study is just a peer-reviewed argument

15 Comments:

About Me

Previous Posts