Tuesday, December 06, 2011

Transparent studies are better, even if they're less rigorous

In 1985, Orioles manager Joe Altobelli claimed that power pitchers do better than finesse pitchers in cold weather. In the 1986 Baseball Abstract, Bill James wanted to check whether that was true.

So, he went through baseball history, and found sets of pitchers with exactly the same season W-L record, but where one was a power pitcher and one was a finesse pitcher. For example, Tom Seaver, a power pitcher, went 22-9 in 1975. He got paired with Lary Caldwell, a finesse pitcher, who went 22-9 in 1978.

Bill found 30 such pairs of pitchers. So, he had two groups of 30 pitchers, each with identical 539-345 (.610) records overall.

He compared the two groups in April, the cold-weather month. As he put it, "Altobelli was dead wrong. He couldn't have been more wrong." It turned out that the power pitchers were only 49-51 (.490) in April, while the finesse pitchers were 63-37 (.630). That's exactly opposite to what Altobelli had thought.


Nice study, right? I love it ... it's one of my favorites. But it wasn't very "sophisticated" in terms of methodology. For instance, It didn't use regression.

Should it? Well, I suppose you could make that argument. With regression, the study can be more precise. Bill James used only season W-L record in his dataset, but in your regression, you could add a lot more things: ERA, strikeouts, walks. You could include dummy variables for season, park, handedness, and lots of other things. And, of course, you wouldn't be limited to only 30 pairs of pitchers.

And you'd get a precise estimate for the effect, with confidence intervals and p-values.

But ... in my opinion, that would make it a WORSE study, not better. Why? One big reason: Bill's study is completely transparent, and understandable.

Take the four paragraphs above, and show them to a friend who doesn't know much about statistics. That person, if he's a baseball fan, will immediately understand how the study worked, what it showed, and what it means.

On the other hand, if you try to explain the results of a regression, he won't get it. Sure, you could explain the conclusions, and what the coefficient of walks and strikeouts mean, and so on. And he might believe you. But he won't really get it.

With the easy method, anyone can understand what the evidence means. With regression, they have to settle for understanding someone else's explanation of what the evidence means.

Reading Bill's study is like being an eyewitness to a crime. Reading the results of the regression is like hearing an expert witness testify what happened.


Now, you may object: why is that so important? After all, if it takes a sophisticated method to uncover the truth, well, that's what we have to do. Sure, it's nice if the guy on the street can be an eyewitness and understand the evidence, but that's not always possible. If we limited our research studies to methods that were intuitive to the layman, we'd never learn anything! Physics is difficult, but gives us cars and airplanes and electronics and nuclear energy. If it takes a little bit of effort and education to be able to do it, then that's the price we pay!

To which I have two responses. The first one is, that, actually, I agree. I'm not saying that we should limit our research to *only* studies that laymen understand better. I'm just saying that it's preferable *when it's possible*.

The second response, though, is: it's not just laymen I'm talking about. It's also sophisticated statisticians and sabermetricians. You, and me, and Tango, and Bill James, and JC Bradbury, and David Berri, and Brian Burke, and all those guys.

Because, the truth is, a regression study is not transparent to ANYBODY. I mean, I've read a lot of regressions in the last few years, and I can tell you, it takes a *lot* of work to figure out what's going on. There are a lot of details, and, even when the regression is simple, there's a lot of work do to in the interpretation.

For instance, a few paragraphs ago, I gave a little explanation of how a regression might work for Bill's study. And, suppose, making some numbers up, I get a coefficent of -.0007 per strikeout, and -.0003 per walk. What does that mean?

Well, you have to think about it. It means, for instance, that if Tom Seaver had 50 extra strikeouts and 20 extra walks than pitcher X, his April winning percentage will be .041 worse, all things being equal, than X. But ... it takes a while, and you have to do it in your head.

But wait! Not every pitcher in the study had the same number of innings pitched. So Tom Seaver's 50 extra strikeouts, do I have to convert that to a fixed number of innings? What did the study use? Now I have to go back and check.

And how do I interpret the results? Am I sure they really apply to all pitchers? I mean, suppose there's a pitcher like Seaver, who strikes out a lot of guys, but has better control, so he walks 30 fewer guys. Should I really assume that he'll do better in April than Seaver, since he's "less" of a power pitcher in that dimension?

Also, wait a sec. The regression included season W-L record, but that *includes* the April W-L record that it was trying to predict. That will throw off the results, won't it? Maybe the regression should have used only May to October. Or maybe it did? Now I have to go check that.

And what if it didn't, and there were a bunch of pitchers that pitched only one game after April? Will that throw off the results? If the confidence intervals suspect, are the coefficients suspect too?

I could go on ... there are a thousand issues that affect the interpretation of the regression's results. And it's impossible for any one person, even the best statistician in the world, to hold all thousand in his head at once.


You could now come back with another argument: "OK, the regression is harder to interpret, but we can just do that interpretation. Indeed, that's the duty of the researcher. When you write up a study, your duty isn't just to do a regression and report the results. It's to figure out what the results mean, and check that everything's been done right. Also, that's what the peer reviewers are for, and the journal editors."

To which the most obvious reply is: it's simply not true that peer reviewing makes sure everything is correct. There are loads and loads of problems with actual interpretation of regressions, some of which I've written about here. There was the paper that forgot to hold everything constant when interpreting a coefficient. There was the paper that decided that X picks ahead was worth zero, but drafting 2X picks ahead was significant. There was the paper that decided that a low "coefficient of determination" meant that the effect was insignificant, regardless of what the coefficient showed. And so on.

And those are the easy ones. I mean, sure, I'm no sophisticated peer reviewer with a Ph.D. who looks through a hundred of these a month, but I do have a decent basic idea of how regressions work and how baseball works. But, for some of these regressions, it took me a long time, measured in hours, to figure out what they actually were doing and (in many cases) why the results didn't really mean what the author thought they meant. It's not that you need the right kind of expertise ... it's that every case is different, and there's no formula. To figure out what a result actually means, you have to look at everything: where the data comes from, how it interacts, what is really being measured, what the coefficients mean, and, especially, if the model is realistic and if other models give different results.

As I've said before, regression is easy. Interpreting the regression is hard -- legitimately hard. And, unlike other hard problems, you don't know when or whether you've found the right answer. You can spend days looking at it, and you might still be missing something.


Which, of course, means that the simple method is more likely to be correct than the complicated method: if we understand a study, we're much more likely to spot its flaws. If Bill James did something dumb in how he compiled his data, most of us will catch it. If Joe Regressor does something wrong in his complicated regression, it's likely that nobody will see it (unless it's in a hard science, in which case a plane will crash or a patient will die or something).

And, of course, there's still the advantage that the simple study is easier to understand. That's an advantage even if the regression study is absolutely 100% correct. If you can see the answer in four paragraphs of arithmetic, that's better than if it takes ten pages of regression notation.

Which advantage is more important? Actually, it looks like the first one is more important ... but, you know, the second one can make a strong case.

That Bill James study that I mentioned earlier ... if you have a copy of the 1986 Abstract handy, you should go read it. It's on page 134. It's only two pages long, and easy reading, like most of Bill's prose.

If you've read it, I'd say that, right now, you KNOW the answer to Bill's question. I don't mean you know 100% that he's right, and what the answer is ... I'm saying that you know, almost 100%, the evidence and the logic. You may or may not think the results are conclusive -- for instance, I'd like to see a larger sample size -- but, either way, it's your own decision, not Bill's.

That is: even though Bill did the study, you instantly absorb the answer. You don't have to trust Bill about it. Well, you have to trust that he aggregated the data properly, and didn't cheat in what pitchers he chose. But you don't have to trust Bill's judgment, or Bill's knowledge of baseball, or Bill's interpretation. His interpretation will become yours: you'll see what he did, and understand it well enough that if someone challenges it, you can defend it. Bill's study is so transparent, that, after you read it, you understand the answer as well as Bill did, probably about as well as anyone can.

That's not true for a regression study. Often, for many readers (myself included), the explanation is impenetrable. The references for the methods refer you to textbooks that are hard to find and technical, and, usually, there's no discussion of what the results mean, other than just a surface reading of the coefficients. If you want to truly understand what's going on, you have to read the study, and read critically. Then you have to read it again, filling in the missing pieces. Then you have to look at the tables, and back to the text, then to the model. And then you still probably have to read it again.

And all this is assuming you already know something about how regression works. If you don't, you'll just have no idea.

So the difference is: if the study is simple, you know what it means. If the study is complicated, you don't know anything. You have to trust the person that wrote the study.

It's night and day: knowing versus not knowing. With the Bill James study, you know it's true. With a regression study, you just believe it's true.


To go a bit off topic ...

A lot of the factual things we think we know, and will passionately defend, we don't really know, except from what other people tell us. I bet everyone reading this has a strong opinion on creation vs. evolution, or whether global warming is real or a hoax, or whether 9/11 was or was not partly an inside job by the US government.

Take evolution. Let's suppose you believe in evolution, and you think creationism is not true, just wishful thinking by creationists. I think most of my friends fall into this category, and I've read surveys that say most Americans generally believe this.

But if that's you, can you really say that you KNOW it? You don't, probably. I bet you that for most of you (myself included), if someone asked you for examples of actual evidence for evolution, we'd have nothing. Seriously, zero. I couldn't even give a half-hearted attempt at a single sentence of why and how we know that evolution actually happened.

Sure, I believe evolution happened, but not because of any actual evidence. I believe it for secondhand reasons. I believe it happened because I know that scientists, serious researchers, have looked at the evidence, and that it's strong evidence. What I *do* think I know, from dealing (at arm's length) with teachers and authors and scientists and journalists, is that it's very, very unlikely that the worldwide supply of scientists, working independently and jockeying for discoveries and status and publications, and being so steeped in scientific method, and competing in an open marketplace of ideas, could have deluded themselves into misinterpreting so much evidence over so many decades.

So, you know, I can't say I know evolution is true. But now I DO know there is strong evidence that power pitchers do not outperform in April.

That's the beauty of the Bill James-type study, the one that lays it all out for you without using fancy techniques. It gives you a completely different feeling, the feeling that you actually know something, instead of just believing it from hearsay.

And that, to me, is an important part of what science is all about.

Labels: , , ,


At Tuesday, December 06, 2011 5:05:00 PM, Blogger David Pinto said...


I redid the study with data from 1986 on, and the result doesn't hold up.

At Tuesday, December 06, 2011 5:10:00 PM, Blogger Phil Birnbaum said...

Beautiful, thanks! I suspected that the effect would disappear, but I didn't think that it would reverse ... at least not that much in ERA.

BTW, could you post the overall records of the two groups? I'm curious about whether the two groups were different in April in other respects.

At Tuesday, December 06, 2011 5:10:00 PM, Blogger Phil Birnbaum said...

BTW, that was FAST. :)

At Tuesday, December 06, 2011 5:19:00 PM, Blogger Phil Birnbaum said...

Now that I think about it, the ERA difference might not actually indicate anything. The power pitchers probably had a better ERA than the finesse pitchers overall, not just in April ... a finesse pitcher probably has lower talent than a power pitcher with the same record. Therefore, he probably gave up more runs for the same number of wins.

Therefore, you'd expect the finesse pitchers to have a higher ERA anyway.

That's probably why Bill used only W-L record in his original study.

At Tuesday, December 06, 2011 8:56:00 PM, Anonymous bradluen said...

Because, the truth is, a regression study is not transparent to ANYBODY. I mean, I've read a lot of regressions in the last few years, and I can tell you, it takes a *lot* of work to figure out what's going on. There are a lot of details, and, even when the regression is simple, there's a lot of work do to in the interpretation.

I want to send this to every researcher in the world. Perhaps with a note saying that if you must do regression, include some graphs or examples that help us understand what the model is doing and how well it fits.

At Wednesday, December 07, 2011 1:15:00 AM, Blogger Don Coffin said...

I will say, though, that using eyewitness testimony as an example in support of your position is something you probably don't want to do. There's just way too much evidence that eyewitness testimony is often completely wrong. Just one reference out of literally thousands about this: http://agora.stanford.edu/sjls/Issue%20One/fisher&tversky.htm

At Friday, December 09, 2011 11:45:00 AM, Anonymous Ron said...

Baseball is for fun, so baseball stats are just for fun. In this context "better" means "gives the reader more pleasure." That will vary from reader to reader. There are people who couldn't follow the Bill James study (they'd get confused at the idea of matched controls). Many people refuse to accept the evidence that the "hot hand" is a myth, despite the fact that this can be explained fairly simply as well, because it is more fun to believe in streak shooting. Whatever floats your boat. It doesn't matter.

For things that really matter, though, you'd better do your best to understand reality, even when it is complicated. It took some sophisticated analyses to show that hormone replacement therapy was actually slightly harmful, where simpler analyses made it appear beneficial.

At Friday, December 16, 2011 1:43:00 PM, Anonymous Anonymous said...

Excellent point about regressions and understanding analysis. I'm a PhD economist and I feel the same way whenever I review an econometric analysis for a journal.

On "evolution," I suggest you read "The Beak of the Finch." It is exactly the kind of transparent evidence-to-interpretation story that you are describing and after reading it, I feel that I "know" evolution is a good explanation for how life on earth came to be as it is today.

At Friday, December 16, 2011 2:20:00 PM, Blogger Phil Birnbaum said...

Anonymous: thanks, I'll look for that book!


Post a Comment

<< Home