Sunday, October 11, 2009

Doesn't "The Book" study pretty much settle the clutch hitting question?

The clutch hitting debate continues. For the latest, here's Tango quoting Bradbury quoting Barra. Bradbury references Bill James' essay, and Barra references Dick Cramer's 1977 study.

In Tango's post, he says,

Anyway, as for actually finding a clutch skill, Andy [Dolphin] did in fact find it, and the results are published in The Book.

Absolutely. It's time, I think, that this study be acknowledged as the most relevant to the clutch question. Cramer's study gets quoted because it's the most famous, but recent studies (like Tom Ruane's) have used a lot more data. Dolphin's study improves on Ruane's by including even more data, by correcting for various factors, and by giving an actual quantitative estimate of how much clutch hitting talent there really is.

The one fault with Dolphin's work is that it hasn't been published in full. This is understandable: "The Book" contains a huge number of studies, and if they were all run in detail, the book would be a couple of thousand pages. But this is one of the most important studies, on one of the most asked questions in sabermetrics. If we want sabermetricians, academics, and reporters to accept the results, the study should be published in full, so as to be subject to full peer review. I'm not even completely sure how the study worked. I have a pretty good idea of the outline, but not the details. Part of the reason the study needs to be published is for the technical details to be available, so others can evaluate the method and reproduce the results if they choose to.

Anyway, here's what I *think* Andy did:

-- he took every regular-season game from 1960 to 1992.
-- he considered only PAs involving RHP, to eliminate platoon bias.
-- for every player who met minimum playing time, he computed his clutch and non-clutch OBP.
-- he adjusted those OBPs to reflect the quality of the opposing pitcher, and the fact that overall clutch and non-clutch OBPs differ.
-- he computed clutch performance by subtracting non-clutch from clutch.

That gave him clutch numbers for 848 players.

-- he looked at the distribution of clutch hitting, and figured the observed variance.
-- he then figured what the variance would have been if there were no clutch hitting.

It turned out that the actual variance was higher than the predicted variance, which is what you'd expect if there were something other than just luck causing the results (such as clutch hitting talent). The difference we can presume to be clutch hitting.

If luck and talent are independent (which is a pretty reasonable assumption), then

Variance caused by talent = (Total Variance) - (Variance caused by luck)

That calculation led Andy to conclude that the talent variance was .008 squared, which meant the standard deviation of clutch talent was 8 points of OBA.

Andy phrased it like this:

"Batters perform slightly differently when under pressure. About one in six players increases his inherent "OBP" skill by eight points or more in high-pressure situations; a comparable number of players decreases it by eight points or more."

That finding, I think, is the strongest we have, and I agree with Tango 100% that we should consider Andy's .008 figure to be the best available answer to the clutch hitting question.


As I said in previous posts, however, I do have some minor reservations about what we can conclude from the analysis, so it's appropriate to add a few caveats.

1. Mostly, I'm not convinced that the .008 represents individual clutch ability in the sense in which most fans think of it -- that the player "bears down" in important situations and performs better than normal. I wonder if, instead, it might just be a matter of both hitters and pitchers using different strategies in those clutch situations.

For instance, suppose you have a power hitter and a singles hitter, and neither gets any better in the clutch. But in those situations, the relative values of offensive events might change. Maybe, with the score close in the late innings, a home run becomes more valuable relative to a single. I'm making these numbers up, but, maybe instead of the HR being three times as valuable as a 1B, it becomes four times as valuable.

Now, the pitcher's strategy changes. Fearing the home run a little more than normal, he'd be apt to pitch around the power hitter, trading fewer home runs for more walks. That would cause the power hitter's OBP to increase more than expected. Even if there's no similar effect for the singles hitter, he'll look relatively worse in the clutch than the power hitter.

So it's possible, and even plausible, that the .008 might not be a reflection of the clutch behavior of an individual hitter, but just an artifact of the strategic manoeuvering in the batter-pitcher matchup.

To find out, you could check whether certain types of hitters have better clutch performances as a group. If you did find that, it would be evidence that at least part of what Andy found as "clutch ability" is just characteristics of the player.

There is some evidence that some of this is happening: in the book, Andy says that when he used wOBA (which weights events by their value, so HRs are worth about three times what a single is worth) instead of OBP (which weights all on-base events equally), the SD dropped from 8 points to 6. That suggests that clutch performance did indeed involve a trade-off between getting on base and hitting for power.

If you went one step further, and analyzed performance in terms of win probability (instead of OBP or wOBA), you might find some other result, such as no evidence of clutch talent at all. It could be that all the clutch differences are the result of hitters adjusting their game to what the situation requires, such as (say) a power hitter trying for a single with the bases loaded, vs. a home run with two outs and nobody on.

2. Just today, Matt Swartz suggested that lefties might be more "clutch" than righties, because they hit better with runners being held at first (I always thought that was because of the hole between first and second, but Matt suggests it's because that limits the defense's ability to shift in other ways). Again, that's something that's real -- so the team would know they could benefit from it -- but not "clutch" in the sense that the hitter is actually better in some way.

3. Another quibble I have with the conclusion is that the result appears to be not that significantly different from zero. Andy says there's a 68% probability that clutch talent is between 3 and 12 points; I calculated that the 95% confidence interval easily includes zero (the p-value of zero is somewhere around .14). So even if you're only interested in whether there's an ability to have a higher OBP (in the sense that some players' clutch OBPs vary more than others), the evidence is not conclusive beyond a reasonable doubt.

4. As Andy implies in "The Book" (and Guy explicitly suggests elsewhere), there could be other explanations for the .008. It could be that some players happened to have more clutch AB at home, so what we're seeing is partly HFA. It could be that some players happened to see a starter for the third time that game (when batters start gaining an advantage) more often in than expected in the clutch. It could be a lot of other things.

Guy suggests doing the same study, but choosing the PA randomly (instead of clutch and non/clutch). That would tell us how much of the .008 happens due to random clustering of factors.

(Note: just as I was about to submit this post, I found an earlier Andy Dolphin study that *does* do this kind of check. Andy found that dividing PA into other situations did not produce any false positives.)


Even if some of these criticisms turn out to be justified, it doesn't mean that clutch doesn't matter. Even if we find the entire effect is (say) due to lefties hitting better with runners on base, that's still something a manager or a GM should take into account. If you have two .270 hitters, but one hits .270 all the time, while the other hits .268 usually but .276 in the clutch ... well, you want the second guy. It doesn't really matter to you whether the extra performance comes from the players gutsiness, or just from something that's inherent in the game.

But my perception is that fans who talk about "clutch" are talking about something in a player's make-up or psychology that makes him more heroic in critical situations. I'd argue that while "The Book"'s study convincingly showed that some players hit slightly better (or worse) in clutch situations, it has NOT showed that it's because the players themselves are "clutch".


Looking back at what I wrote, I realize I'm repeating things I said before. But the point I was trying to make is that I agree with Tango: the study in "The Book" is state of the art, and, to my mind, the question of whether players hit differently in the clutch now has an answer.

I'm not sure how to get the result accepted. Well, publication of the study would help; the media are more likely to pay attention to a result if it's a full academic-type study instead of a few pages of a book. I'm sure JQAS would be happy to run it. Even a web publication would help.

What else? Well, I suppose that the more the sabermetric community cites the result, the more it'll spread, and the more likely sportswriters will be to come across it when researching clutch.

Or maybe a press release? It works for Steven Levitt!

Labels: ,


At Sunday, October 11, 2009 10:12:00 PM, Anonymous Guy said...

I agree that Dolphin's study should be considered the best evidence for clutch. Still, there are a couple of other factors that may cause Dolphin's study to overstate clutch performance. One is the changing spread between clutch and non-clutch performance over time. Dolphin's data spans 1960 to 1992, plus 2000-2004. However, batters' peformance in clutch vs. non-clutch situations was not constant over these years -- clutch performance was about equal to non-clutch in 1960, but then declined over the next few years (presumably because of increased use of relievers). Cyril Morong presents this trend nicely in this post: So we would expect some variance among hitters simply as a function of the years they played.

Now, Dolphin does apply an adjustment for quality of pitchers faced. I'm not sure how pitcher talent was established, but it seems likely he used each pitcher's career OBP allowed. That will provide only a rough correction for more recent hitters facing a lot of relievers, since many relievers' lifetime stats reflect performance as a starter as well. (A set-up man pitching only the eight inning may be quite a bit better than his lifetime stats suggest.) Current year's OBP allowed would deal with that problem, but the amount of noise in a one year's rate for a reliever is huge, so again the adjustment for pitcher quality will be very crude.

The second issue is player age. A player may not see the same proportion of clutch PAs at each age (especially since only 8% of a hitter's PAs are defined clutch, on average). Some hitters, for example, may be allowed to hit in clutch situations while at their peak (age 27-28), but then be pinchhit for in similar situations as he ages. If so, his clutch PAs will look better since they come disproportionately from his peak years.

A lot of this stuff obviously "evens out" when studying groups of hitters. But in this methodology, these small differences all contribute to the variance that is being considered "clutch."

In the end, I think you want to be able to look at the clutch performers and see who they are. Are they more lefthanded than non-clutch players? Do they have more clutch PAs in their home parks? Do they come more from the 1960s and 1970s, less from the 1980s and later? If we had answers to those questions, we'd have a better sense of how much clutch is being measured.

At Sunday, October 11, 2009 10:46:00 PM, Blogger Phil Birnbaum said...

Sure, Guy, I agree with you that all those things should be considered. If you're saying that the .008 should be considered an upper bound on the amount of clutchness, I'd be inclined to go along with that.

And I also agree that it's worth seeing who the clutch guys and non-clutch guys are to help understand what's really going on.

Still, the .008 is the best estimate we have so far, isn't it?

At Monday, October 12, 2009 1:20:00 PM, Anonymous Guy said...

I think I would agree that .006 wOBA represents the likely upper bound of clutch talent. Because pitchers may elect to issue "intentional unintentional walks" to good hitters in high pressure PAs, I think OBP alone is less useful here.

Still, a lot of that .006 could be factors that aren't fully controlled for, rather than true clutch ability. I think you can control for pitcher quality well when comparing two pools of hitters. But at the individual player level, whatever you do is subject to lots of error. Facing a starter for the 3rd time may be much easier than a good 8th-inning reliever, but the stats appear to say the starter is tougher. The study is limited to RHPs, but The Book also shows that pitchers can have different platoon splits -- so a RH hitter might happen to face pitchers in the clutch who are especially tough/easy for RHHs. And some hitters just hit particular pitchers well or poorly, which can't be adjusted for.

The biggest concern to me is years of play: the growing clutch/non-clutch spread over time must create some of this variance (how much, I don't know). Then there's HFA, etc., etc. The method of taking the total variance and subtracting the binomial variance is brilliant, but these issues are its achilles' heal: you have to control for EVERYTHING else to know what you really have left.

It is interesting that Andy looked at some semi-random splits and found less variance there. But I would note that in each case, if I'm reading him correctly, there was more variance than you would get from the binomial alone. So at least some of what appears to be clutch ability is not.

* *

One thing I noticed going back to The Book is that Andy defines clutch based only on inning and score. So leverage that's created by the presence of baserunners isn't a factor. That removes some possible sources of bias, like the LHH issue raised by Matt S. On the other hand, it leaves out part of what we sometimes think of as clutch, which is hitting with men on base or in scoring position. I do think it might be useful, in any future studies of clutch, to distinguish between situational hitting (base/out situation) and hitting under "pressure" (inning/score). These may be different skills (if they are skills at all).

At Monday, October 12, 2009 1:25:00 PM, Blogger Phil Birnbaum said...

I agree with everything you say. Especially the part about baserunners ... I read that Andy based it only on score, but it didn't occur to me that baserunners were ignored. Like you, I'd want to see what happens with a different definition of clutch.

At Monday, October 12, 2009 3:31:00 PM, Blogger Don Coffin said...

"It turned out that the actual variance was higher than the predicted variance, which is what you'd expect if there were something other than just luck causing the results (such as clutch hitting talent). The difference we can presume to be clutch hitting."

Always assuming that the underlying distribution is, in fact, normal, that there aren't the same sort of "fat tails" that exist in other distributions (e.g., financial markets).

And since one way of interpreting Dolphin's results is that the underlying distribution is, in fact, not normal, attributing the difference to an underlying talent is, in many ways, assuming the conclusion...

At Monday, October 12, 2009 3:37:00 PM, Blogger Phil Birnbaum said...


">Always assuming that the underlying distribution is, in fact, normal, that there aren't the same sort of "fat tails" that exist in other distributions (e.g., financial markets)."

Not sure what you mean here. If you get a higher variance than expected from binomial, then the presumption is that there's something else adding to the variance. You don't have to assume that the "something else" is normally distributed (although I think Andy did in coming up with a confidence interval).

That is: it's possible that the entire .008 comes from a few players who are MUCH better in the clutch, and the rest of the players who are at zero.

I'm thinking, though, that I don't understand what you're getting at here.

At Tuesday, October 13, 2009 11:10:00 AM, Anonymous Guy said...

FYI: JC Bradbury has a new, short clutch study up on his blog:

At Wednesday, October 14, 2009 10:53:00 PM, Blogger Cyril Morong said...

Sorry if I missed it, but I think if we do find that players hit differently in the clutch, we need to ask how many extra wins does it generate? Batting 20 points better in close and late situations will win you more games than if you just hit like you normally did (yes, I know we need to adjust for the fact that hitting is generally lower in CL), but how many more games? How many exta wins a season did or would the best clutch hitter add? And is it worth taking into consideration when making personnel decisions? Has any team ever used clutch performance in trades or salary offers? If the answer to that is no, then is clutch a meaningful concept?

At Friday, October 16, 2009 11:28:00 PM, Anonymous Jim Glass said...

It turned out that the actual variance was higher than the predicted variance, which is what you'd expect if there were something other than just luck causing the results (such as clutch hitting talent). The difference we can presume to be clutch hitting.

That's an awful lot of hard work and straining of the eyesight to detect what is "presumed" to be a clutch hitting effect with a top bound of .008.

we need to ask how many extra wins does it generate?

Exactly. As far as the practical effects of "clutch hitting" are concerned -- from Yankee fans knowing Jeter is better than A-Rod in the clutch, to GMs knowing how much more payroll to hand over to clutch hitters from other players, I'd say a study like this pretty much proves clutch hitting does not exist.

0.008 is 1 extra hit in 125 clutch at bats. How many games per year does a player win with that?

And considering how many "clutch" at bats there are in the average season, how many years would a player have to play before compiling enough performance data for a GM to reasonably identify him as being worth the going rate for that 0.008 clutch peformance bonus?

If the results of this study stand up, then for sabremetric theoreticians it mught be very interesting to prove that "clutch performance" does exist at some level ... even if at the same time it proves that level is one so small as to have no effect on the practical play of the game.


Post a Comment

<< Home