Monday, August 17, 2009

Did "The Book" really find evidence for clutch hitting?

For a long time, the most thorough sabermetric studies showed no evidence for the idea that "clutch hitting" exists -- that some players can "turn it on" more than others when the situation is particularly important. Dick Cramer's 1977 study, which compared batters' 1969 clutch performances to those in 1970, found only a very slight tendency for clutch hitters to repeat. That conclusion was criticized by Bill James in his recent "Underestimating the Fog," but better analyses have existed for many years. Pete Palmer's study in 1990 (.pdf, page 6) compared the actual distribution of players' clutch stats to what would be observed if clutchiness were completely random; it found almost an exact match. Then, in 2005, Tom Ruane did the same thing, but for a much larger population of batters, and came up with a similar result.

But three years ago, in "The Book," authors Tom Tango, Mitchel Lichtman, and Andy Dolphin used a different technique (and, I think, even more data), and came up with a different answer. They found that a tendency to clutch hitting does exist, and has a standard deviation of .008 points of OBA. That is, one out of every six batters will hit more than .008 (8 points) better in the clutch than overall; and, by symmetry, one in six players will hit 8 points *worse* in the clutch.

As far as I know, the authors never published their study in full, and their book gives only an outline of how they did it. But, still, I think I was able to figure out their method -- or at least a method that's probably close to what they did -- and I don't have the same confidence in their conclusions that their book does.

I have two disagreements with their study. First, that they used OBA instead of batting average; second, and more seriously, their result of .008 is not statistically significant is significant only at the 14% level, which is only moderate evidence against the competing view that clutch talent does not exist.

First, OBA. The difference between OBA and BA is mostly a matter of including walks. Walks are certainly important, and if you're trying to measure a player's ability or performance, on-base percentage is a much better measure than batting average. But when it comes to clutch, the traditional question is about *hitting* in the clutch, not *walking* in the clutch.

To my knowledge, ability to draw a base on balls in clutch situations has not been studied. But, unlike hitting, it wouldn't be surprising to find that some players are "better" at it than others. Take Barry Bonds, for example. In clutch situations, Bonds was more likely to be walked. (Here are his career splits.)

Of course, Bonds' walks were mostly intentional, and "The Book" omitted the IBB from its totals. But, still, if Bonds was much more likely to be walked, you'd think he'd also have been more likely to be pitched around; and so he'd draw more unintentional walks in clutch situations as well. Maybe there weren't as many "semi-intentional" bases on balls as intentional ones, but, still, a small number would be enough to account for a chunk of a standard deviation of .008.

For instance: suppose on every team the best hitter increases his OBA by about 17 points (.017) in the clutch, because of the semi-intentional walk, and the worst hitter decreases his OBA by the same 17 points. If the other 7 batters are exactly the same in clutch situations, and only these two are different, that's enough to give you an SD of almost exactly .008.

What's 17 points in practice? It's an increase of about 17 walks per 600 PA. And if a typical hitter gets 60 clutch PA a season, you're talking about 1.7 extra walks for one player on the team, and 1.7 fewer walks for a second player. That difference of 3.4 walks total is enough to give you the SD of .008 that the authors found.

That seems pretty realistic, and reasonable, doesn't it? Well, maybe not; I've artificially decided that only two players on the team are affected, which makes the variance move a lot more for 3.4 walks than it would if every player had some tendency. But, still, intuitively, it does seem like a small effect for walks could explain the whole thing.

And that means:

-- several studies have found no clutch ability in batting average;
-- "The Book" found clutch ability in on-base percentage;
-- intuitively, "clutch walking" would seem to be able to account for everything "The Book" found.

So, with that being the state of the evidence, I am inclined to believe that the evidence still suggests that clutch hitting skill doesn't exist, but "clutch walking" skill does.


But even if the authors had used batting average instead of OBP, and got the same result, the result isn't statistically significant. That's not just my conclusion, but also theirs; they say, on page 102,

"... we can merely state that there is a 68% probability that [the clutch talent SD] is between 3 and 12 points."

Since a 68% probability is 1 SD each way, the authors seem to be implying a standard error of about 4.5 points. That means a 95% confidence interval is about 9 points either way -- which includes zero.

Actually, I get an even wider confidence interval using my method (which might actually be the same as theirs). Let me go through it. For those of you who don't care about the math, you can skip this smaller print.

-- Math/details start here --

The study said that it included 848 players, with an average 2450 PA in non-clutch situations, and 200 in clutch situations. So I created 848 identical players with those numbers, and gave each player exactly zero clutch ability. Every player had an OBA of .340.

From the binomial distribution, the SD of each player's OBA over the non-clutch 2450 PA is .00957. The SD of each player's OBA over the clutch 200 PA is .0335. The SD of the difference between the two is the square root of the sums of the squares, which is .03484. That's 34.84 points of OBA.

That's the variance only due to randomness, or luck. If there truly is variance in players' *talent* for clutch hitting, the observed variance would be higher. How much higher? Well, if you assume that talent and luck are independent, then, as the authors often point out on their blog,

Variance (observed) = variance (talent) + variance (luck)

Since the authors concluded a talent variance of 8 points squared, we can assume that

Variance (observed) = 8 points squared + 34.84 points squared

Which means that

Variance (observed) = 35.75 points squared

Since the SD is the square root of the variance, we get

SD(observed) = 35.75 points

So, presumably, in their population of 848 players, the authors observed the SD of the clutch difference was 35.75 points.

Now, if there really was no such thing as clutch ability, how often would we observe an SD of more than 35.75 points due to luck alone, when the expected number is only 34.84? To check, I ran a simulation, and the answer was: about 14% of the time.

That's obviously not significant, 14%.

Another way to check: the SD of the simulated variance was about .88 of a point. The difference between 35.75 and 34.84 is about .91 of a point. So the observed difference was almost exactly 1 SD from zero. Again, that's not significant.

If we look for a 68% confidence interval like the authors had, 1 SD on each side, we get (34.87, 36.63). That means a 68% confidence interval for clutch talent is 0.1 to 11.3 points. That's different than what the authors gave -- 3 to 12 points -- but I'm not sure why.

Either way, the observed effect is certainly not statistically significant.

-- math/details end here --

To restate my conclusions for those who skipped the math:

The effect "The Book" found is about 1 SD from zero, which is certainly not statistically significant. It's at the 14% level, not the required traditional 5%. This doesn't mean it can be ignored, but that it constitutes fairly weak evidence.


So, to sum up:

-- two previous studies found no evidence of clutch talent in batting average;

-- Tango/mgl/Dolphin found a small measure of clutch talent, but it wasn't statistically significant.

From that alone, I'd say our conclusion still has to be: not evidence to assume clutch talent. But if you add:

-- Tango/mgl/Dolphin's non-significant result included clutch walks, which common sense strongly suggests *do* vary by player,

Then, to me, that removes most of the last bit of doubt. I think that even if the effect they found is real, there's a really good chance it's caused by walks.

Hey, guys, how about running the study again using batting average?

(UPDATE: some statements on statistical significance replaced by something more accurate.)

Labels: ,


At Monday, August 17, 2009 12:27:00 AM, Blogger Phil Birnbaum said...

Just for interest, here's how often the simulation came up with various estimates for clutch SD:

12 points or more: 2% of the time
8-12 points: 12% of the time
3-8 points: 29% of the time
0-3 points: 6% of the time

No estimate, due to the observed variance being less than the theoretical: 51% of the time

At Monday, August 17, 2009 7:29:00 AM, Blogger Phil Birnbaum said...

Actually, my summary may be a bit overstated. The Palmer and Ruane studies did not quantify their effects in terms of an SD of talent; they just observed that the distribution intuitively looked like what you'd get if there were no such thing as clutch talent. You might argue that those studies *could* be consistent with a clutch talent distribution of something other than zero, and I suppose you could find a way to check.

I'm pretty sure that the Ruane data is not consistent with a talent SD of .008 (in batting average), but I haven't done any work to check that.

Also, since those two previous studies used a subset of the data that this one used, it's not really true that this study contradicts those studies. This study doesn't just add new data that contradicts the old data -- rather, it adds new data and implicitly re-evaluates the old data at the same time.

So my implied argument, "the old data still contradicts your new data" does have a plausible answer.

But the other arguments -- that (a) the confidence interval is wide and includes zero, and (b) walks are likely to be causing a big chunk of the effect -- are still strong enough that I don't think there's enough evidence to say that "clutch hitting" (by its usual definition) exists.

At Monday, August 17, 2009 9:37:00 AM, Blogger Anthony said...

Didn't they use wOBA in The Book, not OBA? How does that change the results?

At Monday, August 17, 2009 10:00:00 AM, Blogger Phil Birnbaum said...

They found an SD of 8 points for OBA, and 6 points for wOBA.

To me, that suggests that walks are a large part of the difference, because when you add other stuff and rescale to the same scale as OBA, you get a smaller effect.

At Monday, August 17, 2009 11:20:00 AM, Blogger Cyril Morong said...


Part of my SABR con clutch presentation in Bostong looked at how OBP changes in various situations. I calculated z-scores the same way as Palmer. So it takes into account how much different a guy did in the clutch and also takes into account if their is a league wide differential between the clutch situation and the non-clutch situation. I looked at all the guys who had 6000+ PAs from 1987-2001. I had 70 guys and 5% of that is about 4. So you normally expect 4 guys to have a z-score of 1.96 or more (or less for -1.96)

Here are the z-scores for OBP with RISP (IBBs included)

Barry Bonds 6.347
Tim Raines 4.502
Paul Molitor 4.352
Chili Davis 4.188
Joe Carter 3.638
Mark McGwire 3.581
Wally Joyner 3.374
Tony Gwynn 3.356
Tony Fernandez 3.337
Eddie Murray 3.321
Ken Caminiti 3.161
David Justice 2.893
Robin Ventura 2.827
Terry Pendleton 2.765
Cal Ripken 2.588
John Olerud 2.383
Jay Buhner 2.378
Albert Belle 2.185
Jeff Bagwell 2.107
Tino Martinez 2.069
Ken Griffey Jr. 2.001
Steve Finley -2.000
Omar Vizquel -2.549

Now RISP with IBBs taken out

Tony Fernandez 3.914
Tim Raines 3.822
Jay Buhner 3.073
Jay Bell 2.999
Wally Joyner 2.937
David Justice 2.626
Cal Ripken 2.511
Ken Caminiti 2.448
Mark McGwire 2.419
Greg Vaughn 2.345
Todd Zeile 2.320

Now close and late with IBBs included

Mark Grace 2.667
Edgar Martinez 2.451
Tony Gwynn 2.044
Tino Martinez 1.986
Travis Fryman -2.557

Now close and late with IBBs taken out

Edgar Martinez 2.216
Mark Grace 2.144
Tino Martinez 2.029
B.J. Surhoff -2.063
Travis Fryman -2.158
Ken Caminiti -2.390

So there are more than 4 guys in each case, even in the cases with IBBs taken out. But we never know who got there by luck and who really was clutch at getting on base. Also, we don't know if there were semi-intentional walks.

Then look at RISP with IBBs taken out. Allo the z-scores are positive and there's a bunch

At Monday, August 17, 2009 11:26:00 AM, Blogger Phil Birnbaum said...

Cy, thanks. Your comment got cut off, there might be a size limit.

So what we have is: with walks taken out, two studies show almost no clutch effect. With (non-intentional) walks in, two studies (yours, and The Book's) show a very slight clutch effect.

So it seems to me like there is a clutch effect for walks, and that's what's being found.

At Monday, August 17, 2009 11:33:00 AM, Blogger Cyril Morong said...

No, it did not get cut off. Bunch was my last word. I really don't know for sure what the z-scores mean. Notice with RISP that taking out IBBs drastically lowers the significant cases and with IBBs taken out for close and late situations there are at most two clutch hitters since we would expect 4 and there are 6 significant z-scores.

But can't you remember the announcers always commenting on how good Tony Fernandez was at taking a walk with RISP?

At Monday, August 17, 2009 11:36:00 AM, Blogger Phil Birnbaum said...

Well, we know that some players are "clutch" IBBers, since they get walked a lot in critical situations. That doesn't really tell us about clutch hitting in general.

I also believe that there are guys who are clutch walkers ... maybe Tony Fernandez is one of them. That would explain why you got 6 significant results instead of 4 ... that's the kind of effect you'd expect if there were a slight clutch walk talent.

At Monday, August 17, 2009 7:28:00 PM, Anonymous Guy said...

Phil/Cyril: Couldn't the "clutch" BBs be a function of the pitcher's decision of how best to pitch to these hitters in clutch situations, rather than a skill of the hitters'? It seems to me too much of a coincidence that Cyril's hitters are almost all good-to-great hitters -- exactly the type of hitters who pitchers should walk more than average in clutch PAs.

Back to the Book study: I would add one more potential objection to Phil's: the observed=real+random method assumes that all variance beyond binomial variation equals an underlying skill. But there are many other sources of variance: some hitters may have had many/few clutch PAs at home; some may have had many/few with the platoon advantage; some may have had more than average clutch PAs when facing a pitcher for the 3rd time in the game.

I think to use this method, researchers first need to determine the variance in players' performance under two set of conditions we know to be random, like day of the month. If the clutch/non-clutch variance is greater than the odd/even date variance, for example (and using equal sample sizes), then that's evidence of clutch talent. But I suspect it may not be any larger.

At Monday, August 17, 2009 7:31:00 PM, Anonymous James said...

I find the 5% cutoff for statistical significance to be completely arbitrary, so I think it's fair to say that the study in The Book found "some" evidence for clutch hitting.

At Monday, August 17, 2009 8:58:00 PM, Blogger Cyril Morong said...


I agree with your points. I definitely was not saying that those z-scores proved anything. I think that Tom Ruane many years ago on the SABR list did the kind of thing you suggest (ood/even). Maybe it is online somewhere. Here is a link to one of his studies. Not sure if has what I was thinking of


At Monday, August 17, 2009 10:48:00 PM, Blogger Phil Birnbaum said...


Yes, I agree that "clutch walkers" are probably that way because of the behavior of the pitchers.

For "The Book" study, they did normalize for quality of pitcher and lefty/righty. But I agree with you that there might be other factors.

FWIW, I added a bit of randomness to every PA in my simulation (each PA's randomness was independent of all the others). It didn't change the results hardly at all.

At Monday, August 17, 2009 10:49:00 PM, Blogger Phil Birnbaum said...

James: sure, what they found is certainly "some evidence." My argument is that, especially in the light of other studies that found nothing, a 15% significance level isn't enough to declare that clutch hitting has been found.


Post a Comment

<< Home