Did "The Book" really find evidence for clutch hitting?
For a long time, the most thorough sabermetric studies showed no evidence for the idea that "clutch hitting" exists -- that some players can "turn it on" more than others when the situation is particularly important. Dick Cramer's 1977 study, which compared batters' 1969 clutch performances to those in 1970, found only a very slight tendency for clutch hitters to repeat. That conclusion was criticized by Bill James in his recent "Underestimating the Fog," but better analyses have existed for many years. Pete Palmer's study in 1990 (.pdf, page 6) compared the actual distribution of players' clutch stats to what would be observed if clutchiness were completely random; it found almost an exact match. Then, in 2005, Tom Ruane did the same thing, but for a much larger population of batters, and came up with a similar result.
But three years ago, in "The Book," authors Tom Tango, Mitchel Lichtman, and Andy Dolphin used a different technique (and, I think, even more data), and came up with a different answer. They found that a tendency to clutch hitting does exist, and has a standard deviation of .008 points of OBA. That is, one out of every six batters will hit more than .008 (8 points) better in the clutch than overall; and, by symmetry, one in six players will hit 8 points *worse* in the clutch.
As far as I know, the authors never published their study in full, and their book gives only an outline of how they did it. But, still, I think I was able to figure out their method -- or at least a method that's probably close to what they did -- and I don't have the same confidence in their conclusions that their book does.
I have two disagreements with their study. First, that they used OBA instead of batting average; second, and more seriously, their result of .008
is not statistically significant is significant only at the 14% level, which is only moderate evidence against the competing view that clutch talent does not exist.
First, OBA. The difference between OBA and BA is mostly a matter of including walks. Walks are certainly important, and if you're trying to measure a player's ability or performance, on-base percentage is a much better measure than batting average. But when it comes to clutch, the traditional question is about *hitting* in the clutch, not *walking* in the clutch.
To my knowledge, ability to draw a base on balls in clutch situations has not been studied. But, unlike hitting, it wouldn't be surprising to find that some players are "better" at it than others. Take Barry Bonds, for example. In clutch situations, Bonds was more likely to be walked. (Here are his career splits.)
Of course, Bonds' walks were mostly intentional, and "The Book" omitted the IBB from its totals. But, still, if Bonds was much more likely to be walked, you'd think he'd also have been more likely to be pitched around; and so he'd draw more unintentional walks in clutch situations as well. Maybe there weren't as many "semi-intentional" bases on balls as intentional ones, but, still, a small number would be enough to account for a chunk of a standard deviation of .008.
For instance: suppose on every team the best hitter increases his OBA by about 17 points (.017) in the clutch, because of the semi-intentional walk, and the worst hitter decreases his OBA by the same 17 points. If the other 7 batters are exactly the same in clutch situations, and only these two are different, that's enough to give you an SD of almost exactly .008.
What's 17 points in practice? It's an increase of about 17 walks per 600 PA. And if a typical hitter gets 60 clutch PA a season, you're talking about 1.7 extra walks for one player on the team, and 1.7 fewer walks for a second player. That difference of 3.4 walks total is enough to give you the SD of .008 that the authors found.
That seems pretty realistic, and reasonable, doesn't it? Well, maybe not; I've artificially decided that only two players on the team are affected, which makes the variance move a lot more for 3.4 walks than it would if every player had some tendency. But, still, intuitively, it does seem like a small effect for walks could explain the whole thing.
And that means:
-- several studies have found no clutch ability in batting average;
-- "The Book" found clutch ability in on-base percentage;
-- intuitively, "clutch walking" would seem to be able to account for everything "The Book" found.
So, with that being the state of the evidence, I am inclined to believe that the evidence still suggests that clutch hitting skill doesn't exist, but "clutch walking" skill does.
But even if the authors had used batting average instead of OBP, and got the same result, the result isn't statistically significant. That's not just my conclusion, but also theirs; they say, on page 102,
"... we can merely state that there is a 68% probability that [the clutch talent SD] is between 3 and 12 points."
Since a 68% probability is 1 SD each way, the authors seem to be implying a standard error of about 4.5 points. That means a 95% confidence interval is about 9 points either way -- which includes zero.
Actually, I get an even wider confidence interval using my method (which might actually be the same as theirs). Let me go through it. For those of you who don't care about the math, you can skip this smaller print.
-- Math/details start here --
The study said that it included 848 players, with an average 2450 PA in non-clutch situations, and 200 in clutch situations. So I created 848 identical players with those numbers, and gave each player exactly zero clutch ability. Every player had an OBA of .340.
From the binomial distribution, the SD of each player's OBA over the non-clutch 2450 PA is .00957. The SD of each player's OBA over the clutch 200 PA is .0335. The SD of the difference between the two is the square root of the sums of the squares, which is .03484. That's 34.84 points of OBA.
That's the variance only due to randomness, or luck. If there truly is variance in players' *talent* for clutch hitting, the observed variance would be higher. How much higher? Well, if you assume that talent and luck are independent, then, as the authors often point out on their blog,
Variance (observed) = variance (talent) + variance (luck)
Since the authors concluded a talent variance of 8 points squared, we can assume that
Variance (observed) = 8 points squared + 34.84 points squared
Which means that
Variance (observed) = 35.75 points squared
Since the SD is the square root of the variance, we get
SD(observed) = 35.75 points
So, presumably, in their population of 848 players, the authors observed the SD of the clutch difference was 35.75 points.
Now, if there really was no such thing as clutch ability, how often would we observe an SD of more than 35.75 points due to luck alone, when the expected number is only 34.84? To check, I ran a simulation, and the answer was: about 14% of the time.
That's obviously not significant, 14%.
Another way to check: the SD of the simulated variance was about .88 of a point. The difference between 35.75 and 34.84 is about .91 of a point. So the observed difference was almost exactly 1 SD from zero. Again, that's not significant.
If we look for a 68% confidence interval like the authors had, 1 SD on each side, we get (34.87, 36.63). That means a 68% confidence interval for clutch talent is 0.1 to 11.3 points. That's different than what the authors gave -- 3 to 12 points -- but I'm not sure why.
Either way, the observed effect is certainly not statistically significant.
-- math/details end here --
To restate my conclusions for those who skipped the math:
The effect "The Book" found is about 1 SD from zero, which is certainly not statistically significant. It's at the 14% level, not the
So, to sum up:
-- two previous studies found no evidence of clutch talent in batting average;
-- Tango/mgl/Dolphin found a small measure of clutch talent, but it wasn't statistically significant.
From that alone, I'd say our conclusion still has to be: not evidence to assume clutch talent. But if you add:
-- Tango/mgl/Dolphin's non-significant result included clutch walks, which common sense strongly suggests *do* vary by player,
Then, to me, that removes most of the last bit of doubt. I think that even if the effect they found is real, there's a really good chance it's caused by walks.
Hey, guys, how about running the study again using batting average?
(UPDATE: some statements on statistical significance replaced by something more accurate.)