Changing my mind on "The Book" and clutch hitting
My last post talked about the clutch study in "The Book." It turns out that study was written by Andy Dolphin, who responds in a comment at "The Book" blog here, as does co-author Mitchel Lichtman (mgl). The comments are definitely worth reading.
I had two arguments, one about statistical significance, and one about walks. To summarize them (perhaps more clearly than in the original post):
-- previous studies found no evidence of clutch hitting talent.
-- Andy's study found evidence of clutch hitting (OBA) talent with variance .008.
-- The .008 is not statistically significant only at p=.14 (14% rather than the traditional 5%). It therefore constitutes fairly weak evidence.
-- Combine that weak evidence of .008 with the previous studies that found zero, and there's still a fair bit of doubt on whether clutch hitting exists.
-- if you include intentional walks, it seems obvious that the best hitters will appear to be "clutch".
-- there is such a thing as a "semi-intentional walk".
-- generally, the players who receive IBBs will be the same ones who receive "semi" IBBs.
-- so it seems like the best hitters will appear to be "clutch" just because of those semi-intentional walks.
-- but extra semi-intentional walks is not what "clutch hitting" traditionally means;
-- and so Andy's study may not answer the same question that's being asked.
To clarify: I have no objection to anything in Andy's study itself, just as to the conclusions you can draw from its results.
Anyway, I've changed my mind; I now think that we can draw firmer conclusions from Andy's study, and lean towards his result that clutch hitting exists. I stand by my original logic; but I did another simulation, and my view of the facts has changed.
Specifically: I no longer believe that the previous studies necessarily found evidence of zero clutch hitting. I thought they did, but, on further examination, I think the Tom Ruane study gives results that are perfectly consistent with what Andy found: clutch hitting variance of .008 points of OBA.
Here's what Tom did. He found 727 players who met his cutoff for plate appearances. For each player, he found the difference between each player's "clutch" BPS (batting average plus slugging), and compared it to his "non-clutch" BPS. Then, he broke the 727 differences into categories -- 0 to 15 points clutch, 45-60 points choke, and so on.
Then, he did the same thing, but, instead of using "clutch" and "non-clutch" AB, he divided the AB in each group randomly. And so, if there is no such thing as clutch talent -- if clutch hitting is, in effect, random -- the two groups should break down exactly the same.
And they pretty much did. Here, from Tom's study, are the two groups:
-J -I -H -G -F -E -D -C -B -A A B C D E F G H I J
Real 1 3 6 5 15 21 46 76 77 115 105 88 69 41 31 13 10 1 3 1
Fake 1 2 3 7 14 26 45 70 94 109 109 92 68 45 26 14 7 3 2 1
They're very, very close. It's hard to tell because of the columns not lining up, so let me leave out the middle columns and make things easier to read:
... -J I H G .... G H I J
Real 1 3 6 5 ... 10 1 3 1
Fake 1 2 3 7 .... 7 3 2 1
Taking groups G to J, which comprise players who were at least 105 points better in the clutch, we see that there were 15 in real life, and 13 random. On the choke side of -G to J, there were again 15 in real life, and 13 random.
Here's where I made my wrong assumption: I figured that if there were any real difference between the two groups, even a small one, we'd see a much larger dispersion in the "real" row. I thought we'd see a lot more extreme values -- more than a ratio of 15:13.
I was wrong. I ran a simulation, where I ran a "fake" row, then added an extra variance of .006 points (which is what Andy found for wOBA) to simulate what a "fake" row would look like if Andy's number were real. The results were indistinguishable. Indeed, I think you could add a lot more than .006 and still not be able to see any difference in the two rows. There is just so much randomness there that any difference in talent gets washed out in this kind of comparison.
Also, Tom's results are consistent with my simulation of Andy's result. My simulation found a p-value of .14. Tom's study found that the "real" data were at the 11th percentile of the distribution of "fake" data -- a p of .11. So it seems that Tom and Andy are consistent with each other. That makes sense, because some of the data they used overlapped. Also, Tom's data didn't include walks, which calls into question my argument that the walks might be causing a large proportion of the effect.
So what we have is now:
-- Andy found an effect of .008 of OBA;
-- That's completely consistent with Tom Ruane's study;
-- I think it's also consistent with other studies I've seen;
-- So maybe .008 should indeed be our best estimate of the variance of clutch talent, given all the available evidence.
I have to say, though, that I'm not still completely satisfied about the walks thing. In his reply, Andy said he checked the results without including walks, and he got approximately the same result. I guess this should satisfy me, but I'm still a bit dubious, perhaps irrationally. I wouldn't mind, as commenter Guy suggested, that we check to see if the clutch hitters also tended to be the better hitters (or the guys who IBB the most). That would help me feel better about the walk issue.
Guy also points out that Andy's .008 result doesn't actually represent the variance of talent alone -- rather, it represents all the variance other than luck. The implicit assumption in Andy's study is that it's all talent; but some of it might be other factors: park, non-random distribution of pitchers, etc. Guy suggests running the same study, but dividing the AB by day of the month instead of clutch. Assuming that day of the month is irrelevant to hitting, if we get the same .008 result, that would suggest that what Andy found was something other than clutch talent. More likely, we'd get something between .000 and .008, and we could calculate how much of the .008 is really clutch talent, and how much is other, random, things.
Both those tests would make me happier. But, until then, I guess I have to agree that the current state of the evidence is that the most reasonable estimate for the extent of clutch talent is closer to .008 than to .000.