Sabermetric Research: Changing my mind on "The Book" and clutch hitting

My last post talked about the clutch study in "The Book." It turns out that study was written by Andy Dolphin, who responds in a comment at "The Book" blog here, as does co-author Mitchel Lichtman (mgl). The comments are definitely worth reading.

I had two arguments, one about statistical significance, and one about walks. To summarize them (perhaps more clearly than in the original post):

-- previous studies found no evidence of clutch hitting talent.
-- Andy's study found evidence of clutch hitting (OBA) talent with variance .008.
-- The .008 is not statistically significant only at p=.14 (14% rather than the traditional 5%). It therefore constitutes fairly weak evidence.
-- Combine that weak evidence of .008 with the previous studies that found zero, and there's still a fair bit of doubt on whether clutch hitting exists.

And also:

-- if you include intentional walks, it seems obvious that the best hitters will appear to be "clutch".
-- there is such a thing as a "semi-intentional walk".
-- generally, the players who receive IBBs will be the same ones who receive "semi" IBBs.
-- so it seems like the best hitters will appear to be "clutch" just because of those semi-intentional walks.
-- but extra semi-intentional walks is not what "clutch hitting" traditionally means;
-- and so Andy's study may not answer the same question that's being asked.

To clarify: I have no objection to anything in Andy's study itself, just as to the conclusions you can draw from its results.

Anyway, I've changed my mind; I now think that we can draw firmer conclusions from Andy's study, and lean towards his result that clutch hitting exists. I stand by my original logic; but I did another simulation, and my view of the facts has changed.

Specifically: I no longer believe that the previous studies necessarily found evidence of zero clutch hitting. I thought they did, but, on further examination, I think the Tom Ruane study gives results that are perfectly consistent with what Andy found: clutch hitting variance of .008 points of OBA.

Here's what Tom did. He found 727 players who met his cutoff for plate appearances. For each player, he found the difference between each player's "clutch" BPS (batting average plus slugging), and compared it to his "non-clutch" BPS. Then, he broke the 727 differences into categories -- 0 to 15 points clutch, 45-60 points choke, and so on.

Then, he did the same thing, but, instead of using "clutch" and "non-clutch" AB, he divided the AB in each group randomly. And so, if there is no such thing as clutch talent -- if clutch hitting is, in effect, random -- the two groups should break down exactly the same.

And they pretty much did. Here, from Tom's study, are the two groups:

-J -I -H -G -F -E -D -C -B -A A B C D E F G H I J
Real 1 3 6 5 15 21 46 76 77 115 105 88 69 41 31 13 10 1 3 1
Fake 1 2 3 7 14 26 45 70 94 109 109 92 68 45 26 14 7 3 2 1

They're very, very close. It's hard to tell because of the columns not lining up, so let me leave out the middle columns and make things easier to read:

... -J I H G .... G H I J
Real 1 3 6 5 ... 10 1 3 1
Fake 1 2 3 7 .... 7 3 2 1

Taking groups G to J, which comprise players who were at least 105 points better in the clutch, we see that there were 15 in real life, and 13 random. On the choke side of -G to J, there were again 15 in real life, and 13 random.

Here's where I made my wrong assumption: I figured that if there were any real difference between the two groups, even a small one, we'd see a much larger dispersion in the "real" row. I thought we'd see a lot more extreme values -- more than a ratio of 15:13.

I was wrong. I ran a simulation, where I ran a "fake" row, then added an extra variance of .006 points (which is what Andy found for wOBA) to simulate what a "fake" row would look like if Andy's number were real. The results were indistinguishable. Indeed, I think you could add a lot more than .006 and still not be able to see any difference in the two rows. There is just so much randomness there that any difference in talent gets washed out in this kind of comparison.

Also, Tom's results are consistent with my simulation of Andy's result. My simulation found a p-value of .14. Tom's study found that the "real" data were at the 11th percentile of the distribution of "fake" data -- a p of .11. So it seems that Tom and Andy are consistent with each other. That makes sense, because some of the data they used overlapped. Also, Tom's data didn't include walks, which calls into question my argument that the walks might be causing a large proportion of the effect.

So what we have is now:

-- Andy found an effect of .008 of OBA;
-- That's completely consistent with Tom Ruane's study;
-- I think it's also consistent with other studies I've seen;
-- So maybe .008 should indeed be our best estimate of the variance of clutch talent, given all the available evidence.

I have to say, though, that I'm not still completely satisfied about the walks thing. In his reply, Andy said he checked the results without including walks, and he got approximately the same result. I guess this should satisfy me, but I'm still a bit dubious, perhaps irrationally. I wouldn't mind, as commenter Guy suggested, that we check to see if the clutch hitters also tended to be the better hitters (or the guys who IBB the most). That would help me feel better about the walk issue.

Guy also points out that Andy's .008 result doesn't actually represent the variance of talent alone -- rather, it represents all the variance other than luck. The implicit assumption in Andy's study is that it's all talent; but some of it might be other factors: park, non-random distribution of pitchers, etc. Guy suggests running the same study, but dividing the AB by day of the month instead of clutch. Assuming that day of the month is irrelevant to hitting, if we get the same .008 result, that would suggest that what Andy found was something other than clutch talent. More likely, we'd get something between .000 and .008, and we could calculate how much of the .008 is really clutch talent, and how much is other, random, things.

Both those tests would make me happier. But, until then, I guess I have to agree that the current state of the evidence is that the most reasonable estimate for the extent of clutch talent is closer to .008 than to .000.

Labels: baseball, clutch

4 Comments:

At Monday, August 24, 2009 10:44:00 AM, Guy said...: Phil:
I think you may be conceding too much here. I like the approach that Ruane took very much, comparing real variance to "fake" (random PAs) variance. Many of the factors that could introduce variance -- home/away, pitcher quality, etc. -- are presumably common to both the real and fake samples.

However, there is at least one important distinction that applies only to the real clutch sample: good hitters will tend to have a relatively worse clutch BPS than weak hitters -- relative to their non-clutch performance -- because BPS (BA + SLG) omits walks. As Andy and MGL speculate at their blog, good hitters are in a sense penalized if you ignore walks, because the pitcher was pitching around them in many of these clutch PAs. Many of their non-BB PAs are those in which they swung at bad pitches. If you look at Ruane's over- and under-performers, you can see this is true. His clutch underperformers are much better hitters as a group: .707 BPS, compared to just .660 for the overperformers, nearly a 50-point gap. If you looked at his fake sample, I'd expect to find little or no difference in skill.

We'd have to measure how much variance this hitter-quality factor introduces, but it seems plausible to me that it accounts for most or all of the difference between Ruane's real and fake clutch samples.

The Dolphin approach has the reverse problem, which is probably smaller: that good hitters may have a disproportionate walk advantage. But as we've discussed, it also captures many other possible sources of variance other than clutch. So although the sources are different for the two studies, both may well be measuring non-clutch variance in performance.
At Monday, August 24, 2009 10:58:00 AM, Phil Birnbaum said...: Guy,

Holy crap, I hadn't noticed the difference in two Ruane charts. Good catch. The top 20 clutch hitters contained nobody with a BPS over .800, and eight players under .600. The top 20 chokers were the reverse: three players over .800 and *nobody* under .600. Clearly, that's a different caliber of players.

Now, if the good hitters are "chokier" because they're being pitched around, you could argue that it's "their fault" because they shouldn't be swinging at bad pitches. But then they might be making up for it with more walks. And there are game theory reasons why they might occasionally have to swing.

So, you're right. If you don't include walks, you get false positives on the best hitters being choky. If you DO include walks, you get "semi-false" positives on the clutch hitters being clutchy.

So maybe the question doesn't really have an answer at all ... maybe you have to start by asking "what does clutch hitting mean, anyway?"

But you've convinced me, I think ... if you have to choose between the two methods, I'd choose Andy's. Because, if good hitters are really trading hits for walks, that's part of their clutch ability. On the other hand, whether to include "semi-intentional walks" is debatable, much more so than ignoring paid-for walks.

Still, now you've got me thinking that the question has to be answered in other ways ...
At Monday, August 24, 2009 11:09:00 AM, Phil Birnbaum said...: OK, maybe it's like this: at different points in the game, there's a tradeoff between throwing good pitches and trying to paint the corners. Also, that tradeoff is different for every player.

Those differences lead to different expected BA, OBP, and SLG clutch differences for every player, even if they are ALL indistinguishable in clutch talent vs. non-clutch talent.

So how do you check? Especially when you're looking for such a very small possible effect?

I guess what you can do is this: first, you define "clutch" such that all players are equally likely to be clutch regardless of their skill level (so Jeter is clutch if and only if he is clutchier than the cohort of players who hits like him in non-clutch situations).

Then, you run a regression, to predict a player's clutch OBA performance difference from his other stats. Then, check the distribution of the residuals and compute the variance of the residuals.

Repeat the same thing, but where you use the regression equation above as the clutch rate, then create "fake" binomial data.

Compare the variances of the two sets of residuals. Any difference represents clutch talent.

That's complicated, though. Any other suggestions? Guy?
At Monday, August 24, 2009 5:06:00 PM, Guy said...: Phil:
That sounds like it would work. But I think the first step is to determine how much of an overall advantage, if any, power hitters have in the clutch. It may be that they gain a bit more in OBP, but fare less well in BA/SLG, and the net effect (using wOBA or OPS) is no real difference. In that case, I think you could just re-run Ruane's study using wOBA/OPS.

If good hitters DO enjoy a systemic advantage because pitchers elect to walk them, then I would think you either need to use some kind of regression-based expected clutch performance, or stratify your sample by hitter ability and separately study good, average, and weak hitters. But I'd guess the latter method would entail serious sample size problems.

<< Home

Sabermetric Research

Saturday, August 22, 2009

Changing my mind on "The Book" and clutch hitting

4 Comments:

About Me

Previous Posts