Sabermetric Research: Clutch hitting: a new study from Pete Palmer and Dick Cramer

Wednesday, July 02, 2008

Clutch hitting: a new study from Pete Palmer and Dick Cramer

Of the many excellent presentations at last weekend's SABR convention in Cleveland, one of my favorites was the study by Pete Palmer and Dick Cramer, on clutch hitting. I have to admit that the subject has been done to death (notably by Palmer and Cramer themselves). And there are probably a lot of people like Chris Jaffe, who is "sooooooo very tired of clutch hitting studies."

So this study could be accused of beating a dead horse – other studies, I think, have already convincingly shown that clutch talent doesn't exist – but, on the other hand, on a controversial issue like clutch, you can never have too much evidence.

More important, the highly-regarded "The Book" (along with a previous study by author Andy Dolphin) does believe there is some evidence for clutch. So the debate isn't completely settled.

That's why I think this study does add valuable evidence to the pile.

Anyway, many thanks to Pete and Dick, who have allowed me to post their presentation slides, and two writeups of their findings.

------

Let me start with a recap of my three favorite classic clutch studies, before getting to the new one.

(I will also point out that "clutch hitter" doesn't mean a player who hits well in the clutch – it means a hitter who performs *better* in clutch situations than normal, relative to the rest of the league.)

Dick Cramer, 1977

First, there was Dick Cramer's groundbreaking study from 1977. Dick looked at all players in the 1969 and 1970 seasons. He figured the amount by which they increased their team's win probabilities over the season, and compared that to what you'd expect a raw measure of run performance from their batting line. The difference was their observed clutchness; a clutch player would have created more wins from his raw batting statistics, because his hits would have come when they were more important.

Comparing 1969 clutchness to 1970 clutchness, Dick found an r-squared of .038 for National League players, and .055 for American League players. Dick's conclusion was that, which such a small correlation, clutch hitting was not shown to exist.

As I write this, it now occurs to me that these aren't actually that small – the r is +/- 0.2 in both cases. The study doesn't actually say if the correlation was positive or negative (Dick, if you're reading this, which was it?). Of course, if it had been a negative correlation, that would be stronger evidence.

It's this study, I think, that Bill James criticizes in his famous "Underestimating the Fog" essay (.pdf). Bill argues that Dick didn't actually prove clutch hitting doesn't exist. That's probably true, but it's pretty good evidence that, if it does exist, it's weak. Assuming the correlation is positive, it means that even a player who hit 100 points better in the clutch in 1969 would be expected to hit only 20 points better in the clutch in 1970.

Pete Palmer, 1990

In the March 1990 issue of By the Numbers (.pdf, see page 6), Pete Palmer tackled the question a different way. He noted that even if there were no such thing as clutch talent, some players would *appear* to be clutch just because of dumb luck. He then figured what the distribution should be if it were all just luck, and compared it to the actual distribution.

If the two were the same, that would be evidence that clutch hitting is nothing more than random chance. If the two were different, that would show that clutch talent actually exists, over and above the random effect.

Consider the analogy of coin flips. A fair coin would land eight consecutive heads 1 time in 256. But if 10% of coins were "clutch," with a .600 heads average, you would see eight consecutive heads about 5 times in 256 – five times as many!

So clutch hitting talent would certainly show itself if it existed in any significant quantity.

But when Pete looked at the distribution of how player's hit in the clutch, he found it was perfectly consistent with a normal distribution. For instance, out of 330 random numbers from a normal distribution, you'd expect about one of those to be more than 3 SD above or below the mean. In real life, there was indeed exactly one – Tim Raines (.352 clutch, .296 non-clutch).

If clutch hitting were indeed a real skill, there would be a lot more than just one player 3 SD from the mean.

Because Pete found no "extra" extreme results than what would be expected by chance, his conclusion was that clutch hitting didn't appear to exist.

Tom Ruane, 2005

In this exhaustive review of a few decades worth of Retrosheet data, Tom Ruane looks at all players' clutch hitting stats, runs a random simulation as if they were all non-clutch hitters, and finds the distributions match almost exactly. (The relevant section can be found by going to the study and searching for "Is The Data Random?")

His analysis is very similar to Pete's 1990 work, but with a much larger database.

Cramer and Palmer, 2008

Finally, we come to Cramer and Palmer's new study. It's a bit of a cross between Dick's 1997 study and Tom's study – it looks at 50 years' worth of Retrosheet data, but uses the "win probabilities" method.

And there are several sub-studies within it.

The first study calculated clutch performance for each of ten levels of leverage – highest clutch, with the game most on the line, all the way down to lowest clutch, with little chance of changing the outcome (like in a 15-0 ninth inning). Then, it calculated performance for 10 different random subsamples of the games (based on the date).

Comparing the two distributions, it turned out that the distribution of "clutchiness" was almost exactly the same as the distribution of "datiness". Since datiness is random, this suggests that clutchiness is no less random.

The other substudies were:

-- looking at only the 10% highest-leverage situations, there were almost exactly as many players 2 SD and 3 SD away from the mean as if clutch were random;

-- looking at clutch performance for the 897 players with at least 3000 PA in the last 50 years, the SD was about 3 runs of clutch per 500 PA. A random simulation gave 2.5 runs. Pete and Dick write that "it may be that real life variation could be a little different from the simulated value, but the two are pretty close." My take is that the 0.5 runs is fairly significant – it means the SD of clutch would be about 1.66 runs (the square root of (3 squared minus 2.5 squared)). Still, that means the top 2.5% of players would only be about 3 runs better than average.

-- rerunning Dick's year-to-year correlation experiment gave an r of .002, which is very, very close to zero, both theoretically and practically.

-- finally, for rookies first entering the league, there was no improvement from their first at-bat (when they would presumably be very nervous) to their 100th at-bat (when they should be less nervous. While this doesn't speak to the clutch issue directly, it does serve as more evidence that players' performance doesn't seem to be affected by their personal stress level.

-----

There are lots of other studies on clutch hitting that I haven't mentioned here; Cy Morong keeps an updated list of them. As I mentioned, Andy Dolphin did find evidence of significant clutch talent. "The Book," which Dolphin co-authored, found evidence of clutch talent with an SD equivalent to about 8 points of OBP. Those are the only studies I remember seeing that actually found something non-zero.

I'd be interested in seeing what Dolphin (and co-authors Tom Tango and Mitchel Lichtman) think about Pete and Dick's recent work.

-----

While I'm on the subject of clutch, when Bill's "Understimating the Fog" came out a couple of years ago, I responded with a study of my own. Bill disagreed with what I did, and we had a bit of a discussion on the SABR message board. I'm linking to it here, because I don't think it's online anywhere else.

-- Here's Bill's original "Underestimating the Fog" essay (pdf).
-- Here's my response in "By the Numbers" (pdf, see page 7).
-- Here's Bill's response to me, called "Mapping the Fog" (pdf).
-- And here's my response to Bill's response (pdf).

Labels: baseball, Bill James, clutch, SABR 38

17 Comments:

At Wednesday, July 02, 2008 9:44:00 PM, Anonymous said...: Phil

Thanks for posting all of this. James' "Mapping the Fog" link is not working.

Cy Morong
At Wednesday, July 02, 2008 9:48:00 PM, Phil Birnbaum said...: Oops, sorry about the bad link. Now fixed. Thanks, Cy!
At Thursday, July 03, 2008 11:33:00 AM, Phil Birnbaum said...: I added the words "relative to the rest of the league" to my explanation of clutch hitting. That's because players, overall, hit differently in "clutch" situations due to different pitching, defense covering runners on first base, and so on.

Thanks to Mike Emeigh for pointing out the omission.
At Thursday, July 03, 2008 1:22:00 PM, Anonymous said...: "for instance, out of 210 random numbers from a normal distribution, you'd expect about one of those to be more than 3 SD above the mean."

Phil, shouldn't that be something like 1 out of 741 (3 SD)? 99.73% of the normal curve is within 3 SD. The two tails is .0027 or 1 out of 370.4. You are referring to one of those tails, or 1 out 741. No?

MGL
At Thursday, July 03, 2008 5:02:00 PM, Phil Birnbaum said...: mgl: absolutely, my mistake ... I misquoted the Palmer study. There were 330 players, not 210, and one player 3 SD above *or below* the mean. That's 1 in 660, which is close to 1 in 741.

Sorry about that ... will fix.
At Thursday, July 03, 2008 5:15:00 PM, Phil Birnbaum said...: What I mean to say is that half of 1 in 741 is 1 in 370, which is close to 1 in 330.
At Monday, July 07, 2008 8:46:00 PM, Don Coffin said...: Hi, Phil. I had somehow missed your BTN piece, which I really like.

One comment on it. On p. 8, you find that 12 of the 14 year-to-year correlations are positive, and you comment "I have no explanation for why...the correlation is positive in 12 of the 14 years."

Assume that the year-to-year relationship is, in fact, random. Then we'd actually expect about half the correlations tobe positive, an dhalf to be negative. So we could use the binomial theorem to calculate the probability of 12 of 14 observations beingpositive, when there's a 50% chance of each obseravation being positive. I don't have a binomal table handy, but it'd be easy to check.

If that probability is low, then it's evidence of weak persistence of clutch (and non-clutch) performance.

One way to check that would be to update things through the 2007 season and see if positive correlations continue to be the rule, or whether the data are now showing a greater mix of positive and negative correlations.
At Monday, July 07, 2008 9:22:00 PM, Don Coffin said...: Having now read through everything, let me add something.

For us to have evidence that clutch performance exists, the magnitude of "clutch performance" has to be large enough to be observable--and meaningful. Suppose the (average) magnitude of positive "clutch performance" is 5 points of OPS. Remember, I'm assuming that there is a real, positive effect. Is it large enough to be meaningful?

The meajor league mean OPS, to date in 2008, is about 0.740. Assume that 20% of plate appearances are "clutch," and 80% are not, and that OPSClutch = (OPSNonClutch + .005). If I can still do arithmetic, then OPCNonClutch = 0.739 and OPSClutch = 0.744. I don't thing there is any reasonable test that's going to identify this consistently.

What if the differential is 50 points of OPS? Then, if overall OPS = .740, OPSNonClutch = .730 and OPSClutch = .780. That's an elephant. And no one has ever seen that elephant, so far as I can determine.

(By the way, this is all very sensitive to assumptions about what percentage of plate appearances is "clutch. If it's 10%, then the differentials between "clutch" and "non-clutch" OPS become correspondingly smaller, and harder to observe in what are, by everyone's admission, noisy data.)

So what's a "reasonable" expectation of a clutch/nonclutch differential? It has to be large enough to matter and large enough to be observable.

To contend that "clutch performance" exists, but it's not large enough to observe in the data may simply be another way of saying that it exists, but is too small to make a practical difference.
At Monday, July 07, 2008 9:27:00 PM, Don Coffin said...: And a final comment.

If "clutch performance" exists, why should it be limited to hitters only? What if (some)pitchers also possess "clutchiness"? And suppose managers know who these pitchers are? Wouldn't there be a tendency for managers to try to use their "clutch" pitchers against the other guy's "clutch" hitters? (There's a way to test this, btw. Ask "informed observors" to make lists of "clutch" hitters and "clutch" hitters. Then see if "clutch" hitters are more likely than non-"clutch" hitters to bat against "clutch" hitters--and, of course, conversely.)

If managers tend to match "clutch" with "clutch", isn't there a high probability that "clutchiness" cancels out? And, if it cancels out, how could we ever possibly observe it, or verify its existence? The whole thing becomes incredibly metaphysical.
At Monday, July 07, 2008 10:34:00 PM, Phil Birnbaum said...: Hi, Doc,

Yes, you could check to see if 12 out of 14 is significant binomially ... but if you look at the numbers themselves, they aren't very big. The one-tailed signficance levels of the 12 positives are:

.43, .18, .20, .22, .48, .3, .4 , .19, .41, .48, .16, and .33.

It would be strange indeed if the effect was strong enough to give 12 out of 14, but not strong enough that even one of those 12 turns out close to significance.

So I'm not sure what to think.
At Monday, July 07, 2008 10:39:00 PM, Phil Birnbaum said...: >"To contend that "clutch performance" exists, but it's not large enough to observe in the data may simply be another way of saying that it exists, but is too small to make a practical difference."

Agreed. The problem is that advocates of clutch hitting talk as if clutch hitting talent can make a big difference, but when you show them the results, they ignore the practicalities and triumphantly point out that you haven't proven that it doesn't exist at all.

Clutch is something that makes people feel good, and many resist any effort to understand it analytically.
At Monday, July 07, 2008 10:45:00 PM, Phil Birnbaum said...: >'What if (some)pitchers also possess "clutchiness"?'

I discussed this with (I think) Tango, on his blog, at one point, and we came to the conclusion that pitchers DO pitch better in the clutch. The reason being that they have to pace themselves throughout the game, and they consciously turn it on (by throwing harder, say) in critical situations.

Batters have no need to do that, of course -- they can "bat hard" all the time.

Are there pitchers who are "clutchier" even than that? It depends what you mean by clutch. Is there something to it beyond deliberately throwing better? How do you differentiate *deliberately* changing your approach from *subconsciously* changing your approach?
At Thursday, March 05, 2009 3:58:00 AM, Anonymous said...: 平常茶非常道
At Saturday, March 14, 2009 10:37:00 PM, Anonymous said...: This comment has been removed by a blog administrator.
At Friday, April 10, 2009 3:09:00 PM, Anonymous said...: This comment has been removed by a blog administrator.
At Friday, April 10, 2009 3:09:00 PM, Anonymous said...: This comment has been removed by a blog administrator.
At Wednesday, July 01, 2009 9:55:00 PM, Anonymous said...: This comment has been removed by a blog administrator.

Sabermetric Research

Wednesday, July 02, 2008

Clutch hitting: a new study from Pete Palmer and Dick Cramer

17 Comments:

About Me

Previous Posts