Sabermetric Research: A new basketball free throw choking study

Monday, May 30, 2011

A new basketball free throw choking study

"Performance Under Pressure in the NBA," by Zheng Cao, Joseph Price, and Daniel F. Stone; Journal of Sports Economics, 12(3). Ungated copy here (pdf).

------

There's a new paper demonstrating evidence that NBA players choke when shooting free throws under pressure. The link to the paper is above; here's a newspaper article discussing some of the claims.

Using play-by-play data for eight seasons -- 2002-03 to 2009-10 -- the authors find that players' percentage on foul shots goes down significantly late in the game when they're behind. They present their results through a regression, but it's more obvious just by using their summary statistics. Let me show you the trend, which I got by doing some arithmetic on the cells in the paper's Table 1:

In the last 15 seconds of a game, foul shooters hit

.726 when down 1-4 points (922 attempts)
.784 when tied or up 0-4 points (5505)
.776 when up or down 5+ points (4510)

With 16-30 seconds left, foul shooters hit

.748 when down 1-4 points (727 attempts)
.775 when tied or up 1-4 points (2652)
.779 when up or down 5+ points (6174)

With 31-60 seconds left, foul shooters hit

.752 when down 1-4 points (922 attempts)
.742 when tied or up 0-4 points (1634)
.767 when up or down 5+ points (8969)

In all other situations, foul shooters hit about .751 regardless of score differential (400,000+ attempts).

------

Take a look at the first set of numbers, the "last 15 seconds" group. When down 1 to 4 points, it appears that shooters do indeed "choke," shooting almost 2.5 percentage points (.025) worse than normal. In 5+ point "blowout" situations late in the game, they shoot more than 2.5 percentage points *better* than normal.

But neither of these numbers is statistically significantly different from the overall average (which I'm guessing is about .751). The difference of .025 is about 1.7 SDs.

The real statistical significance comes when you compare the "down by 1-4" group, not to the average, but to the "5+ points" group. In that case, the difference is double: the "down 1-4" is .025 below average, and the "5+" group is .025 *above* average. The difference of .050 is now significant at about 3 SDs.

UPDATE: The above paragraphs are incorrect in one aspect. Dan Stone, one of the paper's authors, corrected me in the comments. What I didn't notice was that in Table 1, the overall free throw percentage of each group was provided. Those percentages are .779 (down 1-4 group), .795 (up 0-4 group), and .782 (5+ group). So the average for those particular players is move like .787 than .751. All three groups shot below expected, but the "down 1-4" group shot WAY below expected.

So the "down 1-4" group is, on its own, statistically significant from expected, without regard to the other two groups. My apologies for not noticing that earlier.

So, if you look only at the last 15 seconds of games, it looks like players down by 1-4 points choke significantly compared to players who are up or down by at least five points.

There are similar (but lesser) differences in the 16-30 seconds group, and the 31-60 seconds group. I haven't done the calculation, but I'm pretty sure you also get statistical significance if you combine the three groups, and compare the "down 1-4" to the "5+" group.

-------

So that's what we're dealing with: when you compare "down 1-4 late in the game" to "up or down 5+ late in the game", the difference is big enough to constitute evidence of choking. The most obvious explanation is that the foul shooters in the two groups might be different. However, that can't be the case, because the authors controlled for who the shooter was, and the results were roughly the same. Indeed, they controlled for a lot of other stuff, too: whether the previous shot was made or not, which quarter it is, whether it's a single foul shot or multiple, and so on. But even after all those controls, the results are pretty much the way I described them above.

Again, I repeat: the authors (and the data) do NOT say that the "choke" group shoots significantly worse than average. They can only say that the "choke" group is significantly worse than one specific group of players: the "don't care" group, shooting late in the game when the result is pretty much already decided, with a gap of 5+ points.

But this fits in with the authors' thesis: that the higher the leverage of the situation, the more choking you see. They later break down the "5+" group into "5-10" and "11+", and they find that even that breakdown is consistent -- the 11+ group shoots better than the (slightly) higher leverage 5-10 group. Indeed, for most of the study, they compare to "11+" instead of "5+". For some of the regressions, they post two sets of results, one relative to the "5-10 points" group, and one relative to the "11+" group. The "11+" results are more extreme, of course.

-------

As I said, the authors don't present the results the way I did above ... they have a big regression with lots of tables and results and such. The result that comes closest to what I did is the first column of their Table 5. There, they say something like,

"In the last 15 seconds of a game, a player down 1-4 points will shoot 5.8 percentage points (.058) worse than if the game were an 11+ point blowout. The SD of that is 2.1 points, so that's statistically significant at p=.01."

-------

Oh, and I should mention that the authors did try to eliminate deliberate misses, by omitting the last foul shot of a set with 5 seconds or less to go. Also, they omitted all foul shots with less than 5 minutes to go in the game (except those in the last 15/30/60 seconds that they're dealing with). I have absolutely no idea why they did that.

-------

Although the authors do mention the "down 1-4" effect above, it's almost in passing -- they spend most of their time breaking the effect down in a bunch of different ways.

The biggest effect they find is for this situation:

-- shooting a foul shot that's not the last of the set (that is, the first of two, or the first or second of three);
-- in the last 15 seconds of the game;
-- team down exactly one point.

compared to

-- shooting a foul shot that's not the last of the set (that is, the first of two, or the first or second of three);
-- in the last 15 seconds of the game;
-- score difference of 11+ points in either direction.

For that particular situation, the difference is a huge 10.8 percentage points (.108), significant at 2.5 SDs.

Also: change "down by one point" to "down by two points", and it's a 6.0 percentage point choke. Change "not the last of the set" to "IS the last of the set," and the choke effect is 6.6 points when down by 1, and 6.0 points when down by 2.

This highly specific stuff doesn't impress me that much ... if you look at enough individual cases, you're bound to find some effects that are bigger and some that are smaller. My guess is that the differences between the individual cases and the overall "down 1-4" case are probably random. However, the authors could counter with the argument that the biggest sub-effects were the ones they predicted -- the "down by 1" and "down by 2" case. On the other hand, late performance is actually *better* than blowouts when the score is tied (by around 0.2 points), a finding the authors say they didn't expect.

So my view is that maybe the "1-4 points" result is real, but I'm wary of the individual breakdowns. Especially this one: in this situation:

-- last 15 seconds of the game
-- for a visiting team
-- where the most recent foul shot was missed
-- down by 1-4 points

the player is 9.6 percentage points (.096) less likely to make the shot than

-- last 15 seconds of the game
-- for a visting team
-- where the most recent foul shot was missed
-- score 11+ points in favor of either team.

Despite the large difference in basketball terms, this one's only significant at .05.

------

Another thing about the main finding is ... we actually already knew it. Last year, I wrote about a similar study (which the authors reference) that found roughly the same thing. Here, copied from that other post, are the numbers those researchers found, for all foul shots in the last minute of games, broken down by score differential:

-5 points: -3% [percentage points]
-4 points: -1%
-3 points: -1%
-2 points: -5% (significant at .05)
-1 points: -7% (significant at .01)
+0 points: +2%
+1 points: -5% (significant at .05)
+2 points: +0%
+3 points: -1% ("also significant")
+4 points: +1%
+5 points: -1%

There are some differences in the two studies. The older study controlled for player career percentages, instead of player season percentages. It didn't control for quarter (which is why commenters suspected it might just be late-game fatigue). It didn't control for a bunch of other stuff that this newer study does. And it used only three seasons of data instead of eight.

But the important thing is: the newer study's eight seasons *include* the older study's three seasons. And so, you'd expect the results to be somewhat similar. It's possible that the three significant seasons are enough to make all eight seasons look significant, even if the other five seasons were just average.

Let's try, in a very rough way, to see if we can tease out the new study's result for those other five seasons.

In the older study, if we average the -1, -2, -3, and -4 effects, we get -3.5. So, in the last minute, down by 1-4 points, shooters choked by 3.5 percentage points.

How do we get the same number for the newer study? Well, in the top-left cell of Table 5, we get that, in the last 30 seconds and down by 1-4 points, shooters choked by 3.8 percentage points.

That's our starting point. But the new study's selection criteria are a little different from the old study's, so we need to adjust.

First, the "-3.8" in the new study is from comparing to games in which the point differential is 11 or more. The "5-10" is probably a more reasonable comparison to the previous study. The difference between "11+" and "5-10", at 30 seconds, appears to be about one percentage point (compare the second columns of Tables 3 and 4). So we'll adjust that 3.8 down to 2.8.

Second, the new study is for the last 30 seconds, while the old study is for the last minute. From earlier in this post, we see that the 31-60 difference between the "down 1-4 group" and the "5+" group is only about -1.5 percentage points. Averaging that with the -2.8 from the above step (but giving a bit more weight to the -2.8 because there were more shots there), we get to about -2.4.

So we can estimate, very roughly, that for the same calculation,

Old study (three seasons): -3.5
New study (eight seasons): -2.4

Let's assume that if the new study had confined itself to only the same three seasons as the older study, it would have come up with the same result (-3.5). In that case, to get an overall average of -2.4, the other five seasons must have averaged -1.74. That's because, if you take five seasons of -1.74, and three seasons of -3.5, you get -2.4.

So, as a rough guess, the new study found:

-3.5 -- same three seasons as the old study;
-1.7 -- five seasons the old study didn't cover;
-------------------------------------------------
-2.4 -- all eight seasons combined.

So, in the new data, this study finds only half the choke effect that the other study did. Moreover, I estimate it's only 1 SD from zero.

-------

That's for "down 1-4 points." Here's the same calculation, broken down by individual score. Here "%" means percentage point difference:

-2 points: First three: -5%. Next five: -2.3%. All eight: -3.3%.
-1 points: First three: -7%. Next five: -1.7%. All eight: -3.7%.
+0 points: First three: +2%. Next five: -1.8%. All eight: -1.0%.
+1 points: First three: -5%. Next five: -0.5%. All eight: -2.2%.
+2 points: First three: +0%. Next five: -0.3%. All eight: -1.3%.

Generally, five new seasons are closer to zero than the three original seasons. That's what you would expect if the original numbers were mostly luck.

-------

So, in summary:

-- The study finds that in the last seconds of games, players behind in close games shoot significantly worse than in blowouts.

-- In the last 30 seconds, they're maybe about 2.8 percentage points worse. In the last 15 seconds, they're maybe about 4.8 percentage points worse.
-- The effect is biggest when down by 1 in the last fifteen seconds.

-- However, they are not statistically significantly better or worse than *average,* just statistically significantly worse than blowouts (although they certainly are "basketball significantly" worse than average).

-- The effect is mostly driven by the three seasons covered in the earlier study. If you look at the other five seasons, the effect is not statistically significant (but still has the same sign).

What do you think? I'm not absolutely convinced there's a real effect overall, but yeah, it seems like it's at least possible.

However, I do think the most extreme individual breakdowns are overstated. For instance, the newspaper article says that in the last 15 seconds, down by 1, players will shoot "5 to 10 percentage points worse than normal." (They really mean "worse than 11+ blowouts," but never mind.) Given that that's the most extreme result the study found, I think it's probably a significant overestimate. I'd absolutely be willing to bet that, over the next five seasons, that the observed effect will be less than five percentage points.

--------

P.S. One last side point: the newspaper article says,

"Shooters who average 90 percent from the line performed slightly better than that under pressure, while 60 percent shooters had a choking effect twice as great as 75 percent shooters. That suggests that a lack of confidence begets less confidence, and vice versa."

This is a correct summary of what the authors say in their discussion, but I think it's wrong. The regressions that this comes from (Table 5, columns 2 and 6) don't include an adjustment for the player. So what it really means is that the 60 percent shooter will be *twice as far below the average player* as the 75 percent shooter. That makes sense -- because he's a worse shooter to begin with, even before any choke effect.

------

UPDATE: After posting this, I realized that I may have missed one aspect of the regression ... but I think my analysis here is correct if I make one additional assumption that's probably true (or close to true). I'll clarify in a future post.

Labels: basketball, choking, clutch, free throws

18 Comments:

At Tuesday, May 31, 2011 9:34:00 AM, Mike Harris said...: Any study like this that doesn't control for career percentage is immediately ignored by me.

Teams that are up 2 (for example) with 15 seconds left are expecting to get fouled, and will attempt (and usually succeed) to get the ball to their best free throw shooter. This effect will dwarf all others.
At Tuesday, May 31, 2011 11:15:00 AM, Phil Birnbaum said...: Mike,

They controlled for season percentage.
At Tuesday, May 31, 2011 3:12:00 PM, Michael K said...: Couple of other thoughts:

Do they control for post-timeout ("ice the shooter") situations?

Do they compare home-away splits (e.g. to tease out the effects of crowd antics behind the baskets)?
At Tuesday, May 31, 2011 5:17:00 PM, Phil Birnbaum said...: Timeouts is a good point ... I thought of that, but couldn't think of a reason that "down by 1" would be different that way than "up by 1".

They do control for home/road.
At Tuesday, May 31, 2011 5:26:00 PM, dan said...: Thanks for giving our paper such a careful read! I think you make some great points - in particular noting the importance of the baseline score difference (11+ vs 5-10) and that our results are ultimately quite similar to those of Worthy et al. I've got to push back on a couple things though of course. First, in your analysis of Table 1 I believe you conclude players don't shoot significantly worse than avg when down 1-4 in last 15 sec by comparing .726 to .751. But I think it would be more appropriate to compare the .726 to the 'normal' for those shooters - which is also given in Table 1, the var FTPct, which is 0.779 for down1-4 in last 15. So that indicates rough choke effect of 5 pct pts (seems better than avg shooters are likely 'selected' in those situations). Also, we control for last 15/30/60 sec with dummy var, in addition to quarter (i'm pretty sure you know this, but wasn't clear in post). And we find effects of going from non-last 15 sec to last 15 sec are small when score margin is 11+ (1 pct pt i think, 2 max). So when we find decline of 9 pct pts from 11+ score to down 1 score in last 15 sec, that means decline of 7-8 pct pts when down 1 in last 15 sec vs 11+ score in non-last 15 sec. So this is why i think the newspaper line you referred to ("5 to 10 percentage points worse than normal.") is reasonable. I also think the effect of whether previous shot is missed or not, and the specific score diff results are interesting and 'real'. I like the Worthy theory for why choking disappears when game is tied (that's the only situations shooters are thinking about going 'for win' as opposed to 'avoiding loss'); I think we should have emphasized this more. The loss aversion thing I mentioned in Oregonian piece just recently occurred to me. Anyway.. thanks again very much for your feedback!

Michael K - nope, don't control for icing (there is a paper on that for field goal kickers tho, i believe, by a psychologist at uc-san diego maybe?), and yes, we certainly look at home-away, and find that barely matters. i think it's because pressure is higher at home, but distractions greater on road, and these cancel out.
At Tuesday, May 31, 2011 5:32:00 PM, Phil Birnbaum said...: Hi, Dan,

Right, I didn't notice the FTPct column in Table 1. Absolutely, they shoot significantly worse than average in that situation.

Sorry about that, I should have noticed the first time.

I'll take a look at your other points soon, and update this post or prepare another one.

Thanks!
At Tuesday, May 31, 2011 5:59:00 PM, Michael K said...: What is the data from the new paper for (just) "down by 1" and "up by 1"? Your table from last year's study shows "down by 1" at -7% and "up by 1" at -5%. Which would be consistent with an effect that occurs in both situations.

That still doesn't explain why "up by 1" and "down by 1" would be different than "tied". But I'm not sure the "choke" explanation explains that either.
At Tuesday, May 31, 2011 6:31:00 PM, Phil Birnbaum said...: Mike,

Check the table above where "next five" is bolded.
At Tuesday, May 31, 2011 6:33:00 PM, Phil Birnbaum said...: Dan,

>"Also, we control for last 15/30/60 sec with dummy var..."

Yes, I did notice that ... I mentioned the quarter adjustment in the post.

I assume that the "lastX" and "up/down Y" coefficients by themselves are close to zero, even though you don't give them in the paper? That was my "extra assumption" in at the end of my post.
At Tuesday, May 31, 2011 6:39:00 PM, Phil Birnbaum said...: >"So this is why i think the newspaper line you referred to ("5 to 10 percentage points worse than normal.") is reasonable."

If I understand, what you're saying is that both "5 to 10 percentage points worse than normal" and "5 to 10 percentage points worse than 11+" are both correct, because of data not published in the paper.

OK, fair enough.

BTW, the main reason I'm skeptical that the "5 to 10 points" is real is that it's the biggest effect you found -- and the biggest effect is usually the least reliable.

Not that I'm disputing your coefficient -- I'm just saying I bet the coefficient is an overestimate of the real life effect.
At Tuesday, May 31, 2011 7:47:00 PM, dan said...: Mike - here are my thoughts again on lack of choking when tied:

I like the Worthy theory for why choking disappears when game is tied (that's the only situation shooters are thinking [mainly] about going 'for win' as opposed to 'avoiding loss') [which reduces pressure substantially]

Phil - just to be clear, the control for last 15 sec is distinct from quarter control. Minutes played would be an even better control for fatigue - which we don't use. But I'm confident this isn't an issue as choke effects are so much higher in last 30 sec as compared to last 31-60. Implausible that fatigue causes that.

Your understanding of my point on the 5-10 effect is correct. And I see your point on being skeptical of it since it's the largest. But as you note it is supported by theory - we expect pressure is greatest down 1, final seconds. So this isn't data mining/cherry picking (at least not in the extreme)

Thanks again for comments. Fyi i am excited about my hot hand paper as it bucks the anticonventional wisdom on that topic. Here's a link- http://people.oregonstate.edu/~stonedan/hothand.pdf
Please don't hesitate to share thoughts
At Tuesday, May 31, 2011 7:52:00 PM, Phil Birnbaum said...: Thanks, Dan! Perhaps cherry picking is not really the best way I could have phrased it ... my point is just that the most extreme finding should be treated with skepticism (at least wrt the "true" value of the coefficient), unless you have strong prior reason to expect it.

As we agree, there was at least some reason beforehand, so the point is well taken.
At Tuesday, May 31, 2011 7:54:00 PM, Phil Birnbaum said...: BTW, Dan, do you have handy the coefficients for "down1" and "last15"? Is it OK that I assumed they're close to zero? Otherwise, the "last15 X down1" interaction needs to be interpreted in the light of those other two.
At Tuesday, May 31, 2011 8:01:00 PM, dan said...: Yes, sorry, meant to address this in my last comment. On p.10 of ungated version we say 'Their coefficients are almost always less than 0.01 and
insignificant' (referring to non-interacted Last and Score vars). I hope that's good enough, I don't have exact numbers. :) I remember looking them over and confirming none were too large (i think largest magnitude was 0.02, tops)
At Tuesday, May 31, 2011 8:03:00 PM, Phil Birnbaum said...: Ah, there it is ... thanks! I should have noticed it earlier when I read the paper.
At Wednesday, June 01, 2011 12:39:00 PM, Guy said...: This is very interesting data, and does seem to suggest a "choking" effect. However, I find the dismissal of the tied game result unconvincing, especially if -- as the prior study seems to show -- there is a substantial choking effect at +1. It's also noteworthy that from 30 to 60 seconds the choke effect is apparently the same for -1/-4 and 0/+4. If choking is limited to trailing teams, why don't we see the same difference here? So I don't think the data yet supports a conclusion that choking occurs only when a team is trailing.

More generally, the categories of -1/-4 and 0/+4 seem potentially contrived to fit the outcomes in this particular dataset, rather than resting on a firm theoretical footing. I'd like to see score/time organized by the leverage/importance of the shot. This is easy to calculate: the probability of winning if the FT is made minus win probability if FT is missed. It will not be the case that all shots when team is trailing are more important than all shots when leading (for example, shot importance in last 15 seconds at +1 is certainly greater than at -4). Then, let's see how choking correlates with the objective importance of the FTA. My guess is the authors would find a correlation overall, despite the results at tie score. And that would be much more convincing evidence of a choke effect, I think.
At Wednesday, June 08, 2011 4:25:00 PM, johnr said...: Did they control for the times when the team trails in a close game and intentionally misses the free throw to try to get the rebound and an opportunity for two points?
At Wednesday, June 08, 2011 4:34:00 PM, Phil Birnbaum said...: Johnr,

Yes, they did. They eliminated all final FTs with 5 seconds or less and 2 pts down.

Sabermetric Research

Monday, May 30, 2011

A new basketball free throw choking study

18 Comments:

About Me

Previous Posts