Monday, May 30, 2011

A new basketball free throw choking study

"Performance Under Pressure in the NBA," by Zheng Cao, Joseph Price, and Daniel F. Stone; Journal of Sports Economics, 12(3). Ungated copy here (pdf).

------

There's a new paper demonstrating evidence that NBA players choke when shooting free throws under pressure. The link to the paper is above; here's a newspaper article discussing some of the claims.

Using play-by-play data for eight seasons -- 2002-03 to 2009-10 -- the authors find that players' percentage on foul shots goes down significantly late in the game when they're behind. They present their results through a regression, but it's more obvious just by using their summary statistics. Let me show you the trend, which I got by doing some arithmetic on the cells in the paper's Table 1:

In the last 15 seconds of a game, foul shooters hit

.726 when down 1-4 points (922 attempts)
.784 when tied or up 0-4 points (5505)
.776 when up or down 5+ points (4510)

With 16-30 seconds left, foul shooters hit

.748 when down 1-4 points (727 attempts)
.775 when tied or up 1-4 points (2652)
.779 when up or down 5+ points (6174)

With 31-60 seconds left, foul shooters hit

.752 when down 1-4 points (922 attempts)
.742 when tied or up 0-4 points (1634)
.767 when up or down 5+ points (8969)

In all other situations, foul shooters hit about .751 regardless of score differential (400,000+ attempts).

------

Take a look at the first set of numbers, the "last 15 seconds" group. When down 1 to 4 points, it appears that shooters do indeed "choke," shooting almost 2.5 percentage points (.025) worse than normal. In 5+ point "blowout" situations late in the game, they shoot more than 2.5 percentage points *better* than normal.

But neither of these numbers is statistically significantly different from the overall average (which I'm guessing is about .751). The difference of .025 is about 1.7 SDs.

The real statistical significance comes when you compare the "down by 1-4" group, not to the average, but to the "5+ points" group. In that case, the difference is double: the "down 1-4" is .025 below average, and the "5+" group is .025 *above* average. The difference of .050 is now significant at about 3 SDs.


UPDATE: The above paragraphs are incorrect in one aspect. Dan Stone, one of the paper's authors, corrected me in the comments. What I didn't notice was that in Table 1, the overall free throw percentage of each group was provided. Those percentages are .779 (down 1-4 group), .795 (up 0-4 group), and .782 (5+ group). So the average for those particular players is move like .787 than .751. All three groups shot below expected, but the "down 1-4" group shot WAY below expected.

So the "down 1-4" group is, on its own, statistically significant from expected, without regard to the other two groups. My apologies for not noticing that earlier.

So, if you look only at the last 15 seconds of games, it looks like players down by 1-4 points choke significantly compared to players who are up or down by at least five points.

There are similar (but lesser) differences in the 16-30 seconds group, and the 31-60 seconds group. I haven't done the calculation, but I'm pretty sure you also get statistical significance if you combine the three groups, and compare the "down 1-4" to the "5+" group.

-------

So that's what we're dealing with: when you compare "down 1-4 late in the game" to "up or down 5+ late in the game", the difference is big enough to constitute evidence of choking. The most obvious explanation is that the foul shooters in the two groups might be different. However, that can't be the case, because the authors controlled for who the shooter was, and the results were roughly the same. Indeed, they controlled for a lot of other stuff, too: whether the previous shot was made or not, which quarter it is, whether it's a single foul shot or multiple, and so on. But even after all those controls, the results are pretty much the way I described them above.

Again, I repeat: the authors (and the data) do NOT say that the "choke" group shoots significantly worse than average. They can only say that the "choke" group is significantly worse than one specific group of players: the "don't care" group, shooting late in the game when the result is pretty much already decided, with a gap of 5+ points.

But this fits in with the authors' thesis: that the higher the leverage of the situation, the more choking you see. They later break down the "5+" group into "5-10" and "11+", and they find that even that breakdown is consistent -- the 11+ group shoots better than the (slightly) higher leverage 5-10 group. Indeed, for most of the study, they compare to "11+" instead of "5+". For some of the regressions, they post two sets of results, one relative to the "5-10 points" group, and one relative to the "11+" group. The "11+" results are more extreme, of course.

-------

As I said, the authors don't present the results the way I did above ... they have a big regression with lots of tables and results and such. The result that comes closest to what I did is the first column of their Table 5. There, they say something like,

"In the last 15 seconds of a game, a player down 1-4 points will shoot 5.8 percentage points (.058) worse than if the game were an 11+ point blowout. The SD of that is 2.1 points, so that's statistically significant at p=.01."

-------

Oh, and I should mention that the authors did try to eliminate deliberate misses, by omitting the last foul shot of a set with 5 seconds or less to go. Also, they omitted all foul shots with less than 5 minutes to go in the game (except those in the last 15/30/60 seconds that they're dealing with). I have absolutely no idea why they did that.

-------

Although the authors do mention the "down 1-4" effect above, it's almost in passing -- they spend most of their time breaking the effect down in a bunch of different ways.

The biggest effect they find is for this situation:

-- shooting a foul shot that's not the last of the set (that is, the first of two, or the first or second of three);
-- in the last 15 seconds of the game;
-- team down exactly one point.

compared to

-- shooting a foul shot that's not the last of the set (that is, the first of two, or the first or second of three);
-- in the last 15 seconds of the game;
-- score difference of 11+ points in either direction.

For that particular situation, the difference is a huge 10.8 percentage points (.108), significant at 2.5 SDs.

Also: change "down by one point" to "down by two points", and it's a 6.0 percentage point choke. Change "not the last of the set" to "IS the last of the set," and the choke effect is 6.6 points when down by 1, and 6.0 points when down by 2.

This highly specific stuff doesn't impress me that much ... if you look at enough individual cases, you're bound to find some effects that are bigger and some that are smaller. My guess is that the differences between the individual cases and the overall "down 1-4" case are probably random. However, the authors could counter with the argument that the biggest sub-effects were the ones they predicted -- the "down by 1" and "down by 2" case. On the other hand, late performance is actually *better* than blowouts when the score is tied (by around 0.2 points), a finding the authors say they didn't expect.

So my view is that maybe the "1-4 points" result is real, but I'm wary of the individual breakdowns. Especially this one: in this situation:

-- last 15 seconds of the game
-- for a visiting team
-- where the most recent foul shot was missed
-- down by 1-4 points

the player is 9.6 percentage points (.096) less likely to make the shot than

-- last 15 seconds of the game
-- for a visting team
-- where the most recent foul shot was missed
-- score 11+ points in favor of either team.

Despite the large difference in basketball terms, this one's only significant at .05.

------

Another thing about the main finding is ... we actually already knew it. Last year, I wrote about a similar study (which the authors reference) that found roughly the same thing. Here, copied from that other post, are the numbers those researchers found, for all foul shots in the last minute of games, broken down by score differential:

-5 points: -3% [percentage points]
-4 points: -1%
-3 points: -1%
-2 points: -5% (significant at .05)
-1 points: -7% (significant at .01)
+0 points: +2%
+1 points: -5% (significant at .05)
+2 points: +0%
+3 points: -1% ("also significant")
+4 points: +1%
+5 points: -1%

There are some differences in the two studies. The older study controlled for player career percentages, instead of player season percentages. It didn't control for quarter (which is why commenters suspected it might just be late-game fatigue). It didn't control for a bunch of other stuff that this newer study does. And it used only three seasons of data instead of eight.

But the important thing is: the newer study's eight seasons *include* the older study's three seasons. And so, you'd expect the results to be somewhat similar. It's possible that the three significant seasons are enough to make all eight seasons look significant, even if the other five seasons were just average.

Let's try, in a very rough way, to see if we can tease out the new study's result for those other five seasons.

In the older study, if we average the -1, -2, -3, and -4 effects, we get -3.5. So, in the last minute, down by 1-4 points, shooters choked by 3.5 percentage points.

How do we get the same number for the newer study? Well, in the top-left cell of Table 5, we get that, in the last 30 seconds and down by 1-4 points, shooters choked by 3.8 percentage points.

That's our starting point. But the new study's selection criteria are a little different from the old study's, so we need to adjust.

First, the "-3.8" in the new study is from comparing to games in which the point differential is 11 or more. The "5-10" is probably a more reasonable comparison to the previous study. The difference between "11+" and "5-10", at 30 seconds, appears to be about one percentage point (compare the second columns of Tables 3 and 4). So we'll adjust that 3.8 down to 2.8.

Second, the new study is for the last 30 seconds, while the old study is for the last minute. From earlier in this post, we see that the 31-60 difference between the "down 1-4 group" and the "5+" group is only about -1.5 percentage points. Averaging that with the -2.8 from the above step (but giving a bit more weight to the -2.8 because there were more shots there), we get to about -2.4.

So we can estimate, very roughly, that for the same calculation,

Old study (three seasons): -3.5
New study (eight seasons): -2.4

Let's assume that if the new study had confined itself to only the same three seasons as the older study, it would have come up with the same result (-3.5). In that case, to get an overall average of -2.4, the other five seasons must have averaged -1.74. That's because, if you take five seasons of -1.74, and three seasons of -3.5, you get -2.4.

So, as a rough guess, the new study found:

-3.5 -- same three seasons as the old study;
-1.7 -- five seasons the old study didn't cover;
-------------------------------------------------
-2.4 -- all eight seasons combined.

So, in the new data, this study finds only half the choke effect that the other study did. Moreover, I estimate it's only 1 SD from zero.

-------

That's for "down 1-4 points." Here's the same calculation, broken down by individual score. Here "%" means percentage point difference:

-2 points: First three: -5%. Next five: -2.3%. All eight: -3.3%.
-1 points: First three: -7%. Next five: -1.7%. All eight: -3.7%.
+0 points: First three: +2%. Next five: -1.8%. All eight: -1.0%.
+1 points: First three: -5%. Next five: -0.5%. All eight: -2.2%.
+2 points: First three: +0%. Next five: -0.3%. All eight: -1.3%.


Generally, five new seasons are closer to zero than the three original seasons. That's what you would expect if the original numbers were mostly luck.

-------

So, in summary:

-- The study finds that in the last seconds of games, players behind in close games shoot significantly worse than in blowouts.

-- In the last 30 seconds, they're maybe about 2.8 percentage points worse. In the last 15 seconds, they're maybe about 4.8 percentage points worse.
-- The effect is biggest when down by 1 in the last fifteen seconds.

-- However, they are not statistically significantly better or worse than *average,* just statistically significantly worse than blowouts (although they certainly are "basketball significantly" worse than average).

-- The effect is mostly driven by the three seasons covered in the earlier study. If you look at the other five seasons, the effect is not statistically significant (but still has the same sign).

What do you think? I'm not absolutely convinced there's a real effect overall, but yeah, it seems like it's at least possible.

However, I do think the most extreme individual breakdowns are overstated. For instance, the newspaper article says that in the last 15 seconds, down by 1, players will shoot "5 to 10 percentage points worse than normal." (They really mean "worse than 11+ blowouts," but never mind.) Given that that's the most extreme result the study found, I think it's probably a significant overestimate. I'd absolutely be willing to bet that, over the next five seasons, that the observed effect will be less than five percentage points.

--------

P.S. One last side point: the newspaper article says,

"Shooters who average 90 percent from the line performed slightly better than that under pressure, while 60 percent shooters had a choking effect twice as great as 75 percent shooters. That suggests that a lack of confidence begets less confidence, and vice versa."

This is a correct summary of what the authors say in their discussion, but I think it's wrong. The regressions that this comes from (Table 5, columns 2 and 6) don't include an adjustment for the player. So what it really means is that the 60 percent shooter will be *twice as far below the average player* as the 75 percent shooter. That makes sense -- because he's a worse shooter to begin with, even before any choke effect.

------

UPDATE: After posting this, I realized that I may have missed one aspect of the regression ... but I think my analysis here is correct if I make one additional assumption that's probably true (or close to true). I'll clarify in a future post.


Labels: , , ,

Monday, July 26, 2010

An NFL field goal "choking" study

In a comment to last week's post on choking in basketball, commenter Jim A. posted a link to this analysis of choking in football. It comes from a 1998 issue of "Chance" magazine, a publication of the American Statistical Association.

The paper comes to the conclusion that field-goal kickers do indeed choke under pressure.

Authors Christopher R. Bilder and Thomas M. Loughlin looked at every place kick (field goal or extra point) in the 1995 NFL season. They ran a (logit) regression to predict the probability of making the field goal, based on a bunch of criteria, like distance, altitude, wind, and so on. They designated as "clutch" all those attempts that, if successful, would have resulted in a change of lead.

I assume that kicks starting or resulting in a tie count as "change of lead" -- if so, then clutch kicks are those where the kicking team is behind by 0 to 3 points.

The authors narrowed their model down by eliminating variables that didn't appear to explain the results much. The final model had only four variables:

-- clutch
-- whether it was an extra point or a field goal
-- distance
-- distance * (dummy variable for wind > 15mph)

It turned out that clutch kicks were significantly less successful than non-clutch kicks, by an odds factor of 0.72. If, in a non-clutch situation, your odds of making a field goal were 5.45:1 (which works out to 84.5%, the overall 2008 NFL average), then, to get your clutch odds, you multiply 5.45 by 0.72. And so your corresponding odds in a clutch situation would be 3.93:1 (80%).

It's a small drop -- less than five percentage points overall -- but statistically significant nonetheless.

----

Now, to ask the usual question (albeit one the paper doesn't ask): could there be something going on other than choking? Some possibilities:

1. All attempts the study consideres "clutch" are, by definition, made by a team that's either tied or behind in the score. Wouldn't that be selection bias, since the "clutch" sample would be disproportionately comprised of teams who are, overall, a bit worse than average? Worse teams would have worse field goal kickers, which might explain the dropoff.

The paper ignores that possibility, explicitly assuming that all FG kickers are alike:

"One difference there is no difference between placekickers is that NFL-caliber placekickers are often thought of as "interchangeable parts" by teams. NFL teams regularly allow their free-agent placekickers, who are demanding more money, to leave for other teams because other placekickers are available."


That makes no sense: free-agent quarterbacks leave for other teams too, but that doesn't mean all are equal. Besides, if all placekickers were the same, those "other teams" wouldn't sign them either.

So I wonder if what's really going on is that the kickers in "clutch" situations are simply not as good as the kickers in other situations. The discrepancy seems pretty large, though, so I wonder if that effect would be enough to explain the five percentage point difference.

2. One of the other factors the authors considered was time left on the clock. It turned out to be significant, originally, but, for some reason, it was left out of the final regression.

But clutch kicks would be more likely to occur with less time on the clock. Behind by 3 points with two seconds remaining, a team would try the field goal. Behind by 5 points with two seconds remaining, the team would try for a touchdown instead.

Why does that matter? Maybe because, if there's lots of time on the clock and the team isn't forced to kick, they might not try it if conditions are unfavorable (into the wind, for instance). But with time running out, they'd have to give it a shot even if conditions were less favorable. So time-constrained kicks would have a lower success rate for reasons other than "choking".

3. The assumption in the regression is that all the coefficients are multiplicative. Perhaps that's not completely correct.

In low-wind conditions, the regression found that every yard closer to the goalposts changes your success odds to 108% of their original. And clutch changes your odds to 72% of the original. So, according to the model, going one yard closer but in a clutch situation should change your odds to 108% of 72%, or 78%.

But what if that's not the case? What if multiplying isn't strictly correct? Suppose that "clutch" makes the holder more likely to fumble the snap, by a fixed amount, and there's also an effect on the kicker that's proportional to the final probability. In that case, multiplying the two effects wouldn't be strictly correct -- only an approximation. And, therefore, the regression would give biased estimates for the coefficients. If the "distance" coefficient is biased too high, but "clutch" kicks happen to be for longer distances, that would explain a higher-than-expected failure rate.

4. The paper included kicks for extra points (PATs), which are made some 99% of the time. And there were lots of PAT attempts in the sample, even more than field goal attempts. At first I thought those could confuse the results. If there were no clutch factor, you'd expect exactly one clutch PAT to be missed. What if, by random chance, there were two instead? That would imply a large odds ratio factor for the PATs, based only on one extra miss, which wouldn't be statistically significant at all.

Could that screw up the overall results? I did a little check, and I don't think it could. I think the near-100% conversion rate for PATs is pretty much ignored by the logistic regression. But I'm not totally certain of that, so I thought I'd mention it here anyway.

5. The authors found that the odds of making a PAT were very much higher than the odds of making a field goal of exactly the same distance -- an odds ratio of 3.52. That means that if the odds of making a PAT are 100:1, the odds of making the same field goal are only 28:1.

What could be causing that difference? It could be a problem with the model, or it could be that there is indeed something different about a PAT attempt.

What could be different about a PAT attempt? Well, perhaps for an FG attempt, both teams are trying harder to avoid taking a penalty. For the defensive team, a penalty on fourth down could give the kicking team enough yards for a first down, which could turn the FG into a TD. For the offensive team, a penalty might move them out of field goal range completely. Those situations don't apply when kicking a PAT.

In clutch situations, the incentives would be different still. Suppose it's a tie game with one second left on the clock, and a 25-yard attempt coming. An offensive 10-yard penalty would hurt a fair bit: it would turn a 90% kick into an 80% kick, say. A defensive penalty wouldn't hurt as much, though: it might only turn the 90% kick into a 95% kick.

Normally, a defensive penalty hurts more than an offensive penalty: it could create a first down, rather than just a more difficult kick. But in late-game situations, an offensive penalty hurts more than a defensive penalty: it lowers the success rate by more than a defensive penalty raises it.

Therefore, in a clutch situation, could it be that FGs are intrinsically more difficult, just because the offense has to play more conservatively, but the defense can play more aggressively?


Labels: , , , ,

Friday, July 16, 2010

Do foul shooters choke in the last minute of close games?

Searching Google Scholar for studies about "choking," I came across an interesting one, a short, simple analysis of free-throw shooting in NBA games.

It's called "Choking and Excelling at the Free Throw Line," by Darrell A. Worthy, Arthur B. Markman, and W. Todd Maddox. (.pdf)

The authors looked at all free throws in the last minute of games in the three seasons from 2002-03 to 2004-05. They broke their sample down by score differential, and compared the success percentage to the players' career percentages.

They found that for most of the scores, the shooters converted fewer than expected. Here's the data as I read it off the graph (but see the PDF for yourself). The score differential is from the perspective of the shooting team, and the "%" column is actually percentage points.

-5 points: -3%
-4 points: -1%
-3 points: -1%
-2 points: -5% (significant at 5%)
-1 points: -7% (significant at 1%)
+0 points: +2%
+1 points: -5% (significant at 5%)
+2 points: +0%
+3 points: -1% ("also signficant")
+4 points: +1%
+5 points: -1%

The authors conclude that choking occurs, especially when down by 1 point.

It may not be obvious at first glance from the chart, but there's a tendency to "choke" all the way down: there are 8 negatives and only 3 positives (and the negatives are generally more extreme than the positives). Do players actually shoot worse in the last minute of close games?

----

I couldn't think of any statistical reason the results might be misleading ... but in an e-mail to me, Guy came up with a good one.

Suppose that career shooting percentage is not always a good indicator of a player's percentage that game. Maybe it varies throughout a career, somehow -- higher at peak age and lower elsewhere, or, even, increasing throughout a career. (It doesn't matter to the argument *how* it varies, just that it does.)

You might expect that the differences would just cancel out. However, the overestimated shooters would be appear in each category more than the underestimated shooters. Why? Because they would miss the first shot more often, and take a second shot *within the same score category*.

As an example, suppose two players have 75% career percentages, but, on this day, A is a 100% shooter and B is a 50% shooter. Suppose they each go to the line twice with the game tied. On their first shot, A makes two and B makes one. So far, their percentage is 75%, as expected. Perfect.

But, only B gets to take a second shot with the game still tied. He does that once, the one time in two he missed the first shot. And he makes it half the time.

So, on average, you have these guys taking five shots, and making 3.5 of them. That's 70 percent -- 5 percentage points less than the career average would suggest.

Now, the numbers I used here are not very realistic -- nobody's a 100% shooter, and hardly anyone is a 50% shooter. What if I change it to 80% and 70%?

Then, following the same logic, and if my arithmetic is correct, those two players combined would make 74.8% of their shots instead of 75%. It's still something, but not nearly enough to explain the results. Still, I really like Guy's explanation.

----

So, there you go: it does look like, for those three seasons, players shot worse in the last minute than expected. Can anyone think of an explanation, other than "choking" and luck, for why that might be the case? Has anyone done this kind of analysis for other seasons to confirm these results?


UPDATE: Maybe it's just fatigue! See comments.



Labels: , , ,