Sabermetric Research: Is the evidence against the "hot hand" flawed?

Thursday, September 13, 2012

Is the evidence against the "hot hand" flawed?

The "hot hand" effect is the purported tendency of players (or teams) to to follow success with success, and failure with failure. For instance, when a shooter makes a few free throws in a row, most basketball fans would expect that he's on a roll, and should continue to shoot better than his overall average, at least until his hot hand cools.

Generally, researchers have concluded that the evidence shows no sign of such an effect. Alan Reifman is perhaps the busiest researcher on the topic; he has a blog devoted to the hot hand, as well as a recent book (which I still mean to read and review).

However, recently, an academic paper by Dan Stone questioned the conclusion that the hot hand is largely a myth. Stone argues that even if an effect exists, it would be very hard to find in the data. His argument is along the lines of, "absence of evidence is not evidence of absence."

At "Wages of Wins," Jeremy Bretton said he wasn't convinced -- that Stone's critique was "more a quibble" than a conclusive rebuttal. Stone replied otherwise, and linked to his study, which I read.

-------

First, and obviously, it's nearly impossible to prove that NO hot hand effect exists. If the effect is small enough, we'll never see it.

For instance, suppose that players shoot at a 75.002 percent success rate after making a free throw, but only 75.001 percent after missing. That effect would be so small that we'd never be able to disprove it. Even after a million shots, the SD of the success rate would be 0.04 percentage points -- still 40 times higher than the effect we're looking to find!

So, for the question to be meaningful, it has to be a bit more specific: is the evidence enough to disprove a *reasonably large* hot hand effect?

That depends on a lot of things. It depends what you mean by "reasonably large." It depends what assumptions you make about how hot hands work. It depends if you're looking at an individual player, or a team.

What Stone did, in his paper, is make some of those assumptions explicit. He found, under his assumptions, that it would be difficult to prove a hot hand effect if it did exist. He therefore concludes that we shouldn't be so quick to deny that it happens.

I don't disagree with his math, but I'm not sure the assumptions translate to the real world.

-------

Normally, we talk about the hot hand in terms of what happens after a success, vs. what happens after a failure. Stone did something a bit different: he talked about what happens after a high-probability shot, vs. what happens after a low-probability shot.

That's a huge difference.

First, it requires that a player has a different probability of making his shot every time. That's an unusual assumption for this kind of study, to assume that, when Kobe Bryant gets fouled, sometimes he has an 90 percent chance of making the shot, and sometimes he has only a 65 percent chance of making it. Usually, the assumption is that his probability is constant. And that seems, to me, to be much more reasonable.

Second, it adds a lot of randomness to the model. Suppose that Kobe hits 86 percent after a hit, but only 82 percent after a miss. Under the "standard" assumption that Kobe's probability is the same every shot, we only have to check whether the two are different based on the same mean. But, under Stone's assumption, we also have to allow for the possibility that the "miss" shots were just harder, that Kobe just had a lower probability, randomly. The more randomness you add, the harder it is to find a real effect.

For instance, if Kobe hits 86 percent in 1000 shots after a hit, and 82 percent in 210 shots after a miss, that's statistically significant if you assume he's otherwise constant at 85 percent. But it's NOT statistically significant if you assume his probability varies randomly, because maybe the missed shots were "harder".

And that's why Stone finds that it's hard to find a "hot hand," even if it exists -- because he posits a model with so much randomness.

--------

If you want to know exactly what Stone did, here it is, in small font. I'm going to use his most plausible model as an example.

Suppose the mean probability of success is 75 percent. But, suppose that rate varies randomly (I'll explain how in a minute).

Now, suppose there *is* a hot hand effect -- not in terms of consecutive successes, but in terms of *probabilities* of consecutive successes. Specifically, you take the probability of the previous shot, and regress it 90 percent to the mean, and that gives you the expected probability of the next shot.

However: after you regress to the mean, you add lots of randomness. So, after a 71 percent shot, you regress to 74 percent -- but than you randomize around 74 percent. So your next shot could be 70 percent, or 80 percent, or something else, although, on average, it'll be 74 percent.

The specifics: Start by figuring out the maximum randomness you can add without going over 100 percent or below 0 percent. (So, if the current probability is 80 percent, the maximum is 20 percent.) Then, you take 1/4 of that in either direction, and choose a random number within that range, and add it to the probability. (So, if you're at 80 percent, 1/4 of 20 is 5. You take a random number between minus 5 percent, and plus 5 percent, and add it to the 80.) That becomes your new probability.

So, on the whole, you have a better-than-average chance after a high-probability shot, and a lower-than-average chance after a low-probability shot. Therefore, there is indeed a "hot hand" effect. However, it doesn't always happen because of the randomization afterwards.

And so, Stone goes on to show that even though there is an effect, there is virtually NO chance of finding it with statistical significance, by just looking at whether shots were made or missed.

It's not that you *sometimes* see an effect -- it's that you'll almost *never* see an effect. Stone ran a series of simulations of 1,000 shots each, and found a 5 percent significance level almost exactly 5 percent of the time. That is: the effect is indistinguishable from no effect.

Stone's conclusion (paraphrased): "How can you say there's no hot hand effect, when I've shown you a scenario with a real hot hand effect, that's impossible to pick up? We should admit that are tests aren't adequate to find an effect, instead of claiming that we've debunked the possibility."

--------

To which I say: Stone is correct, but the "hot hand" effect he assumed is very, very small by "classical" standards. That's hidden, mostly, by the fact that he bases his assumptions on probabilities, rather than successes.

What would be a reasonable "hot hand" effect that's significant in the basketball sense? Maybe, 1 percentage point? That is, after a success, you're 0.25 point better than average, but after a failure, you're 0.75 points worse than average?

Even that, you're not going to be able to find in 1,000 repetitions. The SD of success rate in 1,000 shots is 1.4 percentage points. You'd need over 7,000 shots to get the SD down to 0.5 percentage points, which would get the actual effect to 2 SD. Even then, you'd only have a 50/50 chance of finding significance.

And Stone's effect is much, much smaller than that. Because he manipulated the probabilities *based on probabilities*, instead of based on successes, it makes the effect tiny.

I ran Stone's simulation myself, for 100,000 throws. And the results were:

-- after a success, the chance of another success is 75.049 percent.
-- after a miss, the chance of success is 74.875 percent.

That is: instead of a 1 percentage point difference, which is (in my view) around the lower limit of significance in the basketball sense, Stone's model leads to a difference of only 0.174 percentage points -- about a sixth as large.

That means, in order to find the effect, you need around 250,000 shots, just to have a 50/50 chance of getting significance! And that's why Stone's simulation didn't find any effect -- he used only 1,000 shots, for a very, very small effect.

-------

Now, I think Stone could defend his conclusion, by saying something like,

"Yes, a large hot hand effect leads to a very small observational effect. That's my entire point. There could be a decently-sized hot hand effect, but we'd never see it because it would barely show up in the game data."

And I'd agree with that. But, it all depends on how you define a "hot hand". Most of us define "hot hand" as an effect big enough to materially change our real-life performance expectations. We are looking for a *reasonably sized* effect. This doesn't qualify.

What Stone is telling us is this: if the "hot hand" effect is only big enough to produce one more basket after every 575 successes, as compared to after 575 failures, we'll never see it in the data.

Which, I think, we knew all along.

Labels: hot hand, statistics

11 Comments:

At Thursday, September 13, 2012 12:35:00 PM, Alex said...: Stone's idea of the size of the hot hand might be problematic, but I do like the idea of it in theory. Instead of looking at makes and misses, look at 'good shots' versus 'bad shots'. A hot player should take shots that are more likely to go in while a cold player takes shots that are less likely to go in (not necessarily based on being open or location on the floor, but in some abstract sense).

If being hot exists, we can agree that a hot player might still just miss here and there; that gets lost when looking at makes/misses but not if you try to move it to likelihood of makes/misses.
At Thursday, September 13, 2012 6:12:00 PM, Unknown said...: Hi Phil
Thanks very much for this - appreciate how thoughtful/careful you were with the analysis (not sure if you remember, but you also wrote a nice post on my paper on free throw shooting under pressure).

I agree with much of what you say, but didn't follow this:

"What would be a reasonable "hot hand" effect that's significant in the basketball sense? Maybe, 1 percentage point? That is, after a success, you're 0.25 point better than average, but after a failure, you're 0.75 points worse than average?"

I don't know where the 0.25/0.75 come from.

Anyway, not sure that I agree that the hot hand effects in my simulation are limited to being that small. Think I discuss how probabilities vary from something like 0.2 or 0.3 to 0.7/0.8 for a 50% shooter (in the simulations), and the autoregressive parameter can be as high as 0.9. So if a 50% shooter is a 70% shooter for one shot, he could be a 68% shooter (on avg) on the next shot. But maybe you're focusing on the lower values of the AR parameter

The other point that I discuss in paper - but in last section, and maybe don't stress enough - is that even if this type of behavior is hard to pick up in data, could still be understood in real life, because we have more information then (as opposed to the info statisticians have) about when the player's in the 70% state (information on the difficulty of shots that have been taken, whether they've gone in due to luck, defense, how the shooter feels etc).

Finally i think it's important to make this distinction between probabilities and made/missed shots regardless, and i do think probabilities are the right focus. My model of shot probability evolution is not the only 1 - Jeremy Arkes' is also plausible and he finds even stronger distortions i think

Thanks again !
At Thursday, September 13, 2012 9:11:00 PM, Phil Birnbaum said...: Hi, Dan,

Thanks for the response!

The 0.25/0.75 is just my gut about how much of an effect I'd have to see to say that the effect is significant in the real-world sense. Your mileage may vary.

Right, your hot hand effects *in terms of probabilities* are reasonably large, but they don't translate much into hot hands in terms of results. That's one of my points.

The other one is about the assumption that the probabilities vary so much. Does a foul shooter really jump between 65% and 80%? Why would he? And, relatedly, how would you know if he did or not? Is there an experiment that would settle it? I can't think of one, unless you add further assumptions.

Serious question: suppose player A hits 75%, but varies randomly between a 50% and a 100% chance each shot. Player B is 75% each shot. Is there an experiment to tell which is which? I don't think there is.
At Thursday, September 13, 2012 9:11:00 PM, Phil Birnbaum said...: BTW, is Jeremy Arkes' paper online?
At Friday, September 14, 2012 10:52:00 AM, Unknown said...: I'm not sure i agree re not translating much into results - obviously a 70% shooter makes 1 more shot every 5 attempts than a 50% shooter - a substantial result. What I am trying to show that, just using shot result data, it's really hard to tell when someone is in the 70% state

Re a shooter varying from 65-80% - I am not saying this necessarily happens - but this is what would happen, if the hot hand existed, right? Ie a player would sometimes be a better (ex ante) shooter than at other times?

And your question - 'how would we know if he did' - is very similar to the question i am claiming is the interesting one (how do ex ante shot probabilities vary from shot to shot), which I am saying is very hard to answer. I think you're right that we definitely could not know if the probabilities varied, but were not correlated at all (ie the player who shoots 75% on avg, shoots 90% on 1 shot, and then is still on avg 75% for next shot). But that's not very plausible. Intuitively at least, if the probs varied a lot, i think it's likely they would be highly auto-correlated as well.

Not sure if Arkes' paper's on the web but Im sure he'd email it to you. I think it's likely to be published in J Sports Econ NAASE conference special issue
At Friday, September 14, 2012 11:39:00 AM, Phil Birnbaum said...: >"Intuitively at least, if the probs varied a lot, i think it's likely they would be highly auto-correlated as well."

But your model assumes they vary a lot and *aren't* necessarily highly auto-correlated!

So, in any case, shouldn't your null hypothesis be that they *don't* vary and *aren't* auto-correlated?
At Friday, September 14, 2012 2:11:00 PM, Unknown said...: In the analytical part, I assume they vary, but not necessarily a lot.

In the simulation, they vary a lot - but that's so that a substantial hot hand exists (when rho is high) - to address your exact concern.

Variation is necessary for autocorrelation. I'm really only interested in looking at the case of high variation and autocorrelation (seeing what the regression results are then). So not sure that case of no variation is relevant. But anyway, no autocorrelation (with or without variation) is the null hypothesis for the regressions

Question for you, what is your model of shot probabilities, assuming the hot hand exists? (and is substantial)
At Friday, September 14, 2012 6:07:00 PM, Phil Birnbaum said...: Variation is necessary for autocorrelation, but RANDOM variation (that is, not hot-hand related) is not necessary.

?"what is your model of shot probabilities, assuming the hot hand exists?"

My model is that the probability for a given player is close to constant. Actually, I'd call that my prior. I'm willing to change if evidence is forthcoming, but the idea that Kobe could be an 85% shooter one moment, and then a 65% shooter the next, seems implausible.

I do agree with you that if your model is correct -- in the sense of being close to real life -- a substantial hot hand would be difficult to show. It's just that I find such a model implausible. I could be wrong.
At Monday, September 17, 2012 5:54:00 PM, Unknown said...: The random variation thing is a red herring - not what drives the results. In Arkes model/analysis, players are either hot, cold, or not, and the same issue arises

I was asking 'given a substantial hot hand exists', so not sure your answer (probabilities are constant) is appropriate. Thanks re your last point tho. But I think if you're ruling out a substantial hot hand by assumption, that defeats the whole purpose.

Thanks again !
At Tuesday, September 18, 2012 2:53:00 PM, Phil Birnbaum said...: Well, the amount of random variation is what makes the results fail to be statistically significant, no? I'm confident that if the hot hand was a higher probability after a success, rather than a higher probability after a higher probability, it would be more likely to find significance.

I'm not ruling out a substantial hot hand by assumption. I'm just saying that a substantial hot hand, to me, means a high difference of a hit after a hit. What you posit, I argue is a not-very-substantial hot hand.
At Tuesday, March 08, 2016 5:45:00 PM, Daniel B said...: I think there are a few factors against the idea of the "hot hand" that would tend to show a meaningful negative correlation if there was no hot hand. Just for one example, the subconscious short-term human behavior trend is to guard more carefully against what happened most recently. If my man nails an open shot after I help out on a drive, then next time down the court when someone drives, I'm likely to stay a little closer and not commit as much to the help.

Also, a player making a high percentage of shots on a given night may become too confident and start taking worse shots. A player missing a lot is likely to start passing up decent looks - which usually isn't the correct reaction, but it's a natural one.

Sabermetric Research

Thursday, September 13, 2012

Is the evidence against the "hot hand" flawed?

11 Comments:

About Me

Previous Posts