Is the evidence against the "hot hand" flawed?
The "hot hand" effect is the purported tendency of players (or teams) to to follow success with success, and failure with failure. For instance, when a shooter makes a few free throws in a row, most basketball fans would expect that he's on a roll, and should continue to shoot better than his overall average, at least until his hot hand cools.
Generally, researchers have concluded that the evidence shows no sign of such an effect. Alan Reifman is perhaps the busiest researcher on the topic; he has a blog devoted to the hot hand, as well as a recent book (which I still mean to read and review).
However, recently, an academic paper by Dan Stone questioned the conclusion that the hot hand is largely a myth. Stone argues that even if an effect exists, it would be very hard to find in the data. His argument is along the lines of, "absence of evidence is not evidence of absence."
At "Wages of Wins," Jeremy Bretton said he wasn't convinced -- that Stone's critique was "more a quibble" than a conclusive rebuttal. Stone replied otherwise, and linked to his study, which I read.
First, and obviously, it's nearly impossible to prove that NO hot hand effect exists. If the effect is small enough, we'll never see it.
For instance, suppose that players shoot at a 75.002 percent success rate after making a free throw, but only 75.001 percent after missing. That effect would be so small that we'd never be able to disprove it. Even after a million shots, the SD of the success rate would be 0.04 percentage points -- still 40 times higher than the effect we're looking to find!
So, for the question to be meaningful, it has to be a bit more specific: is the evidence enough to disprove a *reasonably large* hot hand effect?
That depends on a lot of things. It depends what you mean by "reasonably large." It depends what assumptions you make about how hot hands work. It depends if you're looking at an individual player, or a team.
What Stone did, in his paper, is make some of those assumptions explicit. He found, under his assumptions, that it would be difficult to prove a hot hand effect if it did exist. He therefore concludes that we shouldn't be so quick to deny that it happens.
I don't disagree with his math, but I'm not sure the assumptions translate to the real world.
Normally, we talk about the hot hand in terms of what happens after a success, vs. what happens after a failure. Stone did something a bit different: he talked about what happens after a high-probability shot, vs. what happens after a low-probability shot.
That's a huge difference.
First, it requires that a player has a different probability of making his shot every time. That's an unusual assumption for this kind of study, to assume that, when Kobe Bryant gets fouled, sometimes he has an 90 percent chance of making the shot, and sometimes he has only a 65 percent chance of making it. Usually, the assumption is that his probability is constant. And that seems, to me, to be much more reasonable.
Second, it adds a lot of randomness to the model. Suppose that Kobe hits 86 percent after a hit, but only 82 percent after a miss. Under the "standard" assumption that Kobe's probability is the same every shot, we only have to check whether the two are different based on the same mean. But, under Stone's assumption, we also have to allow for the possibility that the "miss" shots were just harder, that Kobe just had a lower probability, randomly. The more randomness you add, the harder it is to find a real effect.
For instance, if Kobe hits 86 percent in 1000 shots after a hit, and 82 percent in 210 shots after a miss, that's statistically significant if you assume he's otherwise constant at 85 percent. But it's NOT statistically significant if you assume his probability varies randomly, because maybe the missed shots were "harder".
And that's why Stone finds that it's hard to find a "hot hand," even if it exists -- because he posits a model with so much randomness.
If you want to know exactly what Stone did, here it is, in small font. I'm going to use his most plausible model as an example.
Suppose the mean probability of success is 75 percent. But, suppose that rate varies randomly (I'll explain how in a minute).
Now, suppose there *is* a hot hand effect -- not in terms of consecutive successes, but in terms of *probabilities* of consecutive successes. Specifically, you take the probability of the previous shot, and regress it 90 percent to the mean, and that gives you the expected probability of the next shot.
However: after you regress to the mean, you add lots of randomness. So, after a 71 percent shot, you regress to 74 percent -- but than you randomize around 74 percent. So your next shot could be 70 percent, or 80 percent, or something else, although, on average, it'll be 74 percent.
The specifics: Start by figuring out the maximum randomness you can add without going over 100 percent or below 0 percent. (So, if the current probability is 80 percent, the maximum is 20 percent.) Then, you take 1/4 of that in either direction, and choose a random number within that range, and add it to the probability. (So, if you're at 80 percent, 1/4 of 20 is 5. You take a random number between minus 5 percent, and plus 5 percent, and add it to the 80.) That becomes your new probability.
So, on the whole, you have a better-than-average chance after a high-probability shot, and a lower-than-average chance after a low-probability shot. Therefore, there is indeed a "hot hand" effect. However, it doesn't always happen because of the randomization afterwards.
And so, Stone goes on to show that even though there is an effect, there is virtually NO chance of finding it with statistical significance, by just looking at whether shots were made or missed.
It's not that you *sometimes* see an effect -- it's that you'll almost *never* see an effect. Stone ran a series of simulations of 1,000 shots each, and found a 5 percent significance level almost exactly 5 percent of the time. That is: the effect is indistinguishable from no effect.
Stone's conclusion (paraphrased): "How can you say there's no hot hand effect, when I've shown you a scenario with a real hot hand effect, that's impossible to pick up? We should admit that are tests aren't adequate to find an effect, instead of claiming that we've debunked the possibility."
To which I say: Stone is correct, but the "hot hand" effect he assumed is very, very small by "classical" standards. That's hidden, mostly, by the fact that he bases his assumptions on probabilities, rather than successes.
What would be a reasonable "hot hand" effect that's significant in the basketball sense? Maybe, 1 percentage point? That is, after a success, you're 0.25 point better than average, but after a failure, you're 0.75 points worse than average?
Even that, you're not going to be able to find in 1,000 repetitions. The SD of success rate in 1,000 shots is 1.4 percentage points. You'd need over 7,000 shots to get the SD down to 0.5 percentage points, which would get the actual effect to 2 SD. Even then, you'd only have a 50/50 chance of finding significance.
And Stone's effect is much, much smaller than that. Because he manipulated the probabilities *based on probabilities*, instead of based on successes, it makes the effect tiny.
I ran Stone's simulation myself, for 100,000 throws. And the results were:
-- after a success, the chance of another success is 75.049 percent.
-- after a miss, the chance of success is 74.875 percent.
That is: instead of a 1 percentage point difference, which is (in my view) around the lower limit of significance in the basketball sense, Stone's model leads to a difference of only 0.174 percentage points -- about a sixth as large.
That means, in order to find the effect, you need around 250,000 shots, just to have a 50/50 chance of getting significance! And that's why Stone's simulation didn't find any effect -- he used only 1,000 shots, for a very, very small effect.
Now, I think Stone could defend his conclusion, by saying something like,
"Yes, a large hot hand effect leads to a very small observational effect. That's my entire point. There could be a decently-sized hot hand effect, but we'd never see it because it would barely show up in the game data."
And I'd agree with that. But, it all depends on how you define a "hot hand". Most of us define "hot hand" as an effect big enough to materially change our real-life performance expectations. We are looking for a *reasonably sized* effect. This doesn't qualify.
What Stone is telling us is this: if the "hot hand" effect is only big enough to produce one more basket after every 575 successes, as compared to after 575 failures, we'll never see it in the data.
Which, I think, we knew all along.