Thursday, August 06, 2015

Can a "hot hand" turn average players into superstars?

Last post, I reviewed a study by Joshua Miller and Adam Sanjurjo that found a "hot hand" effect in the NBA's Three-Point Contest. In addition to finding a hot hand, the authors also showed how some influential previous studies improperly underestimated observed streakiness because of the incorrect way they calculated expectations. 

I agreed that the previous studies were biased, and accepted that the authors found evidence of a hot hand in the three-point contest. But I was dubious that you can use that evidence to assume a hot hand in anything other than a "muscle memory" situation.

Dr. Miller, in comments on my post and follow-up e-mails, disagreed. In the comments, he wrote,


"The available evidence shows big effect sizes. Should we infer the same effect in games, given we have no known way to measure them? It is certainly a justifiable inference."

Paraphrasing Dr. Miller's argument: Since (a) the original "no hot hand" studies were based on incorrect calculations, and (b) we now have evidence of an actual hot hand in real life ... then, (c) we should shift our prior for real NBA games from "probably no hot hand" to "probably a significant hot hand."

That's a reasonable argument, but I still disagree.

------

There are two ways you can define a "hot hand":

1. Sometimes, players have higher talent ("talent" means expected future performance) than other times. In other words, some days they're destined to be "hot," better than their normal selves.

2. When players have just completed a streak of good performance, they are more likely to follow it with continued good performance than you'd otherwise expect.

Call (1) the "hot hand talent" hypothesis, and (2) the "streakiness" hypothesis. Each implies the other -- if you have "good days," your successes will be concentrated among those good days, so you'll look streaky. Conversely, if your expectation is to exhibit streakiness, you must be "better in talent" after a streak than after a non-streak.

I think the two definitions are the same thing, under certain other reasonable assumptions. At worst, they're *almost* the same thing.

However, we can observe (2), but not (1). That's why "hot hand" studies, like Miller/Sanjurjo, have to concentrate on streaks.

----

The problem is: it takes a *lot* of variation in talent (1) to produce just a *tiny bit* of observed streakiness (2). 

Observed streakiness is a very, very weak indicator of a varation in talent. That's because players also go on streaks for a lot of other reasons than that they're actually "hot" -- most importantly, luck.

In the three-point contest study, the authors found an average six percentage point increase in hit rate after a sequence of three consecutive hits, from about 53 percent to 59 percent. As Dr. Miller points out, the actual increase in talent when "hot" must be significantly higher -- because not all players who go HHH are necessarily having a hot hand. Some are average, or even "cold," and wind up on a streak out of random luck. 

If only half of "HHH" streaks are from players truly hot at the time, the true "hot hand" effect would have to be double what's observed, or 12 percentage points.

Well, 12 points is huge, by normal NBA standards. I can see it, maybe, in the context of muscle memory, like the uncontested, repeated shots in the Miller/Sanjurjo study -- but not in real life NBA action.

What if there were a 12-point "hot hand" effect in, say, field goal percentage in regular NBA games? Well, for all NBA positions, as far as I can tell, the difference between average and best is much less than 12 points. That would mean that when an average player is +12 points "hot," he'd be better than the best player in the league. 

Hence my skepticism. I'm willing to believe that a hot hand exists, but NOT that it's big enough to turn an average player into a superstar. That's just not plausible.

------

Suppose you discover that a certain player shoots 60% when he's on a three-hit streak, and 50% other times. How good is he when he's actually hot? Again, he's not "hot" every time he's on a streak, because streaks happen often just by random chance. So, the answer depends on *how often* he's hot. You need to estimate that before you can answer the question. 

Let's suppose we think he's hot, say, 10 percent of the time.

So, to restate the question as a math problem:


"Joe Average is normally a 50 percent shooter, but, one time in ten, he is "hot", with a talent of P percent. You observe that he hits 60% after three consecutive successes. What's your best estimate of P?"

The answer: about 81 percent.

An 81 percent shooter will make HHH about 4.25 times as often as a 50 percent shooter (that's 81/50 cubed). That means that Joe will hit 4.25 streaks per "hot" game for every one streak per "normal" game.

However: Joe is hot only 1/9 as often as he is normal (10% vs. 90%). Therefore, instead of 425 "hot" HHH for every 100 "regular" HHH, he'll have 425 "hot" HHH for every *900* "regular" HHH.

Over 1325 shots, he'll be taking 425 shots with an expectation of 81 percent, and 900 shots with an expectation of 50 percent. 

Combined, that works out to 794-for-1325, which is the observed 60%.

Do you really want to accept that the "hot hand" effect turns an ordinary player into an 81-percent shooter? EIGHTY-ONE PERCENT? 

But that's what the assumptions imply. If you argue that:

-- player X is 50% normally;
-- player X is "hot" 10 percent of the time;
-- player X is expected to hit 60% after HHH

Then, it MUST FOLLOW that

-- player X is 81% when "hot".

To which I say: no way. I say, nobody is an 81% shooter, ever -- not Michael Jordan, not LeBron James, nobody. 

To posit that the increase from 50% to 60% is reasonable, you have to assume that an average player turns into an otherworldly Superman one day in ten, due to some ineffable psychological state called "hotness."  

-----

You can try tweaking the numbers a bit, if you like. What if a player is "hot" 25 percent of the time, instead of 10 percent? In that case,

-- player X is 71% when "hot".

That's not as absurd as 80%, but still not very plausible. What if a player is "hot" fully half the time? Now,

-- player X is 64.6% when "hot". 

That's *still* not plausible. Fifteen points is still superstar territory. Do you really want to argue that half the time a player is ordinary, but the other half he's Michael Jordan? And that nobody would notice without analyzing streaks?

Do you really want to assume that the variation in talent within a single player is wider than the variation of talent among all players?.

-----

Let's go the other way, and start with an intuitive prior for what it might mean to be "hot." My gut says, at most, maybe half an SD of league talent. You can go from 50th to 70th percentile when everything is lined up for you -- say, from the 15th best power forward in the league,  to the 9th best. Does that sound reasonable?

In the NBA context, let's call that ... I have no idea, but let's guess five percentage points.* And let's say a player is "hot" one time in five. 

(* A reader wrote me that five percentage points is a LOT more than half an SD of talent. He's right; my bad. Still, that just makes this part of the argument even stronger.)

So: if one game in five, you were a 55% shooter instead of 50%, what would you hit after streaks?

-- For 1000 "hot" shots, you'd achieve HHH 166 times, and hit 91.3 of the subsequent shots.

-- For 4000 "regular" shots, you'd achieve HHH 500 times, and hit 250 of the subsequent shots.

Overall, you'd be 341.3 out of 666, or 51.25%.

In other words: a hot hand hypothesis that posits a reasonable (but still significant) five-point talent differential expects you're only 1.25 percentage points better after a streak. 

Well, you need a pretty big dataset to make 1.25 points statistically significant. 30,000 attempts would do it: 6000 when "hot" and 24,000 when not hot.*

(* That's using binomial approximation, which underestimates the randomness, because the number of attempts isn't fixed or independent of success rate. But never mind for now.)

And even if you had a sample size that big, and you found significance ... well, how can you prove it's a "hot hand"? It's only 1.25 points, which could be an artifact of ... well, a lot of things other than streakiness.

Maybe you didn't properly control for home/road, or you used a linear adjustment for opponent quality instead of quadratic. Maybe the 1.25 doesn't come from a player being hot one game in five, but, rather, the coach using him in different situations one game in five. Or, maybe, those 20 percent of games, the opposition chose to defend him in a way that gave him better shooting opportunities. 

So, it's going to be really, really hard to prove a "hot hand" effect by studying performance after streaks.

------

But maybe there are other ways to analyze the data.

1. Perhaps you could look at player streaks in general, instead of just what happens in the one particular shot after a streak. That would measure roughly the same thing, but might provide more statistical power, since you'd be looking at what happens during a streak instead of just the events at the end. 

Would that work? I think it would at least give you a little more power. Dr. Miller actually does something similar in his three-point paper, with a "composite statistic" that measures other apsects of a player's sequences.

2. Instead of just a "yes/no" for whether to count a certain shot, you could weight it by the recent success rate, or the length of the streak, or something. Because, intuitively, wouldn't you expect a player to be "hotter" after HHHHHH than HHH? Or, even, wouldn't you expect him to be hotter after HHMHMHHMHHHMMHH than HHH? 

I'm pretty sure that kind of thing has been done before, that there are studies that try to estimate the next shot from the success rate in the X previous shots, or some such.

------

But, you can't fight the math: no matter what, it still takes ten haystacks of "hot hand talent" variation to produce a single needle of "streakiness." There just isn't enough data available to make the approach work. 

Having said that ... there's a non-statistical approach that theoretically could work to prove the existence of a real-life hot hand. 

In his e-mails to me, Dr. Miller said that basketball players believe that some of them are intrinsically streakier than others -- and that they even "know" which players those were. In an experiment in one of his papers, he found that the players named as "streaky" did indeed wind up showing a larger "hot hand" effect in a subsequent controlled shooting test.

If that's the case (I haven't read that paper yet), that would certainly be evidence that something real, and observable, is happening.

And, actually, you don't need an a laboratory experiment for this. Dr. Miller believes that coaches and teammates can sense variations in talent from body language and experience. If that's the case, there must be sportswriters, analysts, and fans who can do this too.

So, here's what you do: get some funding, set up a website, and let people log on while watching live games to predict, in real time, which players are currently exhibiting a hot hand. If even one single forecaster proves to be able to consistently choose players who outperform their averages, you have your evidence.  

-----

I'd be surprised, frankly, if anyone was able to predict significant overachievement in the long run. And, I'd be shocked -- like, heart attack shocked -- if the identified "hot" players actually did perform with "Superman" increases in accuracy. 

As always, I could be wrong. If you think I *am* wrong, that the "hot hand" is even half as significant a factor in real life as it is in the three-point contest, I think this would easily be your best route to proving it.



Labels: , , , ,

Tuesday, July 21, 2015

A "hot hand" is found in the NBA three-point contest

A recent paper provides what I think is rare, persuasive evidence of a "hot hand" in a sporting event.

The NBA Three-Point Contest has been held annually since 1986 (with the exception of 1999), as part of the NBA All-Star Game event. A pair of academic economists, Joshua Miller and Adam Sanjurjo, found video recordings of those contests, and analyzed the results. (.pdf)

They found that players were significantly more likely to make a shot after a series of three hits than otherwise. Among the 33 shooters who had at least 100 shots in their careers, the average player hit 54 percent overall, but 58 percent after three consecutive hits ("HHH").  

(UPDATE: the 58 percent figure is approximate: the study reports an increase of four percentage points after HHH than after other sequences. Because the authors left out some of the shots in some of their calculations (as discussed later in this post), it might be more like 59% vs. 55%, or some such. None of the discussion to follow depends on the exact number.)

The authors corrected for two biases. I'll get to those in detail in a future post, but I'll quickly describe the most obvious one. And that is: after HHH, you'd expect a *lower than normal* hit rate -- that is, an apparent "mean-reverting hand" -- even if results were completely random. 

Why? Because, if a player hit exactly 54 of 100 shots, then, after HHH, the next shot must come out of what remains -- which is 51 remaining hits out of 97 remaining shots. That's only 52.6 percent. In other words, the hit rate not including the "HHH" must obviously be lower than the hit rate including "HHH". 

That might be easier to see if you imagine that the player hit only 3 out of 100 shots overall. In that case, the expectation following HHH must be 0 percent, not 3 percent, since there aren't enough hits to form HHHH!

After the authors corrected for this, and for the other bias they noted, the "hot hand" effect jumped from 4 percentage points to 6. 

------

UPDATE: Joshua Miller has replied to some of what follows, in the comments.  I have updated the post in a couple of places to reflect some of his responses.

------

That's a big effect, a difference of 6 percentage points. Maybe it's easier to picture this way:

Of the 33 players, 25 of them shot better after HHH than their overall rate. 

In other words, the "hot hand" beat the "mean-reverting hand" with a W-L record of 25-8. With the adjustments included, the hot hand jumps to 28-5.

------

Could the result be due to something other than a hot hand? Well, to some extent, it could be selective sampling of players.

In the contest, players shoot 25 attempts per round. To get to 100 attempts, and be included in the study, a shooter has to play at least four rounds in his career.  (By the way, here's a YouTube video of the 2013 competition.)

In any given contest, to survive to the next round, a player needs to do well in the current round. That means that players who got enough attempts were probably lucky early. That might select players who concentrated their hits in early rounds, compared to the late rounds, and create a bit of a "hot hand" effect just from that.

And I bet that's part of it ... but a very small part. Even if a player shot 60/60/50 in successive rounds, just by luck, that alone wouldn't be nearly enough to show an overall effect of 6 percentage points, or even 4, or (I think) even 1.

UPDATE: The authors control for this by stratifying by rounds, Dr. Miller replies.

------

One reason I believe the effect is real is that it makes much more intuitive sense to expect a hot hand in this kind of competition than in normal NBA play.

In each round of the contest, players shoot 5 consecutive balls from the same spot on the court, in immediate succession. That seems like the kind of task that would easily show an effect. It seems to me that a large part of this would be muscle memory -- once you figure out the shot, you just want to do exactly the same thing four more times (or however many balls you have left once you figure it out). 

After those five balls, you move to another spot on the arc for another five balls, and so on, and the round ends after you've thrown five balls from each of five locations. However, even though the locations move, the distances are not that much different, so some of the experience gained earlier might extend to the next set of five, making the hot hand even more pronounced.

There's one piece of the evidence that offers support for the "muscle memory" hypothesis. It turns out that the first two shots in each round were awful. The authors report that the first shot was made only 26 percent of the time, and the second shot only 39 percent. For the remaining twenty-three shots, the average success rate was 56 percent.

That "warm up" time is very consistent with a "muscle memory" hot hand.

-----

In fact, those first two shots were so miserable that the authors actually removed them from the dataset! If I understand the authors correctly, a player listed with 100 shots was analyzed for only 92 of those shots.

UPDATE: originally, I thought that rounds were stitched together, so removing those shots would increase observed streakiness from one round to the next. But Dr. Miller notes, in the comments, that they considered streaks within a single round only. In that case, as he notes, removing the first two shots has the effect of reducing "cold hand" streakiness, making the results more conservative.  

The removal of those shots, it seems to me, would be likely to overstate the findings a bit. The authors strung rounds together as if they were just one long series of attempts (even if they spanned different years; that seems a bit weird, that you'd say a player had a "hot hand" if he continued a 2004 streak in 2005, but never mind).

That means that when they string the last five shots of one round with the first five shots of the next, instead of something like


MHHHH MMHMH


they get 


MHHHH   HMH


which tends to create more streaks, since you're taking out shots that tend to be mostly misses, in the midst of a series of shots that tend to be mostly hits. ("M" represents a miss, as you probably gathered.)


I wonder if the significant effect the authors found would still have shown up without those omitted shots. I suspect it would have been, at least, significantly weaker. I may be wrong -- the authors showed streakiness both for hits and misses, so maybe the extra "MM" shots would have shown up in their "cold hand" numbers.


------

I bet you'd find a hot hand if you tried the equivalent contest yourself. Position a wastebasket somewhere in the room, a few feet away. Then, stay in one spot, and try to throw wads of paper into the basket. I'm guessing your first one will miss, and you'll adjust your shot, and then you'll get a bit better, and, eventually, you'll be sinking 80 to 90 percent of them. Which means, you have a "hot hand" -- once you get the hang of it, you'll be able to just repeat what you learned, which means hits will tend to follow hits.

Here's a more extreme analogy. Instead of throwing paper into a basket, you're shown a picture of a random member of the Kansas City Royals, and asked to guess his age exactly. After your guess, you're told how far you were off. And then you get another random player (which might be a repeat).

Your first time through the roster, you might get, say, 1/3 of them right. The second time through, you'll get at least 2/3 of them right -- the 1/3 from last time, and at least half the rest (now that you know how much you were off by, you only have to guess which direction). The third time through, you'll get 100%.

So, your list of attempts will look something like this (H for hit, M for miss):

MMMHMHMHMMHHHMMHHMHMMMHMHHHHMHHHMMHHHHHHHHMHHHHHHHH...

Which clearly demonstrates a hot hand.

And that's similar to what I think is happening here. 

------

The popular belief, among sportswriters and broadcasters, is that the hot hand - aka "momentum" or "streakiness" -- is real, that a team that has been successful should be expected to continue that way. But almost every study that has looked for such an effect has failed to find one.

That led to the coining of the term "hot hand fallacy" -- the belief that a momentum effect exists, when it does not. Hence the title of this paper: "Is it a Fallacy to Believe in the Hot Hand in the NBA Three Point Contest?"

So, does this study actually refute the hot hand fallacy? 

Well, it refutes it in its strongest form, which is the position that there NEVER exists a hot hand of ANY magnitude, in ANY situation. That's obviously wrong. You can prove it with the Kansas City Royals example, or ... well, you can prove it in your own life. If you score every word you misspelled as a miss, and the rest as a hit ... most of your misses are clustered early in life, when you were learning to read and write, so there's your hot hand right there.

The real "fallacy," as I see it, is not the idea that a hot hand exists at all, but the idea that it is a significant factor in predicting what's going to happen next. In most aspects of sports, the hot hand, when it does exist, is so small as to have almost no predictive value. 

Suppose a player has two kinds of days, equal and random -- "on," where he hits 60%, and "off" where he hits only 50%. That would give rise to a hot hand, obviously. But how big a hot hand? What should you predict as the chance of the player making his next shot?

Before the game, you'd guess 55% -- maybe he's on, or maybe he's off. But, now, he hits three straight shots. He has a hot hand! What do you expect now?

If my math is right, you should now expect him to shoot ... 56.3%. Not much different!

The "50/60 on/off" actually represents a huge variation in talent. The problem is that streaks are a weak indicator of whether the player is actually "on," versus whether he just had a lucky three shots. In real life, it's even weaker than a 1.3 percent indicator, because, for one thing, how do you know how long a player is "on" and how long he's "off"? I assumed a full game, but that's wildly unrealistic.

You can probably think of many reasons streakiness is a weak indicator. Here's just one more. 

The "56.3%" illustration was assuming that all shots were identical. In real life, if it's not a special case of a three-point contest ... well, when a player hits HHH, it might be evidence of a hot hand, but it also just could be that those shots were taken in easier conditions, that they were 60% shots instead of 50% shots because the defense didn't cover the shooter very well.

Real games are much more complicated and random than a three-point shooting contest. That's why I don't like the phrasing, that the authors of this NBA study found evidence of "THE hot hand effect". They found evidence of "A hot hand effect", one particular one that's large enough to show up in the contrived environment of a muscle-memory based All-Star novelty event. It doesn't necessarily translate to a regular NBA game, at least not unless you dilute it enough that it becomes irrelevant.

------

The "hot hand" issue reminds me of the "clutch hitting" issue. Both effects probably exist, but are so tiny that they're pretty much useless for any practical purposes. Academic studies fail to find statistically significant evidence, and imply that "absence of evidence" implies that no effect exists. We sabermetricians cheat a little bit, saving effort by saying there's "no effect" instead of "no effect big enough to measure."

So "no effect" becomes the consensus. Then, someone comes up with a finding that actually measures an effect -- this study for the hot hand, and "The Book" for clutch hitting. And those who never disbelieved in it jump on the news, and say, "Aha! See, I told you it exists!"  

But they still ignore effect size. 

People will still declare that their favorite hitter is certainly creating at least a win or two by driving in runs when it really counts. But now, they can add, "Because, clutch hitting exists, it's been proven!" In reality, there's still no way of knowing who the best clutch hitters are, and even if you could, you'd find their clutch contribution to be marginal.

And, now, I suspect, when the Yankees win five games in a row, the sportscasters will still say, "They have momentum! They're probably going to win tonight!" But now, they can add, "Because, the hot hand exists, it's been proven!" In reality, the effect is so attenuated that their "hotness" probably makes them a .501 expectation instead of .500 -- and, probably, even that one point is an exaggeration.  

My bet is: the "hot hand" narrative won't change, but now it will claim to have science on its side.




Labels: , , , ,