Team fatigue and the NBA playoffs, part II
Last month, a FiveThirtyEight study by Nate Silver found that, after winning their first round series 4-0, NBA teams significantly outperformed expectations in the second round, by about 3 points per game. Conversely, teams that took seven games in the first round fell well short in the second, underperforming expectations by a hefty 5.7 points per game.
The study argued that it's fatigue. Teams that sweep their series have lots of time to rest and recover, while the 4-3 teams have to jump right back in immediately.
I was skeptical of the fatigue explanation. Last week, I thought it might be just a mathematical anomaly. I posted about it, and then immediately realized that, while the analysis was correct, the effect wasn't nearly enough to explain what Nate got. In fact, it explained less than one-tenth.
So, I figured, if it's not that, maybe I can try to figure out what it really is. After a couple of days of working on simulations, I have an answer. Well, I think I have an answer for part of it, and an opinion for the other part.
First, there's the effect I talked about last post, where the expected point differential for a favorite should be artificially high because of how games are weighted. My estimate last post was kind of back-of-the-envelope, so I decided to use a simulation to get a better handle on it.
I created a conference of 15 teams, and assigned each a random point differential "talent," with a mean of zero and an SD of 4 points. Then, I played independent 82-game seasons for each, where a game was 100 possessions, two-point field goals only. After the season, I ranked the teams by W-L record, and had the top 8 make the playoffs. I paired them off 1-8, 2-7, 3-6, and 4-5, and played a first-round best-of-seven series. I ranked the four winners, paired them up 1-4 and 2-3, and played a second-round best-of-seven series.
After all that, I took the four second-round teams -- actually, close to 120,000, because I ran 30,000 repetitions -- and compared their simulated second-round point differential to their talent. I expected the teams that went 4-0 in the first round would appear to exceed their talent in the second. (If they did so, the only possible reason would be the weighting anomaly, because of the way the simulation was set up.)
They did appear to score higher than their talent, but not by much:
4 games: +0.17 points/game
5 games: +0.03
6 games: -0.03
7 games: -0.11
I found a difference of 0.28 points between the 4-0 teams and the 4-3 teams. The FiveThirtyEight study found 8.7 points. So, the logic is right, but the magnitude is nowhere near enough to explain the real-life differential.
There's another good reason you'd expect the 4-0 teams to outperform in the second round -- their first round gives us more information about the team. Specifically, the first round sweep suggests that the team is probably better than our original talent estimate.
The FiveThirtyEight study used season SRS as their estimate of talent. That's the regular-season point differential after adjusting for strength of schedule. Like any other observed performance, it's subject to randomness, and will vary from true talent.
Taking the simplified "100 possessions, 2-point attempts only" model, you can calculate that the SD of single game point differential is 14.1 points (10 times the square root of 2). For a season average, you divide that by the square root of 82, which gives 1.56. That means that even in this oversimplified model, the typical team's SRS is more than 1 point different from its true talent.
Some teams' SRSses are underestimates, and some are overestimates. The teams that go 4-0 are now more likely to be underestimates. So, you'd expect them to outperform in the next round.
To check, I re-ran the simulation, but, this time, instead of checking whether 4-0 teams performed better than their talent, I checked whether they performed better than their 82-game SPS. As expected, they did. But the effect was still pretty small:
4 games: +0.22 points/game
5 games: +0.28
6 games: -0.09
7 games: -0.35
We're up to 0.58 points difference between 4-0 and 4-3, still far short of FiveThirtyEight's finding of 8.7 points.
However: the simulation is still missing some hidden talent variation. For one thing, it assumes team talent is exactly the same every game. But that's not the case. Aside from home court advantage (which I ignored, because it wouldn't change the results much), there's things like injuries, trades, changes in player talent as they learn, and so forth.
For instance: if a team acquires a 2-point star player halfway through the season, a single SRS will blend the "before" and "after." As a result, the overall SRS for the year will be 1 point short of playoff reality.
To simulate that, I introduced a "playoff variation" factor. For each of the eight playoff teams, I tweaked their talent by a random number of points, mean 0 and SD 1. The results:
4 games: +0.39 points/game
5 games: +0.17
6 games: -0.08
7 games: -0.38
Larger again. Now, the difference is up to 3/4 of a point.
When I up the "playoff variation" to have an SD of 2, it gets bigger still:
4 games: +0.75 points/game
5 games: +0.17
6 games: -0.19
7 games: -0.63
Now, we're up to 1.38 points. Still well short of 8.7, but enough that we would be able to say that this is at least *part* of what the FiveThirtyEight study found.
This suggests that, if fatigue isn't the explanation, maybe it has something to do with differences between SRS and actual talent.
Well, I finished writing all of that, and then I thought, hey, we have an easy way to figure out how well SRS estimates talent -- the Vegas betting line! So I went to check on those, and now I'm writing the rest of this post two days later.
In the FiveThirtyEight study, which covered 2003-2013, there were 17 teams that went 4-0 in the first round.
In 2013, the Heat took on the Bulls in the second round after sweeping the first. SRS ratings had the Heat at 7.04 point-per-game favorites over the Bulls. But the Vegas line had them at 9.5 points better. (The betting line was 13 points, and I subtracted 3.5 for home court advantage.)
So, we can say Vegas estimated the Heat as 2.46 points better (relative to Chicago) than their SRS estimate. I'll chart that like this:
Vegas SRS Diff
2013 Heat/Bulls +9.5 +7.04 +2.46
Notes: (a) The Vegas numbers may vary, because I used more than one site, and they might vary by a half point here and there. (b) For all Vegas lines, I looked only at the first game of the series. (c) It's conventional to write a Vegas favorite as "-9.5", but I'm going to use "+9.5" for consistency with SRS; hope that's not too confusing. (d) I used 3.5 points for home court advantage. (e) I'm going to talk about how SRS rated a single team (the one I'm talking about at the time), even though it's actually how SRS rated that team minus how SRS rated the opposing team. That's just to make things easier to read.
Here's all seventeen of the 4-0 teams:
4-0 teams Vegas SRS Diff
2013 Heat/Bulls +9.5 +7.04 +2.46
2013 Spurs/Warriors +6 +5.35 +0.65
2012 Spurs/Clippers +8 +4.36 +3.64
2012 Thunder/Lakers +4.5 +5.32 -0.82
2011 Celtics/Heat -1.5 -1.93 +0.43
2010 Magic/Hawks +5.5 +2.78 +2.72
2009 Cavs/Pistons +8 +6.97 +1.03
2008 Lakers/Jazz +4.5 +0.47 +3.13
2007 Bulls/Pistons -1.5 +0.84 -2.34
2007 Pistons/Bulls +1.5 -0.84 +2.34
2007 Cavs/Nets +2.5 +4.33 -1.83
2006 Mavs/Grizzlies -1.5 -0.73 -0.77
2005 Suns/Mavs +3 +1.23 +1.77
2005 Heat/Wizards +7.5 +6.48 +1.02
2004 Spurs/Lakers +1.5 +3.16 -1.66
2004 Nets/Pistons -2 -3.16 +1.16
2004 Pacers/Heat +8 +5.06 +2.94
Average +3.74 +2.81 +0.93
For those 17 series, the bookmakers rated the average favorite 0.93 points better than their SRS differential. In other words, the 4-0 teams were a point better than the FiveThirtyEight study gave them credit for. That explains roughly one point of the three points by which FiveThirtyEight found the favorites outperforming.
Does that mean the "fatigue" effect can now only be two points? Not necessarily. You could still argue that the reason for the extra Vegas point is that bookmakers and bettors *knew* about the fatigue factor, and adjusted their expectations accordingly.
But, in that case, you could also ask, why did Vegas only adjust for one point out of the three? Actually, I don't think even the one point is fatigue adjustment. There's a better explanation, in my view.
Suppose the the +0.93 was all fatigue adjustment. In that case, if we repeat the chart for the first round, the discrepancy should be zero, right? Because all teams had roughly equal rest before the first round.
It's not zero. I won't give you the full chart, but the first round average is still positive, at +0.49 points.
You could still defend fatigue. You could say, sure, maybe SRS was wrong by +0.49 points all along, but the remaining +0.44 points for the second round favorites could still be Vegas acknowledging the fatigue factor.
But, I think there's a better explanation for that +0.49 points. After the first round, we have good reason to believe those teams are better than we thought before the first round. After all , they just swept.
The 17 teams were six-point favorites, on average, in the first round. According to a simulation I did, when a +6 team wins, it wins by an average of 14 points. (When it loses, it loses by 9.5 points, but that doesn't factor in to these 4-0 series.)
If you add four +14 games to a +6 regular season, it increases the SRS from 6.00 to 6.37 -- almost exactly the +0.44 the favorites moved.
So, I think it's not fatigue that the lines were correcting for, but just new evidence for what the team talent was all along.
But we still have those remaining 2.07 points to account for. Actually, let's adjust for the mathematical anomaly effect, which is .17 points. (It's not a big adjustment, but I did all that work, dammit, and I don't want to waste it.)
That brings us down to 1.9 points. Where did those come from?
Well, it could just be luck. If the SD of a single game point differential is 14, the SD of the average of 56 independent games would be 1.87 points. In that light, the 1.9 point differential is only one SD.
Actually, it's probably a bit more than that. Two of the 17 series were actually identical, just in reverse -- when the 2007 Bulls faced the Pistons, after they both went 4-0 in the first round. Since those two teams have to cancel to zero, the effect is bigger than it looks, perhaps by a factor of the square root of 17/15. (The observed effect rises by 17/15, and the SD rises by only the square root 17/15.)
Still, it's not statistically significant by normal standards, if that's what you like to look at.
Another way of looking at the same discrepancy: in the second round, the 17 teams went a combined 50-41 against the spread. Eliminating the two duplicates, they went 44-35. That's also about one SD away from .500, which you'd expect, since the W-L and point differential are essentially two ways of looking at the same result.
It still *could* be fatigue, but I think you need better evidence than 44-35.
Now let's look at the fourteen 4-3 teams, the ones that underperformed in the second round:
4-3 teams Vegas SRS Diff
2013 Bulls/Heat -9 -7.04 -1.96
2012 Clippers/Spurs -8 -4.85 -3.15
2012 Lakers/Thunder -4 -4.48 +0.48
2010 Hawks/Magic -5.5 -2.68 -2.82
2009 Hawks/Cavs -8 -6.97 -1.03
2009 Celtics/Magic -2 +0.95 -2.95
2008 Celtics/Cavs +6 +9.84 -3.84
2007 Jazz/Warriors +0.5 +3.07 -2.57
2006 Suns/Clippers +2 +3.73 -1.73
2005 Pacers/Pistons -5 -2.82 -2.18
2005 Mavs/Suns -3 -1.23 -1.77
2004 Heat/Pacers -8 -4.80 -3.20
2003 Mavs/Kings -5 +1.22 -6.22
2003 Pistons/76ers -5.5 +1.22 -6.72
Average -3.89 -1.06 -2.73
These are much bigger SRS errors than for the 4-3 teams ... compared to the bookies, SRS overestimated the teams by an average 2.7 points. The FiveThirtyEight study found a 5.7 point difference, which leaves three points unexplained -- 2.8 points after adjusting for the mathematical anomaly.
SRS had also rated those teams higher than the bookies in the first round -- but only by 0.45 points. That's only 1/6 of the full effect. So, this time, if you believe Vegas is adjusting for fatigue, you have a better argument. (But, again: why did bettors only adjust by half the observed effect?)
The 2.8-point shortfall, relative to Vegas, resulted in those teams going 29-45 against the spread. That's a bit less than 2 SDs below .500. (Actually, by "30 points equals one win," 2.8 points works out to 30-44. So there's one game Pythagorean error.)
So we have 44-35 for the sweeping teams, and 29-45 for the seven-game teams. Even if they aren't significant individually, doesn't the *combination* of the two suggest something real is going on?
Not as much as it seems, because the 4-0 and 4-3 results aren't independent. The 4-0 teams played the 4-3 teams some of the time. So, some of the extreme results are counted in both samples.
In 2010, the Magic went 4-0 in the first round, while the Hawks went 4-3. When they faced each other next round, Orlando absolutely crushed Atlanta, with an *average* score of 107-82. That's 22 points per game more than expected.
That shows up as a +22 for the 4-0 teams, and a -22 for the 4-3 teams. You can see those as the two most extreme dots in the FiveThirtyEight chart:
In fact, exactly half the series are independent, and half are exact mirror images, except for which column of the chart they appear in. In the chart, for every dot, you'll find a mirror image dot in one of the four columns somewhere.
If I erased the 4-3 column from the chart, you could reproduce it perfectly. You'd just find all the dots in the first three columns that aren't offset by a mirror-reflection dot, and those must be the ones that go in the last column.
You can still say, "Not only did the 4-0 teams outperform, but, also, the 4-3 teams underperformed!" But that's like saying, "Not only did the Magic score 20 more points than the Hawks last night, but the Hawks also scored 20 fewer points than the Magic!" Well, not exactly, because not every 4-0 team played a 4-3 team -- some of them played 4-1 teams and 4-2 teams. So it's only partially like that. But still enough that you have to keep it in mind.
In the FiveThirtyEight study, the traditional evidence for significance comes when they do a regression, and they find a "first round games" effect that's 3 SDs from the mean. But that's an overestimate of the significance, for the reasons discussed:
1. The series aren't independent; half are duplicates of the other half.
2. The regression doesn't adjust for the mathematical weighting anomaly, which is around 0.15 points for the first and last columns.
3. The regression doesn't adjust for SRS under/overestimating Vegas by the 0.5 points we should all be able to agree on (looking at the first round, where fatigue didn't apply).
4. The regression doesn't adjust for the fact that our estimate of team skill should change for the second round, even independent of fatigue, because of the evidence of how they played in the first round.
After all that, what's left?
1. The unexplained observed point differentials: +1.85 points for the first group, -2.8 points for the second group. Or, equivalently converted to wins against the spread: the unexplained record of 44-35 for the sweeping teams, and 29-45 for the seven-game teams.
2. The possible argument that the difference between SRS and Vegas in the second round -- after subtracting off the difference in the first round, and the new information about team talent -- might be evidence that Vegas is adjusting for fatigue.
Even without having a formal significance test for what's left, those effects seem small enough to me like they could just be random luck.
Going from data interpretation to personal opinion, here's my argument for it being just random:
(a) It's probably not significant at the 5% level, or, at best, just barely.
(b) It's rare to find a large effect that bookies and sharp bettors haven't also found;
(c) Nate said the effect didn't repeat for other rounds;
(d) This year failed to follow the pattern (after the study appeared). The 4-3 Pacers performed against the 4-0 Wizards exactly as SRS predicted. The 4-0 Heat performed against the 4-3 Nets a tiny bit worse than predicted. And the 4-3 Spurs handily beat expectations against the 4-2 Trail Blazers. (The remaining two 4-3 teams faced each other, cancelling out.)
(e) The difference is so huge that it's just implausible on its face. Even Nate doesn't really believe it: "the effects are so pronounced I don't trust them." Taking the results at face value would have made Indiana a 35% underdog to win its series against Washington, instead of a 76% favorite. (The Pacers wound up winning, 4 games to 2.)
(f) It seems implausible that a fatigue advantage would persist throughout the entire second round. The first game or two, maybe. But, the numbers are too big for that. A 3-point-per-game advantage over a 5-game series is 15 points overall ... and there's no way one team could have a 7.5 point advantage in the first two games, or a 15-point advantage in the first game.
Feel free to disagree with me.
(P.S. Credit to regular commenter GuyM for suggestions in an e-mail conversation we had. Guy was big on the "teams may be different in the playoffs from the regular season" explanation, which turns out to be the important one.)