Sabermetric Research: Jonathan Gibbs study unconvincing on NBA point shaving

The Tim Donaghy NBA scandal has reignited interest in point shaving in general. In that light, a bunch of sources – a New York Times column, the Freakonomics blog, and TWOW, to name three – have mentioned a new study by Jonathan Gibbs, an undergraduate at Stanford. ESPN.com has an excellent overview article on its TrueHoop blog.

The study is called "Point Shaving in the NBA: An Economic Analysis of the National Basketball Association's Point Spread Betting Market." Gibbs says he found reasonable grounds to suspect point shaving, and everyone (that I've read) seems to believe that the study does indeed show good evidence.

I disagree. Let me start by summarizing Gibbs' results (in between dotted lines), then I'll comment.

-----

Gibbs built a database of a large number of NBA games. He found that the favorite beat the spread almost exactly half the time, 49.95%. However, the larger the spread, the less likely the favorite would cover. If you bet underdogs of 10 or more points, you'll win 52.15% of your bets. At 13 points or more, 53.04% of underdogs covered. The betting market seems to be inefficient when it comes to games between unbalanced teams.

Taking home underdogs only, it's even worse. At 10 points or more, home underdogs had a .548 winning percentage against the spread. At 13 points or more, it was an unbelievably large .718 (although in only 42 games). You can apparently make very good money by betting on home underdogs. [There's a similar effect in the NFL, also – see here.]

If you divide all games into three groups, based on the size of the spread (0-6 points, 6.5-12 points, 12.5 points and up), you find a continuum. Narrow favorites tend to cover more than 50% of the time. Medium favorites cover almost exactly 50% of the time. And, as mentioned, heavy favorites cover less than 50% of the time.

Moreover, there is an interesting "last minute" effect. After 47 minutes of play – or 46, 45, 44, and 43 -- favorites in games with small spreads have actually covered less than half the time. But in the last minute, they tend to outscore their opposition to the extent that they go from a couple of points below the spread, to a couple of points above it. That's because of the opposition's fouling strategy. If a team is up by 4 with less than a minute to play, the opposition will take deliberate fouls in order to get the ball back. On very rare occasions, the strategy works and they win. Usually, it just gives the team in the lead more points. And so, the favorite's 3 point lead often turns into 6 or 7 following the gift foul shots.

The same pattern appears for medium-spread games, but to a lesser extent – which makes sense, because fewer medium-spread games are within "fouling distance" towards the end.

But for games with large spreads, the effect goes the other way. Heavy favorites are a little bit ahead of the spread with a minute to go, but wind up behind the spread by the end of the game. The author believes this is evidence of point shaving by the favorite.

Finally, Gibbs runs a few regressions on the final score (against the spread) against a bunch of other variables. In several cases, the "point spread" variable is significant, which suggests to Gibbs that teams are aware of, and being affected by, the spread.

-----

Okay, now my comments.

First, the shift by the heavy favorites from covering the spread after 47 minutes, to not covering the spread after 48. Isn't there a more likely explanation than point shaving? Couldn't it just be normal "garbage time" strategy? Up by 15 with a minute to go, the team leading will perhaps want to seal the victory by wasting as much of the 24 seconds as it can, rather than rubbing it in by trying to score more points. The opposition has no such motivation, and they might gain a few points in the last minute on the couple of possessions they get.

Second, Gibbs is puzzled that the point spread variable should be significant in regressions. But, of course it should, because it's a proxy for team quality. Suppose team X is tied with over team Y with a minute to go. Who will win? From the information I gave, it's 50-50. But now suppose I tell you that X was a ten-point favorite over Y. Now who's more likely to win? Team X. Not because X cares about the spread, but because the spread means they're a better team than Y! Of course, it's only one minute, and so team X might only be a very slight favorite. But Gibbs' study includes an awful lot of games, which provides more than enough data for a slight advantage to become statistically significant.

Third, I think Gibbs misinterprets the results of his regressions. In one model, he tries to estimate the probability of the favorite covering by the betting line, and the five scores with 1 to 5 minutes to go. He finds that "as the ... point spread becomes ... larger, the likelihood of ... covering decreases." (And that this is evidence of shaving.)

But that assumes all dependent variables being equal – for instance, the amount of the lead (against the spread) after 47 minutes. Consider two situations where that margin is one point.

-- X is five points ahead of Y, against a 4-point spread, with one minute to go.
-- A is fifteen points ahead of B, against a 14-point spread, with one minute to go.

Which team is less likely to beat the spread? A, of course. They're 15 points up, so they’re going to kill time, let B score a few points, and win by 11. X is only 4 points up, so they can't afford to do that, so they'll try to score and increase their margin.

As Gibbs says, "the betting line is a significant determinant for whether the favored team covers." Yes, but that's only because the way he set up his regression makes it a proxy for actual margin of victory. It has nothing to do with the spread at all!

Fourth, there's something about these regressions I don't understand. Take the one just described, for instance. It turned out that all the variables were significant – margin with 5 minutes to go, margin with 4 minutes to go, margin with 3 minutes to go, margin with 2 minutes to go, and margin with 1 minute to go. Why is that? If A is ahead by 4 points with a minute to go, what difference does it make how much they led by three minutes ago? Shouldn't only the 1-minute-to-go number be significant? If the score is 93-89, why does it matter how it got that way?

I suppose you could come up with some hypothetical ... maybe if you were 10 points up before, but only 1 point up now, you might have benched the unfocused regulars who blew 9 points in three minutes, and that might make the last minute go differently. But that doesn't seem right, and I really wonder what's really going on in that regression.

Fifth, and very important: the fact that some teams beat the spread more or less often than 50% can't possibly be evidence of point shaving. Oddsmakers have studied thousands of games, and are the best in the world at what they do. If 6% of games were being fixed – or even 1% of games – the oddsmakers would have taken that into account. Even if they didn't know, or even suspect, that point shaving was happening, they would just notice that heavy favorites win by fewer points, and adjust accordingly.

Suppose that the oddsmakers figure that team X is good enough that they should beat team Y by 12 points. But, they know from studying past games that their model isn't good enough – that because of point shaving, the average is 11 points. So they drop the line to 11. Now, the corrupt player, seeing that, will try to lose by only 10. The oddsmaker figures that out, and drops the line to 10. The corrupt player now tries for 9. The oddsmaker matches, and the player drops to 8. On it goes, until the line has dropped so far that the player can just barely shave that many points without arousing suspicion. That's an equilibrium, and now the odds correct exactly for the probability of point-shaving.

So my argument is that looking at how often teams beat the spread can't possibly provide evidence of fixed levels of point-shaving. It can only show evidence of point shaving that's *unexpected* by the oddsmakers. Since Gibbs' sample of games is many seasons long, and oddsmakers are very, very good at what they do, the failure to beat the line can't automatically be attributed to cheating.

Finally, suppose you're a corrupt player trying to miss shots to come under the spread. Wouldn't you do as little as possible to ensure that result? Suppose the spread is 10, you're up by 9, and so you deliberately miss a shot with 30 seconds left. The opposition scores a three, and now you're only up by 6 with five seconds left. Wouldn’t you stop trying to lose? No matter what, the opposition is going to cover the spread. You won't risk getting caught just to lose by 6, when losing by 8 is perfectly acceptable.

And you're not going to cheat in the first or second quarter, when the game is close and you don't know how the score's going to wind up. For one thing, your team might lose outright because of your shaving. For another, the opposition might play so well that you don't have to take the risk of shaving at all.

To save your butt, you're not going to start deliberately missing until late in the fourth quarter. And even then, only when the game is close compared to the spread.

And so, if teams were regularly point shaving, you'd see an unusual shape when you plotted the results. You wouldn't expect to see the nice curve that Gibbs found. His graphs are smooth and symmetrical, just shifted left. Not only are –1 and -2 (against the spread) too high, but so is –5, and –10, and –15. But why would anyone cheat on a –15 game?

Instead of smoothness, wouldn't you expect a big hump only close to zero, at minus 1 and 2 and 3? And wouldn't all those extra games come from games that were plus 1 and plus 2 and plus 3? Gibbs claims that 6% of lopsided games may involve point shaving. That should create a strikingly huge growth immediately to the left of 0, and a similar valley immediately to the right of zero. But that's not what we see.

---

It's definitely possible that I'm missing something, especially considering that so many people have read Gibbs' paper and found it convincing. But I don't think it gives any real evidence of point shaving whatsoever.

9 comments:

Pizza CutterTuesday, July 31, 2007 11:21:00 AM
Phil, your statistical analysis (as well as the logic behind it) are dead on. It seems to me that any systematic analysis of point shaving, whether by referees or players, based on play-by-play data would run into the problem of looking for a signal in a very noisy field. (Do such PBP data exist for basketball?)

Even if there were "hits", my guess is that the sample size would be so small that the player/ref/coach could invoke plausible deniability ("Hey, I just happened to miss those three shots out of the hundreds I've missed in my career.") Now, if they were backed up with hard evidence from the FBI or IRS or whoever investigates such things, then we're talking.
zjelvehThursday, August 09, 2007 7:04:00 PM
Hi Phil,
Have you read this study? (I wrote it up on my blog here.) It involves point shaving in the NCAA, but I'd be very interested in your take. It argues that Wolfers was wrong on NCAA point shaving and that his results merely reflect the characteristics of the game. Needless to say, it has gotten very little attention. Thanks
Phil BirnbaumThursday, August 09, 2007 7:08:00 PM
Zubin,

Sure, I'll take a look. Thanks for the link!
AnonymousFriday, August 10, 2007 4:26:00 PM
Phil,

interesting comments. Do you know where play by play data might exists for the NBA so we can analyze this issue? ESPN.com's PBP contains errors in 20% of the posted games. thanks.
Phil BirnbaumFriday, August 10, 2007 4:28:00 PM
Nope, no idea. Anyone else know?
Phil BirnbaumFriday, August 10, 2007 5:20:00 PM
Zubin,

Done! And posted here. Thanks for the tip!
dlb8685Thursday, October 04, 2007 11:34:00 PM
I largely agree with you, except for the idea that bookies would adjust their spread down if they knew the game was fixed to be shaved. Here's why. Bookies don't care about accurately predicting the difference between two teams, their only job is to get an equal amount of money on both teams.

Say San Antonio plays Atlanta at home, and the spread is -12.5. That's not because the bookie thinks S.A. will win by 12 or 13, it's because if there are 500k people betting on the game, 250k are with the Spurs and 250k are with the Hawks. The people who benefit from the point shaving are bettors who are in the know. The bookie cleans up no matter what, and even if he did know, he'd probably do better to put his own money on Atlanta somewhere else than to move his whole line down to 9 or 10 to unbalance the betting. The second option would look awfully suspicious when S.A. won by 8 and everyone else had the game at 12.5. I hope that all makes sense.
Phil BirnbaumFriday, October 05, 2007 12:32:00 AM
It does makes sense, thanks. I was assuming that there was enough insider action to move the spread by necessity -- that is, so much money going on the underdog that the bookies have to change the odds to balance out the betting.

But if it's a smallish amount of insider wagering, I agree with you.
AnonymousWednesday, March 25, 2009 10:29:00 PM
buy wow gold,cheap wow gold,power wow power levelingworld of warcrft gold.

Pages

Tuesday, July 31, 2007

Jonathan Gibbs study unconvincing on NBA point shaving

9 comments: