Another NBA race study
A recent academic paper claims to prove that NBA coaches discriminate in favor of players of their own race, by giving them extra playing time. (It's been mentioned in the press here and here.)
Unlike some of the other race studies I've written about, where the problems were subtle, this one is obvious.
The authors start by showing differences between white and black players. From 1996-97 to 2004-05 -- the seasons covered -- the black players performed better than the white players. In the average of their previous 20 games, they scored 1.4 more points per 48 minutes, and had more assists and steals. The white players, on the other hand, committed fewer turnovers, and grabbed more rebounds.
The white players' advantages seem smaller, and that's borne out by the fact that the black players got 4.8 minutes more playing time per game.
So, the black players seem generally better than the white players.
Having shown that, the authors now run a regression on a whole bunch of stuff -- including performance stats -- to predict minutes played.
Before the regression, the black players got 4.8 minutes on the floor than the white players. After the controls, though, it goes the other way: being black *decreased* playing time by 2.9 minutes.
Clearly, the regression doesn't do a particularly good job predicting minutes played.
Remember, the most basic comparison possible showed that the black players played 4.8 minutes more than the whites. After controlling for everything in the regression, the difference is still 2.9 minutes. Even after all that regressing, there still appear to be large unexplained differences between whites and blacks.
It's pretty obvious why this might happen: playing time isn't linear. If you have twice the per-minute stats of Kobe Bryant, you're not going to play 80 minutes a game. And if you have only 1/10 the stats, you're not going to play 4 minutes: you're going to be out of basketball.
So the model is very poor. And, since whites and blacks appear to be very different in their statistical characteristics, the model is inaccurate for them in different ways.
So if the black players get 2.9 minutes less playing time than the regression thinks they should, it's probably that the model overweights the things black players do, and underweights the things white players do.
In summary, the model the authors used overpredicts for black players -- even ignoring the race of the coach.
So, what happens when the authors include a dummy for the player and coach being of a different race?
Well, most coaches are white, and most players are black. So, "white-coach/black-player" is going to be much more frequent than "black-coach/white-player". If the ratios are 70/30 in both cases, the "different race" bucket is going to be 84 percent black players.
And we know the model overpredicts for black players. And that's why, when the player is of a different race than the coach, he gets less playing time than the model thinks he should. Because, 84 percent of the time, he's black, and the model is biased too high for black players.
It's not necessarily race bias at all; it's just a consequence of having a bad model.
In Powerpoint form:
-- the model overpredicts for black players;
-- the "different race" case is overwhelmingly black players;
-- the model overpredicts for the "different race" case.
That's really all that's going on here.