## Sunday, November 25, 2012

### Another NBA race study

A recent academic paper claims to prove that NBA coaches discriminate in favor of players of their own race, by giving them extra playing time.  (It's been mentioned in the press here and here.)

Unlike some of the other race studies I've written about, where the problems were subtle, this one is obvious.

------

The authors start by showing differences between white and black players.  From 1996-97 to 2004-05 -- the seasons covered -- the black players performed better than the white players.  In the average of their previous 20 games, they scored 1.4 more points per 48 minutes, and had more assists and steals.  The white players, on the other hand, committed fewer turnovers, and grabbed more rebounds.

The white players' advantages seem smaller, and that's borne out by the fact that the black players got 4.8 minutes more playing time per game.

So, the black players seem generally better than the white players.

Having shown that, the authors now run a regression on a whole bunch of stuff -- including performance stats -- to predict minutes played.

Before the regression, the black players got 4.8 minutes on the floor than the white players.  After the controls, though, it goes the other way: being black *decreased* playing time by 2.9 minutes.

Clearly, the regression doesn't do a particularly good job predicting minutes played.

Remember, the most basic comparison possible showed that the black players played 4.8 minutes more than the whites.  After controlling for everything in the regression, the difference is still 2.9 minutes.  Even after all that regressing, there still appear to be large unexplained differences between whites and blacks.

It's pretty obvious why this might happen: playing time isn't linear.  If you have twice the per-minute stats of Kobe Bryant, you're not going to play 80 minutes a game.  And if you have only 1/10 the stats, you're not going to play 4 minutes: you're going to be out of basketball.

So the model is very poor.  And, since whites and blacks appear to be very different in their statistical characteristics, the model is inaccurate for them in different ways.

So if the black players get 2.9 minutes less playing time than the regression thinks they should, it's probably that the model overweights the things black players do, and underweights the things white players do.

In summary, the model the authors used overpredicts for black players -- even ignoring the race of the coach.

-------

So, what happens when the authors include a dummy for the player and coach being of a different race?

Well, most coaches are white, and most players are black.  So, "white-coach/black-player" is going to be much more frequent than "black-coach/white-player".  If the ratios are 70/30 in both cases, the "different race" bucket is going to be 84 percent black players.

And we know the model overpredicts for black players.  And that's why, when the player is of a different race than the coach, he gets less playing time than the model thinks he should.  Because, 84 percent of the time, he's black, and the model is biased too high for black players.

It's not necessarily race bias at all; it's just a consequence of having a bad model.

------

In Powerpoint form:

-- the model overpredicts for black players;
-- the "different race" case is overwhelmingly black players;

and therefore,

-- the model overpredicts for the "different race" case.

That's really all that's going on here.

Labels: , ,

At Monday, November 26, 2012 9:52:00 AM,  BMMillsy said...

I'm not sure you have provided us with enough evidence that "this is all that is going on here." Though I am skeptical of the conclusion as well.

We saw the same thing in the Parsons et al. paper, in which without controls REVERSE discrimination bias was going on. But once you include all the controls, their data shows discrimination IN FAVOR of one's own race.

Now, both papers could be having the same limitations as you state here, but I don't think we can say anything about that without the data. (especially since both use OLS to model a non-linear outcome).

With that said, I always question work done by someone who puts 3-D excel graphs in an academic paper...

I also take issue with this:

"Statistical discrimination in favor of Black basketball players could also provide
an explanation for why a Black player is on the court between 4 and 5 min longer on
average per game than a White player."

They make a leap with this statement, though do qualify it later. I don't find this to be a reasonable argument if you can statistically choose at the individual level when putting a player in and know the player intimately with respect to their skill level. It seems that a bias would be straight racial preference bias if it exists.

Also, I think there is a much more interesting finding in the regression models (if the effect exists): higher population areas (i.e. more urban areas) are more likely to play black players, and *some* possibility of higher income folks preferring white players (though, this is not significant...no SEs!!! these would be helpful). Perhaps this would be an interesting future study in terms of "who is signed or drafted" by the team.