Monday, April 25, 2011

Did NFL teams discriminate against black coaching candidates? Part II

I posted recently about a "Rooney Rule" study that appeared in the Journal of Sports Economics. In that paper, the authors found that, from 1990 to 2002, NFL teams with black head coaches won 1.1 games per season more than teams with white head coaches. The authors took this as evidence that the NFL was discriminating against black candidates -- hiring only the best black coaches, and not the average ones.

A few more thoughts on the issue:

1. I'm not a subject matter expert (SME) on NFL coaching, but it seems to me very, very unlikely that a sample of 29 coaches, no matter how you selected them, could be, on average, as much as 1.1 games better than average. That seems way too high. Maybe one coach could, sure, under very specific circumstances (say, if he figures out he should start Tom Brady instead of Drew Bledsoe). But the average of 29 coaches? That would be nearly impossible, wouldn't it?

And it's not like the study chose the best 29 coaches -- they chose the only 29 black coaches there were. That means the best black coaches of the 29 would have to be substantially better than 1.1 wins, season after season. That, again, seems implausible.

It's a critical question, because, if the effect is too big to be coaching, the study is no evidence at all -- it literally has zero value!

Here's the logic. If you argue that the 1.1 games is statistically significant, then you're saying that there's evidence that the teams with the black coaches are significantly different, in some way, from the teams with the white coaches. You may believe that the difference is the coach's race. But since 1.1 is too big an effect to be just the coaches, the difference must be, in part, something else. So, since there must be something else going on, you have very little basis for thinking that there's evidence that even *any part of it* is coaching. After all, whatever the "something else" is, it could be just as easily responsible for all of the 1.1 as part of it. In fact, it could be responsible for *more* than 1.1 games, and the black coaches might be *worse* than the white coaches!

If you get an effect size that couldn't possibly be what you're looking for, then all you have evidence for is that there's something else causing the effect. That means there are confounding factors your study hasn't controlled for, which means you have no evidence at all for your particular hypothesis. That doesn't mean you're wrong -- it's not that you have evidence against it, it's just that you have no evidence *for* it.

This is a little bit counterintuitive -- it means a small effect is better evidence than a large effect. If you get statistical significance with a difference of 1.1 wins, that means nothing. But if you If you get statistical significance with a difference of 0.1 wins, now at least there's a chance that you're seeing something real.

2. In a different post a while ago, I quoted Bill James on psychology:

"... in order to show that something is a psychological effect, you need to show that it is a psychological effect -- not merely that it isn't something else. Which people still don't get. They look at things as logically as they can, and, not seeing any other difference between A and B conclude that the difference between them is psychology."

After this coaching study, it occurs to me that Bill's argument holds for *any* possible cause, not just psychology. Racial bias, for instance. Editing Bill's quote:

"... in order to show that something is a racial bias effect, you need to show that it is a racial bias effect -- not merely that it isn't something else. Which people still don't get. They look at things as logically as they can, and, not seeing any other difference between A and B conclude that the difference between them is racial bias."

The typical study will spend a lot of time and paragraphs and numbers persuading you that there is evidence that A and B are different at a statistically significant level. But then they'll give you only a few sentences *about what that evidence really means*. Shouldn't it be the other way around?

It's as if you're on trial for murder, and the prosecution spends five days nailing down how many millions of dollars you stand to inherit from the victim. They call a stockbroker, a banker, a real estate agent, all of whom testify for hours about how much the guy left you in his will, down to the penny. And then, after all that, the prosecutor says to the judge, "so, obviously, the accused must have done it. We rest our case."

That's backwards. Showing that A and B are different is the easy part -- it's just regression. The hard part is figuring out *why* A and B are different. Most of the effort should go into the argument, not into the statistics.

3. A reader was kind enough to send me a similar study from "Labour Economics." It's called "Moving on up: The Rooney rule and minority hiring in the NFL," by Benjamin L. Solow, John L. Solow, and Todd B. Walker. (Here's a press release.)

The authors create a model to predict whether a "level-two" assistant coach is promoted to head coach, based on performance, age, calendar year, years of experience, and race. It turns out that race is not significant, either before or after the Rooney Rule. Nonetheless, the coefficient for "minority coach" (most are black) is slightly negative (-0.6 SD) before, and slightly positive (+0.8 SD) after.

If you choose to interpret the pre-2003 coefficient at face value, even though it's not statistically significant (which I don't recommend), it's equivalent to two extra years of high-level coaching experience.

At Monday, April 25, 2011 4:21:00 PM, Anonymous Jim Glass said...

I agree. That was my first thought on seeing the study. Winning 1.1 more games than average in an extremely competitive league where the average team wins just 8 is huge enough on its face to question credibility.

Extraordinary claims require really significant support. To back that claim up you have to identify exactly what these are coaches doing that others aren't, to get such better results. And then you have to explain why the rest of the league doesn't copy same immediately, making the effect go away.

They are positing a huge market failure that costs many teams millions of dollars. Market failures happen, but they never "just happen", they always happen for a reason. Supposed major market failures for which no reason can ever be found almost always turn out to be bogus.

The supposed reason: "Highly profit-motivated NFL owners in places like NYC, etc., are happy to throw away millions of dollars to endulge racism" is something they are going to have to do better than, IMHO.

At Monday, April 25, 2011 7:04:00 PM, OpenID underpoint05 said...

Phil, do you know if there have been serious attempts for any sport to put together better coach/manager performance metrics than simple win-loss records?

It seems like the difficulty in measuring a coach's overall success might explain the continued employment of certain coaches known for suboptimal decision-making.

At Monday, April 25, 2011 7:30:00 PM, Blogger Phil Birnbaum said...

Hi, Brad,

Don't know of any, really ... Chris Jaffe's book tries, but it's forced to attribute to the manager any unexpectedly good or bad performance of the team, which obviously isn't a great solution. Bill James had an entire book on managers, but I don't think he tried to formally evaluate them on "good" or "bad".

And, of course, there's the Lee Mazzone study a few years ago, even though that's a pitching coach effect and not a manager effect ...

Anyone else know of anything?

At Monday, April 25, 2011 9:42:00 PM, Anonymous Jim Glass said...

Re.: "A paper in the latest Journal of Sports Economics, by Janice Fanning Madden..."

An interesting tidbit pulled from

In response to a September 2002 study by Janice Madden, Ph.D., commissioned by attorneys Johnnie L. Cochran, Jr. and Cyrus Mehri titled “Black Coaches in the National Football League: Superior Performance, Inferior Opportunities” ....

Lawyers Johnnie L. Cochran, Jr. and Cyrus Mehri have notified the NFL that they will sue unless substantial progress is made by the NFL in the hiring...

Cochrane himself on "Our Report":


At Monday, April 25, 2011 10:35:00 PM, Anonymous Jim Glass said...

Can anyone tell me who are the "29 coaches" in this Johnnie Cochrane/Janice Madden study?

My sources say there were 18 through 2010. From that subtract Fritz Pollard, 1921 ... and also subtract four "temporary interims" (Terry Robiskie, Emmitt Thomas, Perry Fewell, Eric Studesville). That leaves 13:

Art Shell
Dennis Green
Ray Rhodes
Tony Dungy
Herman Edwards
Marvin Lewis
Lovie Smith
Romeo Crennel
Mike Tomlin
Mike Singletary
Jim Caldwell
Raheem Morris
Leslie Frazier

And as to ...
from 1990 to 2002, NFL teams with black head coaches won 1.1 games per season more than teams with white head coaches...

... of the above list, only five coached during 1990-2002:

Art Shell
Dennis Green
Ray Rhodes
Tony Dungy
Herm Edwards

Is the list above missing at least 11? If not, those five area a *really* small sample size.

Yes, I looked at the study itself, but for all the data it throws out it doesn't name the coaches it looked at. Peculiar.

At Monday, April 25, 2011 11:28:00 PM, Blogger Phil Birnbaum said...

There were 29 team/seasons. I guess those five coaches had an average of 5.8 seasons each?

At Tuesday, April 26, 2011 3:43:00 AM, Anonymous Jim Glass said...

There were 29 team/seasons. I guess those five coaches had an average of 5.8 seasons each?

Right. We are talking about those five coaches. Via their combined record 1990-2002 was indeed 264-207-1, a 56% winning percentage, average per 16 games of 9.1-7.1

This is the evidence that these individuals were superior coaches -- they had to be that good to overcome the handicap of discrimination.

Now after 2002 these superior coaches compiled records of:

Shell: 2-14
Green: 16-32
Edwards: 35-61
Rhodes: Unemployed, after going 37-51-1, .421, during 1990-2002.

Suddenly they aren't looking so superior. They were a combined 53-107, .331 over 10 seasons.

Dungy, the other guy, moved to a team with Peyton Manning as his QB and Bill Polian as his GM, both sure HoFers, and went 75-21. Good for him.

But altogether ... hmmm ....I'm not so sure I buy the conclusion of that study.

(And now I see why they didn't name the coaches in it. "Herm Edwards" as the model of the superior coach!? )

At Thursday, April 28, 2011 1:51:00 PM, Blogger NPHard said...

It is interesting to me that a study can find a relationship between a specific physical description of a coach in football and team wins in a 16 games season when people have struggled to find a relationship between managers in general in MLB in 162 games seasons. Besides shouldn't a coach and team be measured by improvement YOY? Judging a coach who takes over a 1-15 team on the same level of a coach who take over a 12-4 team does not make any sense.

At Monday, May 23, 2011 2:33:00 PM, Blogger Ben said...

Sorry, I realize I'm pretty late to the party (my excuse is finals and qualifying exams, I'm a first year graduate student). I just wanted to note that our paper has a pretty simple metric for coach performance -- points scored relative to other teams for offensive coordinators (and their equivalents) vs. points allowed relative to other teams for defensive coordinators (and their equivalents), normalized to account for changes in offensive environment. Obviously this doesn't work particularly well for head coaches since they're responsible for both sides of the ball, but that is (more or less) the job description of an offensive or defensive coordinator. We won't claim that the metric is perfect, but so long as the errors are uncorrelated from the coach's race (i.e. minority coaches aren't systematically more likely to coach for teams which run the ball and play slowly, for example), then this doesn't pose much of an issue for the analysis.

In regards to the paper by Madden, et al, I think Jim is exactly right when he says their claim is "this is the evidence that these individuals were superior coaches -- they had to be that good to overcome the handicap of discrimination." I think this is a pretty indirect argument, and their results are very similar to ours when they use more direct techniques (new paper by Madden and Ruther here:

If anyone has questions about our research, however, or would like a copy of a draft version of the paper (I can't send the final copy per the journal's orders), feel free to let me know:

At Monday, May 23, 2011 2:40:00 PM, Blogger Phil Birnbaum said...

Thanks, Ben! Appreciate you dropping my and commenting.


