Sabermetric Research: Are early NFL draft picks no better than late draft picks? Part III

This is about the Berri/Simmons NFL draft paper, in which they say that draft position doesn't matter much for predicting quarterback performance. Here are Parts I and II.

-----

One of the paper's most important claims is that scouts are looking at the wrong things -- specifically, the results of the NFL combine.

At the combine, prospects are tested on a bunch of objective measurements. How fast they run the 40-yard dash. Their BMI (a measure of weight to height ratio). Their intelligence, as measured by the Wonderlic test. And, of course, their height.

But, the authors argue, those things don't matter, and scouts are completely misguided in looking at what happens at the combine. They say that a QB prospect's height, BMI, 40-yard-dash time, and Wonderlic score have almost no effect on performance.

And that one point is key. Because, most of the authors' argument goes (in my words):

Premise 1: Scouts care about combine stats.
Premise 2: Combine stats affect draft position.
Premise 3: Combine stats don't predict performance.

Conclusion: Scouts don't know what they're doing.

So, premise 3 is key. How do the authors prove it?

Here's what they did. They took the 121 QBs drafted from 1999 to 2008, for which they had full combine data. Then they ran a regression to predict the QB's senior year performance based on those factors.

They found no statistical significance for any of them. And they conclude:

"Such results indicate that the combine measures are not able to capture key attributes of the quarterback."

And, in a related footnote,

"Such results indicate that there is little relationship between the combine statistics and per play performance."

Again, I think the problem is that the effect is there, but there just isn't enough data for significance. Indeed, it seems to me that they almost COULDN'T find significance in a study of that type.

Look, how much of QB performance is affected by height? Probably not much, right? There are so many other things involved. I mean, this isn't basketball: you don't see a lot of quarterbacks who are 6-foot-9, which suggests that height can't be that big a deal.

If the effect is that small, how are you going to find statistical significance with only 121 datapoints? Especially when you're trying to predict ONE SINGLE YEAR of college performance, which is very noisy (made even more so because, first, the authors chose to predict a measure that's dependent on playing time)?

You can't, and the authors didn't.

But ... that just means your study isn't precise enough. It doesn't show the effect isn't there. You can't look for a needle in a haystack, from fifty feet away, looking through the wrong end of a pair of binoculars, then say, "we didn't see a needle, so it doesn't exist."

Berri and Simmons didn't even show the results of that regression, even though it's key to their story. They just mention "not significant therefore zero" and move on. But if they HAD given their results, I bet you'd see the standard error is wide enough to encompass not just zero, but also many possible values that are perfectly reasonable and perfectly in line with what scouts think height is worth.

The same thing for the other factors -- 40 yard dash speed, Wonderlic, and BMI. It was almost guaranteed that the regression wouldn't find small effects in that sample.

What about the overall results for the four factors? You might get none of the individual combine stats being significant, but the overall correlation might be. Was it?

We really need to see the estimates for the coefficients. How many of them are reasonable individually? If you add them all up, are they also reasonable? If they are, that's all the more reason to point out that the lack of significance doesn't prove anything.

Again, the authors don't show the results ... but they do give a little hint. They run a second regression, this time using rate statistics instead of playing time stats. In a footnote to that, the authors say,

"The adjusted R-squared from these regressions, though, is in the negative range and the F-statistic is statistically insignificant."

A negative adjusted R-squared ... at first glance, that seems to say no relationship.

Except ... I looked up "adjusted R-squared". And, it turns out, for a regression with 5 variables and 121 rows, you can have a negative adjusted R-squared even if the "real" R-squared is as high as .042. That's not as small as it looks. An r-squared of .042 is an r of around 0.2, which is nothing to sneeze at.

(That makes sense. According to this calculator, a single-variable regression on 121 rows needs to find a correlation of 0.178 to find statistically significance, and I think the "adjusted" is meant to make the 5-variable case comparable to the 1-variable case.)

But 0.2 is probably higher than the effect we're looking for. Or at least, on par with it.

Suppose you ranked all the QBs on their combine stats. And then you took a QB who was +1 in SD in combine stats, and compared him to one that was -1 SD in combine stats. What kind of difference would you expect in on-field performance between the two?

Well, to get a correlation of 0.2, you'd have to expect a difference of about 3 points of NFL Quarterback Rating, or 3 or 4 positions in the performance rankings. (To estimate that, I looked here, and added 3 points to the rating of a typical QB.)

Now, remember, QB performance is very noisy. 3-4 positions in the performance rankings probably means 5-6 positions in *talent* rankings.

That seems to me like it's too much. There's no way height, BMI, Wonderlic, and 40-yard-dash speed could be *that* important, could it, that it's 5 or 6 rankings?

If not, then we're looking for an effect that's too small to find with only 121 datapoints to look at.

So, I think Berri and Simmons' regression was doomed from the start. They were guaranteed not to find significance, even if the scouts were right.

Labels: Berri, draft, football, freakonomics, NFL

3 Comments:

At Sunday, March 04, 2012 12:51:00 PM, Doug said...: Phil, if you're right in your summary, I have to question Berri's grasp on causation. How can combine stats, which occur AFTER the senior season (and much training at places like at the API) be seen as CAUSING the senior season stats?

And is the argument really that combine stats don't relate to COLLEGE performance, so therefore the combine can't predict NFL performance? Again, that's too goofy for words.
At Monday, March 05, 2012 10:16:00 AM, Phil Birnbaum said...: Actually, I don't think that's a problem ... if you have a Ph.D., I can retroactively predict that you did OK in 11th grade.

Plus, the combine stats would presumably be the same before or after the player's senior year ... his expected height and weight and intelligence won't have changed.
At Monday, March 05, 2012 8:12:00 PM, Doug said...: Maybe I'm too much of a causal purist, but it seems strange to me that the dependent variable would occur PRIOR to the independent variable. As as for the combine stats, only one of them (height) can't be affected by what a senior does between the end of his college season and the combine. Many QBs train for the combine, including the Wonderlic.

Also, one could have this hypothesis, which the paper does nothing to refute:
"College QB's can succeed for a variety of reasons. But among the college QB's most ready for the NFL, the combine stats are great differentiators of better and worse prospects". With Berri's paper, someone like Russell Wilson who's fantastic in college but not great with combine stats would prove their point. But the reality may be that the combine stats help distinguish between good college players as they prepare for the pros.

The bigger problem with the paper I mentioned in the other thread -- Berri needs to model playing time first, and then performance as a function of playing time.

<< Home

Sabermetric Research

Saturday, March 03, 2012

Are early NFL draft picks no better than late draft picks? Part III

3 Comments:

About Me

Previous Posts