### Are early NFL draft picks no better than late draft picks? Part IV

This is the last post about the Berri/Simmons NFL draft paper, in which they say that draft position doesn't matter much for predicting quarterback performance. Here are Parts I and II and III.

------

In his Freakonomics post, Dave Berri argues, reasonably, that quarterbacks are harder to predict from season to season than basketball players.

When he runs a regression to predict NFL quarterbacks' completion percentage this season, based on only the stat from last season, he gets an r-squared of .311. On the other hand, if he does the same thing for "many statistics in the NBA," his r-squared "exceeds 70 percent."

According to Berri,

"This is not surprising since much of what a quarterback does depends upon his offensive line, running backs, wide receivers, tight ends, play calling, opposing defenses, etc. Given that many of these factors change from season to season, we should not be surprised that predicting the performance of veteran quarterbacks is difficult."

But ... aren't basketball players also subject to changes in the quality of their teammates? Why should teammates be so much more important for football than basketball?

Well, they're not. Almost the entire difference is just sample size. Let me show you.

The r-squared from season to season depends on the variances of what kinds of things are constant between seasons, and what kinds of things are not. For the most part, we can call these "talent" (t) and "noise" (n), respectively.

If the r-squared for QBs between seasons is .31, that means

(t/(t+n)) * (t/(t+n)) = .31

Taking the square root of both sides gives

t / (t+n) = .56

And from there, you can multiply both sides by (t+n), and discover that

n = .79 * t

So, for a single NFL season, the variance due to noise is 79% of the variance due to talent.

Now, in the NFL, a quarterback will get maybe 450 passing attempts per season. In the NBA, a full-time player might get three times as many (FG attempts, FT attempts (even if you take those at half weight), and 3P attempts). So, the noise should be only 1/3 as large. Instead of noise being 79% of talent, it will be only maybe 26%. Call the new value of noise n'. Then,

n' = .26 * t

If you sub that back into the first equation, you get

(t/(t+n')) * (t/(t+n')) = .63

See? Just considering opportunities raises the r-squared of .31 all the way up to .63. Berri says it should "exceed 70%", and we probably could get that to happen if we included rebounds, or used a more sophisticated stat than just shooting percentage.

So, if quarterbacks are harder to predict than basketball players, it's simply because they don't play enough for their stats to be as reliable.

*(UPDATE: As Alex alludes to in the first comment to this post, my logic assumes that "t" -- the variance of talent -- is roughly the same for QBs and NBA shooters. It might not be. But the point is, assuming they're the same is a reasonable first approximation, and that leads to the conclusion that sample size is the biggest difference.*

*So, maybe I should have been more conservative and said that it *could be* that they don't play enough for their stats to be as reliable.)*

------

Which brings me back to Berri's (and Simmons's) academic study. There, they write,

"[Our] results suggest that NFL scouts are more influenced by what they see when they meet the players at the combine than what the players actually did playing the game of football."

Well, yes -- and perhaps the scouts SHOULD be more influenced by the combine. There's lots of noise in only one season of performance, and a rational scout won't weight it too heavily. What if the scout only saw one play? Then, it's obvious that he should be more influenced by the combine than the results. The less data you have, the more you have to weight the combine results.

Look at it this way. You have two pitchers. One throws 100 mph and had an ERA of 3.50 in 50 innings. The other throws 80 mph and had an ERA of 3.20. Which do you draft? Well, *of course* you draft the 100 mph guy. It's only after 200, 300, 400, 1000 innings that you might have enough evidence to change your mind.

------

The idea of random noise and sample size never figures into this paper at all. I don't think the authors even think about it. When they see unexplained variance, they always argue that it's something like the effect of teammates, instead of looking at binomial randomness. In fact, you get the impression they think there's no randomness at all, and the scouts could be perfect if only they were smarter.

For the record, the paper has no occurrences of the words "luck," "random," "binomial," or "sample size."

Labels: Berri, draft, football, freakonomics, NFL

## 24 Comments:

Phil - If you had to guess, do NBA players take more field goal attempts or free throw attempts? Which one do you think is more consistent across seasons, field goal percentage or free throw percentage? I believe, by your logic, field goal percentage should have a higher correlation because there are more opportunities.

Good question, thanks!

It looks, very roughly, like there are twice as many FGAs as FTAs. So, all things being equal, FG% should be more consistent.

However: the correlation depends not just on the number of attempts, but also on the SD of *talent*. My guess is that the FT% talent is spread much more than the FG% talent. You can have guys like Shaq who are awful at FTs, but if there were guys that bad at FGs, they wouldn't be playing.

I bet that more than offsets the number of attempts, so I'd guess that the correlation of FT% is higher than the correlation for FG%. Am I right?

You bring up a good point: my logic only works if the distribution of QB talent is roughly the same as the distribution of NBA shooting talent. There's no guarantee that it is. I'll update the post to be a bit less confident.

You're right, free throws are more consistent. And in fact, the shooting rate stats (FG% and FT%) are the lowest of any of the basketball stats. So I think you would guess that because all those other things happen less than field goal attempts, they all have a larger talent distribution than shooting?

Yes, I'd guess they have a larger distribution than shooting. Especially the stats that depend on playing time. When you have some guys playing 40 minutes and others playing 20, that would be a larger "talent" distribution (where "talent" really means all things that are not random).

Do you know the correlation of year-to-year shooting stats? If it's not significantly higher than the football number, I guess my argument doesn't really hold up and I should retract it.

It's true that Berri neglects the important role of sample size -- this pervades his work. But I think the larger problem with this analysis is that he doesn't distinguish between metrics and actual performance. He says "Given that many of these factors change from season to season, we should not be surprised that predicting the performance of veteran quarterbacks is difficult." It would be a more accurate statement if he changed "predicting" to "measuring." The importance of the line, wide receivers, etc. means that we will find it hard to measure the unique contribution of the QB to team offensive performance. Yes, the metrics we use to evaluate QBs will be impacted by these other players, but that doesn't mean the QB's performance has changed, only our estimate of it. So QBs might be very consistent in their performance, and yet their statistical performance will bounce around due to outside influences. Berri seems to not understand at all that these are potentially quite different things.

Alex/Phil: I think estimating the spread of talent in basketball is greatly complicated by the elective nature of the use of offensive opportunities. The true spread in three point ability, for example, is enormous. Yet most weak 3P shooters won't make any attempts, so the observed variance shrinks. Weak 2-point shooters (usually big men) mainly take shots from close in, making them appear to be more efficient. Adjusting for this would be quite challenging, I think.

Guy,

1. Right. He says that predicting the "performance" is difficult, because of teammates, but it's not the QBs *performance* that's changed if only the teammates do.

Perhaps he believes that if the QB does worse with different teammates, it's the QBs fault? That doesn't seem likely. I think he just doesn't distinguish because he doesn't think it's important, or maybe because it makes the story more complicated and he wants to keep it simple.

2. Right, not every player takes the same opportunities. That is indeed a difficult problem. I assume adjusting for position helps a bit.

Also, it's not a huge problem just leaving out the guys who don't shoot from far. Running backs don't throw pass attempts, so we just leave them out of the analysis. In a sense, the variance of the guys who don't shoot doesn't matter. (Which is the same thing as saying, just weight by attempts.)

What my point is is not to solve the problem, but show that there's good evidence that sample size is probably most of the difference.

I don't have the book handy, but I believe that all the stats are set to per-48 minutes so that playing time is not an issue. If you happen to have Stumbling on Wins, there's a two-page table in an early chapter (maybe 2 or 3) that lists the r-squared for season-to-season prediction of various stats in football, hockey, baseball, and basketball. My memory is that FG% is about .46 and FT% is about .57, and the range goes up into the .8s for things like rebounds and assists.

There would be an easy way to fix your concern about sample size. If you looked at quarterbacks across two or maybe three seasons at a time, so that one observation consisted of more like 700+ passing attempts, you could then run that correlation, right? Or perhaps it would be more easy to look at NBA players for first half/second half of a season. Then you could equate opportunities and you would just have to worry about your potential difference in talent distribution.

Also, I'm curious if you guys could point me to any source that demonstrates that a small sample size means a smaller correlation? Because to my knowledge, the correlation will only be relatively noisier. It could very well be higher than the 'true' value.

More when I get home from work, but here's a thought experiment.

Imagine a million attempts each season. Correlation close to 1.00, right? Now imagine just one pass attempt. The correlation should be close to zero -- anyone can have one good pass.

It might help to imagine the cloud of points on an X-Y graph. For one attempt, the cloud has huge scatter. For a million, it's almost a line.

Alex: One place you can see the sample size/correlation effect at work is when people look at the relationship between payroll and wins. At the season level, the correlation tends to be viewed as rather low, and Berri (among many others) concludes from this that "money can't buy you wins." However, if you look at team payroll and win totals over a 5- or 10-year period, you find a much, more stronger correlation. It becomes very clear that money does indeed buy wins.

What you're thinking of, I suspect, is the regression coefficient. In this example, the estimate of wins added per dollar spent won't necessarily be lower when your sample is small -- it will just be less precise, as you say. But the correlation will rise with sample size.

Guy - you've tried the salary example before, and it doesn't work. I do actually have salary data, and if you look at a single season you get values (r squared) in the .2ish area. If you run it across 10 years (so there are 300 observations), you get .07. If you combine teams so that more data goes into each observation (back to 30 observations), you get .05. Someone using a different way for determining salary got the same general result: http://www.jeremyscheff.com/2011/07/the-correlation-between-spending-and-winning-in-the-nba-trends-by-year-and-by-team/ . Now, I'm not even claiming that all instances have to turn out like this. The correlation from one year should be noisier, not always larger or always smaller.

Phil - I don't see that your example must be true. So I ran a simulation. One group had 20 opportunities go into each point/observation, and another group had 100. There were 200 'people' in each group. I got the correlation for each group and did this 1000 times. The average correlation for both groups was essentially identical.

Guys, you can just look at the formula for the correlation coefficient. N isn't in it.

Alex: I didn't mean increase the sample by adding more seasons. I mean increase your sample size of games played by combining across seasons. So, for example, look at the NY Yankees' total wins from 2006-2011 and their total payroll over those same years (and same for all other teams). Then the correlation will get much stronger, because you are getting a more accurate read on how good the Yankees (and all other teams) really were in those years. If you just add more seasons you aren't reducing the noise, because each additional team-season you add has the same amount of random variance (a function of that sport's season length).

Imagine you were measuring the correlation between point differential and win% in the NBA. If you did this after 10 games, you might get an r of, say, .6. At mid-season, it's probably .8. And at the end of the year it's probably .9 or more. And if you used 10 seasons it would be very close to 1, because all the noise will have washed out. As the noise gets smaller you will approach the true correlation (in this case, but not always, 1).

Another way to think about it: you report the r^2 for FT% is .57. That's using a full season sample for each player (and probably some minimum # of games or FTA). Now imagine you measured these same players' FT% only in games played on a Saturday (dramatically reducing your two sample sizes), and again measured the y-t-y correlation. Wouldn't you expect the r^2 to fall significantly? Of course it will, because some 75% talent guys will shoot 65% on Saturdays one year but 85% the next.

Alex,

Can you give more details of your simulation? Because, the way you describe it, you should indeed get a higher correlation for the situation where 100 opportunities went into each row.

Alex: It occurs to me that the payroll-wins example works well in baseball (and probably the NFL), but not for basketball. In basketball, the season is so long (relative to team talent differences) that season win% has very little noise in it. So grouping together additional seasons won't improve your correlation much, if at all. In baseball, however, the $/wins correlation increases dramatically if you do this. (Tom Tango has shown this.)

So Phil is right, but only up to the point at which larger sample size stops materially improving accuracy. Your r for FT% will be higher if players shoot two rounds of 200 shots than 2 rounds of 20 shots. And higher still at 2x500. But the r will barely budge if you then go to 2x10,000, because at 500 shots you already had a reliable measure of each player's real ability.

Guy - As I said earlier, "If you combine teams so that more data goes into each observation (back to 30 observations), you get .05". By that I meant combining wins across seasons for a particular team as well as combining salary. But at any rate, I'm glad to see you back off your statement. Perhaps we can at least agree that it isn't a mathematical necessity? After all, there are only 82 games in a basketball season and the original point of this article was to compare QB throwing stats (hundreds of opportunities) to basketball shooting stats (hundreds to thousands of opportunities).

Phil - I'll do a post and show the simulation. It might be up tonight, otherwise tomorrow.

Alex,

Here's my simulation.

I randomly picked a skill level from a uniform distribution between .45 and .55. Then, I ran 1000 back-to-back seasons of 20 attempts each, so that I had 1000 rows of two percentages (each based off 20 attempts).

The r-squared was .0009.

I repeated the simulation, but each season was now 200 attempts instead of 20.

The r-squared was .1481.

I repeated again, but this time 2000 attempts.

The r-squared was now .7519.

Try the same simulation, if you can, and let me know.

"Perhaps we can at least agree that it isn't a mathematical necessity?"

Alex: Well, it's only a mathematical necessity if there is an underlying correlation. If there is, then Phil is correct that the correlation will always be strengthened by increasing sample size (up to the point at which you reach the true correlation). I don't see how there can be any dispute about that.

I got us off track by assuming the salary:wins example in baseball applies to other sports as well. If the correlation in basketball is virtually zero, then increasing sample size will of course not increase it. There is a correlation in baseball, and increasing sample size has the expected effect.

If true talent remains constant, then r will depend on two things: 1) the spread in talent and 2) the amount of noise (a function of sample size). The former increases r, the latter decreases it. Phil showed the impact of sample size in his simulation. If he widened the distribution to .25-.75, all of his r^2's would increase.

None of this is to say that player ability is unchangeable -- it certainly does change, and perhaps it even changes more in the NFL. Phil's point (I think) is that Berri can't know that simply by looking at the R^2, without also accounting for sample size and variance. Instead of concluding "basketball players are more consistent," we should first say "basketball players differ more in their talent and/or have more opportunities to display their talent."

One could actually calculate an "expected R^2" for each stat, since we know the variance and sample size for each one. Then we could compare the observed to expected R^2, to see how consistent player talent actually is. That would be interesting....

"Well, it's only a mathematical necessity if there is an underlying correlation. If there is, then Phil is correct that the correlation will always be strengthened by increasing sample size (up to the point at which you reach the true correlation). I don't see how there can be any dispute about that."

Guy, why does the correlation have to increase as you increase sample size? What if you start off "lucky" and the correlation starts off high, but as you add more samples, it decreases toward the "true correlation". I know I've done regressions where I've observed this behavior.

Evan: I don't see how adding random variance -- which is what shrinking sample size does -- would increase your r. Maybe that could happen 1 time in a thousand, I'm not sure, but it should be quite rare.

Let's say you had 2009-10 player ezPM data and wanted to predict 2010-11 team win%. First, you tried to predict each team's win% by month: Dec, Jan, Feb... Then you tried to predict first- and second-half win%. Finally, you try to predict season win%. I am confident that your R^2 will grow with each successive exercise (assuming that ezPM has some predictive value in the first place!). What I think IS true is that your estimate of the coefficient for ezPM when using small samples could be either higher or lower than the true coefficient. There's no reason the estimate has to be low. But I do think your r^2 will decline as your sample size of games played goes down. (As it also should if you used ezPM ratings based only on games from December 2009.)

Anyway, that's my theory....

I worry that perhaps when we talk about "sample size" here, Phil and I mean something different than Alex (and maybe Evan?) are thinking about. We're talking about the sample underlying each player stat (e.g. # of FTA in year 1 and year 2), not the number of players. For example, Phil's simulation used 1,000 players. I don't think his R^2s would be very much lower if he had instead used 500 players. What matters more is the # of trials underlying each percentage (20 vs. 200 vs. 2000). That's the sample size we're talking about.

So, if you measured the correlation between point differential and wins at the season level, your correlation probably wouldn't change very much if you looked at 5 seasons rather than 20 seasons. But if instead of looking at seasons (82 games) you looked at groups of 20 games, then I'd expect your r^2 to drop.

Try this: pick an MLB season, and run a correlation on salary vs. wins. You'll probably get something like .5.

Now, pick five consecutive MLB seasons, and run a correlation on five-year-salary vs. five-year-wins. (That is, for each team, add up five years worth of salary, and five years worth of wins. You will still only have 30 datapoints, just like in the first regression, but the Xs and Ys will be around five times as big.)

The second regression will have a higher correlation than the first.

My guess is somewhere around .7 or .8.

Hey Alex, how's that simulation coming? :>)

Berri has been saying for years that "basketball players are more consistent than football players", and back when I cared, I always commented on his site that it was the metrics themselves that offer a more consistent look at the players. I stopped visiting his site years ago because of this very issue.

I wonder though, what he would say about someone like Charlie Ward. Was he simultaneously consistent on the basketball court and inconsistent on the football field?

Alex has posted some good additional comments on his blog: http://sportskeptic.wordpress.com/2012/03/10/hiatus/.

Post a Comment

<< Home