Sabermetric Research: David Berri's FAQ and rebounding

Wednesday, January 05, 2011

David Berri's FAQ and rebounding

A few years ago, "The Wages of Wins" introduced a basketball rating statistic called "Wins Produced" (WP). Since the book came out, there's been some debate about whether WP has a problem with rebounds when evaluating players. I have argued that it does; two of my posts on it are here and here, but you can probably find others elsewhere.

Dave Berri has recently updated a FAQ that tries to take on us doubters. I'll get to that, but first I guess I should summarize the disagreement, since it's been a few years.

-----

WP values a rebound at +1 point. That's because a rebound takes a possession that would go to the other team, and effectively eliminates it. Since a possession is worth about one point, on average, so is the rebound.

There's no argument about that. The argument is about *who should get credit* for the +1 point. "The Wages of Wins" (or, more specifically, Berri, who is co-author and main blogger and spokesperson for the book), gives the entire +1 to the player who snagged the ball. Others argue that this is wrong.

The opposing argument goes something like this:

When a shot is missed, the ball is likelier to go some places than others. Whoever is at that position is more likely to be in a position to pick up the rebound. When you award the entire value of the rebound to that player, you are mostly rewarding him for being in that spot. As Guy pointed out in a comment way back, it's like putouts in baseball. Getting an out is quite valuable, and many putouts are made at first base. But that doesn't mean that your 1B is five times more valuable than your CF just because he makes five times the putouts. His high total is because of where he plays.

It's obvious that this is also true in basketball. Here is one sample of overall rebounding percentage based on position played:

15.0% Center
13.8% Power Forward
8.9% SF/SG
5.9% Point Guard

Obviously, it's not the case that centers are 250% as good at rebounding as point guards -- it's just that because of the way offenses and defenses work, they happen to be in position for a rebound much more often. That's why Berri adjusts WP scores for position. Otherwise, the numbers wouldn't make sense, and point guards as a group would look like they're horrible basketball players.

A slight variation on this is "diminishing returns". This is an argument that, when you have one player who snags a lot of rebounds, it's not just that he's good at rebounds -- it's that he's being given more opportunities that would otherwise go to his teammates. Perhaps he's also going into other players' "territory" to get them. Or, perhaps, the team has assigned him the role of primary defensive rebounder, reducing other players' rebounding responsibilities to allow them to better transition to offense.

If that's the case, a player shouldn't necessarily get credit for every extra rebound, because, if he didn't get it, one of his teammates would have. That is, there are diminishing opportunities available to the other four players on the team.

So, while there's no question that a rebound is worth +1 to the team, it certainly doesn't seem that the full value should be credited to the skill of the individual player.

Berri, however, is not as convinced. He acknowledges that there is *some* diminishing returns happening, but he still gives the entire +1 to the player who picked up the rebound.

------

OK, now to Berri's FAQ. He makes four separate arguments for why his WP stat doesn't overvalue or misappropriate credit for rebounds. All four of those arguments, I think, are easily rebutted.

I'll go through them one at a time, using Berri's own numbering and titles. Keep in mind that I am *not* trying to provide evidence here for the other side of the debate -- I'm just trying to show why Berri's arguments do not prove his position.

Response #1 -- The Consistency of Rebounds

Rebounds per minute, for individual players, are consistent from year to year. Berri reports a correlation coefficient of over .9, and 0.83 even after adjusting for position played. This is higher than similar correlations in other sports. For instance (examples are Berri's):

0.65 -- Baseball OPS
0.47 -- Baseball batting average
0.37 -- Baseball ERA
0.36 -- NFL rushing yards per attempt (for running backs)
0.24 -- NHL goalie save percentage
0.07 -- NFL QB interceptions per attempt

First, you can't really compare the numbers that way. The actual correlations depend on all kinds of things other than skill -- mostly, the number of opportunities and the variance of the circumstances in which those opportunities happen. The fact that one correlation coefficient is higher than another doesn't necessarily mean that the underlying cause is more consistent.

But, suppose we let that go, and assume, with Berri, that rebounding is more consistent than (say) batting average.

So what? Even if you show that rebounding is consistent, that doesn't prove that rebounding is a skill. To go back to Guy's analogy, the consistency of putouts in baseball would be just as high: Albert Pujols had a lot of putouts in 2009 and 2010, and Alex Rodriguez had a lot fewer putouts in both 2009 and 2010. That doesn't prove that Pujols is a much better "putouter" than A-Rod ... it just proves that Pujols plays first base and Rodriguez plays third base.

To that, you could argue that the analogy isn't perfect, because Berri did adjust by position. Still, there are other reasons you could get a high baseball correlation, other than skill. Maybe some 3B play for teams with pitchers who give up a lot of ground balls, and others don't. That would create a higher correlation, while having nothing to do with skill. Maybe some LF play for teams with lots of RH pitching, so they get fewer fly balls hit to them. And so on.

Or, consider saves. There is a very high correlation between saves one year and saves the next year. That doesn't mean that David Aardsma has a talent for saves, but Felix Hernandez doesn't. It just means that, even though they play the same position on paper -- pitcher -- they are used in very different ways. In this case, the consistency isn't of talent, but of managerial decision-making. (I previously expanded on this thought here.)

So, when Berri says,

"When we look at rebounds, we see a higher correlation than all of these [other sports'] statistics. This leads one to conclude that rebounding is a skill that is primarily about the player credited with the rebound."

... it's obvious that doesn't follow. A high degree of consistency in rebounding rate could mean a consistency of talent, or it could mean a consistency of covering more of the other players' territory.

Consistency just means you're measuring something real. It doesn't mean that the "something real" is necessarily talent.

Response #2 -- Rebounds Are Not the Same For All Teams

Berri writes,

"If a player's rebounds are all "stolen" from his teammates, then teams would have to be getting the same number of rebounds. So do all teams end up with the same number of rebounds?"

As written, this is an egregious straw man. Nobody is saying that rebounds are *all* "stolen" from teammates -- just enough to make the raw statistic unreliable. And nobody is saying teams are exactly the same -- we're saying that teams show more similarity than you'd expect by just adding up the individuals. But I'll assume that Berri knows that, and is just exaggerating for effect.

To show how rebounds differ highly across teams, Berri goes on to compare various statistics by "coefficient of variation" (the SD divided by the mean). Again, as I have written before, that number is not meaningful in the way Berri thinks it is.

For offensive rebounding percentage, Berri gets a figure of .106, which is probably something like .027/.265. The .027 is the SD of OR%, and the .265 is the overall average.

But, what if you changed "offensive rebounding percentage" to "offensive rebounding missed percentage"? That is, suppose you start counting missed rebounds instead of made rebounds. In that case, the SD stays the same, but the mean reverses, from .265 to .735 (26.5% made is 73.5% missed). Now, you now get a "coefficient of variation" of .027./.735, which is .036. That now almost exactly matches the other stats Berri cites (which range from .035 to .043). Still, that doesn't matter, because, as just a raw number, "coefficient of variation" has little do to with the subject at hand.

Intuitively, it may *look* like it does, at least to Berri. But it doesn't.

More generally, I don't understand Berri's argument that the more variation there is among teams, the more skill there is in the statistic. There's a lot more variation in sacrifice bunts than there is in batting average, isn't there? But bunting numbers vary mostly because of managerial decisions, not because of talent. The same is true for intentional walks by pitchers. And, to a lesser extent, it's also true for stolen bases.

Response #3 -- Do We Overvalue Rebounds?

Berri makes an argument that goes something like this: suppose rebounds were overvalued, the way his opponents think they are. Then, if we credit a player for only half the rebounds he makes (and spread the other half around to his teammates), that should change things a lot. But, when you look at the top 20 players in the league, the ranking doesn't change that much. (Chart in FAQ, or alone here.) And the new and old statistic correlate with each other at 0.95.

To which the response is:

First: It DOES make a significant difference in the rankings. Some of those top-20 players drop significantly. Carlos Boozer, for instance, goes from 16.2 wins to 12.5 wins. More importantly, it's the evaluations of Boozer's teammates that would change a lot. Since the Jazz player's stats still have to sum to Utah's total wins, Boozer's teammates will get quite a boost. The standings of the top 20 players may not change a whole lot, but, in the middle, where players are very close together, there will be a wholesale re-evaluation, with non-rebounders moving up and rebounders moving down.

Second: Of the top 20 players last year, 19 of them drop in total wins produced when you credit them only half their rebounds (the 20th one stays the same). That means that every one of the top 20 players was at or above his team's average in rebounds (otherwise, replacing half his rebounds with half his teammates' rebounds would make him look better). It looks like the average drop among the top 20 players is a win or two.

That means that it makes a big difference to whether you get it right. If you're an NBA general manager, whether a player is worth 12 wins or 14 is very significant at contract negotiation time.

Third: A correlation coefficient of .95 does not imply that there's not much difference. It's true that .95 seems like a "big number," but you have to evaluate it in context. I feel pretty certain that I could take the established, proven values for baseball events, screw them all up to make them significantly wrong, and still come up with a .95 correlation to the original. I mean, think about it: any not-too-far wrong stat will put Babe Ruth at the top and Mario Mendoza at the bottom. In that light, mismeasuring some of the components will still leave the correlation pretty high.

Response #4 -- WP isn't just about rebounds

This argument of Berri's says that rebounds aren't such a big deal in the entire context of the WP calculation. They're just one small part. Even if rebounds *were* misallocated, it doesn't matter all that much in context, not nearly enough to invalidate WP.

What's the evidence? Well, Berri shows how much a 1 percent change in various statistics changes the final value of WP:

+5.2% -- points per FG attempt
+3.2% -- rebounds
+1.2% -- free throw percentage
-1.1% -- personal fouls
-0.9% -- turnovers
+0.7% -- steals
+0.2% -- blocked shots

Berri concluded,

"Rebounding certainly matters. ... But WP is more "responsive" to shooting efficiency from the field."

Yes, except: a 1% change in Points Per FG Attempt is much less common in the NBA than a 1% change in Rebounding.

For an analogy, consider baseball. The average player might hit .260 with 12 home runs. Now, a 100% change will increase home runs from 12 to 24 -- a significant increase, but not out of this world. On the other hand, a 100% change in batting average will have the player go from .260 to .520 -- which is pretty much impossible.

So the extent to which a statistic is influenced by one of its components is the product of two factors: "elasticity" (responsiveness to change), as Berri calculated, and the extent to which players actually differ in real life (that is, the variance). Berri has only considered the first.

What he could have done, instead, is something that's commonly done in other studies: show the response, not to a 1% change in the value, but to a 1 *standard deviation* change in the value. If Berri had done that, he would have noticed that the SD of rebounds is (I think) approximately 45% of average, while the SD of shooting percentage is only about 11% of average.

So, a 1 SD change in shooting percentage increases value by 5.2 times 11 -- 57.2%. And a 1 SD change in rebounds increases value by 3.2 times 45 -- 144%. So rebounds are indeed much more influential than shooting.

Now, in fairness to Berri, the real-life results won't be that extreme. Berri adjusted all players' stats by position, and, as we saw above, some positions rebound a lot more than others. The adjustment, therefore, will pull the SD of rebounding down. (Having said that, field goal percentage was also adjusted by position, and some positions probably shoot better than others too, so the SD of shooting percentage will drop too. But probably not as much.)

But my point is not to come up with a definitive answer to the question -- it's to argue that Berri's elasticity calculation doesn't mean what Berri thinks it does.

The strange thing is, that, for this particular narrow question, it would actually make sense to compare correlation coefficients. You could look at the r (or even r-squared) for player rebounds vs. WP, and compare it to the one for player Points Per FG Attempt vs. WP. That would give you an intuitive idea of which season stat affects WP the most. But, in this case, Berri chose not to run a regression.

(And, while I know I promised not to argue for the facts either way, one note. Commenter Guy, in an e-mail, told me that last year, WP had a .75 correlation with rebounds, but only a .5 correlation with shooting percentage.)

------

So, that's why I think that Berri's four counterarguments are not relevant to the question of whether rebounds are misallocated to players. As for actual evidence and argument one way or another, there have been some posts lately at various basketball sabermetrics sites, that perhaps I will comment on in future. Here, for instance, is one of them -- both Guy and Berri make appearances.

Labels: basketball, NBA, rebounds, The Wages of Wins

11 Comments:

At Wednesday, January 05, 2011 8:42:00 PM, Anonymous said...: Berri's general problem is that he is not a subject-matter expert, and he dismisses SMEs' objections when his proclamations don't pass the smell test (because he's missed some element of the sport that throws off his calculation). A far more problematic issue with his basketball "analysis" is how he treats defense. First, he credits defensive value only by minutes played, which any basketball SME could tell you is ludicrous on the face of it. Second, in the FAQ, he's condescending of anyone who would argue the point without going through an academic channel.

-- David A.
RufusOnFire.com
At Thursday, January 06, 2011 12:49:00 AM, Dre said...: "First, you can't really compare the numbers that way. The actual correlations depend on all kinds of things other than skill."
All Dr. Berri is measuring how consistent each is and comparing them. In that regard his choice works remarkably well.
"But, suppose we let that go, and assume, with Berri, that rebounding is more consistent than (say) batting average."
I’m going to call you out here. Dr. Berri looked at the data and listed numbers. You are giving him the “benefit of the doubt” in spite of the fact that. . . .
"Or, consider saves. There is a very high correlation between saves one year and saves the next year."
. . .you’re using an example to prove your point with no number numbers.
"Consistency just means you're measuring something real. It doesn't mean that the 'something real' is necessarily talent."
Dr. Berri’s point here is that if rebounding were more random (e.g. someone has to get it so it doesn’t matter who) it would not be consistent from year to year. Also Dr. Berri is measuring player’s year to year rebounds, which unless you want to go down a weird existential road are real last I checked.
"Nobody is saying that rebounds are *all* "stolen" from teammates"
I figure this is worth noting as you quote Guy a few times in your post. Guy has left countless comments about this on Dr. Berri’s blog.
"(And, while I know I promised not to argue for the facts either way, one note. Commenter Guy, in an e-mail, told me that last year, WP had a .75 correlation with rebounds, but only a .5 correlation with shooting percentage.)"
I’ll point out that you have taken Dr. Berri’s numbers and explained why they are wrong. You’ve then cited Guy as your number source here with not even a second glance (despite showing that you are not completely familiar with his writings)
At Thursday, January 06, 2011 2:16:00 AM, Anonymous said...: Phil - I wonder how sure you are that centers aren't 250% better at rebounding than guards. If Orlando played one game backwards on defense, so that Howard and Bass covered guards on the perimeter while Nelson and Richardson covered the big men in the paint, I wouldn't be shocked at all if their defensive rebounds dropped through the basement.
At Thursday, January 06, 2011 9:11:00 AM, Phil Birnbaum said...: Someone posted a comment here at 12:49 am, and it never showed up. I assume it's because you deleted it immediately after posting, but, if not, if it was a technical glitch, let me know and I'll send it back to you for reposting.
At Thursday, January 06, 2011 9:34:00 AM, DSMok1 said...: Another reason that the +1 point may not make sense on a player level:

For a significant number of defensive rebounds, there are multiple defensive players present for the rebound (could get the rebound), while the offense has already cleared out to cut off the fast break. These rebounds do not show value or skill to the player who gets them, but are rather a random/confounding variable. For some teams, their center will grab such "garbage" rebounds. For other teams, maybe the PG will grab them himself (I see OKC and Russel Westbrook do this).

In addition, often good crafty posts will block out the other team's best rebounder and allow OTHER players on his team to rebound the ball, or will tip the ball to a teammate. For OKC, you see Nick Collison do this--he makes room for teammates to rebound. So some credit should go to him, right?

Further: look at the state changes. After a missed shot, 74% of the rebounds are defensive. Does this not imply that forcing a missed shot already is worth ~0.5 points? The "expected points" for the given possession have dropped from ~1 to ~0.5. And getting the rebound is worth the other ~0.5 points, dropping from ~0.5 points for the offense down to 0.5 points?

I would propose that in actuality, a rebound is worth, on the defensive end, something like 0.3 to the player and 0.05 to other defensive players. On the offensive end, perhaps nearly 0.5 would go to the rebounding player, since that seems to be more of an individual effort/skill.
At Thursday, January 06, 2011 9:52:00 AM, EvanZ said...: My ezPM model follows a similar logic to what DSMok1 said. It's also based on PBP data.

http://thecity2.com/2010/12/28/ezpm-1-0-now-with-play-by-play-data/
At Thursday, January 06, 2011 12:43:00 PM, EvanZ said...: Phil, you've been called out. You might want to respond.

http://sportskeptic.wordpress.com/2011/01/06/we-got-a-twofer/
At Thursday, January 06, 2011 12:45:00 PM, Phil Birnbaum said...: Thanks, Evan. Haven't had a chance to read it fully yet. Will do that soon.
At Thursday, January 06, 2011 4:33:00 PM, Phil Birnbaum said...: I've replied to Alex. It's in the comments at his blog. Here's the link again:

http://sportskeptic.wordpress.com/2011/01/06/we-got-a-twofer/
At Friday, January 07, 2011 12:23:00 PM, Hawerhuk said...: "I wonder how sure you are that centers aren't 250% better at rebounding than guards."

The mere existence of a guy like Dennis Rodman (smaller, but focused on rebounding) suggests that a player's assigned role has a huge impact.

I'd suggest looking at rebounds vs both position and height.
At Tuesday, March 15, 2011 5:51:00 PM, tgt said...: intro problems:

The analogy of a first baseman's putouts is incorrect. A better analogy would be to assists. This error flows through the entire post making it hard to see if you ever actually have a valid point.

You discuss rebounds as if the other team does not exist. This destroys your model. Teammates, in general, are not competing with each other for rebounds. They're competing with the other team. This simplification in your model makes it appear the diminishing returns are much greater than they are.

At the end, you say that Berri is not convinced, and just leaves the rebounds at 1 point. In reality, your arguments are bad, but Berri has considered lessening the worth of each rebound. He has tried various coefficients (including .7 and .5). You even mention this later. This sloppy writing shows either unintentional bias, an efferor to dissemble, or an inability to analyze well. None of them bode well for the rest of the piece.

response 1 problems:

The opportunities are all high enough to be statistically significant and they are all rate stats. Number of opportunities is not a factor. This is an apples to apples comparison. A better counterargument is that those stats with low correlation aren't as powerful as some people thought they were. BA and ERA are notoriously bad stats.

As for your argument about situations. Rebounds (and WP) correlate across seasons where players change teammates. That part of your argument is now void.

You're left with different roles for different players, and you use saves as a comparison. Saves is not a raw stat. It is highly context independent in a way that rebounds are not. Shading the middle infielders toward the lines would be a better parallel, but then it doesn't have any umph.

response 2 problems:

None that I can obviously see. Berri's argument was poor.

response 3 problems:

First is decent. I saw that as an issue as well.

Second is horrible. That's exactly what you would expect for any stat that is modeled properly. People who are excellent, are normally above average in most categories. If you put stats together to find a list of best baseball players of all time, then give their teammates a portion of their BA (OPS, WHIP, etc), you'll see the same behavior.

Third is also horrible. Your complaint is that it's not perfect. It gets people pretty close to right, so it's not good. I can't follow.

Response 4 problems:

You may overstate it a little, but I agree. Using relative percentage changes without standarding them was a huge mistake.

Sabermetric Research

Wednesday, January 05, 2011

David Berri's FAQ and rebounding

11 Comments:

About Me

Previous Posts