Friday, January 28, 2011

Box-score statistics are the RBIs of basketball

Kevin Pelton has responded to my "basketball box score statistics don't work" post. (It's here, at Basketball Prospectus, but free even if you don't subscribe.)

He makes a couple of points.

First, rebounds. I argue that rebounds aren't necessarily a good measure of a player's skill in rebounding, because players vary in how much they "steal" rebounds from other players' territory. In response, Kevin shows that when players change teams, their rebounding numbers stay fairly constant, compared to other statistics. Doesn't that suggest, Kevin asks, that rebounds are relatively independent of the player's team, coach, and environment?

To which I answer: no, not really. I think that players have a certain style of how they approach rebounds, and that doesn't necessarily change from team to team. I might be wrong about this, but if a player is known for his rebounding, it doesn't seem like the new coach will say, "yes, we thought player X was excellent at rebounding, which is why he acquired him, but we're now asking him to cover less territory and pick up fewer rebounds."

If it's a player who's *not* known for his rebounding, he might be traded to a team that has a famous "stealing" rebounder. His rebounds will go down, but not by very much: the headline rebounder takes his extra rebounds from among many teammates, so the effect on any individual teammate is small.

That's why the correlation doesn't change much.

Compare that to field goal percentage, where the correlation changes a lot when players switch teams. That's because of a big difference between rebounds and field goal percentage. With rebounds, one player takes from the others, and the team doesn't change much. With FG%, every player raises or lowers every other player with him, so the team changes a lot.

Obviously, you'll get a lower individual correlation when the sum of all the players varies a lot (FG%), versus when the sum of all the players varies only a little (REB).

-----

Second, Kevin especially disagrees with my conclusions from the Lewin-Rosenbaum (L-R) study. That study found that when you're trying to predict next year's wins based on this year's individual player stats, "minutes played" is a better predictor than some of the sabermetric basketball box-score stats.

Kevin says that, further on in the paper, the authors show a result more favorable to the box score stats, finding they correlate with plus-minus (which we both agree is an accurate stat if you can get past the random noise) much better than minutes played (MP).

He's right about that, and I should have mentioned that in my original post. Also, I shouldn't have given the impression that my conclusion is based on minutes played being better. That was just meant to be icing on the cake.

What I *should* have said was that, even though the correlations are higher for the new stats, that doesn't matter to my argument. My argument isn't that the new stats have a low correlation -- it's that they're biased.

Suppose that you have 30 centers. Some centers are better rebounders than others. And some centers "steal" rebounds more than others. In fact, some centers are big stealers, and some centers "negative" stealers, in that they let their teammates get lots of rebounds they could have got instead.

If you rate each center by the actual number of rebounds he takes in, you are going to be biased in almost every case, because of the "stealing" variable.

Suppose the average skill is 11 rebounds per game for a center. You have center A who steals 3 rebounds per game, so he's at 14 every year. You have center B who lets his teammates take 3 rebounds away, but he's really good, so gets an extra 3 from opponents, and he's at 11 every year.

Center B is a lot better than center A, 3 rebounds better. But he comes off looking 3 rebounds *worse*. And that'll happen over an entire career, so long as A and B have the same styles of play their whole careers.

This is NOT random variation, that one year, by luck, A winds up looking better than B. It's a bias in the statistic, that it fails to accurately measure what it claims to be measuring, and that the errors go a specific way for each player.

It's in that specific sense that I argue that the statistics "don't work".

A good baseball analogy would be RBIs. In 1985, one of the best years of Tim Raines' career, he had 41 RBIs. In 1990, one of the worst years of Joe Carter's career, he had 115 RBI.

Obviously, that RBI total, by itself, is very misleading. By any reasonable standard, Raines had a much, much better season than Carter. In "OPS+", one of the most respected and accurate baseball rate statistics, Raines came in at 151, which means his OPS was 51% higher than league average. Carter was at 85, which is 15% below average. It's no contest.

But that wasn't just randomness. Almost every full-time season of Carter's career, he had more RBI than almost every full-time season of Raines' career. Why? Like rebounds, it's a matter of sending opportunities to teammates. Carter batted fourth, where his teammates were able to get on base for him to drive them in. Raines batted leadoff, where a lot of the time he would hit with nobody on base. Carter's manager played Carter to "steal" opportunities from his teammates, while Raines' manager played Raines to have his teammates "steal" opportunities from *him*.

Again like rebounds, more skilled players do get more RBIs, all else being equal. If Carter had been better in 1990, he would have got more than 115 RBIs. And if Raines had been better, he would have got more than 41. But, in this particular case, the difference is opportunities, not skill.

So, what happens if you rate players' effectiveness by RBIs? On the whole, you get a substantial positive correlation between RBIs and wins, just like Lewin and Rosenbaum got a substantial positive correlation between Wins Produced and Plus-Minus, or PER and Plus-Minus. But, in individual cases, you can't draw any conclusions. You never know for sure that you're not making an awful, awful error and rating a Carter ahead of Raines. Or even a medium-size error, which probably happens pretty often.

If you're thinking you can argue that, well, you're measuring other skills than rebounding, so the basketball stats aren't that bad ... well, that's not true. First, other statistics are just as biased (FG% is also heavily teammate-dependent, but with positive instead of negative correlation). And, second, even with other stats, the bias in rebounds will still come up and bite you in the ass, just on a smaller scale. (That's one of the reasons that no sabermetric baseball stats include RBIs in their formulations -- their bias make predictions harder, not easier.)

Indeed, I bet if you rank every NBA player by even the best of the box-score statistics, and then got a bunch of NBA scouts to ranked them based on their own expertise, the scouts would beat the crap out of the stats. That wouldn't happen in baseball, if you used the good sabermetric stats -- I bet the stats would beat the scouts, or at least come close -- but it WOULD happen in baseball if you just used RBIs.

The analogy between sabermetric basketball box-score statistics and RBI's is actually pretty strong. In both cases:

1. When you add up the individual totals, the correlation to team totals is almost perfect.

2. If you're a better player, your individual numbers are better.

3. Year-to-year individual player correlations are fairly high.

4. Individual player correlations to known-good stats (plus-minus in basketball, OPS in baseball) are also fairly high.

5. However, individual numbers depend not just on skill, but on teammates and role within the team.

6. If you move teams, you generally keep your same role, which means the correlation stays high.

7. This means that the statistic is biased for certain types of players, and the bias does not disappear with sample size.

8. Still, if you look casually, players at the top are much better than players at the bottom, which means the statistic looks like it works.

9. But there will be many cases where players with significantly higher totals will actually be worse players than others with significantly lower totals.

In fact, I think this is my new argument in one sentence: "Box score statistics are the RBIs of basketball." They just don't work well enough to properly evaluate players.



Labels: , , , ,

Monday, January 24, 2011

"Scorecasting:" is home field advantage just biased officiating?

There's a new book that's about to come out, that you might have heard about. It's called "Scorecasting," and the publisher was kind enough to send me a review copy. It's basically a Freakonomics for sports, in intent, in tone, in writing style, and even down to its similar authorship -- one academic economist (Tobias Moskowitz, a finance professor) and one journalist (Sports Illustrated's L. Jon Wertheim).

The book's website is here; it has an excerpt from one of the chapters, and you'll find other excerpts online if you search the authors' names.


The topics will be very familiar to sabermetricians and regular readers of this blog. There are chapters on the Hot Hand, on competitive balance, on NBA refereeing, on steroids, and so on. There isn't a huge amount of breakthrough stuff there, although there are certainly a few new insights. Mostly, the authors summarize what they've learned from academic articles on sports, and they add the results of a few little studies they did themselves.

Alas, by concentrating on the journals, they've missed much of the scholarship of us amateurs. For instance, in the chapter of competitive balance, they argue that baseball is less balanced than the football because MLB teams play 162 games, while NFL teams only play 16. That, of course, is only a small part of the story. There are other parts, such as the distribution of talent in the league, and the internal details of the game itself. Tom Tango has effectively solved the problem of comparing different sports (here's just one of his many posts), but the authors seem unaware of that (although they do mention Tango in the book once, in a mention of leverage, referring to him as a "stats whiz.")

And they're occasionally completely off, as when they say the sample size of the MLB playoffs is enough that the best team ought to win the series.

Still, a lot of the material is solid; the authors are at their strongest when they're reviewing one of the more famous and established studies, like the Romer "fourth down" paper, and the Massey/Thaler NFL draft study. (I'll probably do a full review of the book later, but, for now, just picture a sports Freakonomics that's not as rigorous as most of the websites, but does mention a few things that you didn't know before.)

Anyway, I'm going through the book, and suddenly I see that the authors claim to have solved the problem of what causes home field advantage (HFA). That was sort of shocking. My personal subjective view is that HFA is the biggest unsolved problem in sabermetrics, and very little progress has been made. There's so little progress, in fact, that I've started to take seriously a hypothesis that seems way off the wall -- the theory that humans have built in evolutionary programming that makes them more physically and mentally effective when defending their own turf. (I'm not saying that it's necessarily true, just that I have a bizarre attraction to it.) In that light, finding these HFA claims was a bit like picking up a newspaper article on math, and finding that the reporter has proved Fermat's Last Theorem.

So what's the authors' solution to the long-standing HFA conundrum? Refereeing. After dismissing most of the usual suspects (fan enthusiasm, travel, tailoring the team to the park), Moskowitz and Wertheim believe that most, or all, of HFA can be explained by biased officiating.

They list a bunch of supporting evidence, which I'll summarize here. If you want to follow along, some of this stuff is also in a long excerpt from the book that appeared a couple of weeks ago in the Jan. 17 issue of Sports Illustrated (the article, unfortunately, does not appear to be online).

------

Soccer

1. In soccer, the referee controls how much extra "injury time" is added to the end of a match. It turns out that injury time is longer when the home team would benefit. When the home side was ahead by a goal, there were two minutes of injury time, on average, in a sample of Spanish league games. But when they were *behind* by a goal, it was four minutes.

2. In 1998, the point structure changed to give the winning team three standings points instead of two. Immediately, the above injury time bias increased.

3. The same bias exists in England, Italy, Germany, Scotland, and the US.

4. "... home teams receive many fewer red and yellow cards even after controlling for the number of penalties and fouls on both teams."


Baseball

5. In baseball, the authors looked at the percentage of called pitches that are strikes. In crucial situations (high leverage), home teams got a lot more favorable calls. But in low-leverage situations, it was *road* teams that got more favorable calls. "This makes sense," the authors write. "If the umpire is going to show favoritism to the home team, he or she will do it when it is most valuable -- when the outcome of the game is affected the most. You might even contend that it noncrucial situations the umpire might be biased against the home team to maintain an overall appearance of fairness."

6. " ... the success rates of home teams in scoring from second base on a single or scoring from third base on an out -- typically close plays at the plate -- are much higher than they are for their visitors in high-leverage/crucial situations. yet they are no different or even slightly less successful in noncrucial situations."

7. Over a large sample of 5.5 million pitches, "called strikes and balls went the home team's way, *but only* in stadiums without QuesTec ... Not only did umpires not favor the home team when QuesTec was watching them, they actually gave *more* strikes and *fewer* balls to the home team. In short, when umpires knew they were being monitored, home field advantage on balls and strikes didn't simply vanish; the advantage swung all the way to the visiting team."

8. In low-leverage situations, even in non-QuesTec parks, there was no bias at all.

9. The authors then analyzed pitches using Pitch f/x data, to see how many pitches were miscalled based on the recorded location. For pitches on the corner of the strike zone, there were more miscalls in the home team's favor than in the visiting team's favor. The home advantage was largest on full-count pitches, followed by other three-ball counts, other two-strike counts, and, lastly, all other counts. So, the more crucial the pitch, the greater the HFA.

10. "Over the course of the season, all of this adds up to 516 more strikeouts called on away teams, and 195 more walks awarded to home teams than there otherwise should be, thanks to the home plate umpire's bias. And that includes only terminal pitches -- where the next called pitch will result in either a strikeout or a walk. Errant calls given earlier in the pitch count could confer an even greater advantage on the home team."

11. "This adds up to an extra 7.3 runs per season given to each home team by the plate umpire alone. That might not sound significant, but cumulatively, home teams outscore their visitors by only 10.5 runs in a season." [That latter number isn't correct ... in 2010, it was 23.5 runs. 23.5 runs equals 2.35 wins out of 81, which is a .530 winning percentage. (UPDATE: Oops! I forgot to adjust for the home team not batting in the bottom of the ninth when leading. If you adjust for that, the home advantage is a lot bigger than 10.5 or 23.5 runs.)]


Football

12. In the NFL, "Home teams receive fewer penalties than away teams -- about half a penalty less per game -- and are charged with fewer yards per penalty. Of course, this does not necessarily mean officials are biased. But when we looked at more crucial situations in the NFL ... we found that the penalty bias [increases]."

13. When instant replay came to the NFL, the home winning percentage declined from 58.5 percent (1985-98) to 56 percent (1999-2008). "Before instant replay, home teams enjoyed more than an 8 percent edge in turnovers ... When instant replay came along ... the turnover advantage was cut in half." Also, "the home team does not actually fumble or drop the ball less often than the away team ... they simply lose fewer fumbles than away teams. After instant replay was installed, however, the home team advantage of *losing* fewer fumbles miraculously disappeared, whereas the frequency of fumbles remained the same. ... In close games, where referees' decisions may *really* matter ... home teams enjoyed a healthy 12 percent advantage in recovering fumbles. After instant replay was installed, that advantage simply vanished."

14. After instant replay, there was no change in the relative frequency of home and away penalties. That might be because penalties can't be challenged.

15. Away teams have their challenges upheld 37 percent of the time, versus 35 percent for home teams. But when the home team is losing, the visiting team wins 40 percent, versus only 28 percent for the home team. So it looks like the referees favor the home team more when they need it more.


Basketball

16. In the NBA, fouls and turnovers that are not subjective referee calls (like shot clock violations) are equal for home and road teams. But for subjective calls, away teams get between 1 and 1.5 more of those per game. Visiting players are 15 percent more likely to be called for traveling than home players.

17. "How much of the [HFA] in the NBA is due to referee bias? If we attribute the differences in free throw attempts to referee bias, this would account for 0.8 points per game. If we gave credit to the referees for the more ambiguous turnover differences ... this would also capture another quarter of the home team's advantage. Attributing some of the other foul differences to the referees and adding the effects of those fouls (other than free throws) ... brings the total to about three-quarters of the home team's advantage. And, remember, scheduling in the NBA [visiting teams play more back-to-back games than home teams] explained about 21 percent of [HFA]. This adds up to nearly all of the NBA home court advantage."


Hockey

18. In the NHL, home teams get 20 percent fewer penalties and receive fewer minutes per penalty. "On average, home teams get two and a half more minutes of power play opportunities ... than away teams. That is a *huge* advantage." If you multiply that by a 20 percent success rate, you get an extra 0.25 goals per game for the home team. Since the average overall differential is only 0.3 goals for the home team, "this alone accounts for more than 80 percent of the home ice advantage in hockey."

19. There is no apparent HFA in shootouts, where refereeing makes no difference. Also, in NBA foul shooting. And, even in Pitch f/x data. Visiting pitchers throw no worse, according to Pitch f/x, than home teams do. It's only the umpires' calls that are different.

-----

It's an impressive array of evidence and argument. But, at least some of it doesn't hold up.

Look at number 5: in baseball, in low leverage situations (I believe this means the bottom 50%), the authors say that umpires favor the visiting team. That would mean that, in less critical situations, we should find a "visiting field advantage." But home teams outscore visiting teams even in medium-leverage situations. For instance, here's the breakdown of home and road runs scored by inning (1954 to 2007). The last column is the percentage by which the home team outscored the visiting team:

1 61872-52071 +18%
2 46823-42539 +10%
3 53590-48188 +11%
4 53357-49593 +8%
5 53203-48448 +10%
6 54401-50603 +8%
7 52231-48641 +7%
8 50451-47781 +6%

You would think that you'd have more high-leverage events in the later innings -- but the HFA goes *down* in the last few innings, not up.

But I might be wrong about that, maybe the eighth inning has no more high-leverage situations than the first inning (after all, there are more 8-1 games in the eighth than in the first). So, let's look at innings where, at the start, one team was at least four runs ahead of the other. Those should all be low leverage, for the most part, and should show the visiting team having the advantage.

Nope:

2 2543-2139 +19%
3 4583-4176 +10%
4 8817-7801 +13%
5 10940-10057 +9%
6 14371-13279 +8%
7 15698-14583 +8%
8 16935-16180 +5%

Now, this could be just because, in a four-run game, the home teams are a lot better than the visiting teams. What if we look at situations when the *visiting* team is ahead by at least four runs? Then, we should see a huge effect in favor of the visiting team: first, they're probably a much better team, and, second, the low leverage means the umpire should still be favoring them.

But, no. Even in those situations, the home team still performs a little better, on average, having the advantage in five of the seven cases:

2 957-1022 -6%
3 1974-1799 +10%
4 3609-3355 +8%
5 4435-4645 -5%
6 6269-5705 +10%
7 6627-6562 +1%
8 7309-7179 +2%


So, I just don't see it. If umpires DO call more strikes for visiting teams in low-leverage situations, maybe that's compensated for by those pitches actually being strikes ... but being worse pitches in location and movement and velocity. That is, maybe HFA comes from pitchers throwing more accurately, but more hittably.

In any case, if my data are correct, and the authors' data are also correct, it can't be the case that the authors' findings are an explanation of HFA.

------

Now, let's look at number 18, the hockey case. The authors argue that HFA is caused almost entirely by penalties. If that's the case, then you'd expect home and visiting teams to have similar numbers at equal strength.

They do not. The NHL.com website has home/road goal breakdowns. Here they are for the 2008-09 season, averaged by team:

Even strength... 124-110 (home advantage 12.5%)
Power play...... 35-30 (home advantage 15.1%)
Shorthanded..... 4-4 (home advantage 1.0%)

There's almost as large an advantage at even strength as there is on the power play. Admittedly, the extra power play boost is probably caused by more penalties, as the authors say, but the overall contribution of the extra penalties seems to be pretty small.

Just to make sure it wasn't a fluke, I ran the same numbers for 2009-10:

Even strength... 121-106 (home advantage, 13.9%)
Power play...... 30-25 (home advantage, 21.0%)
Shorthanded..... 4-3 (home advantage, 32.9%)

A bit more extreme in favor of power play. But how do you explain the sizeable advantage for home teams at even strength? One possible explanation is that visiting teams have to play an overcautious game, to avoid being penalized by biased referees. But for a 13.9% disadvantage, that caution would have to be way out of line, wouldn't it?


------

Both of these examples -- and, by the way, they're the only two I checked -- cast doubt on the authors' hypothesis that HFA is almost all refereeing. I have never disagreed that *some* of it might be refereeing, but there's obviously a lot more going on.

And I have to say that the authors have indeed provided a blueprint for how this kind of research should go -- try to break down performance into its constituent parts, and check those.

If there's no home advantage in foul shooting, why not? If there's no HFA in hockey shootouts, why not? If we get a list of areas with high HFA, and a list of areas with low HFA, we can maybe start narrowing down what the causes might be.

But the authors have amassed a lot of evidence, and the must be something to at least some of it, no? For instance, I can't think of any explanation for the injury time phenomenon (maybe I should look up the relevant study). And it seems reasonable that referees will call more fouls on visitors, even if they're unbiased. Why? Because they might be using crowd noise as a guide ot what is and what isn't a foul. If the fans scream when a visitor trips an opponent, but not when a home player trips an opponent, that will simply make it more likely that an unbiased referee will have enough evidence to correctly "convict" the visiting player.

But the question is not just whether referee bias exists, but *how much* of it there is, and how much of HFA it's responsible for. The authors of "Scorecasting" seem more focused on "existence" evidence, and it seems to me they've made only a small dent in terms of explaining the real-life observed HFA. I wish the authors had provided more details of some of their findings, so we can figure out what's going on and maybe quantify it a bit more ... but I guess it is what it is.

I know there are a lot of working sabermetricians reading this ... if you have expertise or evidence on any of the authors' points, please weigh in.

-------


UPDATE: I have a full review of the book here.


Labels: ,

Sunday, January 23, 2011

NBA: does taking more shots lead to lower accuracy?

One of the hypotheses about why you can't take a player's FG% at face value, is that good players will be asked to take a lot of worse shots, like desperation FG attempts with the shot clock running out. That will, perhaps misleadingly, lower their percentage.

While I had my 2008-09 data typed in, I'd thought I'd give that a quick test. I'm sure this has been done before, but I figured I'd post it anyway.

I tried to predict a team/position's eFG% using (a) his position; (b) the rest of the team's FG% (averaged by position); and (c) the percentage of his team's FGA he took. There were 30 rows in the regression, one for each 2008-09 NBA team.

The resulting equation:

FG% equals:

+ .181
+ .022 if he's a SG
+ .016 if he's a SF
+ .025 if he's a PF
+ .043 if he's a C
+ .643 * the eFG% of the rest of the team
- .114 * the percentage of the team's FGA he takes.

The hypothesis seems to check out. The more shot attempts, the lower the overall percentage. For instance, suppose a player takes 20% of this team's attempts, and shoots .500. If he took only 19% of his team's attempts, he'd shoot .5114. If he took 21% of his team's attempts, he'd shot .4886.

That's bigger than it looks. A .500 percentage over 20 shots is 10 FG. A .5114 percentage over 19 shots is 9.727 FG. So, the extra shot nets only 0.283 FG. That means that, at the margin, the player shoots only .283 on those extra shots.

Second, note the high correlation between a position and his teammates. For every percentage point the teammates shoot better than average, the individual player will shoot 0.643 points better than average.

Everything was statistically significant at the 1% level, except the -.114, which was significant at only 7.2%. Its standard error was 63 points, so we definitely need at least another year's data before we can say we have a true understanding of the size of the "shoots more, therefore shoots less accurately" effect.

Also, if certain positions are asked to take desperation shots more than others, the regression might benefit from interaction terms. I'll leave that to you guys. My dataset is available on request if you want to play with it a bit.

Looking forward to any comments you basketball guys may have.



Labels: ,

Thursday, January 20, 2011

Sabermetric basketball statistics are too flawed to work

You know all those player evaluation statistics in basketball, like "Wins Produced," "Player Evaluation Rating," and so forth? I don't think they work. I've been thinking about it, and I don't think I trust any of them enough put much faith in their results.

That's the opposite of how I feel about baseball. For baseball, if the sportswriter consensus is that player A is an excellent offensive player, but it turns out his OPS is a mediocre .700, I'm going to trust OPS. But, for basketball, if the sportswriters say a guy's good, but his "Wins Produced" is just average, I might be inclined to trust the sportswriters.

I don't think the stats work well enough to be useful.

I'm willing to be proven wrong. A lot of basketball analysts, all of whom know a lot more about basketball than I do (and many of whom are a lot smarter than I am), will disagree. I know they'll disagree because they do, in fact, use the stats. So, there are probably arguments I haven't considered. Let me know what those are, and let me know if you think my own logic is flawed.

------

The most obvious problem is rebounds, which I've posted about many times (including these posts over the last couple of weeks). The problem is that a large proportion of rebounds are "taken" from teammates, in the sense that if the player credited with the rebound hadn't got it, another teammate would have.

We don't know the exact numbers, but maybe 70% of defensive and 50% of offensive rebounds are taken from a teammates' total.

More importantly, it's not random, and it's not the same for all players. Some rebounders will cover much more of other players' territory than others. So when player X had a huge rebounding total, we don't know whether he's just good at rebounds, whether he's just taking them from teammates, or whether it's some combination of the two.

So, even if we decide to take 70% of every defensive rebound, and assign it to teammates, we don't know that's the right number for the particular team and rebounder. This would lead to potentially large errors in player evaluations.

The bottom line: we know exactly what a rebound is worth for a team, but we don't know which players are responsible, in what proportion, for the team's overall performance.

------

Now, that's just rebounds. If that were all there were, we could just leave that out of the statistic, and go with what we have. But there's a similar problem with shooting accuracy.

I ran the same test for shooting that I ran for rebounds. For the 2008-09 season, I ran regression for each of the five positions. Each row of the regression was a single team for that year, and I checked how each position's shooting (measured by eFG%) affected the average of the other four positions (the simple average, not weighted by attempts).

It turns out that there is a strong positive correlation in shooting percentage among teammates. If one teammate shoots accurately, the rest of the team gets carried along.

Here are the numbers (updated, see end of post):

PG: slope 0.30, correlation 0.63
SG: slope 0.40, correlation 0.62
SF: slope 0.26, correlation 0.27
PF: slope 0.28, correlation 0.27
-C: slope 0.27, correlation 0.43

To read one line off the chart: for every one percentage point increase in shooting percentage by the SF (say, from 47% to 48%), you saw an increase of 0.26% in each of his teammates (say, from 47% to 47.26%).

The coefficients are a lot more important than they look at first glance, because they represent a change in the average of all four teammates. Suppose all five teammates took the same number of shots (which they don't, but never mind right now). That means that when the SF makes one extra field goal, each teammate also makes an extra 0.26, for a team team total of 1.04 extra field goals.

That's a huge effect.

And, it makes sense, if my logic is right (correct me if I'm wrong). Suppose you have a team where everyone has a talent of .450, but then you get a new guy on the team (player X) with a talent of .550. You're going to want him to shoot more often than the other players. For instance, if X and another guy are equally open for a roughly equal shot, you're going to want to give the ball to X. Even if Y is a little more open than X, you'll figure that X will still outshoot Y -- maybe not .550 to .450, but, in this situation, maybe .500 to .450. So X gets the ball more often.

But, then, the defense will concentrate a little more on X, and a little less on the .450 guys. That means X might see his percentage drop from .550 to .500, say. But the extra attention to X creates more open shots for the .450 guys, and they improve to (say) .480 each.

Most of the new statistics simply treat FG% as if it's solely the achievement of the player taking the shot, when, it seems, it is very significantly influenced by his teammates.

------

Some of that, of course, might be that teams with good players tend to have other good players; that is, it's all correlation, and not causation. But there's evidence that's not the case, as illustrated by a recent debate on the value of Carmelo Anthony.

Last week, Nate Silver showed that if you looked at Carmelo Anthony's teammates' performance, and then looked at that performance when Anthony wasn't on their team, you see a difference of .038 in shooting percentage. That's huge -- about 15 wins a season.

Dave Berri responded with three criticisms. First, that Silver weighted by player instead of by game; second, that Silver hadn't considered the age of the teammates (since very young players improve anyway as they get older); and, third, that if you control for age and a bunch of other things, the results aren't statistically significant from zero. (However, Berri didn't post the full regression results, and did not claim that his estimate was different from .038.)

Finally, over at Basketball Prospectus, Kevin Pelton ran a similar analysis, but within games instead of between seasons (which eliminates the age problem, and a bunch of other possible confounding variables). He found a difference of .028. Not quite as high as Silver, but still pretty impressive. Furthermore, a similar analysis of all of Anthony's career shows similar improvements in team performance, which suggests the effect is real.

To be clear, this kind of analysis is the kind that, I'd argue, works great -- comparing the team's performance with the player and without him. What I think *doesn't* work is just using the raw shooting percentages. Because how do you know what those percentages mean? Suppose one team is all at .460, and another team is all at .490. The .490 means that you have more players on the team above average than below average. But, the above average players are lifting the percentages of the below average players, and the below-average players are reducing the percentages of the above-average players. But which are which? We have no way of telling.

Here's a hockey example. Of Luc Robitaille's eight highest-scoring NHL seasons, six of them came while he was a teammate of Wayne Gretzky. In 1990-91, Robitaille finished with 101 points. How much of the credit for those points do you give to Robitaille, and how much of the credit do you give to Gretzky? There's no way to tell from the single season raw totals, is there? You have to know something about Robitaille, and Gretzky, and the rest of their careers, before you can give a decent estimate. And your estimate will be that Gretzky that should get some of the credit for some of Robitaille's performance.

Similarly, when Carmelo Anthony increases all his teammates' shooting percentages by 30 points, *and it's the teammates that get most of that credit* ... that's a serious problem with the stat, isn't it?

------

So far, we've only found problems with two components of player performance -- rebounds and shooting percentage. However, those are the two biggest factors that go into a player's evaluation. And, additionally, you could argue that the same thing applies to some of the other stats.

For instance, blocked shots: those are primarily a function of opportunity, aren't they? Some players take a lot more shots than others, so the guy who defends against Allen Iverson is going to block a lot more shots than his teammates, all else being equal.

------

Still, it could be possible that the problems aren't that big, and that, while the new statistics aren't perfect, they're still better than existing statistics. That's quite reasonable. However, I think that, given the obvious problems, the burden of proof shifts to those who maintain the stats still work.

The one piece of evidence that I know of, with regard to that issue, is the famous study from David Lewin and Dan Rosenbaum. It's called "The Pot Calling the Kettle Black – Are NBA Statistical Models More Irrational than 'Irrational' Decision Makers?" (I wrote about it here; you can find it online here; and you can read a David Berri critique of it here.)

What Lewin and Rosenbaum did was try to predict how teams would perform last year, based on their previous year's statistics. If the new sabermetric statistics were better evaluators of talent than, say, just points per game, they should predict better.

They didn't. Here are the authors' correlations:

0.823 -- Minutes per game
0.817 -- Points per game
0.820 -- NBA Efficiency
0.805 -- Player Efficiency Rating
0.803 -- Wins Produced
0.829 -- Alternate Win Score

As you can see, "minutes per game" -- which is probably the closest representation you can get to what the coach thinks of a player's skill -- was the second highest of all the measures. And the new stats were nothing special, although "Alternate Win Score" did come out on top. Notably, even "points per game," widely derided by most analysts, finished better than PER and Berri's "Wins Produced."

When this study came out, I thought part of the problem was that the new statistics don't measure defense, but "minutes per game" does, in a roundabout way (good defensive players will be given more minutes by their coach). I still think that. But, now, I think part of the problem is that the new statistics don't properly measure offense, either. They just aren't able to do a good job of judging how much of the team's offensive performance to allocate to the individual players.

Now that I think I understand why Lewin and Rosenbaum got the results they did, I have come to agree with their conclusions. Correct me if I'm wrong, but logic and evidence seem to say that sabermetric basketball statistics simply do not work very well for players.

-----

UPDATE: some commenters in the blogosphere are assuming that I mean that basketball sabermetric research can't work for basketball. That's not what I mean. I'm referring here only to the "formula" type stats.

I think the "plus-minus"-type approaches, like those in the Carmelo Anthony section of the post above, are quite valid, if you have a big enough sample to be meaningful.

But, just picking up a box score or looking up standard player stats online, and trying from that which players are how much better than others (the approach that "Wins Produced" and other stats take) ... well, I don't think you're ever going to be able to make that work.


UPDATE: I found a slight problem with the data: one team was missing and one team I entered twice. I've updated the post. The conclusions don't change.

For the record, the wrong slopes were .30/.39/.31/.25/.24. The corrected slopes, as above, are .30/.40/.26/.28/.27.

The wrong correlations were .59/.58/.37/.26/.40. The corrected correlations are .63/.62/.27/.27/.43
.






Labels: , , , ,

Wednesday, January 19, 2011

Rebounding results were not caused by position leakage

On the subject of negative correlations between rebounders, one thing occurs to me. It's possible that much of the effect could be accounted for by certain errors in the data. Specifically, if sometimes players are classified in the wrong position, that will cause a negative correlation even if there wouldn't otherwise be one.

Why? Well, suppose there's no correlation between sales of watercolor paints and eye shadow. But, they look the same, and, so, sometimes eyeshadow sales are carelessly recorded as watercolor sales. Since the error takes from one column and adds to the other column *at the same time*, then one negative will be associated with one positive, and a negative correlation results.

For a more extreme example, suppose that the recorder on odd numbered days always records eyeshadow as watercolor, and the recorder on even number days always records watercolor as eyeshadow. Then, the correlation will be very close to minus 1. That's because, if eyeshadow is positive, watercolor is always zero. And if eyeshadow is zero, watercolor is always positive.

----

Let me do a basketball example. Suppose that in a certain league, there are four teams, and the PFs and SFs have rebounding percentages as follows:

12 8
12 10
14 8
14 10

If you run a regression on those for rows, you will find there's no relationship at all -- your correlation coefficient will be exactly zero.

But, now, suppose that next year, everything is the same, except that the PF and SF often play each other's position. In fact, they play each other's position 1/4 of the time. The PF now gets a "12" in three-quarters of the games and an "8" in the remaining one quarter, for an average of 11. The SF gets an "8" three quarters of the time, and a "12" the rest, for an average of 9.

Now, suppose the dataset isn't smart enough to know to create a blended PF and SF from the two players' stats, because it doesn't know exactly how many minutes or possessions each played at each position. It should be 3/4 and 1/4, but the data compiler doesn't know that. So he shrugs and says, let's just call the main SF guy an SF, and call the main PF guy a PF.

So now the data for the four teams looks like this:

11 9
12 10
14 8
14 10

Now, we get a negative relationship. In this example, it's mild: every additional rebound by a SF is linked to a .27 decline in rebounds by the PF. But that's because our example is pretty mild: we have only 1/16 blending (25% blending in 25% of the teams). If we add more blending, we get more negative relationship. For instance, if I were to add 50% blending for another team -- changing the "12 10" to a "11 11" -- now the "diminishing returns" go from .27 to .6.

-----

Now, I'm not saying this is actually happening. I don't know much about the process by which 82games.com or DSMok1 create their datasets (which I used in the two previous posts). If they are, indeed, classifying a player's position every minute or every possession, then there shouldn't be a problem.

But if there is a problem created by misclassification, does that mean that statistics like Dave Berri's might be right after all? Well, those statistics need to use correct position data too. If they don't, they suffer from the same problem, just hidden more.

Look again at the breakdown for rebounding percentage by position:

15.0% Center
13.8% Power Forward
8.9% SF
8.9% SG
5.9% Point Guard

Players are evaluated relative to their position. So if you misclassify an PG as an SG for a game's worth of playing time, you've given him three extra rebounds that he doesn't "deserve". If you misclassify a PF as an SF, you give him almost five extra rebounds a game. This is a really almost the same as my "positioning" or "scheme" argument. In the previous case, I say that perhaps some SFs take some of a PF's rebounding opportunities. In this case, I say a SF might be actually playing the PF's actual position, taking *all* of the PFs opportunities (but relinquishing his).

So, it depends on the data.

-----

How can we check? Well, one way is to compare the results for the "position" breakdown to another breakdown that we know is always correct. As I was writing this, I discovered that Eli Witus did that almost three years ago. Instead of classifying by position, he classified by height -- each player got a number indicating where he ranked on his team from the shortest on the court all the time (1.0) to the tallest (5.0). Obviously, a player's rank would vary depending on who was on the court with him, so his final number would include a fraction. For instance, a "4.5" might mean he was the tallest player half the time, and the second-tallest player half the time.

Witus then broke the players into five approximately equal-sized groups, and checked for diminishing returns on rebounds among the groups. However, instead of running a regression on that group against the other four groups, he ran a regression on that group against the entire team (including that group). (That makes most of the correlations positive instead of negative.) In order that we have a comparison, I ran the same regressions for the half-season 2010-11 data provided to me by DSMok1.

Here are the comparisons. The numbers are the regression coefficients with standard error in parentheses (so, for instance, " + 0.27 (0.12)" means that one extra rebound for that position resulted in 0.27 extra rebounds for the team, plus or minus 0.12.)

First, defensive rebounds:

PG: +0.21 (0.35). Height Group 1: +0.27 (0.12).
SG: +0.08 (0.21). Height Group 2: +0.10 (0.11).
SF: +0.48 (0.20). Height Group 3: -0.01 (0.17).
PF: +0.06 (0.13). Height Group 4: +0.12 (0.07).
-C: +0.23 (0.14). Height Group 5: +0.02 (0.07).

From this, we can see that the numbers are different, but not statistically significantly different in any of the five cases. Moreover, the comparisons are evenly split 3-2 over which estimate is higher.

Here are offensive rebounds:


PG: -0.18 (.57). Height Group 1: 0.56 (.49).
SG: -0.06 (.46). Height Group 2: 0.10 (.24).
SF: +0.85 (.39). Height Group 3: 0.56 (.17).
PF: +0.87 (.16). Height Group 4: 0.47 (.13).
-C: +0.62 (.27). Height Group 5: 0.49 (.15).

Again, the numbers are pretty similar, except for PG. But the standard errors in the PG case are so large, that the difference is still less than one SE away from zero.

So, we can conclude: misclassification by position could certainly cause the negative correlations we saw. However, that was probably not a major cause, because, when we use height as a control group, we get approximately the same results. And we know that there is little error in classifying players by height.

However, two caveats: "We get approximately the same results" is just a gut reaction to the two sets of numbers. And, second, Witus' groups are not perfect -- there is still a bit of error. Any given set of five players could include two from the first group, so there is still at least a little bit of misclassification.

But still, I think these results are close enough. We wanted to make sure what we saw for rebounding was real, and not just caused by errors in the data. The evidence suggests that was indeed the case.

Executive Summary: False alarm. Sorry to have bothered you.



Labels: , ,

Thursday, January 13, 2011

2010-11 NBA rebounding correlations

My last post showed that, in the NBA, players generally "steal" 2/3 of their rebounds away from their teammates. That is, when a player grabs a rebound, there's almost a 70% chance that someone else on their own team would have got it if they hadn't.

That study was based on the 2008-09 season, and I didn't have the data broken down into offensive rebounds and defensive rebounds. In addition, I used raw numbers of rebounds, which means my analysis would underestimate the amount of stealing (for reasons given in the "gravity" argument in the previous post).

But, now, commenter DSMok1 solved both problems. He's provided data for the current 2010-11 season (up to last night, I assume?), broken down, and his numbers are in rebound percentage, rather than raw numbers.

And the results are even stronger than before.

For defensive rebounds:

-- PG: every extra DREB% reduces his teammates' DREB% by 0.79.
-- SG: every extra DREB% reduces his teammates' DREB% by 0.92.
-- SF: every extra DREB% reduces his teammates' DREB% by 0.52.
-- PF: every extra DREB% reduces his teammates' DREB% by 0.94.
--- C: every extra DREB% reduces his teammates' DREB% by 0.77.

It looks like the "stealing" rate is about 0.8, on average, for defensive rebounds.

Having said that, however, I should say that the standard errors of these estimates are pretty high. Here are those estimates again, along with the SE:

PG -- 0.79 +/- 0.35
SG -- 0.92 +/- 0.21
SF -- 0.52 +/- 0.20
PF -- 0.94 +/- 0.13
C --- 0.77 +/- 0.14

Here's the same chart for offensive rebounds. This time, I'll put the SEs in brackets at the end.

-- PG: every extra OREB% reduces his teammates' OREB% by 1.18 (+/- 0.55).
-- SG: every extra OREB% reduces his teammates' OREB% by 1.06 (+/- 0.46).
-- SF: every extra OREB% reduces his teammates' OREB% by 0.15 (+/- 0.40).
-- PF: every extra OREB% reduces his teammates' OREB% by 0.13 (+/- 0.16).
--- C: every extra OREB% reduces his teammates' OREB% by 0.38 (+/- 0.27).

A lot different. As expert basketball sabermetricians have said, diminishing returns are at a much lower level for offensive rebounds. The PG and SG numbers are probably enhanced by random errors -- it would be hard to come up with an explanation of why a shooting guard's OREB would cost his teammates *more than one* OREB in exchange. [UPDATE: not true! See comments. I now agree a coefficient more extreme than -1 could actually be correct.]

Even though you could argue that the bottom three estimates are not statistically significantly different from zero (they're all less than 2 SDs), I think our best guess is still the point estimates we have here (since there's no prior reason to believe the diminishing returns rate should be zero).

Overall, other analysts have suggested an average of somewhere between .2 and .3, which seems very reasonable looking at the above table.


Finally, overall rebounds:

-- PG: every extra REB% reduces his teammates' REB% by 0.95 (+/- 0.15).
-- SG: every extra REB% reduces his teammates' REB% by 1.05 (+/- 0.21).
-- SF: every extra REB% reduces his teammates' REB% by 0.99 (+/- 0.26).
-- PF: every extra REB% reduces his teammates' REB% by 0.75 (+/- 0.12).
--- C: every extra REB% reduces his teammates' REB% by 0.71 (+/- 0.14).

All five of these seem to be about the average of OREB and DREB, except for the SF. Not sure why the SF is so different.

------

So, I think this roughly confirms what certain basketball analysts are doing: assigning only about 0.3 of a defensive rebound to the player, and about 0.7 of an offensive rebound. [UPDATE: Guy makes an excellent argument for why my estimate of 0.3 could be too low. See comments.]

Thanks again to DSMok1 for the data.

------

P.S. one other finding: overall, there is a negative correlation between a team's *overall* OREB% and its overall DREB%.

For every one percentage point extra on a team's OREB%, its DREB goes down by 0.3 percentage points. It's not statistically significant -- about 1.5 SDs -- but I thought I'd mention it anyway.


Labels: , , ,

Wednesday, January 12, 2011

Do players "steal" rebounding opportunities from teammates?

In my previous post on rebounding, I promised to review some of the evidence supporting the "diminishing returns" hypothesis. That's the theory that, in general, a player's individual rebound totals are not mostly a product of his own ability to snare rebounds, but, rather, predominantly because of his positioning on the court, or the role he is assigned on his team.

That is: if a player gets a lot of rebounds, even adjusted for what position he plays, a large part of his total is rebounds that other players would have gotten to regardless.

Actually, instead of reviewing the evidence, I thought I'd just create my own. However, my numbers here follow on arguments made by others, in other places. (Here, for instance, is a comment by Guy listing four reasons why we should believe that "diminishing returns" is a real phenomenon.)

The easiest test, perhaps, is simply whether, when a player gets more rebounds, it means that his teammates get fewer. If they do, that's evidence that player is simply grabbing balls his teammates would have gotten anyway. If not, that's support for David Berri's hypothesis, that differences in rebounds are the result of the talent of the particular player credited with the rebound.

So, here's what I did. For every team in the 2008-2009 season, I ran a regression comparing the total rebounds by the team's centers (combined) to the total for the rest of the team (combined). I typed in the data manually from some team pages at the 82games.com website, because that's where I happened to find it broken down by position. (Strangely, basketball-reference.com doesn't do that.)

It's only one season's worth of data, but, still, the results are striking.

Every extra rebound by the center results in 0.61 fewer rebounds by teammates.

That means that only 39 percent of center rebounds are "real", in the sense that the other team would have got them if not for the center's efforts. If a center is (say) 20 rebounds above average for the season, then, on average, you should estimate that 12 of those rebounds would have been grabbed by a teammate anyway.


(For those of you who (unlike me) prefer correlation coefficients, the r was -.63, and the r-squared was .395.)

There's a possible alternative explanation for this: it might be that teams just don't want to acquire too much rebounding talent. So, if they get a center who's good at rebounding, they'll get below-average rebounders at the other positions. Still, even that possibility supports the "diminishing returns" hypothesis. Because, why else would teams limit their rebounders? Baseball teams don't limit their home run hitters ... if baseball teams try to cap the amount of rebounding talent on their team, it must be because they realize that the additional rebounders would be somewhat wasted.

And, in any event, I suspect the effect is much, much too strong to be the result of general manager choice. Here are some other ways of looking at the strength of this finding:

1. Let's convert to a won-loss record.

The center is either above or below average for the league. And the rest of the positions, combined, are either above or below average for the league. Suppose you call it a "win" (for the David Berri theory) when both tendencies are the same (both are above or below average). And suppose you call it a "loss" when the tendencies are different (centers above average and the rest of the team below average, or vice versa).

In that case, the Berri hypothesis goes 7-23.

In extreme cases, it does even worse. Breaking the 7-23 record down into how far centers were from the average:

4-9: Teams' centers within 1 Reb/G of average
3-7: Teams' centers between 1 and 2 Reb/G from average
0-7: Teams' centers more than 2 Reb/G from average

This exactly supports the hypothesis. The farther the center is above (below) average, the less likely his teammates are to also be above (below) average.

2. As others have pointed out before, teams are surprisingly close to each other in total rebounds.

So close, in fact, that it leads to this shocking result:

The variance in team rebounds per game is LESS than the variance in rebounds per game by the center alone!

Think about that, how unusual that is. It doesn't work for many other player skill stats in the world of team sports, does it?

Suppose I ask you to predict how many home runs a random full-time player hit last year. You might guess, say, 18. You'd be off by 18 if the guy hits zero, or you may be off by 36 if the guy turns out to be Jose Bautista. But 36 is your worst-case scenario.

Now, suppose I ask you to predict how many home runs a random TEAM will hit next year. You guess 154, which was the average in 2010. Now, you're in worse shape. The Blue Jays hit 257, and the Mariners hit only 101. The worst case is now 103 for a team, almost triple the worst case of 36 for a player. Instead of -18 to 36, the range of error is now -56 to 103. The variance, and the margin of error, is much higher for a team.

But for NBA rebounds, it's the other way around -- it's actually easier to predict rebounds for a random team than for a random team's centers!

-- For centers, the SD was 1.63, and the range from worst to best was 7.4 (10.3 to 17.7).
-- For teams, the SD was only 1.43. And the range was only 4.0 (38.8 to 43.8).

This piece of evidence is so surprising, and so strong, that it should be enough, by itself, to convince you that something's going on. The SD of rebounds by centers is higher than the SD of rebounds by their teams. That's it in one sentence.


3. There are five positions on a basketball team. In terms of rebounds, every one of those five positions had a negative correlation with the rest of their team. That's 5-0 for the diminishing returns hypothesis.

There are also 10 different pairs of two positions. If you run the correlation between rebounds for those 10 pairs, eight of them come out negative. So that's 8-2.

(The two positive ones were (1) shooting guard vs. small forward (r = .312; every additional rebound by the small forward leads to .31 more rebounds by the shooting guard), and (2) shooting guard vs. power forward (r = .024, every additional rebound by the power forward leads to .01 more rebounds by the shooting guard).


4. There was very, very interesting result for point guard vs. the rest of the team.

Every additional rebound taken by the point guard reduces the rest of the team's rebound total by ... almost exactly one rebound: .96, to be exact.

Taken at face value, that means that 24 out of 25 times, when a point guard gets a rebound, it's because the rest of the team deliberately leaves it for him!

DSMok1 posted this comment last week:

"For a significant number of defensive rebounds, there are multiple defensive players present for the rebound (could get the rebound), while the offense has already cleared out to cut off the fast break. These rebounds do not show value or skill to the player who gets them, but are rather a random/confounding variable. For some teams, their center will grab such "garbage" rebounds. For other teams, maybe the PG will grab them himself (I see OKC and Russel Westbrook do this)."


That's consistent with the data. If the PG only gets the rebound when the entire offense has cleared out, it means that there can't be any value added.

I don't know if this is true or not -- I really don't watch a lot of basketball -- but I find it interesting that the regression and DSMok1 are saying exactly the same thing, only several days apart. Admittedly, it could be just random error ... you'd want to check this for other years to make sure.


5. As I said, every position had a negative correlation with the other positions on the team. Here they are. (UPDATE, 1/23: I realized I accidentally left out one team, and entered one team twice. Corrections have been made below.)

-- PG: every extra rebound reduces his teammates' rebounds by 0.96 0.87.
-- SG: every extra rebound reduces his teammates' rebounds by 0.65 0.64.
-- SF: every extra rebound reduces his teammates' rebounds by 0.73 0.73.
-- PF: every extra rebound reduces his teammates' rebounds by 0.63 0.68.
--- C: every extra rebound reduces his teammates' rebounds by 0.65 0.69.

------

So, there you go: about 2/3 of marginal rebounds are taken away from a teammate.

Actually, it's worse than that! I'm not sure how much worse, but it's worse.

Why? Because the number of rebounds is mostly a function of your opponent's missed shots. The more missed shots, the more defensive rebounds are available. So your defensive rebounds depend, in part, on how good your team defense is.

If you have a good defense, and you get more rebounds than expected, you'd expect all players' rebounds should go up above average together. Same, in reverse, if you have a bad defense.

Again, the same is true for offensive rebounds. The worse your shooting, the more defensive rebounds come available, and vice-versa. But you'd expect all five of your players to get more rebounds, to some extent. So, again, rebounding moves together.

And, finally, there's another factor that should cause a certain amount of positive correlation -- pace. Teams that play faster or slower will see their rebound totals rise or fall in unison.

What all that means is that if the "diminishing returns" factor were zero, you'd have a *positive* correlation between teammates' rebounds. The fact that the actual correlation is negative, means that it has to be more negative than it looks -- it has to first overcome the positive correlation caused by team defense and shooting.

Does that make sense?

Look at it this way. The defense, shooting, and pace correlations are like gravity, pushing rebounds towards a positive relationship. For the "stealing rebounds" factor to turn that negative *against gravity* means that the negative is a bigger factor than it looked.

-----

Since I have no idea how to compensate for the "gravity" effect, let's ignore it for now, and stick with the 2/3 estimate. Assume that a typical player rebound takes 2/3 of a rebound away from a teammate, which means that only 1/3 of all individual rebounds are really individual. What does that mean for player evaluation?

The first instinct is to take every rebound, give 1/3 of the credit to the player who grabbed it, and spread the remaining 2/3 among the rest of the players.

Would that work? Well, it's better than giving the entire rebound to one player. But it still has a big problem.

And that problem is: the 2/3 figure is not random, nor is it evenly distributed. It varies by player and team.

There are some centers who deliberately let their teammates have more rebounds. There are others who deliberately take rebounds away from their teammates (not necessarily due to selfishness, by the way -- it could be the coach's strategy). If you just give centers 1/3 of their marginal rebounds, some centers will be regularly overestimated, and some will be regularly underestimated. Because it's not random, it won't even out over a career.

In his "Win Shares" baseball book, Bill James talked about assists by first basemen. When fielding a ground ball, some 1Bs would almost always take it themselves (Steve Garvey), and some would always toss it to the pitcher (Bill Buckner). As a result, Buckner would always wind up with more assists than Garvey. But it doesn't mean he was a better fielder.

Suppose that exactly half of first basemen tossed to the pitcher, and half of them didn't. If you just regress all of them by 50%, you may be closer, but you're still wrong. You want to regress Buckner 100%, and Garvey 0%, in order to get an accurate rating.

The same applies here. When you have two players who are above average in rebounding, how do you know which one is above average because he's really, really good at getting to the ball before the opposition, and which one is above average because he's just taking the easy ones away from his teammate?

One suggestion is, don't even try -- just give all the credit for rebounds to the defense. The problem with that, of course, is that the legitimately great rebounders no longer get credit.

Another suggestion is: use subjective evaluations. What do NBA observers think about who the best rebounders are? Combine that information with the numbers, and try to work out "custom" estimates for each player that still add up to the team's overall performance. The problem with that is that even the "experts" are often wrong about these things, both because they're not perfect, and because they almost certainly let the raw numbers influence their evaluations.

Still, there's probably *some* value there ... everyone was able to tell that Brooks Robinson and Ozzie Smith were great baseball fielders, even without statistics. So it should also be possible to tell, just from observation, who the best rebounders are. But, then, there's also the case of Derek Jeter, who won a lot of Gold Glove awards despite the objective statistics pointing to him as among the worst-fielding shortstops in baseball. So, you have to be careful.

-----

I don't have a good solution here. Obviously, it would be best if there were a way to figure the best rebounders objectively, using the evidence of the statistical record. It would be great if we could just take David Berri's stat, and adjust it a little, and wind up with the right answer.

But I don't see how it's possible, given the limitations of the data, to tell the talented from the "stealers."


In that light, we should maybe just admit our ignorance, for now, and treat rebounding as something that has to be evaluated subjectively. Lump it in with defense, as a team measure, while keeping in mind that individual rebounding skill does exist and needs to be considered for adjusting individual players when appropriate.

It's not a perfect solution, but it's obviously much, much better than giving the entire statistical credit to one player.

---

UPDATE: Eli Witus' excellent 2008 post (and others) point out that diminishing returns are much lower for offensive rebounds than for defensive. Does anyone know of a good source for ORB and DRB breakdowns by team by position, so I can rerun the analysis for them separately?



Labels: , , ,

Sunday, January 09, 2011

Landsburg's brain teaser, translated into baseball

Since Steven E. Landsburg posted his brain teaser a couple of weeks ago, many commenters have argued that the problem was not well described. So I thought I'd try rephrasing it in a baseball context, which might make it easier to understand. I'll try to include all the picky details. Here we go:

----

In a certain baseball league, every rookie is in the starting lineup on opening day.

Every rookie is an expected .500 hitter in all circumstances, and has an independent 50-50 chance of getting a hit in any AB.


Every rookie plays every inning of every game until he has a hitless at-bat, at which point he is immediately replaced by a veteran and never plays again.


What is the expected value of the overall composite rookie batting average at the end of the season?


----

[Details: 1. "overall rookie batting average" is total hits divided by total AB (so if there are two rookies in the league, a 1-for-2 and a 3-for-4, the composite batting average is .667 (4-for-6)).

2. "Expected value" is in the normal statistical sense. One way to explain "expected value": suppose someone offers a bet, where he promises to pay you a dollar multiplied by the composite batting average (so that if, like in the example above, the composite batting average winds up at .666666... , you get 0.66666... dollars). What is a fair price to pay for the bet, in dollars, so that neither party is expected to make or lose money?

3. You may assume, if you like, that no rookie will go the entire season without making an out.]


----

If I've done it right, this question should be completely isomorphic to the intent of the original. The answer and discussions at Landsburg's blog can be found here. My take was here.


Labels: , , ,

Wednesday, January 05, 2011

David Berri's FAQ and rebounding

A few years ago, "The Wages of Wins" introduced a basketball rating statistic called "Wins Produced" (WP). Since the book came out, there's been some debate about whether WP has a problem with rebounds when evaluating players. I have argued that it does; two of my posts on it are here and here, but you can probably find others elsewhere.

Dave Berri has recently updated a FAQ that tries to take on us doubters. I'll get to that, but first I guess I should summarize the disagreement, since it's been a few years.

-----

WP values a rebound at +1 point. That's because a rebound takes a possession that would go to the other team, and effectively eliminates it. Since a possession is worth about one point, on average, so is the rebound.

There's no argument about that. The argument is about *who should get credit* for the +1 point. "The Wages of Wins" (or, more specifically, Berri, who is co-author and main blogger and spokesperson for the book), gives the entire +1 to the player who snagged the ball. Others argue that this is wrong.

The opposing argument goes something like this:

When a shot is missed, the ball is likelier to go some places than others. Whoever is at that position is more likely to be in a position to pick up the rebound. When you award the entire value of the rebound to that player, you are mostly rewarding him for being in that spot. As Guy pointed out in a comment way back, it's like putouts in baseball. Getting an out is quite valuable, and many putouts are made at first base. But that doesn't mean that your 1B is five times more valuable than your CF just because he makes five times the putouts. His high total is because of where he plays.

It's obvious that this is also true in basketball. Here is one sample of overall rebounding percentage based on position played:

15.0% Center
13.8% Power Forward
8.9% SF/SG
5.9% Point Guard

Obviously, it's not the case that centers are 250% as good at rebounding as point guards -- it's just that because of the way offenses and defenses work, they happen to be in position for a rebound much more often. That's why Berri adjusts WP scores for position. Otherwise, the numbers wouldn't make sense, and point guards as a group would look like they're horrible basketball players.

A slight variation on this is "diminishing returns". This is an argument that, when you have one player who snags a lot of rebounds, it's not just that he's good at rebounds -- it's that he's being given more opportunities that would otherwise go to his teammates. Perhaps he's also going into other players' "territory" to get them. Or, perhaps, the team has assigned him the role of primary defensive rebounder, reducing other players' rebounding responsibilities to allow them to better transition to offense.

If that's the case, a player shouldn't necessarily get credit for every extra rebound, because, if he didn't get it, one of his teammates would have. That is, there are diminishing opportunities available to the other four players on the team.

So, while there's no question that a rebound is worth +1 to the team, it certainly doesn't seem that the full value should be credited to the skill of the individual player.

Berri, however, is not as convinced. He acknowledges that there is *some* diminishing returns happening, but he still gives the entire +1 to the player who picked up the rebound.

------

OK, now to Berri's FAQ. He makes four separate arguments for why his WP stat doesn't overvalue or misappropriate credit for rebounds. All four of those arguments, I think, are easily rebutted.

I'll go through them one at a time, using Berri's own numbering and titles. Keep in mind that I am *not* trying to provide evidence here for the other side of the debate -- I'm just trying to show why Berri's arguments do not prove his position.


Response #1 -- The Consistency of Rebounds

Rebounds per minute, for individual players, are consistent from year to year. Berri reports a correlation coefficient of over .9, and 0.83 even after adjusting for position played. This is higher than similar correlations in other sports. For instance (examples are Berri's):

0.65 -- Baseball OPS
0.47 -- Baseball batting average
0.37 -- Baseball ERA
0.36 -- NFL rushing yards per attempt (for running backs)
0.24 -- NHL goalie save percentage
0.07 -- NFL QB interceptions per attempt

First, you can't really compare the numbers that way. The actual correlations depend on all kinds of things other than skill -- mostly, the number of opportunities and the variance of the circumstances in which those opportunities happen. The fact that one correlation coefficient is higher than another doesn't necessarily mean that the underlying cause is more consistent.

But, suppose we let that go, and assume, with Berri, that rebounding is more consistent than (say) batting average.

So what? Even if you show that rebounding is consistent, that doesn't prove that rebounding is a skill. To go back to Guy's analogy, the consistency of putouts in baseball would be just as high: Albert Pujols had a lot of putouts in 2009 and 2010, and Alex Rodriguez had a lot fewer putouts in both 2009 and 2010. That doesn't prove that Pujols is a much better "putouter" than A-Rod ... it just proves that Pujols plays first base and Rodriguez plays third base.

To that, you could argue that the analogy isn't perfect, because Berri did adjust by position. Still, there are other reasons you could get a high baseball correlation, other than skill. Maybe some 3B play for teams with pitchers who give up a lot of ground balls, and others don't. That would create a higher correlation, while having nothing to do with skill. Maybe some LF play for teams with lots of RH pitching, so they get fewer fly balls hit to them. And so on.

Or, consider saves. There is a very high correlation between saves one year and saves the next year. That doesn't mean that David Aardsma has a talent for saves, but Felix Hernandez doesn't. It just means that, even though they play the same position on paper -- pitcher -- they are used in very different ways. In this case, the consistency isn't of talent, but of managerial decision-making. (I previously expanded on this thought here.)

So, when Berri says,

"When we look at rebounds, we see a higher correlation than all of these [other sports'] statistics. This leads one to conclude that rebounding is a skill that is primarily about the player credited with the rebound."


... it's obvious that doesn't follow. A high degree of consistency in rebounding rate could mean a consistency of talent, or it could mean a consistency of covering more of the other players' territory.

Consistency just means you're measuring something real. It doesn't mean that the "something real" is necessarily talent.


Response #2 -- Rebounds Are Not the Same For All Teams

Berri writes,

"If a player's rebounds are all "stolen" from his teammates, then teams would have to be getting the same number of rebounds. So do all teams end up with the same number of rebounds?"


As written, this is an egregious straw man. Nobody is saying that rebounds are *all* "stolen" from teammates -- just enough to make the raw statistic unreliable. And nobody is saying teams are exactly the same -- we're saying that teams show more similarity than you'd expect by just adding up the individuals. But I'll assume that Berri knows that, and is just exaggerating for effect.

To show how rebounds differ highly across teams, Berri goes on to compare various statistics by "coefficient of variation" (the SD divided by the mean). Again, as I have written before, that number is not meaningful in the way Berri thinks it is.

For offensive rebounding percentage, Berri gets a figure of .106, which is probably something like .027/.265. The .027 is the SD of OR%, and the .265 is the overall average.

But, what if you changed "offensive rebounding percentage" to "offensive rebounding missed percentage"? That is, suppose you start counting missed rebounds instead of made rebounds. In that case, the SD stays the same, but the mean reverses, from .265 to .735 (26.5% made is 73.5% missed). Now, you now get a "coefficient of variation" of .027./.735, which is .036. That now almost exactly matches the other stats Berri cites (which range from .035 to .043). Still, that doesn't matter, because, as just a raw number, "coefficient of variation" has little do to with the subject at hand.

Intuitively, it may *look* like it does, at least to Berri. But it doesn't.

More generally, I don't understand Berri's argument that the more variation there is among teams, the more skill there is in the statistic. There's a lot more variation in sacrifice bunts than there is in batting average, isn't there? But bunting numbers vary mostly because of managerial decisions, not because of talent. The same is true for intentional walks by pitchers. And, to a lesser extent, it's also true for stolen bases.


Response #3 -- Do We Overvalue Rebounds?

Berri makes an argument that goes something like this: suppose rebounds were overvalued, the way his opponents think they are. Then, if we credit a player for only half the rebounds he makes (and spread the other half around to his teammates), that should change things a lot. But, when you look at the top 20 players in the league, the ranking doesn't change that much. (Chart in FAQ, or alone here.) And the new and old statistic correlate with each other at 0.95.

To which the response is:

First: It DOES make a significant difference in the rankings. Some of those top-20 players drop significantly. Carlos Boozer, for instance, goes from 16.2 wins to 12.5 wins. More importantly, it's the evaluations of Boozer's teammates that would change a lot. Since the Jazz player's stats still have to sum to Utah's total wins, Boozer's teammates will get quite a boost. The standings of the top 20 players may not change a whole lot, but, in the middle, where players are very close together, there will be a wholesale re-evaluation, with non-rebounders moving up and rebounders moving down.

Second: Of the top 20 players last year, 19 of them drop in total wins produced when you credit them only half their rebounds (the 20th one stays the same). That means that every one of the top 20 players was at or above his team's average in rebounds (otherwise, replacing half his rebounds with half his teammates' rebounds would make him look better). It looks like the average drop among the top 20 players is a win or two.

That means that it makes a big difference to whether you get it right. If you're an NBA general manager, whether a player is worth 12 wins or 14 is very significant at contract negotiation time.

Third: A correlation coefficient of .95 does not imply that there's not much difference. It's true that .95 seems like a "big number," but you have to evaluate it in context. I feel pretty certain that I could take the established, proven values for baseball events, screw them all up to make them significantly wrong, and still come up with a .95 correlation to the original. I mean, think about it: any not-too-far wrong stat will put Babe Ruth at the top and Mario Mendoza at the bottom. In that light, mismeasuring some of the components will still leave the correlation pretty high.


Response #4 -- WP isn't just about rebounds

This argument of Berri's says that rebounds aren't such a big deal in the entire context of the WP calculation. They're just one small part. Even if rebounds *were* misallocated, it doesn't matter all that much in context, not nearly enough to invalidate WP.

What's the evidence? Well, Berri shows how much a 1 percent change in various statistics changes the final value of WP:

+5.2% -- points per FG attempt
+3.2% -- rebounds
+1.2% -- free throw percentage
-1.1% -- personal fouls
-0.9% -- turnovers
+0.7% -- steals
+0.2% -- blocked shots


Berri concluded,

"Rebounding certainly matters. ... But WP is more "responsive" to shooting efficiency from the field."


Yes, except: a 1% change in Points Per FG Attempt is much less common in the NBA than a 1% change in Rebounding.

For an analogy, consider baseball. The average player might hit .260 with 12 home runs. Now, a 100% change will increase home runs from 12 to 24 -- a significant increase, but not out of this world. On the other hand, a 100% change in batting average will have the player go from .260 to .520 -- which is pretty much impossible.

So the extent to which a statistic is influenced by one of its components is the product of two factors: "elasticity" (responsiveness to change), as Berri calculated, and the extent to which players actually differ in real life (that is, the variance). Berri has only considered the first.

What he could have done, instead, is something that's commonly done in other studies: show the response, not to a 1% change in the value, but to a 1 *standard deviation* change in the value. If Berri had done that, he would have noticed that the SD of rebounds is (I think) approximately 45% of average, while the SD of shooting percentage is only about 11% of average.

So, a 1 SD change in shooting percentage increases value by 5.2 times 11 -- 57.2%. And a 1 SD change in rebounds increases value by 3.2 times 45 -- 144%. So rebounds are indeed much more influential than shooting.

Now, in fairness to Berri, the real-life results won't be that extreme. Berri adjusted all players' stats by position, and, as we saw above, some positions rebound a lot more than others. The adjustment, therefore, will pull the SD of rebounding down. (Having said that, field goal percentage was also adjusted by position, and some positions probably shoot better than others too, so the SD of shooting percentage will drop too. But probably not as much.)

But my point is not to come up with a definitive answer to the question -- it's to argue that Berri's elasticity calculation doesn't mean what Berri thinks it does.

The strange thing is, that, for this particular narrow question, it would actually make sense to compare correlation coefficients. You could look at the r (or even r-squared) for player rebounds vs. WP, and compare it to the one for player Points Per FG Attempt vs. WP. That would give you an intuitive idea of which season stat affects WP the most. But, in this case, Berri chose not to run a regression.

(And, while I know I promised not to argue for the facts either way, one note. Commenter Guy, in an e-mail, told me that last year, WP had a .75 correlation with rebounds, but only a .5 correlation with shooting percentage.)

------

So, that's why I think that Berri's four counterarguments are not relevant to the question of whether rebounds are misallocated to players. As for actual evidence and argument one way or another, there have been some posts lately at various basketball sabermetrics sites, that perhaps I will comment on in future. Here, for instance, is one of them -- both Guy and Berri make appearances.



Labels: , , ,