"Stumbling on Wins:" is there really little difference between goalies?
My copy of "Stumbling on Wins," the new book by David Berri and Martin Schmidt, arrived on Friday. It's a quicker read than their first book, "The Wages of Wins"; for one thing, it's shorter, at 140 pages (before appendices and endnotes). For another thing, the writing style is a bit breezier and less technical, more suited to the non-academic (but serious) sports fan.
The theme of this book is how decision-makers in sports make bad decisions because they don't know how to properly evaluate the information they have. Irrationality in decision making is a subject that's been popularized quite a bit lately. In the last few years, you've got "Predictably Irrational" by Dan Ariely, "Nudge" by Richard Thaler and Cass Sunstein, "Sway" by Ori and Rom Brafman, "Priceless" by William Poundstone, and others. The authors of this book acknowledge the trend, and that they chose their title in tribute to Daniel Gilbert's "Stumbling on Happiness."
I disagree with many (but not all) of the conclusions the authors reach ... it seems like, too often, the authors will do a quick study, look at the results superficially, jump to conclusions that I don't think are justified, and argue from those conclusions that decision-makers are doing it wrong.
For now, I'll just give you one example. In Chapter 3, they argue that NHL goalies are overpaid. Why? Because
"... there simply is little difference in the performance of most NHL goalies."
Why evidence to they give for this?
First, they ran a correlation between a goalie's save percentage (SV%) in consecutive seasons. They got an r-squared of .06, or 6%. That's a small number. So goalies are inconsistent, and what is being observed is not really the goalie's talent.
That's not correct at all.
As I wrote before, and Tango has repeatedly said on his own blog, you can't just observe that because the r-squared is a small number, that the relationship between two variables is weak. Indeed, the same relationship can give you very different r-squareds, depending on other factors in your data, the most obvious of which, here, is sample size.
The r-squared, by definition, is the variance of talent as a percentage of total variance. But the smaller your sample, the more total variance you have just because of luck. And so, the smaller the sample, the lower the r-squared, regardless of whether the talent is low or high.
A low r-squared might mean a small needle -- or it might mean a large haystack.
Unless you take a few seconds to figure out which it is, your r-squared doesn't tell you much of anything about the relationship between the two variables.
What *does* that .06 mean? Well, if the r-squared is .06, then the r is about .25. Roughly speaking, that means you can expect 25% of a goalie's difference from the mean to be repeated next year. Put another way, you have to regress the goalie 75% towards the mean.
Yes, that's not as much as you'd expect. By that calculation, if the average save percentage is .904, and goalie X comes in one season at .924, you'd expect next year he'd be at .909 -- one quarter of the distance between .904 and .924. That's still something: it's .005 above average, which is one goal every 200 shots, or about 10 goals a season.
What do you think -- the idea that a .924 goalie is really .909, does that mean "there's little difference between goalies?" That's more a matter of opinion ... but at least now you have the numbers you need to get a grip on what's going on. The "r-squared equals only .06" doesn't really help you decide.
Anyway, that's one problem, that the .06 isn't as small as it looks. A bigger problem is that I don't think the .06 is accurate.
I repeated the same correlation for two sets of two consecutive years, 2005-06 to 2006-07, and 2007-08 to 2008-09. I looked at only the 20 goalies with the most minutes played. I got r-squareds of .30 and .25, respectively, both much higher than the authors' .06.
Why? I think it's because the authors included goalies with many fewer shots against. They don't say exactly what their criteria were, except that they "adjusted for time on the ice" (whatever that means: SV% doesn't depend on time played). In other studies in the same chapter, they used 1000 minutes as a criterion, so maybe that's what they did here.
Now, to simplify, suppose the variance of SV% consists of only talent and luck. A full-time goalie plays about 3,500 minutes. In my regression, it turns out that you get 1 part talent to three parts luck (that's where the .25 comes from: 25% of the total is talent). Now, suppose Berri and Schmidt's average goalie played only half that, or 1,750 minutes. Then the luck variance would be twice as high, and they'd get one part talent to *six* parts luck. That would drop the r-squared down from .25 to .14.
I don't know how the authors got .06 when my analysis shows .14 ... maybe their cutoff was lower than 1,000 minutes. Maybe there's some selection bias in my sample of top goalies only. Maybe my four seasons just happened to be not quite representative. Regardless, the fact that the r-squared varies so much with your selection criterion shows that you can't take it at face value without doing a bit of work to interpret it.
In any case, going back to my r-squared of .25 ... the square root of .25 is .50. That means that exactly half a full-time goalie's observed difference from the mean is real, and will be repeated next season; if a goalie is .020 better than average this year, expect him to be .010 better than average next year. That's pretty reasonable. In that light, I don't think you can say "there's little difference between goalies" at all.
And, in fact, we should be able to figure out the spread in goalie talent directly, by a method I learned from Tango a few years ago.
Suppose a goalie faces 1,700 shots, and is expected to save 90% of them. By random chance, he'll sometimes save more than 90%, and sometimes less. By the binomial approximation to the normal distribution, the standard deviation of his save percentage due to luck will be .0073.
Now, for the five seasons I checked, the top 20 goalies that year had an actual SD between .007 and .013 ... let's call it about .011.
That's higher than .0073, as you'd expect. The .0073 is what you'd get if all goalies were identical. But there's also extra variance from the fact that some goalies are better than others. Since
(Observed SD)^2 = (Non-luck SD)^2 + (Luck SD)^2
we can say
.011 ^2 = (Non-luck SD) ^2 + .0073 ^2
So the non-luck SD should be about .0082. If we consider everything that's not binomial luck to be talent, then we can say that the SD of top-20 goalie talent is .008. (I dropped the last decimal because our numbers are very rough here.)
If everything that's "non-luck" should repeat next year, we should get an r-squared of about (.008/.011)^2, which is .53. I only got .25 or .30. Why? Well, there could be more luck involved than just binomial. Not all shots are created equal; maybe some goalies got easier shots, and some harder (search for "Shot Quality" here). Maybe there's some variation in talent because of injury or age. There's definitely the quality of the goalie's defense, and that varies a bit from year to year.
Still, there's quite a bit of evidence of talent here. The theoretical value for r-squared was .53, which means the theoretical value for r is .73. That means that if a goaltender was absolutely perfectly consistent, and every shot gave him the same chance of stopping it, each and every year ... then, 73% of his observed talent would be real. That's what it means to be absolutely consistent.
I didn't find .73, but I found about .50. That's a pretty good proportion of the theoretical maximum. I think we can say that a good part of what we see of a goalie's performance is real.
But, does all this mean that "there's little difference between goalies?" Well, let's check. We got an r-squared of .25, which means that 25% of the variance is talent. The variance observed is .011^2, so the variance due to talent is a quarter of that, which is .0055^2.
That means that a goalie who's one SD above average will have a save percentage .0055 better than average. A goalie who's two SDs above average will be .011 better than the mean.
In the context of 1700 shots, one SD is about 9 goals. Two SDs is about 18 goals. And that's from only the 20 goalies with the most playing time. You'd imagine that if you included backup goalies, the variance would be larger. But, to be conservative, I'll leave the SD at 9 goals for now.
Berri and Schmidt looked at Martin Brodeur's career and found he saved an average of 13.6 goals per year, compared to an average goalie. That's consistent with a 9 goal SD; it implies that Brodeur is about one and a half SDs above average, which seems very reasonable. The authors also point out that, in terms of wins, an advantage of 13.6 goals a year is very small compared to what an NBA superstar can provide. That's true, but it doesn't mean that goalies don't matter in the context of hockey. To address that point, you need to look at the 9 goal SD. Is that a lot?
Well ... I'm not sure. I think it's more than it looks. Let's compare goalies to skaters.
Looking at the plus-minus statistics from 2008-09, a bunch of Bruins come up near the top, with numbers scattered around +30. That means that, when those players were on the ice in non-power-play situations, the Bruins scored 30 more goals than they gave up. Along with Detroit, that seems to be the highest bunch in the league.
Since five players are on the ice, you could give each of them credit for 6 extra goals. But they're not all equal -- some are better than others. Let's say that instead of 6/6/6/6/6, they might be 10/8/6/4/2.
That means that the best player on the Bruins might be worth 10 goals. Regressing that to the mean, let's call it 8 goals. Adding power plays, which weren't included in plus/minus, let's move it back to 10 goals.
That's the best player on the best team. But maybe the best player in the league wasn't on the Bruins -- he might have been on a mediocre team, and his teammates caused his plus/minus to drop. How do we adjust for that? I don't know, but let's bump it up 4 goals, and estimate that the best player in the NHL was worth 14 goals last year.
Now, figure the best goalie is about 2 SD above average, for 18 goals. So, the best goalie in the league is better than the best skater! That doesn't suggest, at all, that there's little difference between goalies.
Except ... last year's top plus/minus figure of +37 (David Krejci) is low by historical standards. In 1981-82, the top five players had plus-minuses above 66, almost twice what the Bruins had last year (although in a higher-scoring offensive environment). And, in 1970-71, Bobby Orr had a plus-minus of +124. Back then, you could certainly argue that goalies were more homogeneous than skaters, and the best skater (Gretzky, Orr, or Lemieux) was easily better than the best goalie. And I think that coincides with the intuition that people had back then, that a good goalie could help, but would never be a factor like a Gretzky would.
Still, maybe we should bump the 14 goal estimate for the best skater up a little bit, closer to the 18 goals we found for the best goalie.
I may be wrong in my logic somewhere, but, if I've done everything right, it seems that top goalies in this era are very similar in importance to top skaters. So when Berri and Schmidt accuse GMs of signing goalies to big contracts because "the people that write the checks" don't "understand [the] story" that goalies don't matter much ... well, I think they underestimate the capabilities of those hockey executives. Their judgment might not be perfect, but I think they understand the variation of talent at least as well as Berri and Schmidt seem to.
So I think Berri and Schmidt got into trouble by just looking at the number .06 without thinking about what it meant. They do this again, a bit later, when they run a correlation between SV% in the regular season, and SV% in the playoffs. That's just doomed to fail, because the playoff sample is so small. That makes the variance due to luck very large, which, in turn, brings the r-squared very close to zero.
Actually, they find an r-squared of .07, which is actually larger than the .06 they found over two consecutive regular seasons. You'd think it would be smaller, since playoff samples are so much smaller. I wonder if the .06 was maybe they used very small samples over the regular season, including goalies with only a couple of games played?
Anyway, after that, they try the correlation between two consecutive playoff appearances. They found "none" of the performance was predictable, which suggests an r-squared of .00 (or maybe they assume it's .00 because it wasn't statistically significant). But that's probably just a sample size issue. If their intention was to show that playoff performance by goalies has a lot of random luck in it, well, yes, of course it does. But if their intent is to conclude that goalie performance is completely unpredictable, that one r-squared isn't enough evidence of that. And I'd bet that if they looked a little closer, they'd find that goalies perform in the playoffs exactly as you'd expect them to, subject to a substantial amount of binomial random luck. Or maybe not -- maybe playoff hockey is so different from regular season that different goalies excel at it. But if you want to check that, you have to do more than just run a single regression and look at a single r-squared.
Finally, another non sequitur arises where they write,
"Looking at ... goalies ... one sees an average save percentage of [.895]. The standard deviation of that percentage, though, is only .018. Hence the coefficient of variation of save percentage [the SD divided by the mean] is only 0.02. Hence, there simply is very little difference in the performance of most NHL goalies."
Now, I don't get this at all. How does the coefficient of variation tell you whether or not there's a qualitative difference in performance? It just doesn't. The fact that the SD is a small fraction of the mean doesn't have anything to do with how important the statistic is.
Inutitively, I can see how you might jump to that conclusion, if you don't think about it much. But if you do, it makes no sense. The proportion doesn't matter. When it comes to goals, it's the absolute number that matters. If you let in 10 more goals than average over a season, you cost your team 10 goals. It doesn't matter if you and the other goalies get 100 shots, 1000 shots, 10,000 shots, or 100,000 shots -- ten goals in a season is ten goals in a season.
Another way to look at it is that the SV% statistic is arbitrary, which means the coefficient of variation is arbitary. Suppose the NHL had decided to use "goal percentage" instead of "save percentage", counting up the percentage of shots that went in, instead of the percentage that did not. In that case, the SD would be exactly the same, .018. But the average is now the opposite of what it was -- if 89.5% of shots are stopped, then 10.5% of shots are NOT stopped. And so now your coefficient of variation is .17.
One way, you get .02. Another way, you get .17. So how can the size of the arbitrary coefficient of variation possibly have anything to do with how important goaltending is?
I'm sure the coefficient of variation has its uses, but this isn't one of them.
In summary: as I read it, Berri and Schmidt's argument goes something like this:
-- The r-squared of SV% in consecutive seasons is .06.
-- The r-squared of SV% between a season and the playoffs is .07.
-- The r-squared of SV% between two consecutive playoffs is .00.
-- The coefficient of variation for SV% is .02.
--> These are all small numbers. Therefore, goalies' performances aren't consistent. That means there's not much difference between them, and GMs don't seem to realize this.
As I wrote, I don't think that logic makes sense. I think the evidence shows that, in the current era, good goalies are about as valuable as good skaters. I haven't looked, but I bet that salary data would show that to be roughly consistent with what GMs think.