Saturday, December 13, 2014

True talent levels for NHL team shooting percentage

How much of the difference in team shooting percentage (SH%) is luck, and how much is talent? That seems like it should be pretty easy to figure out, using the usual Palmer/Tango method.

-----

Let's start with the binomial randomness inherent in shooting. 

In 5-on-5 tied situations in 2013-14 (the dataset I'm using for this entire post), the average team took 721 shots. At a 8 percent SH%, one SD of binomial luck is

The square root of (0.08 * 0.92 / 721)

... which is almost exactly one percentage point.

That's a lot. It would move an average team about 10 positions up or down in the standings -- say, from 7.50 (16th) to 8.50 (4th). 

If you want to compare that to Corsi ... for CF% (defined as the percentage of shots at goal which go to the offense), the SD due to binomial luck is also (coincidentally) about one percentage point. That would take a 50.0% team to 51.0%, which is only maybe three places in the standings.

That's one reason that SH% isn't as reliable an indicator as Corsi: a run of luck can make you look like the best or worst in the in that category, instead of just moving you a few spots.

----

If we just go to the data and observe the SD of actual team SH%, it also comes out to about one percentage point. 

Since

SD(talent)^2 = SD(observed)^2 - SD(luck)^2

we get

SD(talent)^2 = 1.0 - 1.0 

Which equals zero. And so it appears there's no variance in talent at all -- that SH% is, indeed, completely random!

But ... not necessarily. For two reasons.

----

First, var(observed) is itself random, based on what happened in the 2013-14 season. We got a value of around 1.00, but it could be that the "true" value, the average we'd get if we re-ran the season an infinite number of times, is different. 

How much different could it be? I wrote a simulation to check. I ran 5,000 seasons of 30 teams, each with 700 shots and a shooting percentage of 8.00 percent. 

As expected, the average of those 5,000 SDs was around 1.00. But the 5,000 values varied with an SD of 0.133 percentage points. (Yes, that's the SD of a set of 5,000 SDs.)  So the standard 95% confidence interval gives us a range of (0.83, 1.27). 

That doesn't look like it would make a whole lot of difference in our talent estimate ... but it does. 

At the top end of the confidence interval, an observed SD of 1.27, we'd get

SD(talent) squared  = 1.27 squared - 1.00 squared 
                    = 0.52 squared

That would put the SD of talent at 0.52 percentage points, instead of zero. That's a huge difference numerically, and a huge difference in how we think of SH% talent. Without the confidence interval, it looks like SH% talent doesn't exist at all. With the confidence interval, not only does it appear to exist, but we see it could be substantial.

Why is the range so wide? It's because the observed spread isn't much different from the binomial luck. In this case, they're identical, at 1.00 each. In other situations or other sports, they're farther apart. In MLB team wins, the SD of actual wins is almost double the theoretical SD from luck. In the NHL, it's about one-and-a-half times as big. In the NBA ... not sure; it's probably triple, or more. 

If you have a sport where the range of talent is bigger than the range of luck, your SD will be at least 1.4 times as big as you'd see otherwise -- and 1.4 times is a significant enough signal to not be buried in the noise. But if the range of talent is only, say, 40% as large as the range of luck, your expected SD will be only 1.077 times as big -- that is, only eight percent larger. And that's easy to miss in all the random noise.

------

Can we narrow down the estimate with more seasons of data? 

For 2011-12, SD(observed) was 0.966, which actually gives an imaginary number for SD(talent) -- the square root of a negative estimate of var(talent). In other words, the teams were closer than we'd expect them to be even if they were all identical! 

For 2010-11, SD(observed) was 0.88, which is even worse. In 2009-10, it was 1.105. Well, that works: it suggests SD(talent) = 0.47 percentage points. For 2008-09, it's back to imaginary numbers, with SD(observed) = 0.93. (Actually, even 2013-14 gave a negative estimate ... I've been saying SD(luck) and SD(observed) were both 1.00, but they were really 1.01 and 0.99, respectively.)

Out of five seasons, we get four impossible situations, the teams are closer together than we'd expect even if they were identical!

That might be random. It might be something wrong with our assumption that talent and luck are independent. Or, it might be that there's something else wrong. 

I think it's that "something else". I think that it's we're not using a good enough assumption about shot types.

-----

Our binomial luck calculation assumed that all the shots were the same, that every shot had an identical 8% chance of becoming a goal. If you use a more realistic assumption, the effects of luck come out lower.

The typical team in the dataset scored about 56 goals. If that's 700 shots at 8 percent, the luck SD is 1 percent, as we found. But suppose those 56 goals come from a combination of high-probability shots and low-probability shots, like this:

For instance: 

 5 goals =   5 shots at 100% 
15 goals =  30 shots at  50%
30 goals = 300 shots at  10%
 6 goals = 365 shots at   1.64%
-------------------------------
56 goals = 700 shots at   8%

If you do it that way, the luck SD drops from 1.0% to 0.91%.

And that makes a big difference. 1.00 squared minus 0.91 squared is around 0.4 squared. Which means: if that pattern of shots is correct, then the SD of team SH% talent is 0.4 points. 

That's pretty meaningful, about five places in the standings.

I'm not saying that shot pattern is accurate... it's a drastic oversimplification. But "all shots the same" is also an oversimplification, and the one that gives you the most luck. Any other pattern will have less randomness. 

What is actually the right pattern? I have no idea. But if you find one that splits the difference, where the luck SD drops only to 0.95% or something ... you'll still get SD(talent) around 0.35 percentage points, which is still meaningfully different from zero.

(UPDATE: Tango did something similar to this for baseball defense, to avoid a too-high estimate for variation in how teams convert balls-in-play to outs.  He describes it here.)

-----

What's right? Zero? 0.35? 0.53? We could use some other kinds of evidence. Here's some other data that could help, from the hockey research community.

These two studies, that I pointed to in an earlier post, found year-to-year SH% correlations in the neighborhood of 0.15. Since the observed SD is about 1.0, that would put the talent SD in the range of 0.15. That seems reasonable, and consistent with the confidence intervals we just saw and the guesses we just made.

Var(talent) for Corsi doesn't have these problems, so it's easy to figure. If you assume a game's number of shots is constant, and binomial luck applies to whether those shots are for or against -- not a perfect model, but close enough -- the estimate of SD(talent) is around 4 percentage points.

Converting that to goals:

-- one talent SD in SH% =  1 goal
-- one talent SD in CF% = 10 goals

So, Corsi is 10 times as useful to know as SH%! Well, that might be a bit misleading: CF% is based on both offense and defense, while SH% is offense only. So the intuitive take on the ratio is probably only 5 times. 

Still: Corsi talent dwarfs SH% talent when it comes to predicting future performance, by a weighting of 5 to 1. No wonder Corsi is so much more predictive!

Either way, it doesn't mean that SH% is meaningless. This analysis suggests that teams who have a very high SH% are demonstrating a couple of 5-on-5 tied goals worth of talent. (And, of course, a proportionate number of other goals in other situations.)

-----

And, if I'm not mistaken ... again coincidentally, one point of CF% is worth the same, in terms of what it tells you about a team's talent, as one point of SH%. (Of course, SH% is much harder to achieve -- only a few of teams are as much as 1 point of SH% above or below average, while almost every team is above or below 51.0% CF%.)

So, instead of using Corsi alone ... just add CF% and SH%! That only works in 5-on-5 tied situations -- otherwise, it's ruined by score effects. But I wouldn't put too much trust in any shot study that doesn't adjust for score effects, anyway.

-----

I started thinking about this after the shortened 2012-13 season, when the Toronto Maple Leafs had an absurdly high SH% in 5-on-5 tied situations (10.82, best in the league), but an absurdly low CF% (43.8%, second worst to Edmonton).

My argument is: if you're trying to project the Leafs' scoring talent, you can't just use the Corsi and ignore the SH%. If the Leafs are 2 points above average in SH%, that tells you as much about their talent as two Corsi points. Instead of projecting the Leafs to score like a 43.8% Corsi team, you have to project them to score like, maybe, a 45.8% team. Which means that instead of second worst, maybe they're probably only fifth or sixth worst.

That's almost exactly what I estimated a year ago, based on a completely different method and set of assumptions. Neither analysis is perfect, and there's still lots of randomness in the data and uncertainty in the assumptions ... but, still, it's nice to see the results kind of confirmed.




Labels: , , , ,

Thursday, December 11, 2014

The best NHL players create higher quality shots

A couple of months ago, I pointed to data that showed team shot quality is a real characteristic of a team, and not just random noise which the hockey analytics consensus believes it to be. 

That had to do with opposition shot distance. In 2013-14, the Wild allowed only 32 percent of opposing shots from in close, as compared to the Islanders, who allowed 61 percent. Those differences are far too large to be explained by luck.

Here's one more argument, that -- it seems to me -- is almost undeniable evidence that SH% must be a real team skill.

-----

It's conventional wisdom that some players shoot better than others, right? In a 1991 skills competition, Ray Bourque shot out four targets in four shots. The Hull brothers (and son) were well-known for their ability to shoot. Right now, Alexander Ovechkin is considered the best pure scorer in the league.*

(*Update: yeah, that's not quite right, as a reader points out on Twitter.)

In 1990-91, Brett Hull scored 86 goals with a 22.1 percent SH%. Nobody would argue that was just luck, right? You probably do have to regress that to the mean -- his career average was only 15.7 percent -- but you have to recognize that Brett Hull was a much better shooter than average.

Well, if that's true for players, it's true for teams, right? Teams are just collections of players. The only way out is to take the position that Hull just cherry-picked his team's easiest shots, and he was really just stealing shooting percentage from his teammates. 

That's logically possible. In fact, I think it's actually true in another context, NBA players' rebounds. It doesn't seem likely for hockey, but, still, I figured, I have to check.

-----

I went to hockey-reference.com and found the top 10 players in "goals created" in 2008-09. (I limited the list to one player per team.)  

For each of those players, I checked his team's shooting percentage with and without him on the ice, in even-strength situations in road games, the following season. (Thanks to "Super Shot Search," as usual, for the data.)  

As expected, their teams generally shot better with them than without them:

-----------------------
With  Without
-----------------------
10.1   5.8   Ovechkin
 5.5   9.3   Malkin
 7.1   5.3   Parise
 6.0   8.2   Carter 
10.3   9.1   Kovalchuk
 8.9   5.7   Datsyuk
10.8   7.5   Iginla
10.7   7.2   Nash
 9.8   7.5   Staal
 9.0   9.1   Getzlaf
-----------------------
 9.4   7.5   Average
-----------------------

Eight of the ten players improved their team's SH%. Weighting all players equally, the average increase came out to +1.9 percentage points, which is substantial. 

It would be hard to argue, I think, that this could be anything other than player influence. 

It looks way too big to be random. It can't be score effects, because these guys probably play roughly the same number of minutes per game regardless of the score. It can't be bias on the part of home-team shot counters, because these are road games only. 

And, it can't be players stealing from teammates, because the numbers are for all teammates on the ice at the same time. You can't steal quality shots from players on the bench.

------

I should mention that there was also an effect for defense, but it was so small you might as well call it zero. The opposition had a shooting percentage of 7.7 percent with one of those ten players on the ice, and 7.8 percent without. 

That kind of makes sense -- the players in that list are known for their scoring, not their defense. I wonder if we'd find a real effect if chose the players on some defensive scale instead? Maybe Selke Trophy voting?

Also ... what's with Malkin? On offense, the Penguins shot 3.8 percentage points worse on his shifts. On defense, the Penguins opponents shot 3.8 percent better. Part of the issue is that his "off" shifts are Sidney Crosby's "on" shifts. But even his raw numbers are unusually low/high.

------

Speaking of Crosby ... if you don't believe that the good players have consistently high shot quality, Crosby's record should help convince you. Every year of his career, the Penguins had higher quality shots with Crosby on the ice than without:

------------------------------
With  Without 
------------------------------
11.7   9.4   2008-09 Crosby
10.1   7.1   2009-10 
 9.6   6.9   2010-11
13.9   7.3   2011-12
13.2   8.8   2012-13
10.4   6.5   2013-14
 7.1   7.0   2014-15 (to 12/9)
-------------------------------
10.9   7.6   Average
-------------------------------

Sidney Crosby shifts show a consistent increase of 3.3 percentage points -- even including the first third of the current season at full weight. 

You could argue that's just random, but it's a tough sell.

------

Now, for team SH%, you could still make an argument that goes something like this:


"Of course, superstars like Sidney Crosby create better quality shots. Everyone always acknowledged that, and this blog post is attacking a straw man. The real point is ... there aren't that many Sidney Crosbys, and, averaged over a team's full roster, their effects are diluted to the point where team differences are too small to matter."

But are they really too small to matter?  How much do the Crosby numbers affect the Penguins' totals?

Suppose we regress Crosby's +3.3 to the mean a bit, and say that the effect is really more like 2.0 points. In 2013-14, about 38 percent of the Penguins' 865 (road, even-strength) shots came with Crosby on the ice. That means that the Crosby shifts raised the team's overall road SH% by about 0.76 percentage points. 

That's not dilute at all. Looking at the overall team 5-on-5 road shooting percentages, 0.76 points would move an average team up or down about 8 positions in the rankings.

----

Based on all this, I think it would be very, very difficult to continue arguing that team shooting percentage is just random.

Admittedly, that still doesn't mean it's important. Because, even if it's not just random, how is it that all these hockey sabermetric studies have found them so ineffective in projecting future performance? 

The simple answer, I think, is: weak signal, lots of noise. 

I have some ideas about the details, and will try to get them straight in my mind for next a future post. Those details, I think, might help explain the Leafs -- which is one of the issues that got this debate started in the first place.



Labels: , , ,

Tuesday, December 09, 2014

Corsi vs. Tango

Tom Tango is questioning the conventional sabermetric wisdom on Corsi. In a few recent posts, he presents evidence that Corsi can be improved upon as a predictor of future NHL team performance. Specifically: goals are important, too.

That has always seemed reasonable to me. In fact, it seems so reasonable that I wonder why it's disputed. But it is. 

Goals is just shots multiplied by shooting percentage (SH%). The consensus among the hockey research community is that their studies show SH% is not a skill that carries over from year to year. And, therefore, goals can't matter once you have shots.

I've been disputing that for a while now -- at least seven posts worth (here's the first). But I've been doing it from argument. Tango jumps right to the equations. He split seasons in half randomly, and ran a regression to try to predict one half from the other half. Goals proved to be very significant. In fact, when you try to predict, you have to weight goals *four times as heavily* as Corsis. (Five times as heavily as unsuccessful Corsis.)

In a tongue-in-cheek jab at statistics named after their inventors, he called that new statistic the "Tango." 

Despite Tango's regression results, the hockey analysts who commented still disagree. I'm still surprised by that ... the hockey sabermetrics community are pretty smart guys, very good at what they do, and a lot of them have been hired by NHL teams. I've had times when I've wondered if I'm missing something ... I mean, when it's 1 dabbler against 20 analysts who do this every day, it's probably the 20 who are right. Well, now it's two against the world instead of one ... and the second is Tango, which makes me a little more confident. 

Also ... Tango jumps right to the actual question, and proves goals significantly improve the prediction. That's hard to argue with; at least, it's harder to argue with than what I'm doing, which is attacking the assumption that shooting percentage is all random. 

You can see one response here, and Tango's reply here.  

Tango got his data from war-on-ice.com, who agreed to make it available to all (thank you!!!). I was planning to do some work with the data myself, but ... I guess Tango and I think about things from different angles, because, the more I thought about it, the more of my "usual" arguments I thought of, the less direct ones. So, there'll be another post coming soon. I'll play with the data when my thoughts wind up somewhere that I need to look at it.

For this post, a few of my observations from Tango's posts and the discussion that followed.

-----

In one of his posts, Tango wrote,


"One of the first things that (many) people did was to run a correlation of Corsi v Tango, come up with an r=.98 or some number close to 1 and then conclude: “see, it adds almost nothing”. If only that were true. "

Tango is absolutely right (you should read the whole thing). It's just another case of jumping to conclusions from a high or low correlation coefficient.

Sabermetrics is pretty good at figuring out good and bad. It has to be -- I mean, even fans and sportswriters are pretty good at it, and the whole point of sabermetrics is to do better. We're already in "high correlation" territory, able to separate the good teams and players from the bad teams and players pretty easily. 

Find a 10-year-old kid who's a serious sports fan -- any sport -- and get him to rank the players from best to worst -- whether by formula, or by raw statistics. Then, find the best sabermetric statistic you can, and rank the players that way.

I bet the correlation would be over 0.9. Just by gut.

We're already well into the .9s, when it comes to understanding hockey. Any improvements are going to be marginal, at least if you measure them by correlation. And so, it follows that *of course* Tango and Corsi are going to correlate highly. 

Also, and as Tom again points out, if Corsi already has a high correlation with something, at first glance, Tango can appear to increase it only slightly. If you start with, say, 0.92, and Tango improves it to 0.93 ... well, that doesn't look like much, intuitively. But it IS much. If you look at it another way -- still intuitively -- it was 8% uncorrelated before, and now it's only 7% uncorrelated. You've improved your predictive ability by 12%!

The point is, you have to think about what the numbers actually mean, instead of just having it click "93 isn't much bigger than 92, so who cares?"

Tom illustrates the point by noting that, even though Tango and Corsi appear to be highly correlated to each other, Tango improves a related correlation from .44 to .50. There must be some significant differences there.

-----

There's a more important argument, though, than to not underestimate how much better .93 is than .92. And that argument is: *it's not about the correlation*. 

Yes, it's nice to have a higher and higher r-squared, and be able to reduce your error more and more. But it's not really error reduction we're after. It's *knowledge*. 

It's quite possible that a model that gives you a low correlation actually gives you MORE of an understanding of hockey than a model that gives you a high correlation. Here's an easy example: the correlation between points in the standings, and whether or not you make the playoffs, is very high, close to 1.0. The correlation between your Corsi and whether or not you make the playoffs is lower, maybe 0.7 or something (depending how you do it -- which is another reason not to rely on the correlation alone). 

Which tells you more about hockey that you didn't already know? Obviously, it's the Corsi one. Everyone already knows that points determine whether you qualify for the playoffs. When you find out that shots seem to be important, that's new knowledge -- the knowledge that outshooting your opponents means something. (Of course, *what* it means is something you have to figure out yourself -- correlation doesn't tell you what type of relationship you've found.)

And that's what's going on here. Corsi has a high correlation with future winning, but Corsi *and goals* has an even higher correlation (to a significant extent). What does that tell us? 

Goals matter, not just shots. 

That's an important finding!  You can't dismiss it just because the predictions don't improve that much. If you do, you're missing the point completely. 

You wouldn't do that in other aspects of life, would you? Those faulty airbags in the news recently, the ones that kill people with shrapnel ... those constitute a small, small fraction of collision deaths. If you looked only at predicting fatalities, knowing about those airbags is a rounding error. 

But the point is not just to predict fatality rates!  Well, not for most of us. If you're an insurance company, then, sure, maybe it doesn't make that much difference to you, a couple of cents on each policy you write. But that doesn't mean the information isn't important. It's just important for different questions. Like, how can we reduce fatalities? We can reduce fatalities by replacing the defective air bags!

Also, the information means it is false to state that faulty airbags don't matter. You can still argue that they don't matter MUCH, relative to the overall total of collision deaths; that might be true. But for that, you can't argue from correlation coefficients. You have to argue from ... well, you can use the regression equation. You can say, "only 1 person in a million dies from the airbag, but 1000 in a million die from other causes."

In this case, Tango found that a goal matters four times as much as a shot. If, roughly speaking, there are 12 shots for every goal, then every 12 shots, you get 12 "points" of predictive value from the shots, and 4 "points" of predictive value from the goals. 

The ratio isn't 1000:1 like for the airbags. It's 3:1. How can you dismiss that? Not only is it important knowledge about hockey, but the magnitude of the effect is really going to affect your predictions.

-----

Why does the conventional wisdom dispute the relevance of goals? Because the consensus is that shooting percentage is random -- just like clutch hitting in baseball is random.

Why do they think that? Because the year-to-year team correlation for shooting percentage is very low.

I think it's the "low number means no effect" fallacy. Here are two studies I Googled. This one found an r-squared of .015, and here's another at .03. 

If you take the square root of those r-squareds, to give you correlations, you get 0.12 and 0.17, respectively.

Those aren't that small. Especially if you take into account how much randomness there is in shooting percentage. A signal of 0.17 in all that noise might be pretty significant.

-----

It's well-known that both Corsi and shooting percentage change with the score of the game. When you're up by one goal, your SH% goes up by almost a whole percentage point -- and your Corsi goes down by four points. When you're up two or more, the differences are even bigger. 

That's probably because when teams are ahead, they play more defensively. Their opponents, who are trailing, play more aggressively -- they press in the offensive zone more, and get more shots. 

So, teams in the lead see their shots go down. But their expected SH% goes up, because they get a lot of good scoring chances when the opposition takes more chances -- more breakaways, odd-man rushes, and so on.

It occurred to me that these score effects could, in theory, explain Tango's finding. 

Suppose Team A has 30 Corsis and 5 goals in a game, and Team B has 30 Corsis and no goals. 

Even if shooting percentage is indeed completely random, team A is probably better than team B. Why? Because, with 5 goals, team B probably had a lead most of the game. If it had 30 Corsis despite leading, it must be a much better Corsi team to overcome the “handicap” of shooting so much when it’s ahead. So, when it’s behind, it’ll probably kick the other team’s butt.

I don't think Tango's finding is *all* score effects. And, even if it were, all that would mean is that if you didn't explicitly control for score, "Tango" would be a more accurate statistic than Corsi. And most of the hockey studies I've seen *don't* control for score.

----

Here's one empirical result that might help -- or maybe won't help at all, I'm actually not sure. (Tell me what you think.)

My hypothesis has been that some teams have better shooting percentages, along with lower Corsis, because they choose to set up for higher quality shots. Instead of taking a 5% shot from the point, they take a 50/50 chance on setting up a 10% shot from closer in. 

As I have written, I think the evidence shows some support for that hypothesis. In 5-on-5 tied situations, there's a negative correlation between Corsi rate and SH%. In 2013-14 (raw stats here), it was -0.16. In the shortened 2012-13 season, it was -0.04. In 2011-12, -0.14. 

Translating the -0.14 into hockey: for every additional goal due to shot quality, teams lowered their Corsi by 2.1 shots.

That's a tradeoff of around 2 Corsis per goal. Tango found 4 Corsis per goal. Does that mean that two of Tango's four Corsis come from shot quality, and the other two come from score effects?

Not sure. There's probably too much randomness in there anyway to be confident, and I'm not completely sure that they're measuring the same thing. But, there it is, and I'll think about it some more.

-----

UPDATE, 30 minutes later: Colby Cosh pointed out, on Twitter, that Tango's regression used Corsi and goal *differential* -- offense minus defense.  That means the "goals against" factor is partially a function of save percentage, which partially reflects goalie talent, which, of course, carries over into the future. So, goalie talent absolutely has to be part of the explanation of the "goals" term.


So now we have two factors: score effects and goalie effects. Could that fully explain the predictive value of goals, without resort to shot quality?  I'll have to think about the actual numbers, whether the magnitudes could be high enough to cover the full "4 shots" coefficient.








Labels: , , , , ,

Tuesday, December 02, 2014

Players being "clutch" when targeting 20 wins -- a follow-up

In his 2007 essay, "The Targeting Phenomenon," (subscription required), Bill James discussed how there are more single-season 20-game winners than 19-game winners. That's the only time that happens, that the higher number happens more frequently than the lower number. 

This is obviously a case of pitchers targeting the 20-win milestone, but Bill didn't speculate on the actual mechanisms for how the target gets hit. In 2008, I tried to figure it out. But, this past June, Bill pointed out that my conclusion didn't fit with the evidence:

"... the Birnbaum thesis is that the effect was caused one-half by pitchers with 19 wins getting extra starts, and one-half by poor offensive support by pitchers going for their 21st win, thus leaving them stuck at 20. But that argument doesn't explain the real life data. 

"[If you look closely at the pattern in the numbers,] the bulge in the data is exactly what it should be if 20 is borrowing from 19 -- and is NOT what it should be if 20 is borrowing both from 19 and 21."

(Here's the link.  Scroll down to OldBackstop's comment on 6/6/2014.)

So, I rechecked the data, and rethought the analysis, and ... Bill is right, as usual. The basic data was correct, but I didn't do the adjustments properly.

-----

My original study covered 1940 to 2007. This study, though, will cover only 1956 to 2000. That's because I couldn't find my original code and data. The "1956" is what I happened to have handy, and I decided to stop at 2000 because Bill did. 

First, here are the raw numbers of seasons with X wins:

17 wins: 159
18 wins: 132
19 wins:  92
20 wins: 113
21 wins:  56
22 wins:  35
23 wins:  20
24 wins:  20

You can see the bulge we're dealing with: there are way too many 20-win pitchers. And it can't be that the excess comes from the 21-win bucket, because, then, the average of 20 and 21 would stay the same, and wouldn't be much lower than 19. That can't be right. And, as Bill pointed out, even if only *half* the excess came from the 21 bucket, 20 would still be too big relative to 19.

So, let me try fixing the problem.

In the other study, I checked four ways in which 20 wins could get targeted:

1. Extra starts for pitchers getting close
2. Starters left in the game longer when getting close
3. Extra relief apparances for pitchers getting close
4. Better performance or luck when shooting for 20 than when shooting for 21.

I'll take those one at a time.

-------

1. Extra starts

The old study found that pitchers who eventually wound up at 19 or 20 wins did, in fact, get more late-season starts than others -- about 23 more overall. In this smaller study (1956-2000 instead of 1940-2007), that translates down to maybe 18 extra starts. 

That's about 9 extra wins. Let's allocate four of them to pitchers who wound up at 19 instead of 18, and the other five to pitchers who wound up at 20 instead of 19. If we back that out of the actual data, we get:

18 wins: 132 136
19 wins:  92  93
20 wins: 113 108
21 wins:  56  56

(If you're reading this on a newsfeed that doesn't support font variations: the first column is the old values, which should be struck out.)

What happens is: the 18 bucket gets back the four pitchers who won 19 instead. The 19 bucket loses those four pitchers, but gains back the five pitchers who won 20 instead of 19. The 20 bucket loses those five pitchers.

(In the other study, I didn't bother doing this, backing out the effects when I found them, so I wound up taking some of them from the wrong place, which caused the problem Bill found.)

So, we've closed the gap from 21 down to 15.

--------

2. Starters left in the game longer

After I had posted the original study, Dan Rosenheck commented,
"You didn't look at innings per start. I bet managers leave guys with 19 W's in longer if they are tied or trailing in the hope that the lineup will get them a lead before they depart."

I checked, and Dan was right. In a subsequent comment, I figured Dan's explanation accounted for about 10 extra twenty-game winners. Those are all taken from the 19-game bucket, because the effect occurred only for starters currently pitching with 19 wins.

For this smaller dataset, I'll reduce the effect from 10 seasons to 7. 

So:

18 wins: 136 136
19 wins:  93 100
20 wins: 108 101
21 wins:  56  56

Now, the bulge is down to 1.  We still have a ways to go, if the 19 is to be significantly higher than the 20, but we're getting there.

---------

3. Extra Relief Apparances

The other study listed every pitcher who got a win in relief while nearing 20 wins. Counting only the ones from 1956 to 2000, we get:

3 pitchers winding up at 19
5 pitchers winding up at 20
2 pitchers winding up at 21

Backing those out:

18 wins: 136 139
19 wins: 100 102
20 wins: 101  98
21 wins:  56  54

The gap now goes the proper direction, but only slightly.

------

4. Luck

This was the most surprising finding, and the one responsible for the "getting stuck at 20" phenomenon. Pitchers who already had 20 wins were unusually unlikely to get to 21 in a subsequent start. Not because they pitched any worse, but because they got poor run support from their offense.

When Bill pointed out the problem, I wondered if the run-support finding was just a programming mistake. It wasn't -- or, at least, when I rewrote the program, from scratch, I got the same result.

For every current starter win level, here are the pitchers' W-L records in those starts, along with the team's average runs scored and allowed:

17 wins:   483-311 .557   4.30-3.61
18 wins:   350-250 .608   4.30-3.61
19 wins:   260-182 .588   4.24-3.56
20 wins:   150-136 .524   3.81-3.54
21 wins:    94- 61 .606   4.49-3.44
22 wins:    59- 23 .720   4.26-2.80

The run support numbers are remarkably consistent -- except at 20 wins. Absent any other explanation, I assume that's just a random fluke.

If we assume that the 20-win starters "should have" gone 171-115 (.598) instead of 150-136 (.524), that makes a difference of 21 wins.

The mistake I made in the previous study was to assume that those wins were all stolen from the "21-win" bucket. Some were, but not all. Some of the unlucky pitchers eventually got past the 20-win mark; a few, for instance, went on to post 23 wins. In their case, it becomes the 23-win bucket stealing a player from the 24-win bucket.

I checked the breakdown. For every starter who tried for his 21st win but didn't achieve it that game, I calculated where he eventually finished the season. From there, I scaled the totals down to 21, the number of wins lost to bad luck. The result:

20  wins:  9 pitchers
21  wins:  5 pitchers
22  wins:  3 pitchers
23  wins:  1 pitcher
24  wins:  2 pitchers
25+ wins:  less than 1 pitcher

So: the 20-win bucket stole 9 pitchers from the 21-win bucket. The 21-win bucket stole 5 pitchers from the 22-win bucket. And so on. 

Adjusting the overall numbers gives this:

18 wins: 139 139
19 wins: 102 102
20 wins:  98  89
21 wins:  54  50
22 wins:  35  33

-------

And that's where we wind up. It's still not quite enough, to judge by Bill's formula and even just the eyeball test. It still looks like there's a little bulge at 20, by maybe five pitchers. If 20 could steal five more pitchers from 19, we'd be at 107/84, which would look about right.

But, we've done OK. We started with a difference of +21 -- that is, 21 more twenty-game winners than nineteen-game winners -- and finished with a difference of -13. That means we found an explanation for 34 games, out of what looks like a 39-game discrepancy.

Where would the other five come from? I don't know. It could be luck and rounding errors. It could also be that the years 1956-2000 aren't a representative sample of the original study, so we lost a bit of accuracy when I scaled down.  Or, it could be some fifth real factor I haven't thought of.

In any case, here's the final breakdown of the number of "excess" 20-game winners:

-- 5 from getting extra starts;
-- 7 from being left in games longer than usual;
-- 3 from getting extra relief appearances;
-- 9 from bad run support getting them stuck at 20;
-- 5 from luck/rounding/sources unknown.

By the way, one important finding still stands through both studies. Starters didn't seem to pitch any better than normal with their 20th win on the line, so you can't accuse them of trying harder in the service of a selfish personal goal.




Labels: , , ,

Thursday, November 20, 2014

Does inequality make NBA teams lose?

I

Some people believe that income inequality can hurt group performance. They think that people work better together when employees are more likely to see themselves as equal.

I don't know if that's true or not. But it's a coherent hypothesis, that makes sense in terms of cause and effect.

On the other hand, here's something that doesn't make sense: the idea that when salaries are more unequal, the result is that the total becomes lower. That doesn't work, right? You can tell the CEO, "if you paid your people more equally last year, the company would have done better." But you can't tell the CEO, "if you paid your people more equally last year, they'd have collectively taken home more money."  

Because, the relationship between total pay and individual pay is already known. The total is just the sum of the individuals. Equality can't possibly cause any additional pay, beyond adding up the amounts.

It would be like saying, "You shouldn't carry $50 bills and $1 bills in the same wallet. If you reduced inequality by carrying only $5 bills and $10 bills, you'd have more money."   

That would be silly.

-------

Well, that's almost exactly what's happening in a recent NBA study, by the same poverty researcher who wrote the baseball inequality article I posted about three weeks ago.

The author looked up individual player Win Shares (WS) for the 2013-14 season. He measured Win Share inequality within each team by calculating the Gini Coefficient for the population of players. He then ran a regression to predict team wins from player inequality. He found a strong negative correlation, -0.43. 

In other words, the more equal teams won significantly more games. 

The author suggests this might be evidence of the benefits of equality. On the more equal teams, the better performance might have been created by the "psychological and motivational benefits" of the weaker players having "better opportunity to develop and showcase their skills."

But ... no, that doesn't make sense, for exactly the same reason as the $10 bill example. 

Win Shares is really just a breakdown and allocation of actual team wins. The formulas take the number of games a team won, and apportions that total among the players. In other words, the team totals equal the sum of the individual totals, the same way the total amount in your wallet equals the sum of the individual bills. (*)

Last year, the Spurs won 62 games, while the 76ers won only 19. That can't have anything to do with equality. It's due to the fact that the Spurs had 62 win dollars in their wallets -- say, eleven $5 bills and seven $1 bills -- while the 76ers had only 19 win dollars -- say, a $10 bill and nine $1 bills. 

It's true that the 76ers players' Win Shares were more concentrated among their best players. In fact, the top 5 percent of their players accounted for more than half the team total. But that doesn't matter. They had 19 wins total because they had a total of 19 wins individually.

If you want the Philadelphia 76ers to win 50 games this year, find players who add up to 50 Win Shares. It doesn't matter if you find ten guys with 5 WS each, or one guy with 30 WS and ten guys with 2 WS. 

In fairness to the author, he does explicitly say that the correlation does not necessarily imply causation here. But the point is: he doesn't realize that he's looking at a relationship where correlation CANNOT POSSIBLY imply causation.

And that's what I found so interesting about the study. At first reading, it looks like such a strong finding, that equality may cause teams to win more ... but after a bit of thought, it turns out it's logically impossible!

The only other time I remember seeing that kind of logical impossibility was that study "proving" that listening to children's music makes you older, by retroactively changing your year of birth. And that one had been created deliberately to make a point.

-------

II

As an aside, another thing I found interesting: in his baseball article, the author argued against unequal pay for baseball players because, he believed, pay seemed to have so little to do with actual merit. But, here, by measuring inequality in Win Shares instead of dollars, he seems to be arguing against inequality of merit itself!

Well, that may be just a tiny bit unfair. Reading between the lines, I think the author thinks Win Shares are much more heavily based on opportunity than they actually are. He writes, "maybe top teams, by virtue of their abundance of success, are more willing to share the glory ... Lack of opportunity, by contrast, can lead to despair and diminished performance."

But, actually, the author never demonstrates that the bad teams have more inequality of opportunity (playing time). I suspect that they don't.

In any case, we can see that the 76ers high Gini isn't much caused by differences in opportunity. Even limiting the analysis to "regulars," players with 1,000 minutes or more, the effect remains. On the 76ers, the top two players had 53 percent of the regulars' total Win Shares. On the Spurs, it was only 29 percent.

-------

III

So why is it that the unequal teams tend to be worse? I think it's a combination of (a) the way the Gini coefficient measures inequality, and (b) the mechanism by which NBA performance creates wins. 

Suppose that on a good team, the five regulars have field goal percentages (FG%) of 59, 57, 55, 53, and 51 percent, respectively. On a bad team, the five players are at 49, 47, 45, 43, and 41 percent.

If you measure inequality on the two teams by variance, it comes out equal: a standard deviation of 2.8 on each team. But if you measure it by Gini coefficient, or a similar calculation of "proportion of total wealth," they're different. 

On the good team, the total percentage points add up to 275. The top player, with 59, has 21.5 percent of the total.

On the bad team, the total percentage points add up to 225. The top player, with 49, has 21.8 percent of the total. So, the bad team is equal by SD, but less equal by "percent of total."

The Gini is more than just the top player, of course ... the formula it involves every member of the dataset. Using an online calculator, I found:

The Gini of the good team is 0.029. 
The Gini of the bad  team is 0.036.

So, by Gini, the bad team is less equal than the good team. (A higher Gini means less equality.)

Why does this happen, that the Gini is higher but the variance is the same? Because of the way the two measures differ. Variance stays the same when you *add* the same amount to every player. But not the Gini. The Gini stays the same when you *multiply* every player by the same amount. 

If you *add* a to every player instead of multiplying, the Gini drops. (And, if you *subtract* a positive number from every player, it increases.)

That's often what you want to have happen -- for incomes, say. If I make $50K and you make $10K, we're very unequal. But if you give both of us a $100K raise, now we're at $150K and $110K -- much more equal, intuitively.

The Gini confirms that. Before our raise, the Gini is 0.33. Afterwards, it's 0.08. (But if we use the variance instead, we look the same both ways.)

But for Win Shares, is the Gini-type of inequality really what we want? Are two players with 7 WS and 6 WS, respectively, really that much more equal, in an intuitive basketball sense, than two players with 2 WS and 1 WS? What about two players at 0.002 and 0.2 wins? In that case, one player has 100 times the wins of the other. But does "100 times" really give a proper impression of how different they are?

I don't think so. I think it's just an artifact of the way performance translates to wins.

What's wins? It's performance above replacement value. (Well, actually, WS is measuring above zero value, which is lower, but I'll call it "replacement value" anyway since the logic is the same.)  

So, to get wins, you start with performance, and subtract a constant. As we saw, when you subtract the same positive number from every player, the Gini goes up. It's a "negative raise" that makes employees less equal.

Suppose the average FG% is 50 percent. Suppose that 40 percent is "replacement level" that leads to exactly zero wins, the level at which a team is so bad it will never win a game. Conversely, 60 percent is the level at which a team is so good it will never lose a game. 

If the relationship is linear, it's easy to convert player FG% to Win Shares. Actually, I'll convert to "wins per 100 games," because the "out of 100" scale is easier to follow.

On the good team we talked about earlier, the players had FG% of 59, 57, 55, 53, and 51. That corresponds to W100 of 95, 85, 75, 65, and 55.

On the bad team, the players' FG% of 49, 47, 45, 43, and 41 translate to W100s of 45, 35, 25, 15, and 5.

See what happens? The FG% looks a lot more equal than the wins. On the bad team, the best player was only 20 percent better than the worst player in field goal percentage (49 vs. 41). But in wins ... he's 800 percent better! (45/5.)  On the good team, though, there's still enough performance after subtracting that the numbers look reasonably equal. 

The actual Gini coefficients:

FG%:  The Gini of the good team is 0.029. 
FG%:  The Gini of the bad  team is 0.036.

Wins: The Gini of the good team is 0.11.
Wins: The Gini of the bad  team is 0.32.

That's just how the math works. The Gini coefficient is very sensitive to where you put your "zero". If you measure zero as 0 FG%, inequality looks low. If you measure zero as zero wins (say, FG% of 40 percent), inequality looks higher. If you measure zero as replacement level (say, FG% of 43 percent), inequality looks even higher. And if you measure zero as an NBA average team (say, FG% of 50 percent), it's even more unequal -- the top half of the teams have 100 percent of the wins! (**) 

The higher the threshold that you call zero, the greater the inequality. 

In baseball, a player hitting .304 has only about 1% more hits than a player hitting .301. But he has 300% more "hits above .300". 

In the economy, the top 10% of families may have (say) 45% of the income -- but probably close to 100% of the new Ferraris. 

And a real-life example: In the NHL, over the last ten seasons, the Black Hawks have 13% more standings points than the Maple Leafs -- but 500% more playoff appearances.

-----

One last analogy:

Take a bunch of middle-class workers, and tax them $40,000 each. They become much more unequal, right? Instead of making, say, $50K to $80K, they now take home $10K to $40K. There's a much bigger difference, now, in what they can afford relative to each other.

But if you tax the same $40K away from a bunch of doctors, it matters less. They may have ranged from $200K to $300K, and now it's $160K to $260K. They're a bit less equal than before, but you hardly notice.

Measuring after the $40K tax is measuring "income above $40K," which is like measuring "FG% above replacement level of 40%" -- which is like measuring Win Shares.

So that's why bad teams in the NBA appear more unequal than the good teams -- because "Wins" are what's left of "Performance" after you levy a hefty replacement-level tax. Most of the players on the good teams stay middle-class after paying the tax -- but on the bad teams, while some stay middle class, more of the others drop into poverty.

It has nothing to do with the social effects of equality or inequality.  It's just an artifact of how the Gini Coefficient and basketball interact.


------

* Actually, there's a bit of wiggle room in the particular version of WS the author used, the version from basketball-reference.com. That version doesn't add up perfectly, but it promises to be close, certainly close enough that it doesn't make a difference to this argument. 

** That's if you give the bottom teams zero. If you give them a negative, the Gini actually winds up at infinity. (The overall total has to be zero relative to the average, and you can't divide by zero.)  


Labels: , , ,

Tuesday, November 04, 2014

Corsi, shot quality, and the Toronto Maple Leafs, part VII

In previous posts, I've argued that when it comes to shots, NHL teams might differ in how they choose to trade quantity for quality. That might partly explain why the Toronto Maple Leafs, for the past few seasons now, have had ugly-looking shot stats, but with an above-average shooting percentage.

Skeptics argue that team shooting percentage (SH%) doesn't seem to have predictive value from season to season, which suggests it's luck rather than skill or strategy. But, at the same time, Corsi for teams seems to have a negative correlation to SH%, which is one piece of evidence that shot quality strategy might be a real issue.

Anyway, read the previous six posts for that argument. This is just an anecdote.

It comes from a piece by James Mirtle, the Maple Leafs beat writer for the Globe and Mail. Mirtle notes that the Toronto coaching staff has directed Morgan Rielly to increase his shot attempts:


[In the October 28 game vs. Buffalo,] Rielly rang up two assists – including a beauty cross-crease pass on James van Riemsdyk’s goal – and was all over the puck generally, generating nine shot attempts.

That propensity to shoot has been Rielly’s biggest shift from a year ago. The coaches want him putting more pucks on the net, and he has responded in dramatic fashion, with 2.8 shots a game compared to 1.3 in his rookie year despite similar ice time.

Even more impressively, Rielly leads all NHL defencemen in generating shot attempts, with 21.6 per 60 minutes at even strength, meaning he’s getting a look at the net roughly every 2.5 minutes he’s on the ice.

He’s winding up more frequently than not only every Leafs defenceman but every Leaf, including shot demon Phil Kessel, something that’s helping drive Toronto to respectable totals on the shot clock most nights.

Entering [the October 31] game against the injury-plagued Blue Jackets in Columbus, the Leafs have been outshot, but only by one: 281-280.

"I told myself this year that I would shoot more," Rielly said. 

Well, isn't that exactly the kind of thing Corsi skeptics should be looking for? It's evidence that coaching decisions can affect shot quantity and quality -- in other words, Corsi and SH%.

It's a small sample size -- the Leafs had played only nine games when Mirtle's piece came out -- but let's see what happens if we take Rielly's numbers at face value and make a few estimates.

Assume Rielly gets 20 minutes of ice time per game. If 80 percent of that is at even strength. it's 16 minutes at 5-on-5. Let's call it 15 to make the calculations easier.

Since he's generating 21.6 even-strength Corsis per 60 minutes, that's 5.4 even-strength Corsis per 15-minute game.*  

*I'm assuming that the "shot attempts" in the article refers to Corsi. If it refers to Fenwick, the effect is even larger than what I'm about to calculate, because the denominators are smaller (since Fenwick leaves out blocked shots).

Rielly's shots roughly doubled since last year, so let's assume his Corsis doubled too. That means his increase from last year must be about 2.7 Corsis per game. 

Last year, those extra 2.7 Rielly shot attempts would have been passes or stickhandles. Assuming half those attempts would have eventually resulted in shots by other players, the increase due to Rielly's shooting is down to 1.3. 

How significant is 1.3 Corsis per game?  In 2013-14, the Leafs were out-Corsied 4,342 to 3,259 at even strength, giving them a league-worst 42.9 Corsi percentage. If you add in 107 Corsis to the "for" side (1.3 times 82 games), it's now 4,342 to 3,366. That would bump Toronto to 43.7 percent. Now, only second worst.

It's not huge, but it's something that would indeed show up in the stats. And, according to Mirtle, it's something that's due to a deliberate coaching decision. 

How big would the effect be if the coaches decided everyone should shoot more, instead of just one defenseman whose minutes comprise only about six percent of total player ice-time?

-------

Also, you would think those extra shots would have to result in a reduction in shooting percentage, right?  Last year, when Rielly wasn't shooting as much, it was probably because he thought he could set up a better quality shot some other way. And, I would assume, Rielly's shots are taken farther from the net than average, since defensemen usually play the point. 

You could come up with a scenario where shot quality wouldn't drop ... maybe shots from the point lead to a lot of juicy rebounds, so long shots lead to a certain number of extra dangerous shots. Sure, that's possible. But I doubt if that effect, or any other, would make up the quality difference completely. If there were *never* a tradeoff between quantity and quality, every team would be shooting all the time. So, there must be some level of "dangerousness" above which a point shot is a good idea, and below which a pass is better. For shot quality would stay the same when Rielly shoots more, all his new shots would have to come in situations where not only was the shot the best move, but the shot was SO dangerous from the point that it would even be higher quality than the best alternative from closer in.

That's unlikely to be happening if Rielly now leads the league in shot attempts by defensemen. There aren't that many ultra-super-dangerous shot opportunities, never mind ultra-super-dangerous shot opportunities that Rielly wouldn't have taken advantage of last year.

-----

As I write this, it's only eleven Leaf games into the season, which is a very small sample size.  But I checked anyway.  (Here's a YTD link that may be outdated if you're reading this later than today.)

In those 11 games, the Leafs have an above-average Corsi at 50.9% in 5-on-5 tied situations. But they've scored only 5 goals in 121 shots. That's a shooting percentage of 4.13%, dead last in the league.  

There's not enough data for that to really be meaningful, but it's interesting nonetheless.




(There are seven parts. Part VI was previousThis is Part VII.)

Labels: , , , ,