Wednesday, January 07, 2015

Predicting team SH% from player talent

For NHL teams, shooting percentage (SH%) doesn't seem to carry over all that well from year to year. Here repeated from last post, are the respective correlations: 

-0.19  2014-15 vs. 2013-14
+0.30  2013-14 vs. 2012-13
+0.33  2012-13 vs. 2011-12
+0.03  2011-12 vs. 2010-11
-0.10  2010-11 vs. 2009-10
-0.27  2009-10 vs. 2008-09
+0.04  2008-09 vs. 2007-08

(All data is for 5-on-5 tied situations. Huge thanks to puckalytics.com for making the raw data available on their website.)

They're small. Are they real? It's hard to know, because of the small sample sizes. With only 30 teams, even if SH% were totally random, you'd still get coefficients of this size -- the SD of a random 30-team correlation is 0.19.  

That means there's a lot of noise, too much noise in which to discern a small signal. To reduce that noise, I thought I'd look at the individual players on the teams.  (UPDATE: Rob Vollman did this too, see note at bottom of post.)

Start with last season, 2013-14. I found every player who had at least 20 career shots in the other six seasons in the study. Then, I projected his 2013-14 "X-axis" shooting percentage as his actual SH% in those other seasons.  

For every team, I calculated its "X-axis" shooting percentage as the average of the individual player estimates.  

(Notes: I weighted the players by actual shots, except that if a player had more shots in 2013-14 than the other years, I used the "other years" lower shot total instead of the current one. Also, the puckalytics data didn't post splits for players who spent a year with multiple teams -- it listed them only with their last team. To deal with that, when I calculated "actual" for a team, I calculated it for the Puckalytics set of players.  So the team "actual" numbers I used didn't exactly match the official ones.)

If shooting percentage is truly (or mostly) random, the correlation between team expected and team actual should be low.  

It wasn't that low. It was +0.38.  

I don't want to get too excited about that +38, because most other years didn't show that strong an effect. Here are the correlations for those other years:

+0.38  2013-14
+0.45  2012-13
+0.13  2011-12
-0.07  2010-11
-0.34  2009-10
-0.01  2008-09
+0.16  2007-08

They're very similar to the season-by-season correlations at the top of the post ... which, I guess, is to be expected, because they're roughly measuring the same thing.  

If we combine all the years into one dataset, so we have 210 points instead of 30, we get 

--------------
+0.13  7 years

That could easily be random luck.  A correlation of +0.13 would be on the edge of statistical significance if the 210 datapoints were independent. But they're not, since every player-year appears up to six different times as part of the "X-axis" variable.

It's "hockey significant," though. The coefficient is +0.30. So, for instance, at the beginning of 2013-14, when the Leafs' players historically had outshot the Panthers' players by 2.96 percentage points ... you'd forecast the actual difference to be 0.89.  (The actual difference came out to be 4.23 points, but never mind.)

-----

The most recent three seasons appear to have higher correlations than the previous four. Again at the risk of cherry-picking ... what happens if we just consider those three?

+0.38  2013-14
+0.45  2012-13
+0.13  2011-12
--------------
+0.34  3 years

The +0.34 looks modest, but the coefficient is quite high -- 0.60. That means you have to regress out-of-sample performance only 40% back to the mean.  

Is it OK to use these three years instead of all seven? Not if the difference is just luck; only if there's something that actually makes the 2011-12 to 2013-14 more reliable.  

For instance ... it could be that the older seasons do worse because of selective sampling. If players improve slowly over their careers, then drop off a cliff ... the older seasons will be more likely comparing the player to his post-cliff performance. I have no idea if that's a relevant explanation or not, but that's the kind of argument you'd need to help justify looking at only the three seasons.

Well, at least we can check statistical significance. I created a simulation of seven 30-team seasons, where each identical team had an 8 percent chance of scoring on each of 600 identical shots. Then, I ran a correlation for only three of those seven seasons, like here.

The SD of that correlation coefficient was 0.12. So, the +0.34 in the real-life data was almost three SDs above random.

Still: we did cherry-pick our three seasons, so the raw probability is very misleading.  If it had been 8 SD or something, we would have been pretty sure that we found a real relationship, even after taking the cherry-pick into account. At 3 SD ... not so sure.

-----

Well, suppose we split the difference ... but on the conservative side. The 7-year coefficient is 0.30. The 3-year coefficient is 0.60.  Let's try a coefficient of 0.40, which is only 1/3 of the way between 0.30 and 0.60.

If we do that, we get that the predictive ability of SH% is: one extra goal per X shots in the six surrounding seasons forecasts 0.4 extra goals per X shots this season.

For an average team, 0.4 extra goals is around 5 extra shots, or 9 extra Corsis.

In his study last month, Tango found a goal was only 4 extra Corsis.  Why the difference? Because our studies aren't measuring the same thing.  We were asking the same general question -- "if you combine "goals" and "shots," does that give you a better prediction than "shots" alone? -- but doing so by asking different specific questions.  

Tango asked how you predict half a team's games predict the other half. I was asking how you predict a team's year from its players' six surrounding years. It's possible that the "half-year" method has more luck in it ... or that other differences factor in, also.

My gut says that the answers we found are still fairly consistent.

---



UPDATE: Rob Vollman, of "Hockey Abstract" fame, did a similar study last summer (which I read, but had forgotten about).  Slightly different methodology, I think, but the results seem consistent.  Sorry, Rob!



Labels: , , , , ,

Thursday, December 18, 2014

True talent levels for NHL team shooting percentage, part II

(Part I is here)

Team shooting percentage is considered an unreliable indicator of talent, because its season-to-season correlation is low. 

Here are those correlations from the past few seasons, for 5-on-5 tied situations.

-0.19  2014-15 vs. 2013-14
+0.30  2013-14 vs. 2012-13
+0.33  2012-13 vs. 2011-12
+0.03  2011-12 vs. 2010-11
-0.10  2010-11 vs. 2009-10
-0.27  2009-10 vs. 2008-09
+0.04  2008-09 vs. 2007-08

The simple average of all those numbers is +0.02, which, of course, is almost indistinguishable from zero. Even if you remove the first pair -- the 2014-15 stats are based on a small, season-to-date sample size -- it's only an average of +0.055.

(A better way to average them might be to square them (keeping the sign), then taking the root mean square. That gives +0 .11 and +0.14, respectively. But I'll just use simple additions in this post, even though they're probably not right, because I'm not looking for exact answers.)

That does indeed suggest that SH isn't that reliable -- after all, there were more negative seasons than strong positive ones.

But: what if we expand our sample size, by looking at the correlation between pairs that are TWO seasons apart? Different story, now:

+0.35   2014-15 vs. 2012-13
+0.12   2013-14 vs. 2011-12
+0.27   2012-13 vs. 2010-11
+0.12   2011-12 vs. 2009-10
+0.41   2010-11 vs. 2008-09
-0.03   2009-10 vs. 2007-08

These six seasons average +0.21, which ain't bad.

------

Part of the reason that the two-year correlations are high might be that team talent didn't change all that much in the seasons of the study. I checked the correlation between overall team talent, as measured by hockey-reference.com's "SRS" rating. For 2008-09 vs. 2013-14, the correlation was +0.50.

And that's for FIVE seasons apart. So far, we've only looked at two seasons apart.

I chose 2008-09 because you'll notice the correlations that include 2007-08 are nothing special. That, I think, is because team talent changed significantly between 07-08 and 08-09. If I rerun the SRS correlation for 2007-08 vs. 2013-14 -- that is, going back only one additional year -- it drops from +0.50 to only +0.25.

On that basis, I'm arbitrarily deciding to drop 2007-08 from the rest of this post, since the SH% discussion is based on an assumption that team talent stays roughly level.

------

But even if team talent changed little since 2008-09, it still changed *some*. So, wouldn't you still expect the two-year correlations to be lower than the one-year correlations? There's still twice the change in talent, albeit twice a *small* change.

You can look at it a different way -- if A isn't strongly related to B, and B isn't strongly related to C, then how can A be strongly related to C?

Well, I think it's the other way around. It's not just that A *can* be strongly related to C. It's that, if there's really a signal within the noise, you should *expect* A to be strongly related to C.

Consider 2009-10. In that year, every team had a certain SH% talent. Because of randomness, the set of 30 observed team SH% numbers varied from the true talent. The same would be true, of course, for the two surrounding seasons, 2008-09, and 2010-11.

But both those surrounding seasons had a substantial negative correlation with the middle season. That suggests that for each of those surrounding seasons, their luck varied from the middle season in the "opposite" way. Otherwise, the correlation wouldn't be negative.

For instance, maybe in the middle season, the Original Six teams were lucky, and the other 24 teams were unlucky. The two negative correlations with the surrounding seasons suggest that in each of those seasons, maybe it was the other way around, that the Original Six were unlucky, and the rest lucky.

Since the surrounding seasons both had opposite luck to the middle season, they're likely to have had similar luck to each other. 

In this case, they are. The A-to-B correlation is -0.27. The B-to-C correlation is -0.10. But the A-to-C correlation is +0.41. Positive, and quite large.

-0.10   2010-11 (A) vs. 2009-10 (B)
-0.27   2009-10 (B) vs. 2008-09 (C)
-----------------------------------
+0.41   2010-11 (A) vs. 2008-09 (C)


------

This should be true even if SH% is all random -- that is, even if all teams have the same talent. The logic still holds: if A correlates to B the same way C correlates to B ... that means A and C are likely to be somewhat similar. 

I ran a series of three-season simulations, where all 30 teams were equal in talent. When both A and C had a similar, significant correlation to B (same sign, both above +/- 0.20), their correlation with each other averaged +0.06. 

In our case, we didn't get +0.06. We got something much bigger: +0.41. That's because the underlying real-life talent correlation isn't actually zero, as it was in the simulation. A couple of studies suggested it was around +0.15. 

So, the A-B was actually -0.25 "correlation points", relative to the trend: -0.10 relative to zero, plus -0.15 below typical. (I'm sure that isn't the way to do it statistically -- it's not perfectly additive like that -- but I'm just illustrating the point.)  Similarly, the B-C was actually -0.42 points.

Those are much larger effects when you correct them that way, so they have a stronger result. When I limited the simulation sample so both A-B and A-C had to be bigger than +/- 0.25, the average A-C correlation almost tripled, to +0.16. 

Add that +0.16 to the underlying +0.15, and you get +0.31. Still not the +0.41 from real life, but close enough, considering the assumptions I made and shortcuts I took.

------

Since we have six seasons with stable team talent, we don't have to stop at two-season gaps ... we can go all the way to five-season gaps, and pair every season with every other season. Here are the results:


         14-15  13-14  12-13  11-12  10-11  09-10  08-09  
--------------------------------------------------------
14-15           -0.19  +0.35  +0.20  +0.15  +0.46  -0.07  
13-14    -0.19         +0.30  +0.12  +0.27  -0.07  +0.42  
12-13    +0.35  +0.30         +0.33  +0.27  +0.24  +0.26  
11-12    +0.20  +0.12  +0.33         +0.03  +0.12  -0.08  
10-11    +0.15  +0.27  +0.27  +0.03         -0.10  +0.41  
09-10    +0.46  -0.07  +0.24  +0.12  -0.10         -0.27  
08-09    -0.07  +0.42  +0.26  -0.08  +0.41  -0.27


The average of all these numbers is ... +0.15, which is exactly what the other studies averaged out to. That's coincidence ... they used a different set of pairs, they didn't limit the sample to tie scores, and 14-15 hadn't existed yet. (Besides, I think if you did the math, you'd find you wanted the root of the average r-squared, which would be significantly higher than  +0.15.)

Going back to the A-B-C thing ... you'll find it still holds. If you look for cases where A-B and B-C are both significantly below the 0.15 average, A-C will be high. (Look in the same row or column for two low numbers.)  

For instance, in the 14-15 row, 13-14 and 08-09 are both negative. Look for the intersection of 13-14 and 08-09. As predicted, the correlation there is very high -- +0.42. 

By similar logic, if you find cases where A-B and B-C go in different directions -- one much higher than 0.15, the other much lower -- then, A-C should be low.

For instance, in the second row, 09-10 is -0.07, but 08-09 is +0.42. The prediction is that the intersection of 09-10 and 08-09 should be low -- and it is, -0.27.

------

Look at 2012-13. It has a strong positive correlation with every other season in the sample. Because of that, I originally guessed that 2012-13 is the most "normal" of all the seasons, the one where teams most played to their overall talent. In other words, I guessed that 2012-13 was the one with the least luck.

But, when I calculated the SDs of the 30 teams for each season ... 2012-13 was the *highest*, not the lowest. By far! And that's even adjusting for the short season. In fact, all the full seasons had a team SD of 1.00 percentage points or lower -- except that one, which was at the adjusted equivalent of 1.23.

What's going on?

Well, I think it's this: in 2012-13, instead of luck mixing up the differences in team talent, it exaggerated them. In other words: that year, the good teams got lucky, and the bad teams got unlucky. In 2012-13, the millionaires won most of the lotteries.

That kept the *order* of the teams the same -- which means that 2012-13 wound up the most exaggeratedly representative of teams' true talent.

Whether that's right or not, it seems that two things should be true:

-- With all the high correlations, 2012-13 should be a good indicator of actual talent over the seven-year span; and

-- Since we found that talent was stable, we should get good results if we add up all six years for each team, as if it was one season with six times as many games.* 

*Actually, about five times, since there are two short seasons in the sample -- 2012-13, and 2014-15, which is less than half over as I write this.

Well, I checked, and ... both guesses were correct.

I checked the correlation between 2012-13 vs. the sum of the other five seasons (not including the current 2014-15). It was roughly +0.54. That's really big. But, there's actually no value in that ... it was cherry-picked in retrospect. Still, it's just something I found interesting, that for a statistic that is said to have so little signal, a shortened season can still have a +0.54 correlation with the average of five other years!

As for the six-season averages ... those DO have value. Last post, when we tried to get an estimate of the SD of team talent in SH% ... we got imaginary numbers! Now, we can get a better answer. Here's the Palmer/Tango method for the 30 teams' six-year totals:

SD(observed) = 0.543 percentage points
SD(luck)     = 0.463 percentage points
--------------------------------------
SD(talent)   = 0.284 percentage points

That 0.28 percentage points has to be an underestimate. As explained in the previous post, the "all shots are the same" binomial luck estimate is necessarily too high. If we drop it by 9 percent, as we did earlier, we get

SD(observed) = 0.543 percentage points
SD(luck)     = 0.421 percentage points
--------------------------------------
SD(talent)   = 0.343 percentage points

We also need to bump it for the fact that this is the talent distribution for a six-season span -- which is necessarily tighter than a one-season distribution (since teams tend to regress to the mean over time, even slightly). But I don't know how much to bump, so I'll just leave it where it is.

That 0.34 points is almost exactly what we got last post. Which makes sense -- all we did was multiply our sample size by five. 

The real difference, though, is the credibility of the estimate. Last post, it was completely dependent on our guess that the binomial SD(luck) was 9 percent too high. The difference between guessing and not guessing was huge -- 0.34 points, versus zero points.  In effect, without guessing, we couldn't prove there was any talent at all!

But now, we do have evidence of talent ... and guessing adds only around 0.6 points. If you refuse to allow a guess of how shots vary in quality ... well, you still have evidence, without guessing at all, that teams must vary in talent with an SD of at least 0.284 percentage points.




Labels: , , , , ,

Tuesday, December 09, 2014

Corsi vs. Tango

Tom Tango is questioning the conventional sabermetric wisdom on Corsi. In a few recent posts, he presents evidence that Corsi can be improved upon as a predictor of future NHL team performance. Specifically: goals are important, too.

That has always seemed reasonable to me. In fact, it seems so reasonable that I wonder why it's disputed. But it is. 

Goals is just shots multiplied by shooting percentage (SH%). The consensus among the hockey research community is that their studies show SH% is not a skill that carries over from year to year. And, therefore, goals can't matter once you have shots.

I've been disputing that for a while now -- at least seven posts worth (here's the first). But I've been doing it from argument. Tango jumps right to the equations. He split seasons in half randomly, and ran a regression to try to predict one half from the other half. Goals proved to be very significant. In fact, when you try to predict, you have to weight goals *four times as heavily* as Corsis. (Five times as heavily as unsuccessful Corsis.)

In a tongue-in-cheek jab at statistics named after their inventors, he called that new statistic the "Tango." 

Despite Tango's regression results, the hockey analysts who commented still disagree. I'm still surprised by that ... the hockey sabermetrics community are pretty smart guys, very good at what they do, and a lot of them have been hired by NHL teams. I've had times when I've wondered if I'm missing something ... I mean, when it's 1 dabbler against 20 analysts who do this every day, it's probably the 20 who are right. Well, now it's two against the world instead of one ... and the second is Tango, which makes me a little more confident. 

Also ... Tango jumps right to the actual question, and proves goals significantly improve the prediction. That's hard to argue with; at least, it's harder to argue with than what I'm doing, which is attacking the assumption that shooting percentage is all random. 

You can see one response here, and Tango's reply here.  

Tango got his data from war-on-ice.com, who agreed to make it available to all (thank you!!!). I was planning to do some work with the data myself, but ... I guess Tango and I think about things from different angles, because, the more I thought about it, the more of my "usual" arguments I thought of, the less direct ones. So, there'll be another post coming soon. I'll play with the data when my thoughts wind up somewhere that I need to look at it.

For this post, a few of my observations from Tango's posts and the discussion that followed.

-----

In one of his posts, Tango wrote,


"One of the first things that (many) people did was to run a correlation of Corsi v Tango, come up with an r=.98 or some number close to 1 and then conclude: “see, it adds almost nothing”. If only that were true. "

Tango is absolutely right (you should read the whole thing). It's just another case of jumping to conclusions from a high or low correlation coefficient.

Sabermetrics is pretty good at figuring out good and bad. It has to be -- I mean, even fans and sportswriters are pretty good at it, and the whole point of sabermetrics is to do better. We're already in "high correlation" territory, able to separate the good teams and players from the bad teams and players pretty easily. 

Find a 10-year-old kid who's a serious sports fan -- any sport -- and get him to rank the players from best to worst -- whether by formula, or by raw statistics. Then, find the best sabermetric statistic you can, and rank the players that way.

I bet the correlation would be over 0.9. Just by gut.

We're already well into the .9s, when it comes to understanding hockey. Any improvements are going to be marginal, at least if you measure them by correlation. And so, it follows that *of course* Tango and Corsi are going to correlate highly. 

Also, and as Tom again points out, if Corsi already has a high correlation with something, at first glance, Tango can appear to increase it only slightly. If you start with, say, 0.92, and Tango improves it to 0.93 ... well, that doesn't look like much, intuitively. But it IS much. If you look at it another way -- still intuitively -- it was 8% uncorrelated before, and now it's only 7% uncorrelated. You've improved your predictive ability by 12%!

The point is, you have to think about what the numbers actually mean, instead of just having it click "93 isn't much bigger than 92, so who cares?"

Tom illustrates the point by noting that, even though Tango and Corsi appear to be highly correlated to each other, Tango improves a related correlation from .44 to .50. There must be some significant differences there.

-----

There's a more important argument, though, than to not underestimate how much better .93 is than .92. And that argument is: *it's not about the correlation*. 

Yes, it's nice to have a higher and higher r-squared, and be able to reduce your error more and more. But it's not really error reduction we're after. It's *knowledge*. 

It's quite possible that a model that gives you a low correlation actually gives you MORE of an understanding of hockey than a model that gives you a high correlation. Here's an easy example: the correlation between points in the standings, and whether or not you make the playoffs, is very high, close to 1.0. The correlation between your Corsi and whether or not you make the playoffs is lower, maybe 0.7 or something (depending how you do it -- which is another reason not to rely on the correlation alone). 

Which tells you more about hockey that you didn't already know? Obviously, it's the Corsi one. Everyone already knows that points determine whether you qualify for the playoffs. When you find out that shots seem to be important, that's new knowledge -- the knowledge that outshooting your opponents means something. (Of course, *what* it means is something you have to figure out yourself -- correlation doesn't tell you what type of relationship you've found.)

And that's what's going on here. Corsi has a high correlation with future winning, but Corsi *and goals* has an even higher correlation (to a significant extent). What does that tell us? 

Goals matter, not just shots. 

That's an important finding!  You can't dismiss it just because the predictions don't improve that much. If you do, you're missing the point completely. 

You wouldn't do that in other aspects of life, would you? Those faulty airbags in the news recently, the ones that kill people with shrapnel ... those constitute a small, small fraction of collision deaths. If you looked only at predicting fatalities, knowing about those airbags is a rounding error. 

But the point is not just to predict fatality rates!  Well, not for most of us. If you're an insurance company, then, sure, maybe it doesn't make that much difference to you, a couple of cents on each policy you write. But that doesn't mean the information isn't important. It's just important for different questions. Like, how can we reduce fatalities? We can reduce fatalities by replacing the defective air bags!

Also, the information means it is false to state that faulty airbags don't matter. You can still argue that they don't matter MUCH, relative to the overall total of collision deaths; that might be true. But for that, you can't argue from correlation coefficients. You have to argue from ... well, you can use the regression equation. You can say, "only 1 person in a million dies from the airbag, but 1000 in a million die from other causes."

In this case, Tango found that a goal matters four times as much as a shot. If, roughly speaking, there are 12 shots for every goal, then every 12 shots, you get 12 "points" of predictive value from the shots, and 4 "points" of predictive value from the goals. 

The ratio isn't 1000:1 like for the airbags. It's 3:1. How can you dismiss that? Not only is it important knowledge about hockey, but the magnitude of the effect is really going to affect your predictions.

-----

Why does the conventional wisdom dispute the relevance of goals? Because the consensus is that shooting percentage is random -- just like clutch hitting in baseball is random.

Why do they think that? Because the year-to-year team correlation for shooting percentage is very low.

I think it's the "low number means no effect" fallacy. Here are two studies I Googled. This one found an r-squared of .015, and here's another at .03. 

If you take the square root of those r-squareds, to give you correlations, you get 0.12 and 0.17, respectively.

Those aren't that small. Especially if you take into account how much randomness there is in shooting percentage. A signal of 0.17 in all that noise might be pretty significant.

-----

It's well-known that both Corsi and shooting percentage change with the score of the game. When you're up by one goal, your SH% goes up by almost a whole percentage point -- and your Corsi goes down by four points. When you're up two or more, the differences are even bigger. 

That's probably because when teams are ahead, they play more defensively. Their opponents, who are trailing, play more aggressively -- they press in the offensive zone more, and get more shots. 

So, teams in the lead see their shots go down. But their expected SH% goes up, because they get a lot of good scoring chances when the opposition takes more chances -- more breakaways, odd-man rushes, and so on.

It occurred to me that these score effects could, in theory, explain Tango's finding. 

Suppose Team A has 30 Corsis and 5 goals in a game, and Team B has 30 Corsis and no goals. 

Even if shooting percentage is indeed completely random, team A is probably better than team B. Why? Because, with 5 goals, team B probably had a lead most of the game. If it had 30 Corsis despite leading, it must be a much better Corsi team to overcome the “handicap” of shooting so much when it’s ahead. So, when it’s behind, it’ll probably kick the other team’s butt.

I don't think Tango's finding is *all* score effects. And, even if it were, all that would mean is that if you didn't explicitly control for score, "Tango" would be a more accurate statistic than Corsi. And most of the hockey studies I've seen *don't* control for score.

----

Here's one empirical result that might help -- or maybe won't help at all, I'm actually not sure. (Tell me what you think.)

My hypothesis has been that some teams have better shooting percentages, along with lower Corsis, because they choose to set up for higher quality shots. Instead of taking a 5% shot from the point, they take a 50/50 chance on setting up a 10% shot from closer in. 

As I have written, I think the evidence shows some support for that hypothesis. In 5-on-5 tied situations, there's a negative correlation between Corsi rate and SH%. In 2013-14 (raw stats here), it was -0.16. In the shortened 2012-13 season, it was -0.04. In 2011-12, -0.14. 

Translating the -0.14 into hockey: for every additional goal due to shot quality, teams lowered their Corsi by 2.1 shots.

That's a tradeoff of around 2 Corsis per goal. Tango found 4 Corsis per goal. Does that mean that two of Tango's four Corsis come from shot quality, and the other two come from score effects?

Not sure. There's probably too much randomness in there anyway to be confident, and I'm not completely sure that they're measuring the same thing. But, there it is, and I'll think about it some more.

-----

UPDATE, 30 minutes later: Colby Cosh pointed out, on Twitter, that Tango's regression used Corsi and goal *differential* -- offense minus defense.  That means the "goals against" factor is partially a function of save percentage, which partially reflects goalie talent, which, of course, carries over into the future. So, goalie talent absolutely has to be part of the explanation of the "goals" term.


So now we have two factors: score effects and goalie effects. Could that fully explain the predictive value of goals, without resort to shot quality?  I'll have to think about the actual numbers, whether the magnitudes could be high enough to cover the full "4 shots" coefficient.








Labels: , , , , ,

Saturday, August 30, 2014

Is MLB team payroll less important than it used to be?

As of August 26, about 130 games into the 2014 MLB season, the correlation between team payroll and wins is very low. So low, in fact, that *alphabetical order* predicts the standings better than salaries!

Credit that discovery to Brian MacPherson, writing for the Providence Journal. MacPherson calculated the payroll correlation to be +0.20, and alphabetical correlation to be +0.24. 

When I tried it, I got .2277 vs. .2346 -- closer, but alphabetical still wins. (I might be using slightly different payroll numbers, I used winning percentage instead of raw win totals, and I may have done mine a day or two later.)

The alphabetical regression is cute, but it's the payroll one that raises the important questions. Why is it so low, at .20 or .23? When Berri/Schmidt/Brook did it in "The Wages of Wins," they got around .40.

It turns out that the season correlation has trended over time, and MacPherson draws a nice graph of that, for 2000-2014. (I'll steal it for this post, but link it to the original article.)  Payroll became more important in the middle of last decade, but then dropped quickly, so that 2012, 2013, and 2014 are the lowest of all 15 years in the chart:






What's going on? Why has the correlation dropped so much?

MacPherson argues it's because it's getting harder and harder to buy wins. There is an "inability of rich teams to leverage their financial resources."  The end of the steroids era  means there are fewer productive free-agent players in their 30s for teams to buy. And the pool of available signings is reduced even further, because smaller-market teams can better afford to hang on to their young stars.


"Having money to spend remains better than not having money to spend. That might not ever change. Unfortunately for the Red Sox and their brethren, however, it matters far less than it once did."


------

My thoughts:

1.  The observed 2014 correlation is artificially low, because it's taken after only about 130 games (late-August), instead of a full season. 

Between now and October, you'd expect the SD due to luck to drop by about 12 percent. So, instead of 2 parts salary to 8 parts luck (for the current correlation of .20), you'll have 2 parts salary to 7.2 parts luck. That will raise the correlation to about .22.

Well, maybe not quite. The non-salary part isn't all binomial luck; there's some other things there too, like the distribution of over- and underpriced talent. But I think .22 is still a reasonable projection.

It's a small thing, but it does explain a tenth of the discrepancy.

------

2.  The lower correlation doesn't necessarily mean that it's harder to buy wins. As MacPherson notes, It could just mean that teams are choosing not to do so. More specifically, that teams are closer in spending than they used to be, so payroll doesn't explain wins as well as it used to.

Here's an analogy I used before: in Rotisserie League Baseball, there is a $260 salary cap. If everyone spends between $255 and $260, the correlation between salary and performance will be almost zero -- the $5 isn't enough of a signal amidst the noise. But: if you let half the teams spend $520 instead, you're going to get a much higher correlation, because the high-spending half will do much, much better than the lower-spending half.

That could explain what's happening here.

In 2006, the SD of payroll was around 42% of the mean ($32MM, $78MM). In 2014, it was only 38% ($43MM, $115MM). It doesn't look that much different, but ... teams this year are 10 percent closer to each other than they were, that has to be contributing to the difference.

(This is the first time I've done something where "coefficient of variation" (the SD divided by the mean) helped me, here as a way to correct SDs for inflation.

Also, this is a rare (for me) case where the correlation (or r-squared) is actually more relevant than the coefficient of the regression equation. That's because we're debating how much salary explains what we've actually observed -- instead of the usual question of much salary leads to how many more wins.)


------

3.  While doing these calculations, I noticed something unusual. The 2014 standings are much tighter than normal. 

So far in 2014, the SD of team winning percentage is .058 (9.4 games per 162). In 2006, the SD was larger, at .075 (12.2 games per 162). That might be a bit high ... I think .068 (11 games per 162) is the recent historical average.

But even 9.4 compared to 11 is a big difference.  It's even more significant when you remember that the 2014 figure is based on only 130 games. (I'd bet the historical average for late-August would be between 12 and 13 games, not 11.)

What's going on? 

Well, it could be random luck. But, it could be real. It could be that team talent "inequality" has narrowed -- either because of the narrowing of team spending (which we noted), or because all the extra spending isn't buying much talent these days.

I think the surrounding evidence shows that it's more likely to be random luck. 

Last year, the SD of team winning percentage was at normal levels -- .074 (12.04 games per 162). It's virtually impossible for the true payroll/wins relationship to have changed so drastically in the off-season, considering the vast majority of payrolls and players stay the same from year to year.

Also, it turns out that even though the correlation between 2014 payroll and 2014 wins is low, the correlation between 2014 payroll and 2013 wins is higher. That is: this year's payroll predicts last year's wins (0.37) better than it predicts this year's wins (0.23)! 

Are there other explanations than 2014 being randomly weird? 

Maybe the low-payroll teams have young players who improved since last year, and the high-payroll teams have old players who declined. You could test that: you could check if payroll correlates better to last year's wins than this year's for all seasons, not just 2013-2014.

If that happened to be true, though, it would partially contradict MacPherson's hypothesis, wouldn't it? It would say that the money teams spend on contracts *do* buy wins as strongly as before, but those wins are front-loaded relative to payroll.

We can see how weird 2014 really is if we back out the luck variance to get an estimate of the talent variance.

After the first 130 games of 2014, the observed SD of winning percentage is .058. After 130 games, the theoretical SD of winning percentage due to luck is .044.

Since luck is generally independent of talent, we know

SD(observed)^2 - SD(luck)^2 = SD(talent)^2 

Plugging in the numbers: .058 squared minus .044 squared equals .038 squared. That gives us an estimate of SD(talent) of .038, or 6.12 games per 162.

I did the same calculation for 2013, and got 10.2.

2013: Talent SD of 10.2 games
2014: Talent SD of  6.1 games

That kind of drop in one off-season pretty much impossible, isn't it? 

If that kind huge a compression were real, it would have to be due to huge changes in the off-season -- specifically, a lot of good players retiring, or moving from good teams to bad teams.

But, the team correlation between 2013 wins and 2014 wins is +0.37. That's a bit lower than average, but not out of line (again, especially taking the short season into account). 

It would be very, very coincidental if the good teams got that much worse while the bad teams got that much better, but the *order* of the standings didn't change any more than normal.

So, I think a reasonable conclusion is that it's just random noise that compressed the standings. This year, for no reason, the the good teams have tended to be unlucky while the bad teams have tended to be lucky. And that narrowed the distance between the high-payroll teams and the low-payroll teams, which is part of the reason the payroll/wins correlation is so low. 

------

4. We can just look at the randomness directly, since the regression software gives us confidence intervals. 

Actually, it only gives an interval for the coefficient, but that's good enough. I added 2 SDs to the observed value, and then worked backwards to figure out what the correlation would be in that case. It came out to 0.60. 

That's huge!  The confidence interval actually encompasses every season on the graph, even though 2014 is the lowest of all of them.

To confirm the 0.60 number, I used this online calculator. If the true correlation for the 30 teams is 0.4, the 95% confidence interval goes up to 0.66, and down to 0.05. That's close to my calculation for the high end, and easily captures the observed value of 0.23 in its low end. 

That's not to say that I think they really ARE all the same, that the differences are just random -- I've never been a big fan of throwing away differences just because they don't meet significance thresholds. I'm just trying to show how easy it is that it *could be* random noise.

I can try to rephrase the confidence interval argument visually. Here's the actual plot for the 2014 teams:




The correlation coefficient is a rough visual measure of how closely the dots adhere to the green regression line. In this case, not that great; it's more a cloud than a line. That's why the correlation is only 0.23.

Now, take a look at the teams between $77 million and $113 million, the ones in the second rectangle from the left.

There are eighteen teams in that group bunched into that small horizontal space, a payroll range of only $46 million in spending. Even at the historically high correlations we saw last decade, and even if the entire difference was due to discretionary free-agent spending, the true talent difference in that range would be only about 3 or 4 games in the standings. That would be much smaller than the effects of random chance, which would be around 12 games between luckiest and unluckiest. 

What that means is:  no matter what happens, that second vertical block is dominated by randomness, and so the dots in that rectangle are pretty much assured of looking like a random cloud, centered around .500. (In fact, for this particular case, the correlation for that second block is almost perfectly random, at -.002.)

So those 18 teams don't help much. How much the overall curve looks like a straight line is going to depend almost completely on the remaining 12 points, the high-spending and low-spending teams. In our case, the two low-spending teams are somewhat worse than the cloud, and the ten high-spending teams are somewhat better than the cloud, so we get our positive correlation of +0.23. 

But, you can see, those two bad teams aren't *that* bad. In fact, the Marlins, despite the second-lowest payroll in MLB, are playing .496 ball.

What if we move the Marlins down to .400? If you imagine taking that one dot, and moving it close to the bottom of the graph, you'll immediately see that the dots would get a bit more linear. (The line would get steeper, too, but steepness represents the regression coefficient, not the correlation, so never mind.)  I made that one change, and the correlation went all the way up to 0.3. 

Let's now take the second-highest-payroll Yankees, and move them from their disappointing  .523 to match the highest-payroll Dodgers, at .564. Again, you can see the graph will get more linear. That brings the correlation up to 0.34 -- almost exactly the average season, after mentally adjusting it a bit higher for 162 games.

Of course, the Marlins *aren't* at .400, and the Yankees *aren't* at .564, so the lower correlation of 0.23 actually stands. But my point is not to argue that it should actually be higher -- my point is that it only takes a bit of randomness to do the trick. 

All I did was move the Marlins down by less than 2 SDs worth of luck, and the Yankees by less than 1 SD worth of luck. And that was enough to bump the correlation from historically low, to historically average.

------

5. Finally: suppose the change isn't just random luck, that there's actually something real going on. What could it be?

-- Maybe money doesn't matter as much any more because low-spending teams are getting more of their value from arbs and slaves. They could be doing that so well that the high-spending teams are forced to spend more on free agents just to catch up. It wouldn't be too hard to check that empirically, just by looking at rosters.

-- It could be that, as MacPherson believes, there are fewer productive free agents to be bought. You couuld check that easily, too: just count how many free agents there are on team rosters now, as compared to, say, 2005. If MacPherson is correct, that careers are ending after fewer years of free agency, that should show up pretty easily.

-- Maybe teams just aren't as smart as they used to be about paying for free agents. Maybe their talent evaluation isn't that great, and they're getting less value for their money. Again, you could check that, by looking at free-agent WAR, or expected WAR, and comparing it to contract value.

-- Maybe teams don't vary as much as they used to, in terms of how many free-agent wins they buy. I shouldn't say "maybe" -- as we saw, the SD of payroll, adjusted for inflation, is indeed lower in 2014 than it was in 2006, by about 10 percent. So that would almost certainly be part of the answer. 

-- More specifically: maybe the (otherwise) bad teams *more* likely to buy free agents than before, and the (otherwise) good teams are *less* likely to buy free agents than before. That actually should be expected, if teams are rational. With more teams qualifying for the post-season, there's less point making yourself into a 98-win team when a 93-win team will probably be good enough. And, even an average team has a shot at a wild card, if they get lucky, so why not spend a few bucks to raise your talent from 79 games to (say) 83 games, like maybe the Blue Jays did last year?

-----

I'll give you my gut feeling, but, first a disclaimer: I haven't really thought a whole lot about this, and some of these argument occurred to me as I wrote. So, keep in mind that I'm really just thinking out loud.

On that basis, my best guess is ... that most of the correlation drop is just random noise. 

I'd bet that money buys free agents just as reliably as always, and at the usual price. The correlation is down not because spending buys fewer wins, but because more equal spending makes it harder for the regression to notice the differences.

But I'm thinking that part of the drop might really be the changing patterns of team spending, as MacPherson described. I wonder if that knot of 18 mid-range teams, clustered in such a small payroll range, might be a permanent phenomenon, resulting from more small-market teams moving up the payroll chart after deciding their sweet spot should be a little more extravagant than in the past. 

Because, these days, it doesn't take much to almost guarantee a team a reasonable shot at a wildcard spot -- which means, meaningful games later in the season than before, which means more revenue. 

In fact, that's one area where it's not zero-sum among teams. If most of the fan fulfillment comes from being in the race and having hope, any team can enter the fray without detracting much from the others. What's more exciting for fans -- being four games out of a wildcard spot alone, or being four games out of a wildcard spot along with three other teams? It's probably about the same, right? 

Which makes me now think, the price of a free agent win could indeed change. By how much? It depends on how increased demand from the small market teams compares to decreased demand from the bigger-spending teams.

------

Anyway, bottom line: if I had to guess the reasons for the lower correlation:

-- 80% randomness
-- 20% spending patterns

But you can get better estimates with some research, by checking all those things I mentioned, and any others you might think of.





Hat Tip: Craig Calcaterra


Labels: , , ,