Sunday, May 29, 2016

Leicester City and EPL talent evaluation

Last post, I wondered how accurate bookmakers are at evaluating English Premier League (EPL) team talent. In the comments, Tom Tango suggested a method to figure that out, but one that requires knowing the bookies' actual "over/under" forecasts. I couldn't find those numbers, , but bloggers James Grayson and Simon Gleave came to my rescue with three seasons' worth of numbers, and another commenter provided a public link for 2015-16.

(BTW, Simon's blog contains an interesting excerpt from "Soccermatics," a forthcoming book on the sabermetrics of soccer. Worth checking out.)

So, armed with numbers from James and Simon, what can we figure out?

1. SD of actual team points

Here is the SD of team standings points for the previous three seasons:

2015-16: 15.4
2014-15: 16.3
2013-14: 19.2
Average  17.1

The league was quite unbalanced in 2013-14 compared to the other two seasons. That could have been because the teams were unbalanced in talent, or because the good teams got more lucky than normal and the bad teams more unlucky than normal. For this post, I'm just going to use the average of the three seasons (by taking the root mean square of the three numbers), which is 17.1.

2. Theoretical SD of team luck

I ran a simulation to figure the SD of points just due to luck -- in other words, if you ran a random season over and over, for the same team, how much would it vary in standings points even if its talent were constant? I would up with a figure around 8 points. It depends on the quality of the team, but 8 is pretty close in almost all cases.

3. Theoretical SD of team talent

By the pythagorean relationship between talent and luck, we get

SD(observed) = 17.1
SD(luck)     =  8.0
SD(talent)   = 15.1

4. SD of bookmaker predictions

If teams vary in talent with an SD about 15 points, then, if the bookmakers were able to evaluate talent perfectly, their estimates would also have an SD of 15 points. But, of course, nobody is able to evaluate that well. For one thing, talent depends on injuries, which haven't happened yet. For another thing, talent changes over time, as players get older and better or older and worse. And, of course, talent depends on the strategies chosen by the manager, and by the players on the pitch in real time. 

So, we'd expect the bookies' predictions to have a narrower spread than 15 points. They don't:

16.99 -- 2015-16 Pinnacle (opening)
14.83 -- 2015-16 Pinnacle (closing)
16.17 -- 2015-16 Sporting Index (opening)
15.81 -- 2015-16 Spreadexsports (opening)
17.30 -- 2014-15 Pinnacle (opening)
17.37 -- 2014-15 Pinnacle (closing)
16.95 -- 2014-15 Sporting Index (opening)
15.80 -- 2014-15 Sporting index (closing)
15.91 -- 2013-14 Pinnacle

Only one of the nine sets of predictions is narrower than the expectation of team talent, and even that one, barely. This surprised me. In the baseball case, the sports books projected a talent spread that was significantly more conservative than the actual spread. 

Either the EPL bookmakers are overoptimistic, or the last three Premier League seasons had less luck than the expected 8.0 points. 

5.  Bookmaker accuracy

If the bookmakers were perfectly accurate in their talent estimates, we'd expect their 20 estimates to wind up being off by an SD of around 8 points, because that's the amount of unpredictable performance luck in a team-season.

In 2014-15, that's roughly what happened:

7.85 -- 2014-15 Pinnacle (opening)
6.37 -- 2014-15 Pinnacle (closing)
6.90 -- 2014-15 Sporting Index (opening)
7.75 -- 2014-15 Sporting index (closing)

Actually, every one of the bookmakers' lines was more accurate than 8 points! In effect, in 2014-15, the bookmakers exceeded the bounds of human possibility -- they predicted better the speed of light. What must have happened is: in 2014-15, teams just happened to be less lucky or unlucky than usual, playing almost exactly to their talent. 

But the predictions for 2015-16 were way off:

15.17 -- 2015-16 Pinnacle (opening)
14.96 -- 2015-16 Pinnacle (closing)
15.13 -- 2015-16 Sporting Index (opening)
14.96 -- 2015-16 Spreadexsports (opening)

And 2013-14 was in between:

9.77 -- 2013-14 Pinnacle

Again, I'll just go with the overall SD of the three seasons, which works out to about 11 points.

15 points -- 2015-16
 7 points -- 2014-15
10 points -- 2013-14
11 points -- average

Actually, 11 points is pretty reasonable, considering 8 is the "speed of light" best possible long-term performance.

6. Bookmaker talent inaccuracy

If 11 points is the typical error in estimating performance, what's the average error in estimating talent? That's an easy calculation, by Pythagoras:

 11 points -- observed error
  8 points -- luck error
  8 points -- talent error

That 8 points for talent should really be 7.5, but I'm arbitrarily rounding it up to create a rule of thumb that "talent error = luck error". 

7. Bookmaker bias

In step 4, it looked like the bookmakers were overconfident, and predicting a wider spread of talent than actually existed. In other words, it looked like they were trying to predict luck.

If they did that, it would have to mean they were overestimating the good teams, and underestimating the bad teams. That's the only way to get a wider spread.

But, in 2013-14, it was the opposite! The correlation between the bookies' prediction and the eventual error was -0.07. (The "error" includes the sign, of course. The argument isn't that the bookies are more wrong for good teams and bad teams, it's that they're more likely to be wrong in a particular direction.)

In other words, even though Pinnacle seemed to be trying to predict team luck, it worked out for them!

Which means one of these things happened:

1. Pinnacle got really lucky, and their guesses for which teams would have good luck actually worked out;

2. We're wrong in thinking Pinnacle was overconfident by that much. In other words, the spread of talent is wider than we thought it was. Remember, 2013-14, was more unbalanced than the other two seasons we looked at.

I think it's some of each. The SD(talent) estimate for 2013-14 came out to 17.5 points. In that light, Pinnacle's 15.9-point SD isn't *that* overconfident.

... In 2014-15, on the other hand, Pinnacle *did* overestimate the spread. The better the closing line on the team, the less extreme it performed, with a correlation of +0.35. Sporting Index, with their more conservative line, correlated only at +0.11.

Part of the reason the correlations are so high is because that was the year random luck balanced out so much more than usual. If teams were moving all over the place in the standings for random reasons, that would tend to hide the bookmakers' tendency to rate the teams too extreme. 

... Finally, we come to 2015-16. Now, we see what looks like very strong evidence of overconfidence. For the Pinnacle closing line, the correlation between estimate and overestimate is +.46. The other bookmakers are even higher, at +.52 and +.50.

Much of that comes from two teams. First, and most obviously, Leicester City, predicted at 40.5 points but actually winding up at 81. Second, Chelsea, forecasted the best team in the league at 83 points, but finishing with only 50.

These don't really seem to fit the narrative of "the bookies know who the good and bad teams are, but just tend to overestimate their goodness and badness." But, they kind of do fit the narrative. Favorites like Chelsea are occasionally going to have bad years, so you're going to have an occasional high error. But, that error will be even higher if you overestimated them in the first place.


OK, there's the seven sets of numbers we got from James' and Simon's data. What can we conclude?

Well, the question I wanted to answer was: how much are the bookmakers typically off in estimating team talent? Our answer, from #5: about 8 points.

But ... I'm not that confident. These are three weird seasons. Last  year, we have Leicester City and their 5000:1 odds. The season before, we have "better than speed of light" predictions, meaning luck cancelled out. And, two years ago, as we saw in #1, we had a lot more great and awful teams than the other two seasons, which suggests that 2013-14 might be an outlier as well.

I'd sure like to have more seasons of data, maybe a decade or so, to get firmer numbers. For now, we'll stick with 8 points as our estimate.

An eight-point standard error means that, typically, one team per season will be mispriced by 16 points or more. That's not necessarily exploitable by bettors. For one thing, bookmaker prices match public perception, so it's hard to be the one genius among millions who sees the exact one team that's mispriced. For another thing, some of what I'm calling "talent" is luck of a different kind, in terms of injuries or players learning or collapsing.

We still have the case that Leicester City was off by around 40 points. That's 5 SDs if you think it was all talent. It's also 5 SDs if you think it was all luck. 

The "maximum likelihood," then, if you don't know anything about the team, would be if it were 2.5 SDs of each. The odds of that happening are about 1 in 13,000 (1 in 26,000 for each direction). 

My best guess, though, is to trust the bookmakers' current odds of about 30:1 as an estimate of what Leicester City should have been. How do we translate 30:1 into expected points? As it turns out, Liverpool was 28:1, with an over/under of 66. So let's use 66 points as our true talent estimate.

Under that assumption, Leicester City beat their talent by 15 points of luck (81 minus 66), or a bit less than 2 SD. And their assumed true talent of 66 points beat the bookmakers' estimate of 40 by 26 points, which is 3.25 SD.

That seems much more plausible to me. 

Becuase ... I think it's reasonable to think that luck errors are normally distributed. But I don't think we have any reason to believe that human errors, in estimating team talent, also follow a normal distribution. It seems to me that Leicester City could be a black swan, one that just confounded the normal way bettors and fans thought about performance. They may have been a Babe Ruth jumping into the league -- someone who saw you could win games by breaking the assumptions that led to the typical distribution of home runs.

So, when we see that Leicester was 3.25 SD above the public's estimate of their true talent ... I'm not willing to go with the usual probability of a 3.25 SD longshot (around 1 in 1700). I don't know what the true probability is, but given the "Moneyball" narrative and the team's unusual strategy, I'd suspect those kinds of errors are more common than the normal distribution would predict.

Even if you disagree ... well, with 20 teams, a 1 in 1700 shot comes along every 85 years. It doesn't seem too unreasonable to assume we just saw the inevitable "hundred year storm" of miscalculation.

And, either way, the on-field luck you have to assume -- 15 points -- is less than two standard deviations, which isn't that unusual at all.

So that's my best guess at how you can reasonably get Leicester City to 81 points. 

Labels: , , , , , , ,

Monday, May 23, 2016

How much of Leicester City's championship was luck?

How much of Leicester City's run to the Premier League championship was just luck? I was curious to get a better gut feel for how random it might have been, so I wrote a simulation. 

Specifics of the simulation are in small font below. The most important shortcoming, I think, was that I kept teams symmetrical, instead of creating a few high-spending "superteams" like actually exist in the Premier League (Chelsea, Manchester United, Arsenal, etc.). Maybe I'll revisit that in a future post, but I'll just go with it as is for now.


Details: For each simulated season, I created 20 random teams. I started each of them with a goal-scoring and goal-allowing talent of 1.35 goals per game (about what the observed figure was for 2015-16). Then, I gave each a random offensive and defensive talent adjustment, each with mean zero and SD of about 0.42 goals per game. For each season, I corrected the adjustments to sum to zero overall. I played each game of the season assuming the two teams' adjustments were additive, and used Poisson random variables for goals for and against. I didn't adjust for home field advantage. 


At the beginning of the season, Leicester City was a 5000:1 longshot. What kind of team, in my simulation, actually showed true odds of 5000:1? We can narrow it down to teams with a goal differential (GD) talent of -4 to -9 for the season. In 500,000 random seasons, here's how many times those teams won:

tal   #tms   ch  odds
 -9  166135  20  8307
 -8  168954  25  6758
 -7  171272  26  6587
 -6  173327  22  7879
 -5  175017  53  3302
 -4  177305  61  2907
    1032010 207  4986

In 500,000 seasons of the simulation, 1,032,010 teams had a GD talent between -3 and -9. Only 207 of them won a championship, for odds of 4,985:1 against, which is close to the 5000:1 we're looking for. 

Even half a million simulated seasons isn't enough for randomness to even out, which is why the odds don't decrease smoothly as the teams get better. I'll maybe just go with a point estimate of -8. In other words, for Leicester City to be a 5000:1 shot to win the league, their talent would have to be such that you'd expect them to be outscored by 8 goals over the course of the 38-game season. Maybe it might be 7 goals instead of 8, but probably not 6 and probably not 9.  (I guess I could run the simulation again to be more sure.)


Leicester City actually wound up outscoring their opponents by 32 goals last year. Could that be luck? What's the chance that a team that should be -8 would actually wind up at +32? That's a 40 goal difference -- Leicester City would have had to be lucky by more than a goal a game.

The SD of goal differential is pretty easy to figure out, if you assume goals are Poisson. Last season, the average game had 1.35 goals for each team. In a Poisson distribution, the variance equals the mean, so, for a single game, the variance of goal differential is 2.70. For the season, multiply that by 38 to get 102.6. For the SD, take the square root of that, which is about 10.1. Let's just call it 10.

So, a 40-goal surprise is about four SDs from zero. Roughly speaking, that's about a 1 in 30,000 shot.


If we were surprised that Leicester City won the championship, we should be even *more* surprised that they went +32. In fact, we should be around six times more surprised!

Why are the "+32" odds so much worse than the "championship" odds? Because, on those rare occasions when a simulated -8 team wins the championship, it usually does it with much less than a +32 performance. Maybe it goes +20 but gets "Pythagorean luck" and winds lots of close games. Maybe it goes +17 but the other teams have bad luck and it squeaks in.

If you assume that a team that actually scores +32 in a season has, say, a 3-in-10 chance of winning the championship, then the odds of both things happening -- a -8 talent team going +32 observed, and that being enough to win -- is 1 in 100,000. Well, maybe a bit less, because the two events aren't completely independent.


The oddsmakers have priced Leicester City at around 25:1 for next season. That's a decent first guess for what they should have been this year.

Except ... in retrospect, Leicester should probably have been even better than 25:1 this season (you'd expect them to decline in talent next year -- they have an older-than-average team, they may lose players in the off-season, and other teams should catch on to their strategy). On the other hand, MGL says oddsmakers overcompensate for unexpected random events that don't look random. 

Those two things kind of cancel out. But, commenter Eduardo Sauceda points out that bookmakers build a substantial profit margin into a 20-way bet, so let's lower last season's "true" odds to 35:1, as an estimate.

According to the simulation, for a team to legitimately be a 35:1 shot, its expected goal differential for the season would have to be around +16.

Taking all this at face value, we'd have to conclude:

1. The bookies and public thought Leicester City was a -8 talent, when, in reality, it was a +16 talent. So, they underestimated the club by 24 goals.

2. Leicester City outperformed their +16 talent by 16 goals.

3. And, while I'm here ... the simulation says a team with a +32 GD averages 74 points in the standings. Leicester wound up at 81 points. So, maybe they were +7 points in Pythagorean luck.  


One thing you notice, from all this, is how difficult it is to set good odds on longshots, when you can't estimate true talent well enough.

Suppose you analyze a team as best you can, and you conclude that they should be a league average team, based on everything you know about their players and manager. (I'm going to call them a ".500 team," which ignores, for now, the Premier League scoring asymmetry of three points for a win and one point for a draw.)

You run a simulation, and you find that a .500 team wins the championship once every 770 simulated seasons. If the simulation is perfect, can you just go and set odds of 769:1, plus vigorish?

Not really. Because you haven't accounted for the fact that you might be wrong that Everton is a .500 team. Maybe they're a .450 team, or a .600 team, and you just didn't see it. 

But, isn't there a symmetry there? You may be wrong that they're exactly average in talent, but if your analysts' estimate is unbiased, aren't they just as likely to be -8 as they are to be +8? So, doesn't it all even out?

No, it doesn't. Because even if the error in estimating talent is symmetrical, the resulting error in odds is not. 

By the simulation, a team with .500 talent is about 1 in 940 to win the championship. But, what if half the time you incorrectly estimate them at -8, and half the time you incorrectly estimate them at +8?

By my simulation, a team with -8 GD talent is 1 in 6,758 to win. A team with +8 talent is 1 in 157. The average of those two is not 1 in 940, but, rather, 1 in 307. 

If you're that wildly inaccurate in your talent evaluations, you're going to be offering 939:1 odds on what is really only a 307:1 longshot. Even if you're conservative, going, say, 600:1 instead of 939:1, you're still going to get burned.

This doesn't happen as much with favorites. In my simulation, a +30 team was 1 in 5.4. The average of a +22 team and a +38 team is 1 in 4.7. Not as big a difference. Sure, it's probably still enough difference to cost the bookmakers money, but I bet the market in favorites is competitive enough that they've probably figured out other methods to correct for this and get the odds right.


Anyway, the example I used had the bookies being off by exactly 8 goals. Is that reasonable? I have no idea what the SD of "talent error" is for bookmakers (or bettors' consensus). Could it be as high as 8 goals? 

For the record, the calculation of SD(talent) for 2014-15 (the season before Leicester's win), using the "Tango method," goes like this:

SD(observed) = 22.3 goals
SD(luck)     = 10   goals
SD(talent)   = 19.9 goals

For a few other seasons I checked:

2015-16  SD(talent) = 19.9
2013-14  SD(talent) = 27.8
1998-99  SD(talent) = 18.3 

In MLB, the SD of talent is about 9 wins. How well, on average, could you evaluate a baseball team's talent for the coming season? Maybe, within 3 wins, on average? That's a third of an SD.

In the Premier League, a third of an SD is about 6 goals. But evaluation is harder in soccer than in baseball, because there are strategic considerations, and team interactions make individual talent harder to separate out. So, let's up it to 9 goals. Offsetting that, the public consensus for talent -- as judged by market prices of players -- reduces uncertainty a bit. So, let's arbitrarily bring it back down to 8 goals. 

That means ... well, two SDs is 16 goals. That means that in an average year, the public overrates or underrates one team's talent by 16 goals. That seems high -- 16 goals is about 10 points in the standings. But, remember -- that's just talent! If luck (with an SD of 10 goals) goes the opposite direction from the bad talent estimate, you could occasionally see teams vary from their preseason forecast by as many as 36 or 46 goals.

Does that happen? What's the right number? Anyone have an idea? At the very least, we now know it's possible once in a while, to be off by a lot. In this case, it looked like everyone underestimated the Foxes by (maybe) 24 goals in talent.


In light of all this, bookmaker William Hill announced that, next year, they will not be offering any odds longer than 1,000:1. When I first read that, I thought, what's the point? If they had offered a thousand to one on Leicester City, they still would have lost a lot of money, if the true odds were 35:1.

But ... now I get it. Maybe they're saying something like this: "A Premier League team with middle-of-the-road talent -- one that you'd expect to score about as many goals as it allows -- has about a 1 in 1,000 chance of winning the championship. We're not confident enough that we can say, of any bad team, that they can't change their style of play to become average, or that they haven't improved to average over the off-season, or that they've been a .500 team all along but we've just been fooled by randomness. So, we're never again going to set odds based on an evaluation that a team's talent is significantly worse than average, because the cost of a mistake is just too high."

That makes a certain kind of sense. And the logic makes me wonder: were the odds on extreme longshots always strongly biased in bettors' favor, but nobody realized it until now?

(My previous post on the Leicester City is here.)

Labels: , , ,

Thursday, May 12, 2016

How did Leicester City do it?

At the start of the season, you could get 5000:1 odds on Leicester City F.C. winning the 2015-16 English Premier League Championship. Of course, Leicester did win, in what one writer called "the unlikeliest feat in sports history".

A friend wrote me that Leicester City is often said to have "defied the odds" to finish on top. That's a metaphor, of course; odds aren't something you can literally "defy," like a bad law or your supervisor's instructions. What does the metaphor mean? To me, it implies that the odds actually *were* 5000:1, that the team did actually hit the longshot outcome, that they were the "1" instead of one of the "5000". 

Let's suppose, for whatever reason, I offer you 5000:1 odds that a fair coin will land heads. You bet $10, the coin does land heads, and I pay you $50,000. Did you really "defy" the 5000:1 odds? That doesn't sound right. At best, you "defied" odds of 1:1.

So, the question is: at the start of the year, was Leicester City's expectation really a 1 in 5,001 chance? Were the Foxes really that bad a team, in terms of talent? The evidence suggests not.

After 17 of the season's 38 matches, Leicester City sat at the top of the table (I think "top of the table" is English for "first in the standings"), two points up on second-place Arsenal, and fourth overall in goal differential (+13, behind teams at +17, +14, and +14). 

Suppose Leicester were truly a bad team, and had just had a run of good luck. In that case, what would their chance be of hanging on to win the championship? Still pretty poor, right? They're only two points up, with several superior teams right on their tail.

But, the bookmakers had them solidly in the mix. Here are the revised odds on December 21, after 17 matches:

                       Odds  Pts
1.  Leicester          10:1   38
2.  Arsenal            10:11  36
3.  Manchester City    15:8   32
4.  Tottenham          20:1   29
5.  Manchester United  18:1   29
9.  Liverpool          22:1   24
15. Chelsea            66:1   18

Clearly, Leicester City is still considered a lower-quality team than its rivals. Despite Arsenal trailing by two points, bookmakers still give it ten times as much chance of winning as Leicester.

But ... the odds against Leicester hanging on were only 10:1. If the Foxes were really as gawd-awful a team as was thought at the beginning, their odds would be much worse.

After every Premier League season, the bottom three teams are "relegated" to the lower-tier Championship League, while that lower league's best three teams are promoted to replace them. At the beginning of the season, Leicester City was thought to have a 25 percent chance of relegation -- you could get 3:1 odds that they'd be in the bottom three. If they were still thought to be that bad, wouldn't they be much worse than 10:1 to win it all?

For a mirror-image comparison, look at Chelsea, one of the league's elite teams. After those first 17 matches, Chelsea sat 20 points behind Leicester, in fifteenth place out of twenty teams. Despite the poor start, they were still given a 1-in-67 chance (66:1 against) of coming all the way back to take first place. That's because Chelsea was understood to be a very skilled team -- they were the 13:8 favorite when the season started. 

Now: if Leicester were as bad as Chelsea were good, you'd think the chance of them dropping to relegation would be about the same as the odds of Chelsea rising to the top. Right? It's kind of symmetrical. Not completely, because Leicester is at the top and Chelsea is only *near* the bottom. But, that's mitigated by Chelsea needing to be *the* top team, and Leicester City needing only to be in the bottom four.

Not perfectly symmetrical, but reasonable.

But: the symmetry doesn't extend to those mid-season odds. A Chelsea comeback was pegged at 66-1. But a Leicester collapse was 3500:1.

Clearly, by December 21, the betting market evaluated that Leicester was a pretty good team. Not a great team, but a good team. For more evidence of that, they're pegged at around 25:1 to repeat next year. The bookmakers clearly don't think Leicester City was a bad team that just got very, very, very lucky.

So, I'd argue that Leicester didn't "defy the odds."  They were just a better team than 5000:1 from the beginning.


Well, that might not be strictly true. Maybe as the season started, they were an awful team, but they got better quickly. Maybe in week 2, they signed the soccer equivalent of Babe Ruth and Wayne Gretzky and Peyton Manning and Michael Jordan and Pele. (Sorry, I don't know anything about soccer ... "Pele" was the best I could do.)  Or maybe the coach figured out a strategy to take a bunch of mediocre players and make them great (or make the team able to win despite the players' mediocrity).

But, it seems more likely to me that the team was just good from the beginning, and the oddsmakers got it wrong. Well, more importantly, the community of betting soccer fans got it wrong. Because, you'd think, if even a few sharp bettors figured out that Leicester was a pretty good team, they would have moved the odds -- not just by betting on a championship, but probably on available side bets too. 

So, what happened? That's a huge question. I think this is the biggest betting market inefficiency I've ever seen. It's not like the 1969 Mets, who probably *did* just get lucky ... or the 1980 "Miracle on Ice" team, who, when you watch the game, you can tell were very much inferior but very much lucky. This is a case where a legitimately very good team got evaluated as very bad -- by *everyone*, even the best sabermetric soccer types and bettors. 

Of course, Leicester City isn't actually the most talented team in the Premier League -- at least, not according to betting markets. For next season, the Foxes' 25:1 odds rank only seventh -- the favorite, Manchester City, is at 3:2. Ignoring vigorish, the oddsmakers think Man City has ten times the chance than Leicester does.

Assuming that Leicester should have been that same 25:1 this season instead of 5000:1 ... well, that still seems like an exploitable opportunity that, in theory, never should have happened in an efficient betting market.

So, what did happen? How did everyone miss that Leicester City was a good, if not great, team? That's not a question about oddsmaking. It's a question about soccer. What happened? What made Leicester so good, that nobody had seen coming? How did they build such a successful team on their very low budget? Is there a "Moneyball" secret?

I've seen some articles that broke down some stats, about passing and shots on goal and such. But this is a situation where we need more than that. It's like, if an expansion MLB team goes 105-57 with castoff players, it doesn't help much to say, "well, they won because they had a high OPS and their pitchers struck out a lot of guys."  The question is -- how did they get those replacement-level players to have that high OPS and strike out a lot of guys? Was it something they saw in those players? Was it coaching? Was it sign stealing? 


OK, after wondering about all this, I figured, hey, the internet exists, maybe I should do some research. (Which consisted of Googling, and getting advice from my friends.)

And, yes, there seems to be an explanation. Apparently there is indeed a bit of a Moneyball story here. My friend John steered me to an article by Leicester City fan John Micklethwait (who this year neglected to make his annual 20 pound bet on Leicester, and missed out on what would have been a 100,000 pound win). 

Leicester City made three major acquisitions in the off-season, all with "Moneyball" overtones of underappreciated players. First, N’Golo Kanté, who was number one in France last year in the (apparently overlooked) statistic of interceptions made. Second, Jamie Vardy, who is known for speed (which Leicester used to strategic advantage, as we will see). And, third, Riyad Mahrez, who is known for "a rare ability to dribble past people."*

(* Correction: two of the three were actually signed by Leicester in earlier seasons. See note at end of post.)

They got those guys really cheap.  

Then, they analyzed video, and came up with this:

"Leicester players even seem to foul scientifically, slowing down their opponents by taking turns to obstruct them, so that few of the Leicester players get booked or sent off."

And, finally, the most interesting strategic twist: the rapid counterattack.

"This in itself is another innovation. All teams have always counterattacked, but few have based their game so completely around it. In most matches, the team that keeps control of the ball more scores more goals. Teams like Barcelona and Arsenal are famous for never letting their opponents touch it. Not Leicester. Last weekend, Swansea had possession 62 percent of the time, but they still lost 4-0. Leicester’s tactic is to let their opponents have the ball, wait until they make a mistake and then attack at remarkable speed: Hence all those quick players and the unusual disciplined approach."

Well, I love that theory! Because, it's what I argued the Toronto Maple Leafs might have been doing a couple of years ago, when they made the playoffs despite possession stats near the bottom of the league. 

Not only do I like that theory the best, but I also believe it's the most plausible. Because, when the Foxes bought those three players, it was public knowledge. It was no secret that Kanté is a great tackler, and Vardy is crazy fast, and Mahrez has dribbling skills (videos are easily found on Google). So, the odds should have accurately reflected those acquisitions.

On the other hand, the counterattacking strategy? Probably, nobody knew manager Claudio Ranieri was going to try that tactic until the season was underway. Even then, it would take a while before it became apparent how well it worked. So, that *could* explain why even the most knowledgeable soccer experts didn't see it coming at all.

For what it's worth, here's an article about Leicester's counterattacking strategy, with accompanying video of some of their quick transition goals. And, something my friend Bob wrote me, from observation:

"They used the counterattack as their primary mode of offense. They had several wins early in the year with possession below 30%. Manager Claudio Ranieri would frequently position one or two players close to midfield on opponents' corner kicks and free kicks in order to better exploit his team's speed advantage. As the season progressed, teams adapted to this and Leicester's possession totals increased. Leicester adapted by tightening up its defense, winning a string of 1-0 games (4 out of 5 in one stretch)."


From all this, here's my wild-ass bottom line, the working hypothesis that my imperfect Bayesian brain is pulling out of its metaphorical butt:

1. Leicester City improved over the off-season by pulling a Moneyball, by acquiring underappreciated players at bargain prices.

2. They implemented a novel strategy emphasizing speedy counterattacking, and it worked, but became less effective as the opposition recognized it and learned to adapt to it.

3. They did play unexpectedly well, in the traditional sense, apart from the strategy.

4. That unexpectedly good play might have been playing over their heads. They were significantly luckier than their talent, judging by the odds during and after the season.**

(**Any championship team is, in retrospect, likely to have played better than its talent, but I'm arguing in this case for even more luck than for a usual champion.)

It's kind of a vague hypothesis, I know, a little bit of everything. But my best guess is ... it *was* a little bit of everything. Because: (a) can you really turn a bad team into a champion with just three players? (b) the odds insist Leicester was luckier than its talent; and (c) even if you discount the opinions of observers, the stats show the Leicester did repeatedly win with very low possession time.


So, what does that mean for next year?

The improvement in players (#1) will remain, if the club doesn't sell them off. And, of course, we know luck doesn't persist (#3 and #4). 

That leaves #2, the counterattack. Will the strategy continue to work, or will the opposition adapt to it enough that the advantage will dwindle? Normally, I'd just check the betting market, but I'm not sure what the odds are telling us. As mentioned, Leicester is only the seventh favorite to win next year, at 25:1. That's much smaller than 5000:1, for sure. But how much of the difference is from the skill of their new players, and how much is from an expectation that a less traditionally-skilled team can still win by implementing a disruptive counterattack strategy?


In any case, I think the big story here isn't that Leicester City beat 5000:1 odds. I think the big story here is how Leicester City found a "Moneyball" way to beat the system on a low budget. 

I'd argue that this is the "real" Moneyball story, the one we were theoeretically waiting for. 

The original story, about the 2002 Oakland A's, isn't that impressive to me. Sure, the 2002 Oakland A's won 103 games on a low budget, but we kind of know how they did it. Yes, they used sabermetrics, but those gains were marginal. Most of their advantage was having a supply of excellent, pre-free-agent players who came cheap, as well as a large dose of luck. (I don't have public luck estimates handy for 2002, but I once figured the A's were lucky by 12 wins that year. Seven of those were from beating their Pythagorean Projection.)

This Leicester City story is different. This is a legitimately bad team, picking up three "free agents" who were legitimately undervalued and overlooked, and then implementing a system that effectively overcame the skill advantage of some of the best and most expensive football talent on the planet.

Even if only half of Leicester City's improvement was Moneyball, and the other half was luck ... well, even then, the Foxes of Leicester City created millions of pounds worth of wins out of basically nothing. 

Could that really be what happened? 

UPDATE: James Yorke on Twitter has pointed out that two of the three players I mentioned have been with Leicester more than one season. Jamie Vardy was actually signed in 2012, and Mahrez in 2014. Only Kanté was new for 2015-16.

Mr. Yorke also points out that other, lesser players were signed over the past few years, too.

So, Kanté was the main signing in the most recent off-season.  This suggests that most of Leicester's success is #2-#4, with less of it being #1 (being mostly Kanté).

Labels: , , , , , ,

Thursday, April 21, 2016

Noll-Scully doesn't measure anything real

The most-used measure of competitive balance in sports is the "Noll-Scully" measure. To calculate it, you figure the standard deviation (SD) of the winning percentage of all the teams in the league. Then, you divide by what the SD would be if all teams were of equal talent, and the results were all due to luck.

The bigger the number, the less parity in the league.

For a typical, recent baseball season, you'll find the SD of team winning percentage is around .068 (that's 11 wins out of 162 games). By the binomial approximation to normal, the SD due to luck is .039 (6.4 out of 162). So, the Noll-Scully measure works out to .068/.039, which is around 1.74.

In other words: the spread of team winning percentage in baseball is 1.74 times as high as if every team were equal in talent.


Both "The Wages of Wins" (TWOW), and a paper on competitive balance I just read recently (which I hope to post about soon), independently use Noll-Scully to compare different sports. Well, not just them, but a lot of academic research on the subject.

The Wages of Wins (page 70 of the second edition) runs this chart:

2.84 NBA
1.81 AL (MLB)
1.67 NL (MLB)
1.71 NHL
1.48 NFL

The authors follow up by speculating on why the NBA's figure is so high, why the league is so unbalanced. They discuss their "short supply of tall people" hypothesis, as well as other issues.

But one thing they don't talk about is the length of the season. In fact, their book (and almost every other academic paper I've seen on the subject) claims that Noll-Scully controls for season length. 

Their logic goes something like this: (a) The Noll-Scully measure is actually a multiple of the theoretical SD of luck. (b) That theoretical SD *does* depend on season length. (c) Therefore, you're comparing the league to what it would be with the same season length, which means you're controlling for it.

But ... that's not right. Yes, dividing by the theoretical SD *does* control for season length, but not completely.


Let's go back to the MLB case. We had

.068 observed SD
.039 theoretical luck SD
1.74 Noll-Scully ratio

Using the fact that SDs follow a pythagorean relationship, it follows that

observed SD squared = theoretical luck SD squared + talent SD squared


.068 squared = .039 luck squared + talent squared

Solving, we find that the SD of talent = .056. Let's write that this way:

.039 theoretical luck SD
.056 talent SD
.068 observed SD
1.74 Noll-Scully (.068 divided by .039)

Now, a hypothetical. Suppose MLB had decided to play a season four times as long: 648 games instead of 162. If that happened, the theoretical luck SD would drop in half (we'd divide by the square root of 4). So, the luck SD would be .020. 

The talent SD would remain constant at .056. The new observed SD would be the square root of (.020 squared plus .056 squared), which works out to .059:

.020 theoretical luck SD
.056 talent SD
.059 observed SD
2.95 Noll-Scully (.059 divided by .020)

Under this scenario, the Noll-Scully increases from 1.74 to 2.95. But nothing has changed about the game of baseball, or the short supply of home run hitters, or the relative stinginess of owners, or the populations of the cities where the teams play. All that changed was the season length.


My only point here, for now, is that Noll-Scully does NOT properly control for season length. Any discussion of why one sport has a higher Noll-Scully than another *must* include a consideration of the length of the season. Generally, the longer the season, the higher the Noll-Scully. (Try a Noll-Scully calculation early in the season, like today, and you'll get a very low number. That's because after only 15 games, luck is huge, so talent is small compared to luck.)

It's not like there's no alternative. We just showed one! Instead of Noll-Scully, why not just calculate the "talent SD" as above? That estimate *is* constant for season length, and it's still a measure of what academic authors are looking for. 

Tango did this in a famous post in 2006. He got

.060 MLB
.058 NHL
.134 NBA

If you repeat Tango's logic for different season lengths, you'll get the same numbers.  Well, you'll get different results because of random variation ... but they should average somewhere close to those figures.


Now, you could argue ... well, sometimes you *do* want to control for season length. Perhaps one of the reasons the best teams dominate the standings is because NBA management wanted it that way ... so they chose a longer, 82-game season, in order to create some space between the Warriors and the other teams. Furthermore, maybe the NFL deliberately chose 16 games partly to give the weaker teams a chance.

Sure, that's fine. But you don't want to use Noll-Scully there either, because Noll-Scully still *partially* adjusts for season length, by using "luck multiple" as its unit. Either you want to consider season length, or you don't, right? Why would you only *partially* want to adjust for season length? And why that particular part?

If you want to consider season length, just use the actual SD of the standings. If you don't, then use the estimated SD of talent, from the pythagorean calculation. 

Either way, Noll-Scully doesn't measure anything anybody really wants.

Labels: , , , , , , ,

Tuesday, March 22, 2016

Charlie Pavitt: On some implications of the Adam LaRoche situation

Charlie Pavitt occasionally guest posts here ... been a while, but he's back! Here's Charlie's thoughts on Adam Laroche and education.  


Let me begin this essay by stating that if it is true that the White Sox promised Adam LaRoche full clubhouse access for his son Drake, then they are wrong to reverse themselves on that promise now; and if LaRoche only agreed to sign the contract because of the promise, he has good reason to feel betrayed.

But this essay is not about this specific issue. It is about something more general.  My understanding is that LaRoche made a public statement that included something like the following: School is not so important, you can learn more about life in the baseball clubhouse than in school. Even if I am wrong about what LaRoche did or did not say, the question deserves consideration, so I want to discuss my thinking about this general issue and not about this specific case.

You most certainly can learn something about life in the baseball clubhouse. Here are three important things that you can learn: First, that a group of men from drastically different backgrounds (different races/ethnicities/social classes/religions/etc.) can work together in harmony in pursuit of a shared goal. Second, that a group of men from drastically different backgrounds can forge close friendships. Third, that success in one’s pursuits requires what I call the three D’s: desire, dedication, discipline. These are all extremely valuable lessons, and I suppose there are some other things that you can learn in the baseball clubhouse that I cannot think of right now.
But there are many things that are also important in life that you cannot learn in the baseball clubhouse.
First, you cannot learn how to interact in a mature enough manner with women to work in harmony with them in pursuit of a shared goal and to forge close friendships. I have no idea about now, but certainly in the “old days” a baseball clubhouse was anything but a good training ground for learning how to treat women as potential co-workers or non-sexual friends.
Second, you cannot learn how to interact in a mature enough manner with people with a different sexual orientation than you to work in harmony with them in pursuit of a shared goal and to forge close friendships. In this case, it is pretty clear that instances of disparaging treatment are still occurring, although happily the response to these instances has been to demand that the perpetrators grow up and act like adults. I would like to add that it is likely that there are quite a few gay major league baseball players right now who do not feel comfortable enough with how their teammates would react to come out of the closet, given how few past players have felt comfortable enough to do so even after retirement.
Third, you cannot learn how to interact in a mature enough manner with people with disabilities, be they of sight or hearing or physical limitations or psychiatric problems or low “intelligence,” whatever that is, to work in harmony with them in pursuit of a shared goal and to forge close friendships. I will say that baseball players have as a group been sensitive to and supportive of people such as these and also to other players with analogous issues (e.g., alcoholism), which is laudable.  But the issue here is whether one can learn this sensitivity in a baseball clubhouse as well as you can in school.
Fourth, you cannot learn how to interact in a mature enough manner with straight men with no disabilities, but a different temperament than one usually finds among baseball players, to work in harmony with them in pursuit of a shared goal and to forge close friendships.  I am thinking of men who think like artists or musicians or writers (or academics like me), who are basically non-competitive and wish to collaborate with everyone and not only one’s teammates. I am also thinking of men who have made financial sacrifices to dedicate their lives to the betterment of others; those who work in non-profits for pitiful wages, school teachers spending part of their much-lower-than-deserved income on their classroom and students because their school is so badly underfunded. I would not expect active disparagement of such people in the clubhouse, in fact if anything baseball players probably respect the hard work and achievements of such people. But I doubt they learned that attitude in the baseball clubhouse, and in any case the issue is once again whether the ability to understand those mindsets well enough to work alongside and becoming truly friendly with such people is better learned in a baseball clubhouse than in a school.
Fifth, you can learn the sorts of things that can make you a responsible public citizen and contribute to your community and to your nation; for example, someone who votes based on well-thought-out values and relevant knowledge rather than emotion and hearsay.  I am not saying that baseball players as a group do not have adequate public citizen skills, just like the general citizenry my guess is that some do and some don’t, but that you can learn those skills in school far better than in a baseball clubhouse. 
These are the sorts of things that you can learn in a school that you cannot learn in a baseball clubhouse. And, I might add, you can also learn in school about the three important things I listed above that you can learn in a clubhouse; in fact, you can learn them better. You can learn to pursue a goal with or become friends with people from drastically different backgrounds, and you can learn about the importance of the three D’s, and you can learn them better in a school because you are actually participating and not just observing others, as would be generally the case in the clubhouse. Note that unlike the clubhouse, you are learning these skills while interacting with people your own age rather than 10/20 years older. Research has conclusively shown that, while basic language and communication skills are originally learned from intense interaction with one’s immediate family, they are practiced and perfected through intense interaction with one’s age peers.
Now let me add one other thing that you can learn in a school that you cannot learn in a baseball clubhouse. You can learn a marketable skill other than playing baseball.  Again, I want to consider the general issue and stay away from the specific LaRoche situation; my understanding is that Drake is being home-schooled and that Adam LaRoche does many things other than play baseball and I trust that Drake is participating and learning from them. If a player thinks that the baseball clubhouse is a more educational environment for a son than a school, the player is thinking as if there was no question that the son was also going to spend their lives in the baseball clubhouse.  ut what if the son’s interests and temperament are more in line with the examples I mentioned above: the musician/artist/writer/academic, the person who values other’s betterment over personal wealth?  Or even if the son has the desire and temperament to play baseball, but not the skill?  In particular, the latter type of son is woefully unprepared for work, not just for the nuts and bolts of the job but also for working in tandem with women, gay men, and those of different temperament. Or does the player think that, in that instance, it is fine for the son to live his life off the fortune the player is making?

Now, I understand that Drake LaRoche is being home-schooled, and for all I know he is learning about the things I am concerned about, and whether or not this is true in this instance is beside the general point. And I am not saying that boys should never spend any time in a baseball clubhouse, even if it means missing a little school. In conclusion, I am saying the following:

A baseball team that promises a player that the player’s son can totally share the baseball player life at the detriment of schooling is performing a great disservice to the son.

A baseball player who wants his son to totally share the baseball player life at the detriment of schooling is performing a great disservice to his son.

A baseball team that realizes the detriment to the son that the promise has caused and reverses itself on the promise certainly deserves censure for breaking a promise, but is performing a great service to the son.

And a baseball player in the latter situation who retires because of it and, in so doing, insures educational experiences for the son beyond the clubhouse is performing a great service to the son. 

-- Charlie Pavitt

Labels: , ,

Thursday, January 07, 2016

When log5 does and doesn't work

Team A, with an overall winning percentage talent of .600, plays against a weaker Team B with an overall winning percentage of .450. What's the probability that team A wins? 

In the 1980s, Bill James created the "log5" method to answer that question. The formula is

P = (A - AB)/(A+B-2AB)

... where A is the talent level of team A winning (in this case, .600), and B is the talent level of team B (.450).

Plug in the numbers, and you get that team A has a .647 chance of winning against team B. 

That makes sense: A is .600 against average teams. Since opponent B is worse than average, A should be better than .600. 

Team B is .050 worse than average, so you'd kind of expect A to "inherit" those 50 points, to bring it to .650. And it does, almost. The final number is .647 instead of .650. The difference is because of diminishing returns -- those ".050 lost wins" are what B loses to *average* teams because it's bad. Because A is better than average, it would have got some of those .050 wins anyway because it's good, so B can't "lose them again" no matter how bad it is.

In baseball, the log5 formula has been proven to work very well.


There was some discussion of log5 lately on Tango's site (unrelated to this post, but very worthwhile), and that got me thinking. Specifically, it got me thinking: log5 CANNOT be right. It can be *almost* right, but it can never be *exactly* right.

In the baseball context, it can be very, very close, indistinguishable from perfect. But in other sports, or other contexts, it could be way wrong. 

Here's one example where it doesn't work at all.

Suppose that, instead of actually playing baseball games, teams just measured their players' average height, and the taller team wins. And, suppose there are 11 teams in the league, and there's a balanced 100-game season.

What happens? Well, the tallest team beats everyone, and goes 100-0. The second-tallest team beats everyone except the tallest, and winds up 90-10. The third-tallest goes 80-20. And so on, all the way down to 0-100.

Now: when a .600 team plays a .400 team, what happens? The log5 formula says it should win 69.2 percent of those games. But, of course, that's not right -- it will win 100 percent of those games, because it's always taller.

For height, the log5 method fails utterly.


What's the difference between real baseball and "height baseball" that makes log5 work in one case but not the other?

I'm not 100% sure of this, but I think it's due to a hidden, unspoken assumption in the log5 method. 

When we say, "Team A is a .600 talent," what does that mean? It could mean either of two things:

-- A1. Team A is expected to beat 60 percent of the opponents it plays.

-- A2. If Team A plays an average team, it is expected to win 60 percent of the time.

Those are not the same! And, for the log5 method to work, assumption A1 is irrelevent. It's assumption A2 that, crucially, must be true. 

In both real baseball and "height baseball," A1 is true. But that doesn't matter. What matters is A2. 

In real baseball, A2 is close enough. So log5 works.

In "height baseball," A2 is absolutely false. If Team A (.600) plays an average team (.500), it will win 100 percent of the time, not 60 percent! And that's why log5 doesn't work there.


What it's really coming down to is our old friend, the question of talent vs. luck. In real baseball, for a single game, luck dwarfs talent. In "height baseball," there's no luck at all -- the winner is just the team with the most talent (height). 

Here are two possible reasons a sports team might have a .600 record:

-- B1: Team C is more talented than exactly 60 percent of its opponents

-- B2: Team C is more talented than average, by some unknown amount (which varies by sport) that leads to it winning exactly 60 percent of its games.

Again, these are not the same. And, in real life, all sports (except "height baseball") are some combination of the two. 

B1 refers completely to talent, but B2 refers mostly to luck. The more luck there is, in relation to talent, the better log5 works.

Baseball has a pretty high ratio of luck to talent -- on any given day, the worst team in baseball can beat the best team in baseball, and nobody bats an eye. But in the NBA, there's much less randomness -- if Philadelphia beats Golden State, it's a shocking upset. 

So, my prediction is: the less that luck is a factor in an outcome, the more log5 will underestimate the better team's chance of winning.

Specifically, I would predict: log5 should work better for MLB games than for NBA games.


Maybe someone wants to do some heavy thinking and figure how to move this forward mathematically.  For now, here's how I started thinking about it.

In MLB, the SD of team talent seems to be about 9 games per season. That's 90 runs. Actually, it's less, because you have to regress to the mean. Let's call it 81 runs, or half a run per game. (I'm too lazy to actually calculate it.) Combining the team and opponent, multiply by the square root of two, to give an SD of around 0.7 runs.

The SD of luck, in a single game, is much higher. I think that if you computed the SD of a single team's 162 runs-scored-that-game, you'd get around 3. The SD of runs allowed is also around 3, so the SD of the difference would be around 4.2.

SD(MLB talent) = 0.7 runs
SD(MLB luck)   = 4.2 runs

Now, let's do the NBA. From, the SD of the SRS rating seems to be just under 5 points. That's based on outcomes, so it's too high to be an estimate of talent, and we need to regress to the mean. Let's arbitrarily reduce it to 4 points. Combining the two teams, we're up to 5.2 points.

What about the SD of luck? This site shows that, against the spread, the SD of score differential is around 11 points. So we have

SD(NBA talent) =  5.2 points
SD(NBA luck)   = 11.0 points

In an MLB game, luck is 6 times as important as talent. In an NBA game, luck is only 2 times as important as talent. 

But, how you apply that to fix log5, I haven't figured out yet. 

What I *do* think I know is that the MLB ratio of 6:1 is large enough that you don't notice that log5 is off. (I know that from studies that have tested it and found it works almost perfectly.) But I don't actually know whether the NBA ratio of 2:1 is also large enough. My gut says it's not -- I suspect that, for the NBA, in extreme cases, log5 will overestimate the underdog enough so that you'll notice. 


Anyway, let me summarize what I speculate is true:

1. The log5 formula never works perfectly. Only as the luck/talent ratio goes to infinity, will log5 be theoretically perfect. (But, then, the predictions will always be .500 anyway.)  In all other cases, log5 will underestimate, to some extent, how much the better team will dominate.

2. For practical purposes, log5 works well when luck is large compared to talent. The 6:1 ratio for a given MLB game seems to be large enough for log5 to give good results.

3. When comparing sports, the more likely it is that the more-talented team beats the less-talented team, the worse log5 will perform. In other words: the bigger the Vegas odds on underdogs, the worse log5 will perform for that sport.

4. You can also estimate how well log5 will perform with a simple test. Take a team near the extremes of the performance scale (say, a .600/.400 team in MLB, or a .750/.250 team in the NBA), and see how it performed specifically against only those teams with talent close to .500.

If a .750 team has a .750 record against teams known to be average, log5 will work great. But if it plays .770 or .800 or .900 ball against teams known to be average, log5 will not work well. 


All this has been mostly just thinking out loud. I could easily be wrong.

Labels: , , , , ,