Sunday, May 29, 2016

Leicester City and EPL talent evaluation

Last post, I wondered how accurate bookmakers are at evaluating English Premier League (EPL) team talent. In the comments, Tom Tango suggested a method to figure that out, but one that requires knowing the bookies' actual "over/under" forecasts. I couldn't find those numbers, , but bloggers James Grayson and Simon Gleave came to my rescue with three seasons' worth of numbers, and another commenter provided a public link for 2015-16.

(BTW, Simon's blog contains an interesting excerpt from "Soccermatics," a forthcoming book on the sabermetrics of soccer. Worth checking out.)

So, armed with numbers from James and Simon, what can we figure out?

1. SD of actual team points

Here is the SD of team standings points for the previous three seasons:

2015-16: 15.4
2014-15: 16.3
2013-14: 19.2
-------------
Average  17.1

The league was quite unbalanced in 2013-14 compared to the other two seasons. That could have been because the teams were unbalanced in talent, or because the good teams got more lucky than normal and the bad teams more unlucky than normal. For this post, I'm just going to use the average of the three seasons (by taking the root mean square of the three numbers), which is 17.1.

2. Theoretical SD of team luck

I ran a simulation to figure the SD of points just due to luck -- in other words, if you ran a random season over and over, for the same team, how much would it vary in standings points even if its talent were constant? I would up with a figure around 8 points. It depends on the quality of the team, but 8 is pretty close in almost all cases.

3. Theoretical SD of team talent

By the pythagorean relationship between talent and luck, we get

SD(observed) = 17.1
SD(luck)     =  8.0
-------------------
SD(talent)   = 15.1

4. SD of bookmaker predictions

If teams vary in talent with an SD about 15 points, then, if the bookmakers were able to evaluate talent perfectly, their estimates would also have an SD of 15 points. But, of course, nobody is able to evaluate that well. For one thing, talent depends on injuries, which haven't happened yet. For another thing, talent changes over time, as players get older and better or older and worse. And, of course, talent depends on the strategies chosen by the manager, and by the players on the pitch in real time.

So, we'd expect the bookies' predictions to have a narrower spread than 15 points. They don't:

16.99 -- 2015-16 Pinnacle (opening)
14.83 -- 2015-16 Pinnacle (closing)
16.17 -- 2015-16 Sporting Index (opening)
17.30 -- 2014-15 Pinnacle (opening)
17.37 -- 2014-15 Pinnacle (closing)
16.95 -- 2014-15 Sporting Index (opening)
15.80 -- 2014-15 Sporting index (closing)
15.91 -- 2013-14 Pinnacle

Only one of the nine sets of predictions is narrower than the expectation of team talent, and even that one, barely. This surprised me. In the baseball case, the sports books projected a talent spread that was significantly more conservative than the actual spread.

Either the EPL bookmakers are overoptimistic, or the last three Premier League seasons had less luck than the expected 8.0 points.

5.  Bookmaker accuracy

If the bookmakers were perfectly accurate in their talent estimates, we'd expect their 20 estimates to wind up being off by an SD of around 8 points, because that's the amount of unpredictable performance luck in a team-season.

In 2014-15, that's roughly what happened:

7.85 -- 2014-15 Pinnacle (opening)
6.37 -- 2014-15 Pinnacle (closing)
6.90 -- 2014-15 Sporting Index (opening)
7.75 -- 2014-15 Sporting index (closing)

Actually, every one of the bookmakers' lines was more accurate than 8 points! In effect, in 2014-15, the bookmakers exceeded the bounds of human possibility -- they predicted better the speed of light. What must have happened is: in 2014-15, teams just happened to be less lucky or unlucky than usual, playing almost exactly to their talent.

But the predictions for 2015-16 were way off:

15.17 -- 2015-16 Pinnacle (opening)
14.96 -- 2015-16 Pinnacle (closing)
15.13 -- 2015-16 Sporting Index (opening)

And 2013-14 was in between:

9.77 -- 2013-14 Pinnacle

Again, I'll just go with the overall SD of the three seasons, which works out to about 11 points.

15 points -- 2015-16
7 points -- 2014-15
10 points -- 2013-14
--------------------
11 points -- average

Actually, 11 points is pretty reasonable, considering 8 is the "speed of light" best possible long-term performance.

6. Bookmaker talent inaccuracy

If 11 points is the typical error in estimating performance, what's the average error in estimating talent? That's an easy calculation, by Pythagoras:

11 points -- observed error
8 points -- luck error
----------------------------
8 points -- talent error

That 8 points for talent should really be 7.5, but I'm arbitrarily rounding it up to create a rule of thumb that "talent error = luck error".

7. Bookmaker bias

In step 4, it looked like the bookmakers were overconfident, and predicting a wider spread of talent than actually existed. In other words, it looked like they were trying to predict luck.

If they did that, it would have to mean they were overestimating the good teams, and underestimating the bad teams. That's the only way to get a wider spread.

But, in 2013-14, it was the opposite! The correlation between the bookies' prediction and the eventual error was -0.07. (The "error" includes the sign, of course. The argument isn't that the bookies are more wrong for good teams and bad teams, it's that they're more likely to be wrong in a particular direction.)

In other words, even though Pinnacle seemed to be trying to predict team luck, it worked out for them!

Which means one of these things happened:

1. Pinnacle got really lucky, and their guesses for which teams would have good luck actually worked out;

2. We're wrong in thinking Pinnacle was overconfident by that much. In other words, the spread of talent is wider than we thought it was. Remember, 2013-14, was more unbalanced than the other two seasons we looked at.

I think it's some of each. The SD(talent) estimate for 2013-14 came out to 17.5 points. In that light, Pinnacle's 15.9-point SD isn't *that* overconfident.

... In 2014-15, on the other hand, Pinnacle *did* overestimate the spread. The better the closing line on the team, the less extreme it performed, with a correlation of +0.35. Sporting Index, with their more conservative line, correlated only at +0.11.

Part of the reason the correlations are so high is because that was the year random luck balanced out so much more than usual. If teams were moving all over the place in the standings for random reasons, that would tend to hide the bookmakers' tendency to rate the teams too extreme.

... Finally, we come to 2015-16. Now, we see what looks like very strong evidence of overconfidence. For the Pinnacle closing line, the correlation between estimate and overestimate is +.46. The other bookmakers are even higher, at +.52 and +.50.

Much of that comes from two teams. First, and most obviously, Leicester City, predicted at 40.5 points but actually winding up at 81. Second, Chelsea, forecasted the best team in the league at 83 points, but finishing with only 50.

These don't really seem to fit the narrative of "the bookies know who the good and bad teams are, but just tend to overestimate their goodness and badness." But, they kind of do fit the narrative. Favorites like Chelsea are occasionally going to have bad years, so you're going to have an occasional high error. But, that error will be even higher if you overestimated them in the first place.

-------

OK, there's the seven sets of numbers we got from James' and Simon's data. What can we conclude?

Well, the question I wanted to answer was: how much are the bookmakers typically off in estimating team talent? Our answer, from #5: about 8 points.

But ... I'm not that confident. These are three weird seasons. Last  year, we have Leicester City and their 5000:1 odds. The season before, we have "better than speed of light" predictions, meaning luck cancelled out. And, two years ago, as we saw in #1, we had a lot more great and awful teams than the other two seasons, which suggests that 2013-14 might be an outlier as well.

I'd sure like to have more seasons of data, maybe a decade or so, to get firmer numbers. For now, we'll stick with 8 points as our estimate.

An eight-point standard error means that, typically, one team per season will be mispriced by 16 points or more. That's not necessarily exploitable by bettors. For one thing, bookmaker prices match public perception, so it's hard to be the one genius among millions who sees the exact one team that's mispriced. For another thing, some of what I'm calling "talent" is luck of a different kind, in terms of injuries or players learning or collapsing.

We still have the case that Leicester City was off by around 40 points. That's 5 SDs if you think it was all talent. It's also 5 SDs if you think it was all luck.

The "maximum likelihood," then, if you don't know anything about the team, would be if it were 2.5 SDs of each. The odds of that happening are about 1 in 13,000 (1 in 26,000 for each direction).

My best guess, though, is to trust the bookmakers' current odds of about 30:1 as an estimate of what Leicester City should have been. How do we translate 30:1 into expected points? As it turns out, Liverpool was 28:1, with an over/under of 66. So let's use 66 points as our true talent estimate.

Under that assumption, Leicester City beat their talent by 15 points of luck (81 minus 66), or a bit less than 2 SD. And their assumed true talent of 66 points beat the bookmakers' estimate of 40 by 26 points, which is 3.25 SD.

That seems much more plausible to me.

Becuase ... I think it's reasonable to think that luck errors are normally distributed. But I don't think we have any reason to believe that human errors, in estimating team talent, also follow a normal distribution. It seems to me that Leicester City could be a black swan, one that just confounded the normal way bettors and fans thought about performance. They may have been a Babe Ruth jumping into the league -- someone who saw you could win games by breaking the assumptions that led to the typical distribution of home runs.

So, when we see that Leicester was 3.25 SD above the public's estimate of their true talent ... I'm not willing to go with the usual probability of a 3.25 SD longshot (around 1 in 1700). I don't know what the true probability is, but given the "Moneyball" narrative and the team's unusual strategy, I'd suspect those kinds of errors are more common than the normal distribution would predict.

Even if you disagree ... well, with 20 teams, a 1 in 1700 shot comes along every 85 years. It doesn't seem too unreasonable to assume we just saw the inevitable "hundred year storm" of miscalculation.

And, either way, the on-field luck you have to assume -- 15 points -- is less than two standard deviations, which isn't that unusual at all.

So that's my best guess at how you can reasonably get Leicester City to 81 points.

At Monday, May 30, 2016 6:21:00 PM,  Anonymous said...

"I think it's reasonable to think that luck errors are normally distributed."

In the EPL scoring system, luck is not zero sum.

If a lucky team gets a win instead of a draw, it gets a two point advantage (3 instead of 1) while the unlucky team loses one point (0 instead of 1).

If a lucky team gets a draw instead of a loss it's advantaged only one point (1 instead of 0), while the unlucky team loses two points (1 instead of 3).

Would that complicate the normal distribution of luck?

At Thursday, June 16, 2016 5:47:00 AM,  Dissertation writing service said...

In essence, each writer argues that talent evaluation in basketball and football is similar. In my next two posts, I wish to address why I think talent evaluation ..... Club won the English Premier League, it wasn't just the biggest.