Tuesday, December 18, 2018

Does the NHL's "loser point" help weaker teams?

Back when I calculated that it took 73 NHL games for skill to catch up with luck in the standings, I was surprised it was so high. That's almost a whole season. In MLB, it was less than half a season, and in the NBA, Tango found it was only 14 games, less than one-fifth of the full schedule.

Seventy-three games seemed like that was a lot of luck. Why so much? As it turns out, it was an anomaly -- the NHL was just having an era where differences in team talent were small. Now, it's back under 40 games.

But I didn't know that at the time, so I had a different explanation: it must be the extra point the NHL started giving out for overtime losses. The "loser point," I reasoned, was reducing the importance of team talent, by giving the worse teams more of a chance to catch up to the better teams.

My line of thinking was something like this: 

1. Loser points go disproportionately to worse teams. For team-seasons, there's a correlation of around .4 between negative goal differential (a proxy for team quality) and OTL. So, the loser point helps the worse teams gain ground on the better teams.

2. Adding loser points adds more randomness. When you lose by one goal, whether that goal comes early in the game, or after the third period, is largely a matter of random chance. That adds "when the goals were" luck to the "how many goals there were" luck, which should help mix up the standings more. In fact, as I write this, the Los Angeles Kings have two more wins and three fewer losses than the Chicago Blackhawks. But, because Chicago has five OTL to the Kings' one, they're actually tied in the standings.

But ... now I realize that argument is wrong. And, the conclusion is wrong. It turns out the loser point actually does NOT help competitive balance in the NHL. 

So, what's the flaw in my old argument? 


I think the answer is: the loser point does affect how compressed the standings get in terms of actual points, but it doesn't have much effect on the *order* of teams. The bottom teams wind up still at the bottom, but (for instance) instead of having only half as many points as the top teams, they have two-thirds as many points.

Here's one way to see that. 

Suppose there's no loser point, so the winner always gets two points and the loser always gets none (even if it was an overtime or shootout loss). 

Now, make a change so the losing team gets a point, but *every time*. In that case, the difference between any two teams gets cut in half, in terms of points -- but the order of teams stays exactly the same. 

The old way, if you won W games, your point total was 2W. Now, it's W+82. Either way, the order of standings stays the same -- it's just that the differences between teams are cut in half, numerically.

It's still true that the "loser point" goes disproportionately to the worse teams -- the 50-32 team gets only 32 loser points, while the 32-50 team gets 50 of them. But that doesn't matter, because those points are never enough to catch up to any other team. 

If you ran the luck vs. skill numbers for the new system compared to the old system, it would work out exactly the same.


In real life, of course, the losing team doesn't get a point every time: only when it loses in overtime. Last season, that happened in about 11.6 percent of games, league-wide, or about 23.3 percent of losses.

If the loser point happened in *exactly* 23.3 percent of losses, for every team, with no variation, the situation would be the same as before -- the standings would get compressed, but the order wouldn't change. It would be as if, every loss, the loser got an extra 0.233 points. No team could pass any other team, since for every two points it was behind, it only gets 0.233 points to catch up. 

But: what if you assume that it's completely random which losses become overtime losses?  Now, the order can change. A 40-42 team can catch up to a 41-41 team if its losses had randomly included two more overtime losses than its rival. The chance of that happening is helped by the fact that the 40-42 team has one extra loss to try to randomly convert. It needs two random points to catch up, but it starts with a positive expectation of an 0.233 point head start.

If losses became overtime losses in a random way, then, yes, the OTL would make luck more important, and my argument would be correct. But they don't. It turns out that better teams turn losses into OTL much more frequently than worse teams, on a loss-for-loss basis.

Which makes sense. Worse teams' losses are more likely to be blowouts, which means they're less likely to be close losses. That means fewer one-goal losses, proportionately. 

In other words: 

(a) bad teams have more losses, but 
(b) those losses are less likely to result in an OTL. 

Those two forces work in opposite directions. Which is stronger?

Let's run the numbers from last year to find out.

If we just gave two points for a win, and zero for a loss, we'd have: 

SD(luck)    = 9.06
SD(talent)  =13.76

But in real life, which includes the OTL, the numbers are

SD(luck)    = 8.48
SD(talent)  =12.90

Converting so we can compare luck to talent:

35.5 games until talent=luck (no OTL point)
35.4 games until talent=luck (with OTL point)

It turns out, the two factors almost exactly cancel out! Bad teams have more chances for an OTL point because they lose more -- but those losses are less likely to be OTL almost in exact proportion.

And that's why I was wrong -- why the OTL point doesn't increase competitive balance, or make the standings less predictable. It just makes the NHL *look* more competitive, by making the point differences smaller.

Labels: , , ,

Wednesday, December 12, 2018

2007-12 was an era of competitive balance in the NHL

Five years ago, I calculated that in the NHL, it took 73 games until talent was as important as luck in determining the standings. But in a previous study, Tango found that it took only 36 games. 

Why the difference?

I think it's because the years for which I ran the study -- 2006-07 to 2011-12 -- were seasons in which the NHL was much more balanced than usual. 

For each of those six seasons, I went to hockey-reference to find the SD of team standings points:

2006-07  16.14
2007-08  10.43
2008-09  13.82
2009-10  12.95
2010-11  13.27
2011-12  11.73
average  13.18  (root mean square)

Tango's study was written in August, 2006. The previous season had a higher spread:

2005-06  16.52

So, I think that's the answer. It just happened that the seasons I looked at had less competitive balance that the season or seasons Tango looked at.

But what's the right answer for today's NHL? Well, it looks like the standings spread in recent seasons has moved back closer to Tango's numbers:

2013-14  14.26
2014-15  15.91
2015-16  12.86
2016-17  15.14
2017-18  15.44
average  14.76

What does that mean for the "number of games" estimate? I'll do the calculation for last season, 2017-18.

From the chart, SD(observed) is 15.44 points. SD(luck) is roughly the same for all years of the shootout era (although it varies very slightly with the number of overtime losses), so I'll use the old study's number of 8.44 points. 

As usual, 

SD(talent)^2 = SD(observed)^2 - SD(luck)^2
SD(talent)^2 = 15.44^2 - 8.44^2
SD(talent)   = 12.93

So last year, SD(talent) was 12.93. For the six seasons I looked at, it was 8.95. 

2016-12   8.95
2017-18  12.93

Now, let's convert to games.*  

*Specifically, "luck as important as talent" means SD(luck)=SD(talent). Formula: using the numbers for a full season, divide SD(luck) by SD(talent), square it, and multiply by the number of games (82).

When SD(talent) is 8.95, like the seasons I looked at, it takes 77 games for luck and talent to even out. When SD(talent) = 12.93, like it was last year, it takes only ... 36 games.

Coincidentally, 36 games is exactly what Tango found in his own sample.

talent=luck, after
2016-12  77 games
2017-18  36 games

Two things we can conclude from this:

1. Actual competitive balance (in terms of talent) does seem to change over time in non-random ways. The NHL from 2006-12 does actually seem to have been a more competitive league than from 2013-18. 

2. The "number of games" way of expressing the luck/talent balance is very sensitive to moderate changes in the observed range of the standings.


To expand a bit on #2 ... 

There must be significant random fluctuations in observed league balance.  We mention that sometimes in passing, but I think we don't fully appreciate how big those random fluctuations can be.

Here, again, is the SD(observed) for the seasons 2014-17:

2014-15  15.91
2015-16  12.86
2016-17  15.14

It seems unlikely that 2015-16 really had that much tighter a talent distribution than the surrounding seasons. What probably happened, in 2015-16, was just a fluke -- the lucky teams happened to be lower-talent, and the unlucky teams happened to be higher-talent. 

In other words, the difference was probably mostly luck. 

A different kind of luck, though -- luck in how each individual team's "regular" luck correlated, league-wide, with their talent. When the better teams (in talent) are luckier than the worse teams , the standings spread goes up. When the worse teams are luckier, the standings get compressed.

Anyway ... the drop in the chart from from 15.91 to 12.86 doesn't seem that big. But it winds up looking bigger once you subtract out luck to get to talent:

2014-15  13.49
2015-16   9.70
2016-17  12.57

The difference is more pronounced now. But, check out what happens when we convert to how many games it takes for luck and talent to even out:

Talent=luck, after
2014-15  32 games
2015-16  62 games
2016-17  37 games

Now, the differences are too large to ignore. From 2014-15 to 2015-16, SD(observed) went down only 19 percent, but the "number of games" figure nearly doubled.

And that's what I mean by #2 -- the "number of games" estimate is very sensitive to what seem like mild changes in standings variation. 


Just for fun, let's compare 2006-07, one of the most unbalanced seasons, to 2007-08, one of the most balanced. Just looking at the standings, there's already a big difference:

2006-07  16.14
2007-08  10.43

But it becomes *huge* when when you express it in games: 

Talent=luck, after
2006-07   31 games
2007-08  156 games

In one year, our best estimate of how many games it takes for talent to exceed luck changed by a factor of *five times*. And, I think, almost all that difference is itself just random luck.

Labels: , , ,

Monday, December 03, 2018

Answer to: a flawed argument that marginal offense and defense have equal value

The puzzle from last post was this:  What's wrong with this argument, that a run scored has to be worth exactly the same as a run prevented?

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

The answer is hinted at by a comment from Matthew Hunt:

"Is it the zero lower bound for runs? You can always increase the number of offensive runs, but you can't hold an opponent to -1 runs."

It's not specifically the zero lower bound -- the argument is wrong even if shutouts are rare -- but it does have to do with the issue of runs prevented.


(Note: for this post, I'm going to treat runs as if they have a Poisson distribution, to make the argument smoother. In reality, runs in baseball come in bunches, and aren't Poisson at all. If that bothers you, just transfer the argument to hockey or soccer, where goals are much closer to Poisson.)


The answer, I think, is this:  If you want to properly remove a single opponent's run from a season, you don't do it by choosing a random game. You have do it by choosing a random *run*.

When you *add* runs, it's OK to do it by choosing a game first, because all games have roughly equal opportunities to score more runs. But when you *remove* runs, you have to remove a run that's already there ... and you have to weight them all equally when deciding which one to remove.

If you don't weight them the runs equally ... well, suppose you have game A with ten runs, and game B with two runs. If you choose a random game first, each B run has five times the chance of being chosen as each of the A runs. 

Here's another way of looking at it. Suppose you randomly allocate 700 runs among 162 games, and then you realize you made a mistake, you only meant to allocate 699 runs. You'd look up the 700th run you added, and reverse it. 

But, that 700th run is more likely to come from a high-scoring game than a low-scoring game. Why? Because, before you added the last run, the game you were about to add it to was as average as the 161 other games. But after you add the run, that game must now be expected to be one run more than average. (Actually, 699/700 more, but close enough).

So, if you removed a 700th run by choosing a random game first, you'd be choosing it from an expected average game, not an expected above-average-game. And so your distribution will be more bunched up than it should be, and it would no longer be the same as the distribution would be if you just stopped at 699 runs.

And, of course, you might randomly choose a shutout, which brings that game's runs to -1, proving more obviously that your distribution is wrong.

You don't actually have to reverse the 700th run ... there's nothing special about that one compared to the other 699. You can pick the first run, or the 167th run, or a random run. But you have to choose a particular run without regard to the game it's in, or any other context.


Why does a random run have a different value from a run from a random game? 

Because the probabilities change. 

For one thing, you're now much less likely to choose a game where you only allowed one run. You probably won those games anyway, so those runs are less valuable than average. Since you choose less valuable runs less often than before, the value of the run goes up.

But, for another thing, you're now much more likely to choose a game where you gave up a lot of runs. You probably lost those games anyway, so the saved run again probably wouldn't help; you'd just lose 8-3 instead of 9-3. Since you're more likely to choose these less-valuable runs than before, the value of the run goes down.

So some runs where the value is low, you're more likely to choose. Others, you're less likely to choose. Which effect dominates? I don't think we can decide easily from this line of thinking alone. We'd have to do some number crunching.

If we did, we'd find out (as the other argument proved) that "choose a run instead of a game" makes runs prevented more valuable when you already score more than you allow, but less valuable when you allow more than you score. 

But, I don't see a way to prove that from this argument. If you do, let me know!


Finally, let me make one part of the argument clearer. Specifically, why is it OK to pick a random game when adding a run *scored*, but not when subtracting a run *allowed*? Shouldn't it be symmetrical?

Actually, it *is* symmetrical.

When you add a run, you're taking a non-run and changing it to a run. Well, there are so many occurrences of non-runs that they're roughly equal in every game. If you think about changing an out to a run, every game has roughly 27 outs, so every game is already equal.

If you think about hockey ... say, every 15-second interval has a chance of a goal. That's 240 segments per game. In a two-goal game, there are 238 non-goal segments that can be converted into a goal. In a 10-goal game, there are only 230 segments. But 230 is so closer to 238 that you can treat them as equal.*

(* In a true Poisson distribution, they're exactly equal, because you model the game as an infinite number of intervals. Infinity minus 2 is equal to infinity minus 10.)

When you subtract a run ... the process is symmetrical, but the numbers are different. A two-goal game has only two chances to convert a goal to a non-goal, while a ten-goal game has ten -- five times as many. Instead of a 230:238 ratio, you have a 2:10 ratio. The 2 and 10 aren't close enough to treat as equal.

In theory, the two cases are symmetrical in the sense that both are wrong. But, in practice, choosing goals scored by game is wrong but close enough to treat as right. Choosing goals allowed by game is NOT close enough to treat as right.

The fact that goals are rare compared to non-goals is what makes the difference. That difference is why the statistics textbooks say that Poisson is used for the distribution of "rare events."  

Goals are rare events. Non-goals are not.

Labels: ,