Tuesday, January 15, 2019

Fun with splits

This was Frank Thomas in 1993, a year in which he was American League MVP with an OPS of 1.033.

                 PA   H 2B 3B HR  BB  K   BA   OPS 
'93 F. Thomas   676 174 36  0 41 112 54 .317 1.033  

Most of Thomas's hitting splits were fairly normal:

Home/Road:              1.113/0.950
First vs. Second Half:  0.970/1.114
Vs. RHP/LHP:            1.019/1.068
Outs in inning:         1.023/1.134/0.948
Team ahead/behind/tied: 1.016/0.988/1.096
Early/mid/late innings: 1.166/0.950/0.946
Night/day:              1.071/0.939

But I found one split that was surprisingly large:

              PA   H 2B 3B HR BB  K   BA   OPS  RC/G 
Thomas 1     352 108 22  0 33 58 34 .367 1.251 14.81 
Thomas 2     309  66 14  0  8 54 20 .259 0.796  5.45 

"Thomas 1" was an order of magnitude better than "Thomas 2," to the extent that you wouldn't recognize them as the same player. 

This is a real split ... it's not a selective-sampling trick, like "team wins vs. losses," where "team wins" were retroactively more likely to have been games in which Thomas hit better. (For the record, that particular split was 1.172/.828 -- this one is wider.)

So what is this split? The answer is ... 


The first line is games on odd-numbered days of the month. The second line is even-numbered days.

In other words, this split is random.

In terms of OPS difference -- 455 points -- it's the biggest odd/even split I found for any player in any season from 1950 to 2016 with at least 251 AB PA each half. 

If we go down to a 150 AB 201 PA minimum, the biggest is Ken Phelps in 1987:

1987 Phelps   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd          204  31  3  0  8 39 33 .188 0.695  3.79 
even         208  55 10  1 19 41 42 .329 1.204 13.03 

And if we go down to 100 AB 101 PA, it's Mike Stanley, again in 1987, but on the opposite days to Phelps:

1987 Stanley  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd          134  42  6  1  6 18 23 .362 1.034 10.49 
even         113  17  2  0  0 13 25 .170 0.455  1.55 

But, from here on, I'll stick to the 251 AB PA standard.

That 1993 Frank Thomas split was also the biggest gap in home runs, with a 25 HR difference between odd and even (33 vs. 8). Here's another I found interesting -- Dmitri Young in 2001:

2001 D Young  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
Odd          285  68 12  2  2 18 40 .255 0.639  3.48 
Even         292  95 16  1 19 19 37 .348 1.013  9.51 

Only two of Young's 21 home runs came on odd-numbered days. The binomial probability of that happening randomly (19-2/2-19 or better) is about 1 in 4520.*  And, coincidentally, there were exactly 4516 players in the sample!

(* Actually, it must be more likely than 1 in 4520. The binomial probability assumes each opportunity is independent, and equally likely to occur on an even day as an odd day. But, PA tend to happen in daily clusters of 3 to 5. Since PAs are more likely to cluster, so are HR. 

To see that more easily, imagine extreme clustering, where there are only two games a year (instead of 162), with 250 PA each game. Half of all players would have either all odd PA or all even PA, and you'd see lots of extreme splits.)

For K/BB ratio, check out Derek Jeter's 2004:  

2004 Jeter   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd         362 113 27  1 15 14 63 .325 0.888  7.12 
even        327  75 17  0  8 32 36 .254 0.720  4.40 

There were bigger differences, but I found Jeter's the most interesting. 

In 1978, all 10 of Rod Carew's triples came on even-numbered days:

1978 Carew   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd         333  92 10  0  0 45 34 .319 0.766  5.46 
even        309  96 16 10  5 33 28 .348 0.950  8.69 

A 10-0 split is a 1-in-512 shot. I'd say again that it's actually a bit more likely than that because of PA clustering, but ... Carew actually had *fewer* PA in that situation! 

Oh, and Carew also hit all five of his HR on even days. Combining them into 15-0 is binomial odds of 16383 to 1, if you want to do that.

Strikeouts and walks aren't quite as impressive. It's Justin Upton 2013 for strikeouts:

2003 Upton     PA   H 2B 3B HR BB   K   BA  OPS  RC/G 
odd           330  71 14  1 16 31 102 .237 0.761 4.67 
even          303  76 13  1 11 44  59 .293 0.875 6.84 

And Mike Greenwell 1988 for walks:

88 Greenwell   PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
odd           357  91 15  3 10 62  18 .308 0.910 7.61 
even          320 101 24  5 12 25  20 .342 0.973 8.85 

Interestingly, Greenwell was actually more productive on the even-numbered days where he took less than half as many walks.

Finally, here's batting average, Grady Sizemore in 2005:

2005 Sizemore  PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
odd           344  69  9  4 12 26  79 .217 0.660 3.45 
even          348 116 28  7 10 26  53 .360 0.992 9.50 

Another anomaly -- Sizemore hit more home runs on his .217 days than on his .360 days.


Anyway, what's the point of all this? Fun, mostly. But, for me, it did give me a better idea of what kinds of splits can happen just by chance. If it's possible to have a split of 33 odd homers and 8 even homers, just by luck, then it's possible to have 33 first-half homers and 8 second-half homers, just by luck. 

Of course, you should just expect that size of effect once every 40 years or so. It might more intuitive to go from a 40-year standard to a single-season standard, to get a better idea of what we can expect each year. 

To do that, I looked at 1977 to 2016 -- 39 seasons plus 1994. Averaging the top 39 should roughly give us the average for the year. Instead of the average, I figured I'd just (unscientifically) take the 25th biggest ... that's probably going to be close to the median MLB-leading split for the year, taking into account that some years have more than one of the top 39.

For HR, the 25th ranked is Fred McGriff's 2002. It's an impressive 22/8 split:

02 McGriff   PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         297  70 11  1 22 42  47 .275 0.961  7.74 
even        289  73 16  1  8 21  52 .272 0.754  4.89 

For OPS, it's Scott Hatteberg in 2004:

04 Hatteberg PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         312  92 19  0 10 37  23 .335 0.926  8.12 
even        310  64 11  0  5 35  25 .233 0.647  3.47

For strikeouts, it's Felipe Lopez, 2005. Not that huge a deal ... only 27 K difference.

05 F. Lopez  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         316  78 15  2 12 19  69 .263 0.755  4.75 
even        321  91 19  3 11 38  42 .322 0.928  7.95 

For walks, it's Darryl Strawberry's 1987. The difference is only 23 BB, but to me it looks more impressive than the 27 strikeouts:

87 Strwb'ry  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         315  77 15  2 19 37  55 .277 0.912  7.02 
even        314  74 17  3 20 60  67 .291 1.045  9.49 

For batting average, number 25 is Orestes Infante, 2011, but I'll show you the 24th ranked, which is Rickey Henderson in his rookie card year. (Both players round to a .103 difference.)

1980 Rickey  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         340 100 13  1  2 60  21 .357 0.903  8.07 
even        368  79  9  3  7 57  33 .254 0.739  4.67 


I'm going to think of this as, every year, the league-leading random split is going to look like those. Some years it'll be higher, some lower, but these will be fairly typical.

That's the league-leading split for *each category*. There'll be a random home/road split of this magnitude (in addition to actual home/road effect). There'll be a random early/late split of this magnitude (in addition to any fatigue/weather effects). There'll be a random lefty/righty split of this magnitude (in addition to actual platoon effects). And so on.

Another way I might use this is to get an intuitive grip on how much I should trust a potentially meaningful split. For instance, if a certain player hits substantially worse in the second half of the season than in the first half, how much should you worry? To figure that out, I'd list a season's biggest even/odd splits alongside the season's biggest early/late splits. If the 20th biggest real split is as big as the 10th biggest random split, then, knowing nothing else, you can start with a guess that there's a 50 percent chance the decline is real.

Sure, you could do it mathematically, by figuring out the SD of the various stats. But that's harder to appreciate. And it's not nearly as much fun as being able to say that, in 1987, Rod Carew hit every one of his 10 triples and 5 homers on even-numbered days. Especially when anyone can go to Baseball Reference and verify it.

Labels: , , ,

Tuesday, December 18, 2018

Does the NHL's "loser point" help weaker teams?

Back when I calculated that it took 73 NHL games for skill to catch up with luck in the standings, I was surprised it was so high. That's almost a whole season. In MLB, it was less than half a season, and in the NBA, Tango found it was only 14 games, less than one-fifth of the full schedule.

Seventy-three games seemed like that was a lot of luck. Why so much? As it turns out, it was an anomaly -- the NHL was just having an era where differences in team talent were small. Now, it's back under 40 games.

But I didn't know that at the time, so I had a different explanation: it must be the extra point the NHL started giving out for overtime losses. The "loser point," I reasoned, was reducing the importance of team talent, by giving the worse teams more of a chance to catch up to the better teams.

My line of thinking was something like this: 

1. Loser points go disproportionately to worse teams. For team-seasons, there's a correlation of around .4 between negative goal differential (a proxy for team quality) and OTL. So, the loser point helps the worse teams gain ground on the better teams.

2. Adding loser points adds more randomness. When you lose by one goal, whether that goal comes early in the game, or after the third period, is largely a matter of random chance. That adds "when the goals were" luck to the "how many goals there were" luck, which should help mix up the standings more. In fact, as I write this, the Los Angeles Kings have two more wins and three fewer losses than the Chicago Blackhawks. But, because Chicago has five OTL to the Kings' one, they're actually tied in the standings.

But ... now I realize that argument is wrong. And, the conclusion is wrong. It turns out the loser point actually does NOT help competitive balance in the NHL. 

So, what's the flaw in my old argument? 


I think the answer is: the loser point does affect how compressed the standings get in terms of actual points, but it doesn't have much effect on the *order* of teams. The bottom teams wind up still at the bottom, but (for instance) instead of having only half as many points as the top teams, they have two-thirds as many points.

Here's one way to see that. 

Suppose there's no loser point, so the winner always gets two points and the loser always gets none (even if it was an overtime or shootout loss). 

Now, make a change so the losing team gets a point, but *every time*. In that case, the difference between any two teams gets cut in half, in terms of points -- but the order of teams stays exactly the same. 

The old way, if you won W games, your point total was 2W. Now, it's W+82. Either way, the order of standings stays the same -- it's just that the differences between teams are cut in half, numerically.

It's still true that the "loser point" goes disproportionately to the worse teams -- the 50-32 team gets only 32 loser points, while the 32-50 team gets 50 of them. But that doesn't matter, because those points are never enough to catch up to any other team. 

If you ran the luck vs. skill numbers for the new system compared to the old system, it would work out exactly the same.


In real life, of course, the losing team doesn't get a point every time: only when it loses in overtime. Last season, that happened in about 11.6 percent of games, league-wide, or about 23.3 percent of losses.

If the loser point happened in *exactly* 23.3 percent of losses, for every team, with no variation, the situation would be the same as before -- the standings would get compressed, but the order wouldn't change. It would be as if, every loss, the loser got an extra 0.233 points. No team could pass any other team, since for every two points it was behind, it only gets 0.233 points to catch up. 

But: what if you assume that it's completely random which losses become overtime losses?  Now, the order can change. A 40-42 team can catch up to a 41-41 team if its losses had randomly included two more overtime losses than its rival. The chance of that happening is helped by the fact that the 40-42 team has one extra loss to try to randomly convert. It needs two random points to catch up, but it starts with a positive expectation of an 0.233 point head start.

If losses became overtime losses in a random way, then, yes, the OTL would make luck more important, and my argument would be correct. But they don't. It turns out that better teams turn losses into OTL much more frequently than worse teams, on a loss-for-loss basis.

Which makes sense. Worse teams' losses are more likely to be blowouts, which means they're less likely to be close losses. That means fewer one-goal losses, proportionately. 

In other words: 

(a) bad teams have more losses, but 
(b) those losses are less likely to result in an OTL. 

Those two forces work in opposite directions. Which is stronger?

Let's run the numbers from last year to find out.

If we just gave two points for a win, and zero for a loss, we'd have: 

SD(luck)    = 9.06
SD(talent)  =13.76

But in real life, which includes the OTL, the numbers are

SD(luck)    = 8.48
SD(talent)  =12.90

Converting so we can compare luck to talent:

35.5 games until talent=luck (no OTL point)
35.4 games until talent=luck (with OTL point)

It turns out, the two factors almost exactly cancel out! Bad teams have more chances for an OTL point because they lose more -- but those losses are less likely to be OTL almost in exact proportion.

And that's why I was wrong -- why the OTL point doesn't increase competitive balance, or make the standings less predictable. It just makes the NHL *look* more competitive, by making the point differences smaller.

Labels: , , ,

Wednesday, December 12, 2018

2007-12 was an era of competitive balance in the NHL

Five years ago, I calculated that in the NHL, it took 73 games until talent was as important as luck in determining the standings. But in a previous study, Tango found that it took only 36 games. 

Why the difference?

I think it's because the years for which I ran the study -- 2006-07 to 2011-12 -- were seasons in which the NHL was much more balanced than usual. 

For each of those six seasons, I went to hockey-reference to find the SD of team standings points:

2006-07  16.14
2007-08  10.43
2008-09  13.82
2009-10  12.95
2010-11  13.27
2011-12  11.73
average  13.18  (root mean square)

Tango's study was written in August, 2006. The previous season had a higher spread:

2005-06  16.52

So, I think that's the answer. It just happened that the seasons I looked at had less competitive balance that the season or seasons Tango looked at.

But what's the right answer for today's NHL? Well, it looks like the standings spread in recent seasons has moved back closer to Tango's numbers:

2013-14  14.26
2014-15  15.91
2015-16  12.86
2016-17  15.14
2017-18  15.44
average  14.76

What does that mean for the "number of games" estimate? I'll do the calculation for last season, 2017-18.

From the chart, SD(observed) is 15.44 points. SD(luck) is roughly the same for all years of the shootout era (although it varies very slightly with the number of overtime losses), so I'll use the old study's number of 8.44 points. 

As usual, 

SD(talent)^2 = SD(observed)^2 - SD(luck)^2
SD(talent)^2 = 15.44^2 - 8.44^2
SD(talent)   = 12.93

So last year, SD(talent) was 12.93. For the six seasons I looked at, it was 8.95. 

2016-12   8.95
2017-18  12.93

Now, let's convert to games.*  

*Specifically, "luck as important as talent" means SD(luck)=SD(talent). Formula: using the numbers for a full season, divide SD(luck) by SD(talent), square it, and multiply by the number of games (82).

When SD(talent) is 8.95, like the seasons I looked at, it takes 77 games for luck and talent to even out. When SD(talent) = 12.93, like it was last year, it takes only ... 36 games.

Coincidentally, 36 games is exactly what Tango found in his own sample.

talent=luck, after
2016-12  77 games
2017-18  36 games

Two things we can conclude from this:

1. Actual competitive balance (in terms of talent) does seem to change over time in non-random ways. The NHL from 2006-12 does actually seem to have been a more competitive league than from 2013-18. 

2. The "number of games" way of expressing the luck/talent balance is very sensitive to moderate changes in the observed range of the standings.


To expand a bit on #2 ... 

There must be significant random fluctuations in observed league balance.  We mention that sometimes in passing, but I think we don't fully appreciate how big those random fluctuations can be.

Here, again, is the SD(observed) for the seasons 2014-17:

2014-15  15.91
2015-16  12.86
2016-17  15.14

It seems unlikely that 2015-16 really had that much tighter a talent distribution than the surrounding seasons. What probably happened, in 2015-16, was just a fluke -- the lucky teams happened to be lower-talent, and the unlucky teams happened to be higher-talent. 

In other words, the difference was probably mostly luck. 

A different kind of luck, though -- luck in how each individual team's "regular" luck correlated, league-wide, with their talent. When the better teams (in talent) are luckier than the worse teams , the standings spread goes up. When the worse teams are luckier, the standings get compressed.

Anyway ... the drop in the chart from from 15.91 to 12.86 doesn't seem that big. But it winds up looking bigger once you subtract out luck to get to talent:

2014-15  13.49
2015-16   9.70
2016-17  12.57

The difference is more pronounced now. But, check out what happens when we convert to how many games it takes for luck and talent to even out:

Talent=luck, after
2014-15  32 games
2015-16  62 games
2016-17  37 games

Now, the differences are too large to ignore. From 2014-15 to 2015-16, SD(observed) went down only 19 percent, but the "number of games" figure nearly doubled.

And that's what I mean by #2 -- the "number of games" estimate is very sensitive to what seem like mild changes in standings variation. 


Just for fun, let's compare 2006-07, one of the most unbalanced seasons, to 2007-08, one of the most balanced. Just looking at the standings, there's already a big difference:

2006-07  16.14
2007-08  10.43

But it becomes *huge* when when you express it in games: 

Talent=luck, after
2006-07   31 games
2007-08  156 games

In one year, our best estimate of how many games it takes for talent to exceed luck changed by a factor of *five times*. And, I think, almost all that difference is itself just random luck.

Labels: , , ,

Monday, December 03, 2018

Answer to: a flawed argument that marginal offense and defense have equal value

The puzzle from last post was this:  What's wrong with this argument, that a run scored has to be worth exactly the same as a run prevented?

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

The answer is hinted at by a comment from Matthew Hunt:

"Is it the zero lower bound for runs? You can always increase the number of offensive runs, but you can't hold an opponent to -1 runs."

It's not specifically the zero lower bound -- the argument is wrong even if shutouts are rare -- but it does have to do with the issue of runs prevented.


(Note: for this post, I'm going to treat runs as if they have a Poisson distribution, to make the argument smoother. In reality, runs in baseball come in bunches, and aren't Poisson at all. If that bothers you, just transfer the argument to hockey or soccer, where goals are much closer to Poisson.)


The answer, I think, is this:  If you want to properly remove a single opponent's run from a season, you don't do it by choosing a random game. You have do it by choosing a random *run*.

When you *add* runs, it's OK to do it by choosing a game first, because all games have roughly equal opportunities to score more runs. But when you *remove* runs, you have to remove a run that's already there ... and you have to weight them all equally when deciding which one to remove.

If you don't weight them the runs equally ... well, suppose you have game A with ten runs, and game B with two runs. If you choose a random game first, each B run has five times the chance of being chosen as each of the A runs. 

Here's another way of looking at it. Suppose you randomly allocate 700 runs among 162 games, and then you realize you made a mistake, you only meant to allocate 699 runs. You'd look up the 700th run you added, and reverse it. 

But, that 700th run is more likely to come from a high-scoring game than a low-scoring game. Why? Because, before you added the last run, the game you were about to add it to was as average as the 161 other games. But after you add the run, that game must now be expected to be one run more than average. (Actually, 699/700 more, but close enough).

So, if you removed a 700th run by choosing a random game first, you'd be choosing it from an expected average game, not an expected above-average-game. And so your distribution will be more bunched up than it should be, and it would no longer be the same as the distribution would be if you just stopped at 699 runs.

And, of course, you might randomly choose a shutout, which brings that game's runs to -1, proving more obviously that your distribution is wrong.

You don't actually have to reverse the 700th run ... there's nothing special about that one compared to the other 699. You can pick the first run, or the 167th run, or a random run. But you have to choose a particular run without regard to the game it's in, or any other context.


Why does a random run have a different value from a run from a random game? 

Because the probabilities change. 

For one thing, you're now much less likely to choose a game where you only allowed one run. You probably won those games anyway, so those runs are less valuable than average. Since you choose less valuable runs less often than before, the value of the run goes up.

But, for another thing, you're now much more likely to choose a game where you gave up a lot of runs. You probably lost those games anyway, so the saved run again probably wouldn't help; you'd just lose 8-3 instead of 9-3. Since you're more likely to choose these less-valuable runs than before, the value of the run goes down.

So some runs where the value is low, you're more likely to choose. Others, you're less likely to choose. Which effect dominates? I don't think we can decide easily from this line of thinking alone. We'd have to do some number crunching.

If we did, we'd find out (as the other argument proved) that "choose a run instead of a game" makes runs prevented more valuable when you already score more than you allow, but less valuable when you allow more than you score. 

But, I don't see a way to prove that from this argument. If you do, let me know!


Finally, let me make one part of the argument clearer. Specifically, why is it OK to pick a random game when adding a run *scored*, but not when subtracting a run *allowed*? Shouldn't it be symmetrical?

Actually, it *is* symmetrical.

When you add a run, you're taking a non-run and changing it to a run. Well, there are so many occurrences of non-runs that they're roughly equal in every game. If you think about changing an out to a run, every game has roughly 27 outs, so every game is already equal.

If you think about hockey ... say, every 15-second interval has a chance of a goal. That's 240 segments per game. In a two-goal game, there are 238 non-goal segments that can be converted into a goal. In a 10-goal game, there are only 230 segments. But 230 is so closer to 238 that you can treat them as equal.*

(* In a true Poisson distribution, they're exactly equal, because you model the game as an infinite number of intervals. Infinity minus 2 is equal to infinity minus 10.)

When you subtract a run ... the process is symmetrical, but the numbers are different. A two-goal game has only two chances to convert a goal to a non-goal, while a ten-goal game has ten -- five times as many. Instead of a 230:238 ratio, you have a 2:10 ratio. The 2 and 10 aren't close enough to treat as equal.

In theory, the two cases are symmetrical in the sense that both are wrong. But, in practice, choosing goals scored by game is wrong but close enough to treat as right. Choosing goals allowed by game is NOT close enough to treat as right.

The fact that goals are rare compared to non-goals is what makes the difference. That difference is why the statistics textbooks say that Poisson is used for the distribution of "rare events."  

Goals are rare events. Non-goals are not.

Labels: ,

Tuesday, November 27, 2018

A flawed argument that marginal offense and defense have equal value

Last post, I argued that a defensive run saved isn't necessarily equally as valuable as an extra offensive run scored.  

But I didn't realize that was true right away.  Originally, I thought that they had to be equal.  My internal monologue went like this:

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

(It turns out that every ten games, that run will turn a loss into a win ... see here.  But that's not important right now.)

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

That argument is wrong.  It's obvious to me now why it's wrong, but it took me a long time to figure out the flaw in this argument.

Maybe you're faster than I was, and maybe you have an easier explanation than I do.  Can you figure out what's wrong with this argument?  

(I'll answer next post if nobody gets it.  Also, it helps to think of runs (or goals, or points) as Poisson, even if they're not.)

Labels: ,

Monday, October 01, 2018

When is defense more valuable than offense?

Is it possible, as a general rule, for a run prevented to be worth more than a run scored?

I don't think so. 

Suppose every team in the league scored one fewer run, and allowed one fewer run. If runs prevented were more valuable than runs scored, every team would improve. But, then, the league would no longer balance out to .500.

But the values of offensive and defensive runs *are* different for individual teams.

Suppose a team scores 700 runs and allows 600. That's an expected winning percentage of .57647 (Pythagoras, exponent 2). 

Suppose it gains a run of offense, so it scores 701 instead of 700. At 701-600, its expectation becomes .57717, an improvement of .00070.

Now, instead, suppose its extra run comes on defense, and it goes 700-599. Now, its expectation is .57728, an improvement of .00081.

So, for that team, the run saved is more valuable than the run scored.

It turns out that if a team scores more than it allows, a run on defense is more valuable than a run on offense. If a team allows more than it scores, the opposite is true. 


Just recently, I figured out an intuitive way to show why that happens, without having to use Pythagoras at all. I'm going to switch from baseball to hockey, because if you assume that goals scored have a Poisson distribution, the explanation works out easier.

Suppose the Edmonton Oilers score 5 goals per game, and allow 4. If they improve their offense by a goal a game, the 5-4 advantage becomes 6-4. If they improve their defense by a goal, the 5-4 becomes 5-3.

Which is better? 

Even though both scenarios have the Oilers scoring an average two more goals than the opposition, that doesn't happen every game, because there's random variation in how the goals are distributed among the games. With zero variation, the Oilers win every game 5-3 or 6-4. But, with the kind of variation that actually occurs, there's a good chance that the Oilers will lose some games. 

For instance, Edmonton might win one game 7-1, but lose the next 5-3. Over those two games, the Oilers do indeed outscore their opponents by two goals a game, on average, but they lose one of the two games.

The average is "Oilers finish the game +2". The Oilers lose when the result is at least two goals against them. In other words, when the result varies from expectation by -2 goals or greater.

The more variation around the mean of +2, the greater the chance the Oilers lose. Which  means the team with the advantage wants less variation in scores, and the underdog wants more variation.

Now, let's go to the assumption that goals follow a Poisson distribution.*  

(*Poisson is the distribution you get if you assume that in any given moment, each team has its own fixed probability of scoring, independent of what happened before. In hockey, that's a reasonable approximation -- not perfect, but close enough to be useful.)

For a Poisson distribution, the SD of the difference in goals is exactly the square root of the total goals scored.

In the 5-3 case, the SD of goal differential is the square root of 8. In the 6-4 case, the SD is the square root of 10. Since root-10 is higher than root-8, the underdog should prefer 6-4, but the favored Oilers should prefer 5-3.

Which means, for the favorite, a goal of defense is more valuable than a goal of offense.

This "proof" is only for Poisson, but, for the other sports, the same logic holds. In baseball, football, soccer, and basketball, the more goals/runs/points per game, the more variation around the expectation.

Think about what a two goal/point/run spread means in the various sports leagues. In the NBA, where 200 points are scored per game, a 2-point spread is almost nothing. In the NFL, it means more. In MLB, it means a lot more. In the NHL, more still. And, in soccer, where the average is fewer than three goals per game, a two-goal advantage is almost insurmountable.

Labels: , , , ,

Thursday, September 13, 2018

Are soccer goals scored less valuable than goals prevented?

During this year's World Cup of Soccer, I found a sabermetric soccer book discounted at a Toronto bookstore. It's called "The Numbers Game," and subtitled "Why Everything You Know About Soccer Is Wrong."

Actually, I don't know that much about soccer, but much of the book fails to convince me -- for instance, when the authors argue that defense is more important than offense:

"To see if attacking leads to more wins, and whether defense leads to fewer wins and more draws, we conducted a set of rigorous, sophisticated regression analyses on our Premier League data."

As far as I can tell, the regressions tried to predict team wins based on team goals scored and conceded. The results:

0.230 wins -- value of a goal scored
0.216 wins -- value of a goal conceded

The authors write,

"That means goals created and goals prevented contribute about equally to manufacturing wins in English soccer."

But, when it came to losses:

0.176 losses -- value of a goal scored
0.235 losses -- value of a goal conceded


"... defense provides a more powerful statistical explanation for why teams lose. ... when it came to avoiding defeat, the goals that clubs didn't concede were each 33 percent more valuable than the goals they scored."


The authors argue that 

(a) goals scored and conceded contribute equally to wins;
(b) goals conceded contribute more to losses than goals scored.

Except ... aren't those results logically inconsistent with each other?

Suppose you look at the last 20 games where Chelsea faced Arsenal. From (b), you would deduce, 

If Chelsea had scored one goal fewer, but also conceded one goal fewer, they'd probably have had fewer losses.

That's because, according to the author's numbers, the lost goal would have cost Chelsea 0.176 losses, but the goal prevented would have saved them 0.235 losses. Net gain: 0.059 fewer losses.

But Chelsea's goals scored are Arsenal's goals conceded, and vice versa. Also, Chelsea's losses are Arsenal's wins, and vice versa. So, you can rephrase that last quote as,

If Arsenal had conceded one goal fewer, but also scored one goal fewer, they'd probably have had fewer wins.

Except ... the authors just argued that goals scored and conceded are *equal* in terms of wins.

Without realizing it, the book simultaneously makes two contradictory arguments!


So why did the coefficents for goals scored and goals allowed come out so different in the regression? I think it's just random chance.

If a team scores 20 goals and concedes 20 goals, you'd expect them to win as many games as they lose. But that might not happen if the goals aren't evenly distributed over games. For instance, the team might have lost 19 games by a score of 1-0, while winning a 20th game 20-1. 

In other words, team wins and losses vary randomly from their goal differential expectation. If the teams that underperformed happened to be teams that scored more than they conceded, and the teams that overperformed happened to be teams that conceded more than they scored ... in that case, the regression notices that overperformance is correlated with defense, and adjusts accordingly. And you wind up with the result the authors got.

(Another source of error is that performance isn't linear in terms of goals; it's pythagorean. But that's probably a minor issue compared to simple randomness.)

I'd bet that, for the "wins" regression, there was no pattern for which teams randomly outperformed their win projections. But for the "losses" regression, there *was* that kind of pattern, where the teams with better defense did lose fewer games than projected.

I'd bet that if you grouped the games differently, and reran the regression, you'd get a different result. Instead of your regression rows being team-based, like "Chelsea's 38 games from 2007-08," make them time-based, like "the first four weeks of the 2007-08 schedule." That will scramble up the projection anomalies a different way, and I'd bet that the four coefficient estimates wind up much closer to each other.

Labels: ,

Thursday, May 03, 2018

NHL referees balance penalty calls between teams

That finding, from Michael Lopez, shows that the next penalty in an NHL game is significantly less likely to go to the team that's had more penalties so far in the game.

That was a new finding to me. A few years ago, I found that the next penalty is more likely to go to the team that had the (one) most recent penalty -- but I hadn't realized that quantity matters, too.

(My previous research can be found here: part one, two, three.)

So, I dug out my old hockey database and see if I could extend Michael's results. All the findings here are based on the same data as my other study -- regular season NHL games from 1953-54 to 1984-85, as provided by the Hockey Summary Project as at the end of 2011.


Quickly revisiting the old finding: referees do appear to call "make-up" penalties. The team that got the benefit of the most recent power play is almost 50 percent more likely to have the next call go against them. That team got the next penalty 59.7% of the time, versus only 40.3% for the previously penalized team.

39599/98167 .403 -- team last penalized
58568/98167 .597 -- other team

Now, let's look at total numbers of penalties instead. I've split the data into home and road teams, because road teams do get more penalties -- 52 percent vs. 48 percent overall.  (That difference is mitigated by the fact that referees balance out the calls. The first penalty of the game goes to the road team 54 percent of the time. The drop from 54 percent for the first call, down to 52 percent overall, is due to the referees balancing out the next call or calls.)

So far, nothing exciting. But here's something. It turns out that the *second* call of the game is much more likely than average to be a makeup call:

.703 -- visiting penalty after home penalty
.297 -- home penalty after home penalty

.653 -- home penalty after visiting penalty 

.347 -- visiting penalty after visiting penalty

Those numbers are huge. Overall, there are more than twice as many "make up" calls as "same team" calls.

In this case, quantity and recency are the same thing. Let's move on to the third penalty of the game, where they can be different.  From now on, I'll show the results in chart form:

.705 0-2 
.462 1-1
.243 2-0

Here's how to read the chart: when the home team has gone "0-2" in penalties -- that is, both previous penalties to the visiting team -- it gets 70.5% of the third penalties. When the previous two penalties were split, the home team gets 46.2%, similar to the overall average. When the home team got both previous penalties, though, it draws the third only 24.3% of the time (in other words, the visiting team drew 75.7%).

Here's the fourth penalty. I've added sample sizes, in parentheses.

.701 0-3 (755)
.559 1-2 (6951)
.373 2-1 (5845)
.261 3-0 (468)

It's a very smooth progression, from .701 down to .261, exactly what you would expect given that make-up calls are so common. 

Here's the fifth penalty:

.677 0-4 ( 195)
.619 1-3 (3244)
.465 2-2 (6950)
.351 3-1 (2306)
.316 4-0 ( 117)

That's the chart that corresponds to Michael Lopez's tweet, and if you scroll back up you'll see that these numbers are pretty close to his.

Sixth penalty:

.667 0-5 (  48)
.637 1-4 (1182)
.520 2-3 (4930)
.413 3-2 (4134)
.323 4-1 ( 773)
.226 5-0 (  31)

Again, the percentages drop every step ("monotonically," as they say in math).

Seventh penalty:

.692 0-6 (  13)

.585 1-5 ( 369)
.577 2-4 (2528)
.489 3-3 (4140)
.399 4-2 (1798)
.379 5-1 ( 219)
.200 6-0 (  13)

Eighth penalty:

.667 0-7 (   3)
.607 1-6 ( 122)
.588 2-5 ( 969)
.527 3-4 (2721)
.422 4-3 (2414)
.374 5-2 ( 652)
.412 6-1 (  68)
.000 7-0 (   1)

Still a perfect pattern.  It breaks up just a little bit here, for the ninth penalty, but that's probably just small sample size.

.000 0-8 (   1)
.553 1-7 (  38)
.586 2-6 ( 348)
.566 3-5 (1358)
.484 4-4 (2063)
.392 5-3 (1037)
.340 6-2 ( 191)
.333 7-1 (  21)

(This is getting boring, so here's a technical note to break the monotony. I included all penalties, including misconducts. I omitted all cases where both teams took a penalty at the same time, even if one team took more penalties than the other. In fact, I treated those as if they never happened, so they don't break the string. This may cause the results to be incorrect in some cases: for instance, maybe Boston takes a minor, then there's a fight and Montreal gets a major and a minor while Boston gets only a major. Then, Montreal takes a minor. In that case, the study will treat the Montreal minor as a make-up call, when it's really not. I think this happens infrequently enough that the results are still valid.)

I'll give two more cases. Here's the twelfth penalty:

.692 2-9 ( 13)
.623 3-8 ( 61)
.532 4-7 (250)
.506 5-6 (478)
.488 6-5 (459)
.449 7-4 (198)
.457 8-3 ( 35)
.200 9-2 (  5)

Almost perfect.  But ... the pattern does seems to break down later on, at the 14th to 16th penalty (I stopped at 16), probably due to sample size issues. Here's the fourteenth, which I think is the most random-looking of the bunch. You could almost argue that it goes the "wrong way":

.000  2-11 (  1)
.375  3-10 (  8)
.333  4- 9 ( 27)
.516  5- 8 ( 95)
.438  6- 7 (169)
.480  7- 6 (148)
.465  8- 5 ( 71)
.577  9- 4 ( 26)
.600 10- 3 (  5)

Still, I think the overall conclusion isn't threatened, that quantity is a factor in make-up calls.


OK, so now we know that quantity matters. But couldn't that mean that recency doesn't matter? We did find that the team with the most recent penalty was less likely to get the next one -- but that might just be because that team is also more likely to have a higher quantity at that point. After all, when a team takes three of the first four penalties, there's a 75 percent chance* it also took the most recent one. 

(* It's actually not 75 percent, because make-up calls make the sequence non-random. But the point remains.)

So, maybe the recency effect is just an illusion, by the quantity effect. Or vice versa.

So, here's what I did: I broke down every row in every table by who got the more recent call. It turns out: recency does matter.

Let's take that 3-for-4 example I just used:

.613 home team overall     (3244)
.508 after VVVH            ( 486)
.639 after other sequences (2758)

From this, it looks like there's both aspects here. When the home team is "up 3-1" in penalty advantage, it gets only 51 percent of the penalties if its previous penalty was the last of the four. That's still more than the 46.1 percent it gets to start the game, or the 46.5 percent it would get if it had been 2-2 instead of 3-1.

This seems to be true for most of the breakdowns -- maybe even all the ones with large enough sample sizes. I'll just arbitrarily pick one to show you ... the ninth penalty, home team 3-5.

.392 home team overall     (1037)
.362 when most recent was H (743)
.469 when most recent was V (294)

Even better: here's the entire chart for the eighth penalty: overall vs. last penalty went to home team ("last H") vs. last penalty went to visiting team "last V". 

overall   last H    last V
 .607      .750      .596      1-6 
 .588      .477      .609      2-5 
 .527      .446      .584      3-4 
 .422      .372      .518      4-3 
 .374      .357      .466      5-2 
 .412      .406      .500      6-1 

Clearly, both recency and quantity matter. Holding one constant, the other still follows the "make-up penalty" pattern. 

Can we figure out *how much* is recency and *how much* is quantity?  It's probably pretty easy to get a rough estimate with a regression. I'm about to leave for the weekend, but I'll look at that next week. Or you can download the results (speadsheet here) and do it yourself.

Labels: , , ,