Sunday, October 13, 2019

A study on NBA home court advantage

Economist Tyler Cowen often links to NBA studies in his "Marginal Revolution" blog ... here's a recent one, from an August post. (Follow his link to download the study ... you can also find a press release by Googling the title.)

The study used a neural network to try to figure out what factors are most important for home (court) advantage (which I'll call "HCA"). The best fit model used twelve variables: two-point shots made, three-point shots made, and free throws made -- repeated for team at home, opposition on road, team on road, and opposition at home.

The authors write, 

"Networks that include shot attempts, shooting percentage, total points scored, field goals, attendance statistics, elevation and market size as predictors added no improvement in performance. ...

"Contrary to previous work, attendance, elevation and market size were not relevant to understanding home advantage, nor were shot attempts, shooting percentage, overall W-L%, and total points scored."

On reflection, it's not surprising that those other variables don't add anything ... the ones they used, shots made, are enough to actually compute points scored and allowed. Once you have that, what does it matter what the attendance was? If attendance matters at all, it would affect wins through points scored and allowed, not something independent of scoring. And "total points scored" weren't "relevant" because they were redundant, given shots made.


The study then proceeds to a "sensitivity analysis," where they increase the various factors, separately, to see what happens to HCA. It turns out that when you increase two-point shots made by 10 percent, you get three to four times the impact on HCA compared to when you increase three-point shots made by the same 10 percent.

The authors write,

"[This] suggests teams can maximize their advantage -- and hence their odds of winning -- by employing different shot selection strategies when home versus away. When playing at home, teams can maximize their advantage by shooting more 2P and forcing opponents to take more 2P shots. When playing away, teams can minimize an opponent's home advantage by shooting more 3P and forcing opponents to take more 3P shots."

Well, yes, but, at the same time, no. 

The reason increasing 2P by 10 percent leads to a bigger effect than increasing 3P by 10 percent is ... that 10 percent of 2P is a lot more points! Eyeball the graph of "late era" seasons the authors used (I assume it's the sixteen seasons ending with 2015-16). Per team-season, it looks like the average is maybe 2500 two-point shots made, but only 500 three-point shots.

Adding 10 percent more 2P is 250 shots for 500 points. Adding 10 percent more 3P is 50 shots for 150 points. 500 divided by 150 gives a factor of three-and-a-third -- almost exactly what the paper shows!

I'd argue that what the study discovered is that points seem to affect HCA and winning percentage equally, regardless of how they are scored. 


Even so, the argument in the paper doesn't work. By the authors' own choice of variables, HCA is increased by *making* 2P shots, not my *taking* 2P shots. Rephrasing the above quote, what the study really shows is,

"When playing at home, teams can maximize their advantage by concentrating on *making* more 2P and on forcing opponents to *miss* more 2P. That's assuming that it's just as easy to impact 2P percentages by 10 percent than to impact 3P percentages by 10 percent."

But we could have figured that out easily, just by noticing that 10 percent of 2P is more points than 10 percent of 3P.


The authors found that you increase your HCA more with a 10 percent increase in road three-pointers than by a 10 percent increase in road two-pointers. 

Sure. But that's because, with the 3P, you actually wind up scoring fewer road points. Which means you win fewer road games. Which makes your HCA larger, since winning fewer road games increases the difference between home and road. 

It's because the worse you do on the road, the bigger your home court advantage!

Needless to say, you don't really want to increase your HCA by tanking road games. The authors didn't notice that's what they were suggesting.

I think the issue is that the paper assumes that increasing your HCA is always a good thing. It's not. It's actually neutral. The object isn't to increase or decrease your HCA. It's  to *win more games*. You can do that by winning more games at home, increasing your home court advantage, or by winning more games on the road, decreasing your home court advantage.

It's one of those word biases we all have if we don't think too hard. "Increasing your advantage" sounds like something we should strive for. The problem is, in this context, the home "advantage" is relative to *your own performance* on the road. So it really isn't an "advantage," in the sense of something that makes you more likely to beat the other team. 

In fact, if you rotate "Home Court Advantage" 360 degrees and call it "Road Court Disadvantage," now it feels like you want to *decrease* it -- even though it's exactly the same number!

But HCA isn't something you should want to increase or decrease for its own sake. It's just a description of how your wins are distributed.

Labels: , ,

Friday, September 06, 2019

Evidence confirming the DH "penalty"

In "The Book," Tango/Lichtman/Dolphin found that batters perform significantly worse when they play a game as DH than when they play a fielding position. Lichtman (MGL) later followed up with detailed results -- a difference of about 14 points of wOBA. That translates to about 6 runs per 500 PA.

A side effect of my new "luck" database is that I'm able to confirm MGL's result in a different way.

The way my luck algorithm works: it tries to "predict" a player's season by averaging the rest of his career -- before and after -- while adjusting for league, park, and age. Any difference between actual and predicted I ascribe to luck.

I calibrated the algorithm so the overall average luck, over thousands of player-seasons, works out to zero. For most breakdowns -- third basemen, say, or players whose first names start with "M" -- average luck stays close to zero. But, for seasons where the batter was exclusively a DH, the average luck worked out negative -- an average of -3.8 runs per 500 PA.  I'll round that to 4.

-6 R/500PA  MGL
-4 R/500PA  Phil

My results are smaller than what MGL found, but that's probably because we used different methods. I considered only players who never played in the field that year. MGL's study also included the DH games of players who did play fielding positions. 

(My method also included PH who never fielded that year. I made sure to cover the same set of seasons as MGL -- 1998 to 2012.)

MGL's study would have included players who were DHing temporarily because they were recovering from injury, and I'm guessing that's the reason for my missing 2 runs.

But, what about the 4 runs we have in common? What's going on there? Some possibilities:

1. Injury. Maybe when players spend a season DHing, they're more likely to be recovering from some longer-term problem, which also winds up impacting their hitting.

2. It's harder to bat as a DH than when playing a position. As "The Book" suggests, maybe "there is something about spending two hours sitting on the bench that hinders a player's ability to make good contact with a pitch."

3. Selective sampling. Most designated hitters played a fielding position at some time earlier in their careers. The fact that they are no longer doing so suggests that their fielding ability has declined. Whatever aspect of aging caused the fielding decline may have also affect their batting. In that case, looking at DHs might be selectively choosing players who show evidence of having aged worse than expected.

4. Something else I haven't thought of.

You could probably get a better answer by looking at the data a little closer. 

For the "harder to DH" hypothesis, you could isolate PA from the top of the first inning, when all hitters are on equal footing with the DH, since the road team hasn't been out on defense yet. And, for the "injury" hypothesis, you could maybe check batters who had DH seasons in the middle of their careers, rather than the end, and check if those came out especially unlucky. 

One test I was able to do is a breakdown of the full-season designated hitters by age:

Age     R/500PA   sample size
28-32    -13.7     2,316 PA
33-37    - 6.4     4,305 PA
38-42    + 1.4     6,245 PA

(I've left out the age groups with too few PA to be meaningful.)

Young DHs underperform, and older DHs overperform. I think that's suggestive more of the injury and selective-sampling explanations than of the "it's hard to DH" hypothesis. 


UPDATE: This 2015 post by Jeff Zimmerman finds a similar result. Jeff found that designated hitters had a larger "penalty" for the season in cases where they normally played a fielding position, or when they spent some time on the DL.

Labels: , ,

Wednesday, August 14, 2019

Aggregate career year luck as evidence of PED use

Back in 2005, I came up with a method to try to estimate how lucky a player was in a given season (see my article in BRJ 34, here). I compared his performance to a weighted average of his two previous seasons and his two subsequent seasons, and attributed the difference to luck.

I'm working on improving that method, as I've been promising Chris Jaffe I would (for the last eight years or something). One thing I changed was that now, I use a player's entire career as the comparison set, instead of just four seasons. One reason I did that is that I realized that, the old way, a player's overall career luck was based almost completely on how well he did at the beginning and end of his career.

The method I used was to weight the four surrounding seasons in a ratio of 1/2/2/1. If the player didn't play all four of those years, the missing seasons just get left out.

So, suppose a batter played from 1981 to 1989. The sum of his luck wouldn't be zero:

(81 luck) = (81)                     - 2/3(82) - 1/3(83) 
(82 luck) = (82) - 2/5(81)           - 2/5(83) - 1/5(84) 
(83 luck) = (83) - 2/6(82) - 1/6(81) - 2/6(84) - 1/6(85) 
(84 luck) = (84) - 2/6(83) - 1/6(82) - 2/6(85) - 1/6(86) 
(85 luck) = (85) - 2/6(84) - 1/6(83) - 2/6(86) - 1/6(87) 
(86 luck) = (86) - 2/6(85) - 1/6(84) - 2/6(87) - 1/6(88) 
(87 luck) = (87) - 2/6(86) - 1/6(85) - 2/6(88) - 1/6(89)
(88 luck) = (88) - 2/5(87) - 1/5(86) - 2/5(89) 
(89 luck) = (89) - 2/3(88) - 1/3(87) 
total luck = 13/30(81) +1/6(82) - 7/30(83) - 1/30(84) - 1/30(86) - 7/30(87) - 1/6(88) + 13/30 (89)

(*Year numbers not followed by the word "luck" refer to player performance level that year).

(Sorry about the small font.)

If a player has a good first two years and last two years, he'll score lucky. If he has a good third and fourth year, or third last and fourth last year, he'll score unlucky. The years in the middle (in this case, 1985, but, for longer careers, any seasons other than the first four and last four) cancel out and don't affect the total.

Now, by comparing each year to the player's entire career, that problem is gone. Now, every player's luck will sum close to zero (before regressing to the mean).

It's not that big a deal, but it was still worth fixing.


This meant I had to adjust for age. The old way, when a player was (say) 36, his estimate was based on his performance from age 34-38 ... reasonably close to 36. Although players decline from 34 to 38, I could probably assume that the decline from 34 to 36 was roughly equal to the decline from 36 to 38, so the age biases would cancel out.

But now, I'm comparing a 36-year-old player to his entire career ... say, from age 25 to 38. Now, we can't assume the 25-35 years, when the player was in his prime, cancel out the 37-38 years, when he's nowhere near the player he was.


So ... I have to adjust for age. What adjustment should I use? I don't think there's an accepted aging scale. 

But ... I think I figured out how to calculate one.

Good luck should be exactly as prevalent as bad luck, by definition. That means that when I look at all players of any given age, the total luck should add up to zero.

So, I experimented with age adjustments until all ages had overall luck close to zero. It wasn't possible to get them to exactly zero, of course, but I got them close.

From age 20 to 36, for both batting and pitching, no single age was lucky or unlucky more than half a run per 500 PA. Outside of that range, there were sample size issues, but that's OK, because if the sample is small enough, you wouldn't expect them close to zero anyway.


Anyway, it occurred to me: maybe this is an empirical way to figure out how players age! Even if my "luck" method isn't perfect, as long as it's imperfect roughly the same way for various ages, the differences should cancel out. 

As I said, I'm still fine-tuning the adjustments, but, for what it's worth, here's what I have for age adjustments for batting, from 1950 to 2016, denominated in Runs Created per 500 PA:

      age(1-17) = 0.7
        age(18) = 0.74
        age(19) = 0.75
        age(20) = 0.775
        age(21) = 0.81
        age(22) = 0.84
        age(23) = 0.86
        age(24) = 0.89
        age(25) = 0.9
        age(26) = 0.925
        age(27) = 0.925
        age(28) = 0.925
        age(29) = 0.925
        age(30) = 0.91
        age(31) = 0.8975
        age(32) = 0.8775
        age(33) = 0.8625
        age(34) = 0.8425
        age(35) = 0.8325
        age(36) = 0.8225
        age(37) = 0.8025
        age(38) = 0.7925
     age(39-42) = 0.7
       age(43+) = 0.65

These numbers only make sense relative to each other. For instance, players created 11 percent more runs per PA at age 24 than they did at age 37 (.89 divided by .8025 equals 1.11).

(*Except ... there might be an issue with that. It's kind of subtle, but here goes.

The "24" number is based on players at age 24 compared to the rest of their careers. The "37" number is based on players at age 37 compared to the rest of their careers. It doesn't necessarily follow that the ratio is the same for those players who were active both at 24 and 37. 

If you don't see why: imagine that every active player had to retire at age 27, and was replaced by a 28-year-old who never played MLB before. Then, the 17-27 groups and the 28-43 groups would have no players in common, and the two sets of aging numbers would be mutually exclusive. (You could, for instance, triple all the numbers in one group, and everything would still work.)

In real life, there's definitely an overlap, but only a minority of players straddle both groups. So, you could have somewhat of the same situation here, I think.

I checked batters who were active at both 24 and 37, and had at least 1000 PA combined for those two seasons. On average, they showed lucky by +0.2 runs per 500 PA. 

That's fine ... but from 750 to 999 PA, there were 73 players, and they showed unlucky by -3.7 runs per 500 PA. 

You'd expect those players with fewer PA to have been unlucky, since if they were lucky, they'd have been given more playing time. (And players with more PA to have been lucky.)  But is 3.7 runs too big to be a natural effect? (And is the +0.2 runs too small?)

My gut says: maybe, by a run or two. Still, if this aging chart works for this selective sample within a couple of runs in 500 PA, that's still pretty good.

Anyway, I'm still thinking about this, and other issues.)


In the process of experimenting with age adjustments, I found that aging patterns weren't constant over that 67-year period. 

For instance: for batters from 1960 to 1970, the peak ages from 27 to 31 all came out unlucky (by the standard of 1950-2015), while 22-26 and 32-34 were all lucky. That means the peak was lower that decade, which means more gentle aging. 

Still: the bias was around +1/-1 run of luck per 500 PA -- still pretty good, and maybe not enough to worry about.


If the data lets us see different aging patterns for different eras, we should be able to use it to see the effects of PEDs, if any.

Here's luck per 500 PA by age group for hitters, 1995 to 2004 inclusive:

-1.75   age 17-22
-0.74   age 23-27
+0.61   age 28-32
+0.99   age 33-37
+0.45   age 38-42

That seems like it's in the range we'd expect given what we know, or think we know, about the prevalence of PEDs during that period. It's maybe 2/3 of a run better than normal for ages 28 to 42. If, say 20 percent of hitters in that group were using PEDs, that would be around 3 runs each. Is that plausible? 

Here's pitchers:

-1.22   age 17-22
-0.51   age 23-27
+1.36   age 28-32 
+1.42   age 33-37 
+1.07   age 38-42 

Now, that's pretty big (and statistically significant), all the way from 28 to 42: for a starter who faces 800 batters, it's about 2 runs. if 20 percent of pitchers are on PEDs, that's 10 runs each.

By checking the post-steroid era, we can check the opposing argument that it's not PEDs, it's just better conditioning, or some such. Here's pitchers again, but this time 2007-2013:

-0.06   age 17-22
+1.01   age 23-27
+0.30   age 28-32
-1.67   age 33-37
+0.59   age 38-42

Now, from 28 to 42, pitchers were *unlucky* on average, overall.

I'd say this is pretty good support for the idea that pitchers were aging better due to PEDs ... especially given actual knowledge and evidence that PED use was happening.

Labels: , ,

Tuesday, March 26, 2019

True talent levels for individual players

(Note: Technical post about practical methods to figure MLB distribution of player talent and regression to the mean.)


For a long time, we've been using the "Palmer/Tango" method to estimating the spread of talent among MLB teams. You're probably sick of seeing it, but I'll run it again real quick for 2013:

1. Find the SD of observed team winning percentage from the standings. In 2013, SD(observed) was 0.0754.

2. Calculate the theoretical SD of luck in a team-season. Statistical theory tells us the formula is the square root of p(1-p)/162, where p is the probability of winning. Assuming teams aren't that far from .500, SD(luck) works out to around 0.039.

3. Since luck is independent of talent, we can say that SD(observed)^2 = SD(luck)^2 + SD(talent)^2 . Substituting the numbers gives our estimate that SD(talent) = 0.0643. 

That works great for teams. But what about players? What's the spread of talent, in, say, on-base percentage, for individual hitters?

It would be great to use the same method, but there's a problem. Unlike team-seasons, where every team plays 162 games, every player bats a different number of times. Sure, we can calculate SD(luck) for each hitter individually, based on his playing time, but then how do we combine them all into one aggregate "SD(luck)" for step 3? 

Can we use the average number of plate appearances? I don't think that would work, actually, because the SD isn't linear. It's inversely proportional to the square root of PA, but even if we used the average of that, I still don't think it would work.

Another possibility is to consider only batters with close to some arbitrary number of plate appearances. For instance, we could just take players in the range 480-520 PA, and treat them as if they all had 500 PA. That would give a reasonable approximation.

But, that would only help us find talent for batters who make it to 500 PA. Those batters are generally the best in baseball, so the range we find will be much too narrow. Also, batters who do make it to 500 PA are probably somewhat lucky (if they started off 15-for-100, say, they probably wouldn't have been allowed to get to 500). That means our theoretical formula for binomial luck probably wouldn't hold for this sample.

So, what do we do?

I don't think there's an easy way to figure that out. Unless Tango already has a way ... maybe I've missed something and reinvented the wheel here, because after thinking about it for a while, I came up with a more complicated method. 

The thing is, we still need to have all hitters have the same number of PA. 

We take the batter with the lowest playing time, and use that. It might be 1 PA. In that case, for all the hitters who have more than 1 PA, we reduce them down to 1 PA. Now that they're all equal, we can go ahead and run the usual method. 

Well, actually, that's a bit of an exaggeration ... 1 PA doesn't work. It's too small, for reasons I'll explain later. But 20 PA does seem to work OK. So, we reduce all batters down to 20 PA.*  

*The only problem is, we'll only be finding the talent range for the subset of batters who are good (or lucky) enough to make it to 20 plate appearances. That should be reasonable enough for most practical purposes, though.  

How do we take a player with 600 PA, and reduce his batting line to 20 PA? We can't just scale down. Proportionally, there's much less randomness in large samples than small, so if we treated a player's 20 PA as an exact replica of his performance in 600 PA, we'd wind up with the "wrong" amount of luck compared to what the formulas expect, and we'd get the wrong answer.

So, what I did was: I took a random sample of 20 PA from every batter's batting line, sampling "without replacement" (which means not using the same plate appearance twice). 

Once that's done, and every hitter is down to 20 PA, we can just go ahead and use the standard method. Here it is for 2013:

1. There were 602 non-pitchers in the sample. The SD of the 602 observed batter OBP values (based on 20 PA per player) was 0.1067.

2. Those batters had an aggregate OBP of .2944. The theoretical SD(luck) in 20 PA with a .2944 expectation is 0.1019.

3. The square root of (0.1067 squared - 0.1019 squared) equals 0.0317 squared.

So, our estimate of SD(talent) = 0.0317. 

That implies that 95% of batters range between .247 and .373. Seems pretty reasonable.


I think this method actually works quite decently. One issue, though, is that it includes a lot of randomness. All the regulars with 500 or 600 plate appearances ... we just randomly pick 20, and ignore the rest. The result is sensitive to which random numbers are pulled. 

How sensitive? To give you an idea, here are the results of 10 different random runs:


I should explain the "imaginary" one. That happens when, just by random chance, SD(observed) is smaller than the expected SD(luck). It's more frequent when the sample size is so small -- say, 20 PA -- that luck is much larger than talent. 

In our original run, SD(observed) was 0.0107 and SD(luck) was 0.0102.  Those are pretty close to each other. It doesn't take much random fluctuation to reverse their order ... in the "imaginary" run, the numbers were 0.01021 and 0.01022, respectively.

More generally, when SD(observed) and SD(luck) are so close, SD(talent) is very sensitive to small random changes in SD(observed). And so the estimates jump around a lot.

(And that's the reason I used the 20 PA minimum. With a sample size of 1 PA, there would be too much distortion from the lack of symmetry. I think. Still investigating.)

The obvious thing to do is just do a whole bunch of random runs, and take the average. That's doesn't quite work, though. One problem is that you can't average the imaginary numbers that sometimes come up. Another problem -- actually, the same problem -- is that the errors aren't symmetrical. A negative random error decreases the estimate more than a positive random error increases the estimate. 

To help get around that, I didn't average the 500 estimates in the list. Instead, I averaged the 500 values of SD(observed), and 500 estimates of SD(luck). Then, I calculated SD(talent) from those.

The result:

SD(talent) = 0.0356

Even with this method, I suspect the estimate is still a bit off. I'm thinking about ways to improve it. I still think it's decent enough, though.


So, now we have our estimate that for 2013, SD(talent)=0.0356. 

The next step: estimating a batter's true talent based on his observed OBP.

We know, from Tango, that we can estimate any player's talent by regressing to the mean -- specifically, "diluting" his batting line by adding a certain number of PA of average performance. 

How many PA do we need to add? As Tango showed, it's the number that makes SD(luck) equal to SD(talent). 

In the 500 simulations, SD(luck) averaged 0.1023 in 20 PA. To get luck down to 0.0356, where it would equal SD(talent), we'd need 166 PA. (That's 20 multiplied by the square of (0.1023 / 0.0356)). I'll just repeat that for reference:

Regress by 166 PA

A value of 166 PA seems reasonable. To check, I ran every season from 1950 to 2016, and 166 was right in line. 

The average of the 57 seasons was 183 PA. The highest was 244 PA (1981); the lowest was 108 PA (1993).  


Now we know we need to add 166 PA of average performance to a batting line to go from observed performance to estimated talent. But what, exactly, is "average performance"?

There are at least four different possibilities:

1. Regress to the observed real-life OBP. In MLB in 2013, for non-pitchers with at least 20 PA, that was .3186. 

2. Regress to the observed real-life OBP weighting every batter equally. That works out to .2984. (It's smaller than the actual MLB number because, in real life, worse hitters get fewer-than-equal PA.)

3. Regress to the average *talent*, weighted by real-life PA.

4. Regress to the average *talent*, weighting every batter equally.

Which one is correct? I had never actually thought about the question before. That's because I had only every used this method on team talent, and, for teams, all four averages are .500.  Here, they're all different. 

I won't try to explain why, but I think the correct answer is number 4. We want to regress to the average talent of the players in the sample.

Except ... now we have a Catch-22. 

To regress performance to the mean, we need to know the league's average talent. But to know the league's average talent, we need to regress performance to the mean!

What's the way out of this? It took me a while, but I think I have a solution.

The Tango method has an implicit assumption that -- while some players may have been lucky in 2013, and some unlucky -- overall, luck evened out. Which means, the observed OBP in MLB in 2013 is exactly equal to the expected OBP based on player talent.

Since the actual OBP was .3186, it must be that the expected OBP, based on player talent, is also .3186. That is: if we regress every player towards X by 166 PA, the overall league OBP has to stay .3186. 

What value of X makes that happen?

I don't think there can be an easy formula for X, because it depends on the distribution of playing time -- most importantly, how much more playing time the good hitters got that year compared to the bad hitters.

So I had to figure it out by trial and error. The answer:

Mean of player talent = .30995

(If you want to check that yourself, just regress every player's OBP while keeping PA constant, and verify that the overall average (weighted by PA) remains the same. Here's the SQL I used for that:

sum(H+bb)/sum(ab+bb) AS actual, 
sum((h+bb+.30995*166)/(ab+bb+166)*(ab+bb)) / sum(ab+bb) AS regressed 
FROM batting
WHERE yearid=2013 and ab+bb>=20 and primarypos <> "P"
The idea is that "actual" and "regressed" should come out equal.

The "primarypos" column is one I created and populated myself, but the rest should work right from the Lahman database. You can leave out the "primarypos" and just use all hitters with 20+ PA. You'll probably find that it'll be something lower than .30995 that makes it work, since including pitchers brings down the average talent.  Also, with a different population of talent, the correct number of PA to regress should be something other than 166 -- probably a little lower? -- but 166 is probably close.

While I'm here ... I should have said earlier that I used only walks, AB, and hits in my definition of OBP, all through this post.)


So, a summary of the method:

1. For each player, take a random 20 PA subset of his batting line. Figure SD(observed) and SD(luck).

2. Repeat the above enough times to get a large sample size, and average out to get a stable estimate of SD(observed) and SD(luck).

3. Use the Tango method to calculate SD(talent).

4. Use the Tango method to calculate how many PA to regress to the mean to estimate player talent.

5. Figure what mean to regress to by trial and error, to get the playing-time-weighted average talent equal to the actual league OBP.


If I did that right, it should work for any stat, not just OBP. Eventually I'll run it for wOBA, and RC27, and BABIP, and whatever else comes to mind. 

As always, let me know if I've got any of this wrong.

Labels: , , , ,

Tuesday, January 15, 2019

Fun with splits

This was Frank Thomas in 1993, a year in which he was American League MVP with an OPS of 1.033.

                 PA   H 2B 3B HR  BB  K   BA   OPS 
'93 F. Thomas   676 174 36  0 41 112 54 .317 1.033  

Most of Thomas's hitting splits were fairly normal:

Home/Road:              1.113/0.950
First vs. Second Half:  0.970/1.114
Vs. RHP/LHP:            1.019/1.068
Outs in inning:         1.023/1.134/0.948
Team ahead/behind/tied: 1.016/0.988/1.096
Early/mid/late innings: 1.166/0.950/0.946
Night/day:              1.071/0.939

But I found one split that was surprisingly large:

              PA   H 2B 3B HR BB  K   BA   OPS  RC/G 
Thomas 1     352 108 22  0 33 58 34 .367 1.251 14.81 
Thomas 2     309  66 14  0  8 54 20 .259 0.796  5.45 

"Thomas 1" was an order of magnitude better than "Thomas 2," to the extent that you wouldn't recognize them as the same player. 

This is a real split ... it's not a selective-sampling trick, like "team wins vs. losses," where "team wins" were retroactively more likely to have been games in which Thomas hit better. (For the record, that particular split was 1.172/.828 -- this one is wider.)

So what is this split? The answer is ... 


The first line is games on odd-numbered days of the month. The second line is even-numbered days.

In other words, this split is random.

In terms of OPS difference -- 455 points -- it's the biggest odd/even split I found for any player in any season from 1950 to 2016 with at least 251 AB PA each half. 

If we go down to a 150 AB minimum, the biggest is Ken Phelps in 1987:

1987 Phelps   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd          204  31  3  0  8 39 33 .188 0.695  3.79 
even         208  55 10  1 19 41 42 .329 1.204 13.03 

And if we go down to 100 AB, it's Mike Stanley, again in 1987, but on the opposite days to Phelps:

1987 Stanley  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd          134  42  6  1  6 18 23 .362 1.034 10.49 
even         113  17  2  0  0 13 25 .170 0.455  1.55 

But, from here on, I'll stick to the 251 AB standard.

That 1993 Frank Thomas split was also the biggest gap in home runs, with a 25 HR difference between odd and even (33 vs. 8). Here's another I found interesting -- Dmitri Young in 2001:

2001 D Young  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
Odd          285  68 12  2  2 18 40 .255 0.639  3.48 
Even         292  95 16  1 19 19 37 .348 1.013  9.51 

Only two of Young's 21 home runs came on odd-numbered days. The binomial probability of that happening randomly (19-2/2-19 or better) is about 1 in 4520.*  And, coincidentally, there were exactly 4516 players in the sample!

(* Actually, it must be more likely than 1 in 4520. The binomial probability assumes each opportunity is independent, and equally likely to occur on an even day as an odd day. But, PA tend to happen in daily clusters of 3 to 5. Since PAs are more likely to cluster, so are HR. 

To see that more easily, imagine extreme clustering, where there are only two games a year (instead of 162), with 250 PA each game. Half of all players would have either all odd PA or all even PA, and you'd see lots of extreme splits.)

For K/BB ratio, check out Derek Jeter's 2004:  

2004 Jeter   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd         362 113 27  1 15 14 63 .325 0.888  7.12 
even        327  75 17  0  8 32 36 .254 0.720  4.40 

There were bigger differences, but I found Jeter's the most interesting. 

In 1978, all 10 of Rod Carew's triples came on even-numbered days:

1978 Carew   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
odd         333  92 10  0  0 45 34 .319 0.766  5.46 
even        309  96 16 10  5 33 28 .348 0.950  8.69 

A 10-0 split is a 1-in-512 shot. I'd say again that it's actually a bit more likely than that because of PA clustering, but ... Carew actually had *fewer* PA in that situation! 

Oh, and Carew also hit all five of his HR on even days. Combining them into 15-0 is binomial odds of 16383 to 1, if you want to do that.

Strikeouts and walks aren't quite as impressive. It's Justin Upton 2013 for strikeouts:

2003 Upton     PA   H 2B 3B HR BB   K   BA  OPS  RC/G 
odd           330  71 14  1 16 31 102 .237 0.761 4.67 
even          303  76 13  1 11 44  59 .293 0.875 6.84 

And Mike Greenwell 1988 for walks:

88 Greenwell   PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
odd           357  91 15  3 10 62  18 .308 0.910 7.61 
even          320 101 24  5 12 25  20 .342 0.973 8.85 

Interestingly, Greenwell was actually more productive on the even-numbered days where he took less than half as many walks.

Finally, here's batting average, Grady Sizemore in 2005:

2005 Sizemore  PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
odd           344  69  9  4 12 26  79 .217 0.660 3.45 
even          348 116 28  7 10 26  53 .360 0.992 9.50 

Another anomaly -- Sizemore hit more home runs on his .217 days than on his .360 days.


Anyway, what's the point of all this? Fun, mostly. But, for me, it did give me a better idea of what kinds of splits can happen just by chance. If it's possible to have a split of 33 odd homers and 8 even homers, just by luck, then it's possible to have 33 first-half homers and 8 second-half homers, just by luck. 

Of course, you should just expect that size of effect once every 40 years or so. It might more intuitive to go from a 40-year standard to a single-season standard, to get a better idea of what we can expect each year. 

To do that, I looked at 1977 to 2016 -- 39 seasons plus 1994. Averaging the top 39 should roughly give us the average for the year. Instead of the average, I figured I'd just (unscientifically) take the 25th biggest ... that's probably going to be close to the median MLB-leading split for the year, taking into account that some years have more than one of the top 39.

For HR, the 25th ranked is Fred McGriff's 2002. It's an impressive 22/8 split:

02 McGriff   PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         297  70 11  1 22 42  47 .275 0.961  7.74 
even        289  73 16  1  8 21  52 .272 0.754  4.89 

For OPS, it's Scott Hatteberg in 2004:

04 Hatteberg PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         312  92 19  0 10 37  23 .335 0.926  8.12 
even        310  64 11  0  5 35  25 .233 0.647  3.47

For strikeouts, it's Felipe Lopez, 2005. Not that huge a deal ... only 27 K difference.

05 F. Lopez  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         316  78 15  2 12 19  69 .263 0.755  4.75 
even        321  91 19  3 11 38  42 .322 0.928  7.95 

For walks, it's Darryl Strawberry's 1987. The difference is only 23 BB, but to me it looks more impressive than the 27 strikeouts:

87 Strwb'ry  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         315  77 15  2 19 37  55 .277 0.912  7.02 
even        314  74 17  3 20 60  67 .291 1.045  9.49 

For batting average, number 25 is Orestes Infante, 2011, but I'll show you the 24th ranked, which is Rickey Henderson in his rookie card year. (Both players round to a .103 difference.)

1980 Rickey  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
odd         340 100 13  1  2 60  21 .357 0.903  8.07 
even        368  79  9  3  7 57  33 .254 0.739  4.67 


I'm going to think of this as, every year, the league-leading random split is going to look like those. Some years it'll be higher, some lower, but these will be fairly typical.

That's the league-leading split for *each category*. There'll be a random home/road split of this magnitude (in addition to actual home/road effect). There'll be a random early/late split of this magnitude (in addition to any fatigue/weather effects). There'll be a random lefty/righty split of this magnitude (in addition to actual platoon effects). And so on.

Another way I might use this is to get an intuitive grip on how much I should trust a potentially meaningful split. For instance, if a certain player hits substantially worse in the second half of the season than in the first half, how much should you worry? To figure that out, I'd list a season's biggest even/odd splits alongside the season's biggest early/late splits. If the 20th biggest real split is as big as the 10th biggest random split, then, knowing nothing else, you can start with a guess that there's a 50 percent chance the decline is real.

Sure, you could do it mathematically, by figuring out the SD of the various stats. But that's harder to appreciate. And it's not nearly as much fun as being able to say that, in 1987, Rod Carew hit every one of his 10 triples and 5 homers on even-numbered days. Especially when anyone can go to Baseball Reference and verify it.

Labels: , , ,

Tuesday, December 18, 2018

Does the NHL's "loser point" help weaker teams?

Back when I calculated that it took 73 NHL games for skill to catch up with luck in the standings, I was surprised it was so high. That's almost a whole season. In MLB, it was less than half a season, and in the NBA, Tango found it was only 14 games, less than one-fifth of the full schedule.

Seventy-three games seemed like that was a lot of luck. Why so much? As it turns out, it was an anomaly -- the NHL was just having an era where differences in team talent were small. Now, it's back under 40 games.

But I didn't know that at the time, so I had a different explanation: it must be the extra point the NHL started giving out for overtime losses. The "loser point," I reasoned, was reducing the importance of team talent, by giving the worse teams more of a chance to catch up to the better teams.

My line of thinking was something like this: 

1. Loser points go disproportionately to worse teams. For team-seasons, there's a correlation of around .4 between negative goal differential (a proxy for team quality) and OTL. So, the loser point helps the worse teams gain ground on the better teams.

2. Adding loser points adds more randomness. When you lose by one goal, whether that goal comes early in the game, or after the third period, is largely a matter of random chance. That adds "when the goals were" luck to the "how many goals there were" luck, which should help mix up the standings more. In fact, as I write this, the Los Angeles Kings have two more wins and three fewer losses than the Chicago Blackhawks. But, because Chicago has five OTL to the Kings' one, they're actually tied in the standings.

But ... now I realize that argument is wrong. And, the conclusion is wrong. It turns out the loser point actually does NOT help competitive balance in the NHL. 

So, what's the flaw in my old argument? 


I think the answer is: the loser point does affect how compressed the standings get in terms of actual points, but it doesn't have much effect on the *order* of teams. The bottom teams wind up still at the bottom, but (for instance) instead of having only half as many points as the top teams, they have two-thirds as many points.

Here's one way to see that. 

Suppose there's no loser point, so the winner always gets two points and the loser always gets none (even if it was an overtime or shootout loss). 

Now, make a change so the losing team gets a point, but *every time*. In that case, the difference between any two teams gets cut in half, in terms of points -- but the order of teams stays exactly the same. 

The old way, if you won W games, your point total was 2W. Now, it's W+82. Either way, the order of standings stays the same -- it's just that the differences between teams are cut in half, numerically.

It's still true that the "loser point" goes disproportionately to the worse teams -- the 50-32 team gets only 32 loser points, while the 32-50 team gets 50 of them. But that doesn't matter, because those points are never enough to catch up to any other team. 

If you ran the luck vs. skill numbers for the new system compared to the old system, it would work out exactly the same.


In real life, of course, the losing team doesn't get a point every time: only when it loses in overtime. Last season, that happened in about 11.6 percent of games, league-wide, or about 23.3 percent of losses.

If the loser point happened in *exactly* 23.3 percent of losses, for every team, with no variation, the situation would be the same as before -- the standings would get compressed, but the order wouldn't change. It would be as if, every loss, the loser got an extra 0.233 points. No team could pass any other team, since for every two points it was behind, it only gets 0.233 points to catch up. 

But: what if you assume that it's completely random which losses become overtime losses?  Now, the order can change. A 40-42 team can catch up to a 41-41 team if its losses had randomly included two more overtime losses than its rival. The chance of that happening is helped by the fact that the 40-42 team has one extra loss to try to randomly convert. It needs two random points to catch up, but it starts with a positive expectation of an 0.233 point head start.

If losses became overtime losses in a random way, then, yes, the OTL would make luck more important, and my argument would be correct. But they don't. It turns out that better teams turn losses into OTL much more frequently than worse teams, on a loss-for-loss basis.

Which makes sense. Worse teams' losses are more likely to be blowouts, which means they're less likely to be close losses. That means fewer one-goal losses, proportionately. 

In other words: 

(a) bad teams have more losses, but 
(b) those losses are less likely to result in an OTL. 

Those two forces work in opposite directions. Which is stronger?

Let's run the numbers from last year to find out.

If we just gave two points for a win, and zero for a loss, we'd have: 

SD(luck)    = 9.06
SD(talent)  =13.76

But in real life, which includes the OTL, the numbers are

SD(luck)    = 8.48
SD(talent)  =12.90

Converting so we can compare luck to talent:

35.5 games until talent=luck (no OTL point)
35.4 games until talent=luck (with OTL point)

It turns out, the two factors almost exactly cancel out! Bad teams have more chances for an OTL point because they lose more -- but those losses are less likely to be OTL almost in exact proportion.

And that's why I was wrong -- why the OTL point doesn't increase competitive balance, or make the standings less predictable. It just makes the NHL *look* more competitive, by making the point differences smaller.

Labels: , , ,