Tuesday, January 15, 2019

Fun with splits

This was Frank Thomas in 1993, a year in which he was American League MVP with an OPS of 1.033.

                 PA   H 2B 3B HR  BB  K   BA   OPS 
--------------------------------------------------   
'93 F. Thomas   676 174 36  0 41 112 54 .317 1.033  

Most of Thomas's hitting splits were fairly normal:

Home/Road:              1.113/0.950
First vs. Second Half:  0.970/1.114
Vs. RHP/LHP:            1.019/1.068
Outs in inning:         1.023/1.134/0.948
Team ahead/behind/tied: 1.016/0.988/1.096
Early/mid/late innings: 1.166/0.950/0.946
Night/day:              1.071/0.939

But I found one split that was surprisingly large:

              PA   H 2B 3B HR BB  K   BA   OPS  RC/G 
----------------------------------------------------
Thomas 1     352 108 22  0 33 58 34 .367 1.251 14.81 
Thomas 2     309  66 14  0  8 54 20 .259 0.796  5.45 

"Thomas 1" was an order of magnitude better than "Thomas 2," to the extent that you wouldn't recognize them as the same player. 

This is a real split ... it's not a selective-sampling trick, like "team wins vs. losses," where "team wins" were retroactively more likely to have been games in which Thomas hit better. (For the record, that particular split was 1.172/.828 -- this one is wider.)

So what is this split? The answer is ... 

.
.
.

The first line is games on odd-numbered days of the month. The second line is even-numbered days.

In other words, this split is random.

In terms of OPS difference -- 455 points -- it's the biggest odd/even split I found for any player in any season from 1950 to 2016 with at least 251 AB PA each half. 

If we go down to a 150 AB minimum, the biggest is Ken Phelps in 1987:

1987 Phelps   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
----------------------------------------------------
odd          204  31  3  0  8 39 33 .188 0.695  3.79 
even         208  55 10  1 19 41 42 .329 1.204 13.03 

And if we go down to 100 AB, it's Mike Stanley, again in 1987, but on the opposite days to Phelps:

1987 Stanley  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
----------------------------------------------------
odd          134  42  6  1  6 18 23 .362 1.034 10.49 
even         113  17  2  0  0 13 25 .170 0.455  1.55 

But, from here on, I'll stick to the 251 AB standard.

That 1993 Frank Thomas split was also the biggest gap in home runs, with a 25 HR difference between odd and even (33 vs. 8). Here's another I found interesting -- Dmitri Young in 2001:

2001 D Young  PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
----------------------------------------------------
Odd          285  68 12  2  2 18 40 .255 0.639  3.48 
Even         292  95 16  1 19 19 37 .348 1.013  9.51 

Only two of Young's 21 home runs came on odd-numbered days. The binomial probability of that happening randomly (19-2/2-19 or better) is about 1 in 4520.*  And, coincidentally, there were exactly 4516 players in the sample!

(* Actually, it must be more likely than 1 in 4520. The binomial probability assumes each opportunity is independent, and equally likely to occur on an even day as an odd day. But, PA tend to happen in daily clusters of 3 to 5. Since PAs are more likely to cluster, so are HR. 

To see that more easily, imagine extreme clustering, where there are only two games a year (instead of 162), with 250 PA each game. Half of all players would have either all odd PA or all even PA, and you'd see lots of extreme splits.)

For K/BB ratio, check out Derek Jeter's 2004:  

2004 Jeter   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
---------------------------------------------------
odd         362 113 27  1 15 14 63 .325 0.888  7.12 
even        327  75 17  0  8 32 36 .254 0.720  4.40 

There were bigger differences, but I found Jeter's the most interesting. 

In 1978, all 10 of Rod Carew's triples came on even-numbered days:

1978 Carew   PA   H 2B 3B HR BB  K  BA   OPS   RC/G 
---------------------------------------------------
odd         333  92 10  0  0 45 34 .319 0.766  5.46 
even        309  96 16 10  5 33 28 .348 0.950  8.69 

A 10-0 split is a 1-in-512 shot. I'd say again that it's actually a bit more likely than that because of PA clustering, but ... Carew actually had *fewer* PA in that situation! 

Oh, and Carew also hit all five of his HR on even days. Combining them into 15-0 is binomial odds of 16383 to 1, if you want to do that.

Strikeouts and walks aren't quite as impressive. It's Justin Upton 2013 for strikeouts:

2003 Upton     PA   H 2B 3B HR BB   K   BA  OPS  RC/G 
-----------------------------------------------------
odd           330  71 14  1 16 31 102 .237 0.761 4.67 
even          303  76 13  1 11 44  59 .293 0.875 6.84 

And Mike Greenwell 1988 for walks:

88 Greenwell   PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
-----------------------------------------------------
odd           357  91 15  3 10 62  18 .308 0.910 7.61 
even          320 101 24  5 12 25  20 .342 0.973 8.85 

Interestingly, Greenwell was actually more productive on the even-numbered days where he took less than half as many walks.

Finally, here's batting average, Grady Sizemore in 2005:

2005 Sizemore  PA   H 2B 3B HR BB   K  BA   OPS  RC/G 
-----------------------------------------------------
odd           344  69  9  4 12 26  79 .217 0.660 3.45 
even          348 116 28  7 10 26  53 .360 0.992 9.50 

Another anomaly -- Sizemore hit more home runs on his .217 days than on his .360 days.

-------

Anyway, what's the point of all this? Fun, mostly. But, for me, it did give me a better idea of what kinds of splits can happen just by chance. If it's possible to have a split of 33 odd homers and 8 even homers, just by luck, then it's possible to have 33 first-half homers and 8 second-half homers, just by luck. 

Of course, you should just expect that size of effect once every 40 years or so. It might more intuitive to go from a 40-year standard to a single-season standard, to get a better idea of what we can expect each year. 

To do that, I looked at 1977 to 2016 -- 39 seasons plus 1994. Averaging the top 39 should roughly give us the average for the year. Instead of the average, I figured I'd just (unscientifically) take the 25th biggest ... that's probably going to be close to the median MLB-leading split for the year, taking into account that some years have more than one of the top 39.

For HR, the 25th ranked is Fred McGriff's 2002. It's an impressive 22/8 split:

02 McGriff   PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
----------------------------------------------------
odd         297  70 11  1 22 42  47 .275 0.961  7.74 
even        289  73 16  1  8 21  52 .272 0.754  4.89 

For OPS, it's Scott Hatteberg in 2004:

04 Hatteberg PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
----------------------------------------------------
odd         312  92 19  0 10 37  23 .335 0.926  8.12 
even        310  64 11  0  5 35  25 .233 0.647  3.47

For strikeouts, it's Felipe Lopez, 2005. Not that huge a deal ... only 27 K difference.

05 F. Lopez  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
----------------------------------------------------
odd         316  78 15  2 12 19  69 .263 0.755  4.75 
even        321  91 19  3 11 38  42 .322 0.928  7.95 

For walks, it's Darryl Strawberry's 1987. The difference is only 23 BB, but to me it looks more impressive than the 27 strikeouts:

87 Strwb'ry  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
----------------------------------------------------
odd         315  77 15  2 19 37  55 .277 0.912  7.02 
even        314  74 17  3 20 60  67 .291 1.045  9.49 


For batting average, number 25 is Orestes Infante, 2011, but I'll show you the 24th ranked, which is Rickey Henderson in his rookie card year. (Both players round to a .103 difference.)

1980 Rickey  PA   H 2B 3B HR BB   K  BA   OPS   RC/G 
----------------------------------------------------
odd         340 100 13  1  2 60  21 .357 0.903  8.07 
even        368  79  9  3  7 57  33 .254 0.739  4.67 

-------

I'm going to think of this as, every year, the league-leading random split is going to look like those. Some years it'll be higher, some lower, but these will be fairly typical.

That's the league-leading split for *each category*. There'll be a random home/road split of this magnitude (in addition to actual home/road effect). There'll be a random early/late split of this magnitude (in addition to any fatigue/weather effects). There'll be a random lefty/righty split of this magnitude (in addition to actual platoon effects). And so on.

Another way I might use this is to get an intuitive grip on how much I should trust a potentially meaningful split. For instance, if a certain player hits substantially worse in the second half of the season than in the first half, how much should you worry? To figure that out, I'd list a season's biggest even/odd splits alongside the season's biggest early/late splits. If the 20th biggest real split is as big as the 10th biggest random split, then, knowing nothing else, you can start with a guess that there's a 50 percent chance the decline is real.

Sure, you could do it mathematically, by figuring out the SD of the various stats. But that's harder to appreciate. And it's not nearly as much fun as being able to say that, in 1987, Rod Carew hit every one of his 10 triples and 5 homers on even-numbered days. Especially when anyone can go to Baseball Reference and verify it.

Labels: , , ,