Do batters really hit .463 when gunning for a .300 average? Part II
Monday, I reviewed a study that found that players hitting .299 late in the season managed to get to .300 suspiciously often, hitting over .400 in their last plate appearance of the season. The authors argued that this is a result of the players expending extra effort to improve performance with .300 on the line. However, I argued that it was just a case of selective sampling: many players, once they hit .300, are benched for the remainder of the season, so cause and effect are reversed: getting a hit causes a certain at-bat to be the last.
After I wrote that, I found that the authors say their results account for substitutions. In footnote 4, they say that they didn't actually use the last plate appearance, but the last *scheduled* PA. So if they were pinch hit for after getting a hit to get to .300, that player would go into the study as "substituted for" instead of "base hit". However, the authors mentioned only pinch hits, and it's possible that if the player was taken out for defensive substitute, or a pinch runner, his hit would still have stayed in the study.
Because the paper is so unclear, I decided to try to reproduce the results. Using Retrosheet data like the authors did, I found every player from 1975 to 2008 who was hitting .299 before his last plate appearance of the season. (However, unlike the authors, I used only last PAs after September 25.) My results were similar to the study's.
According to the New York Times article describing the study, there were 62 players who went into their last PA at .299, and recorded an at-bat. My attempt, however, found 68 (listed here). Those "last PAs" break down as follows:
-- 33 played the entire final game and hit .242 in their last PA
-- 13 were replaced during the final game and hit .692 in their last PA before being replaced
-- 22 didn't play the final game and had hit .636 in their last PA before being sat for the rest of the season.
I have six players the study didn't. My best guess is that 6 of the 13 who were replaced during the final game were pinch hit for, and those were the ones the original study omitted. I'm pretty convinced that's what's going on and that I reproduced the results fairly. That's because the batting averages seem to be about right, but mostly because both the original study and my replication show that these players had zero walks -- and zero walks in 60+ PA is very unusual.
If that's the case, and the study only controlled for pinch hits, there's still a large amount of selective sampling in the 22 players that didn't play the final game. 14 of those 22 guys got a hit. How did that happen? Normally, it would take about 47 AB to get to 14 hits.
What happened is that there probably *were* 47 of those guys. The other 25, who made an out, dropped to .297 or .298, and therefore weren't benched -- they played again to try to get past .300. Therefore, they weren't included in the study, because that hitless AB wasn't their last.
So, the conclusion remains: the effect the authors found is simply the result of players quitting immediately after they get the hit that pushes them over .300.
Just in case you're not convinced, I decided to check how well players hit late in the season when they're shooting for .300, in a method that avoids the sampling bias. Instead of checking just their last plate appearance, I checked a group of plate appearances chosen in advance.
I looked at every team's last two games of the season, and took every PA by a player who went up to the plate hitting .299. That is: not just their last PA, but *any* such PA in the last two games. That way, there's no bias: that PA winds up in the sample if he gets a hit then quits for the year, but, unlike the original, it also winds up in the sample if he makes an out, and bats again later to try to make up the ground he lost.
The results: not much different from other at-bats. Those players hit .313, going 101 for 323 with 23 walks.
What about batters hitting .300, who, if they made an out, would drop to .299? Those guys hit only .288.
Batters hitting .298, still close enough to three hundred to have a decent shot, hit .310. And batters currently at .302 subsequently hit only .289. In chart form:
.298: .310 in 365 AB, 45 walks
.299: .313 in 323 AB, 23 walks
.300: .288 in 233 AB, 30 walks
.301: .289 in 277 AB, 37 walks
Not a whole lot different than you'd expect. Combined, the .299/.300 group hit .302. That's far from the .463 cited by the Times, for the biased sample.
I then tried something slightly different. I stopped the clock before the team's last two games, found all players who were between .297 and .302 at that point in time, and then checked their performance afterwards, regardless of how their batting average may have moved up or down during those two days. That's under the assumption that if the batter is that close that late in the season, every at-bat is critical in his quest to finish at .300 or above.
The results: those players hit only .302.
However, they did hit higher than other nearby groups, as seen below. Plus, you'd expect the players in the .297-.302 group to regress to the mean a bit, and maybe hit .290 or something. So not only did they beat the surrounding groups, but they also beat their expectation.
.291 - .296 — 692/2434 (.284), 229 BB
.297 - .302 — 593/1962 (.302), 205 BB
.303 - .308 — 421/1587 (.265), 167 BB
This is very slight support for the hypothesis that players on the cusp do succeed in pushing themselves a little harder towards .300. I say "very slight" because the standard deviation for these batting averages is about 10 points, so the results certainly aren't statistically significant. Personally, I expect they're mostly random, and that if you did this same study for other seasons, you'd find the effect goes away. But there still might be something there.
So I think the case is pretty well proven:
1. The factoid, "players hitting .299 or .300 batting a whopping .463 in their final at-bat" is true -- but it's the result of cherry-picking the AB in the sample. If the player got a hit to pass .300, it was likely to *become* his last at-bat, as he tended to sit out the rest of the season. But if he made an out, the AB wouldn't be his last.
2. If you look at at *all* AB, not just the cherry-picked "last" ones, players around .300 hit only slightly better than expected, not statistically significant at all.
So it's fair to say that, while there does seem to be motivation to hit .300, that doesn't seem to translate into higher performance when it matters. However, there is strong evidence that players who have achieved the .300 mark late in the season are motivated to stay out of games in order to avoid dropping back to .299 or below. It's the result of that motivation that biases the sample, creating the false impression that batters demonstrate superstar talent in those situations.
P.S. Lots of discussion on this issue at "The Book" blog. Thanks to commenters there, especially Guy and MGL, for their assistance.