J.C. Bradbury on aging in baseball
J.C. Bradbury is on vacation from blogging, but is still posting occasionally. This week, he wrote that his article on baseball aging patterns has been published. Here's the link to the published version (gated), and here's a link to a freely-available version from last August.
Here's what JC did. He took every player with at least 5000 PA (4000 batters faced for pitchers) who debuted in 1921 or later. Then, for those players, he considered every season in which they had at least 300 PA (or 200 batters faced). That left a total of 4,627 player-seasons for hitters, and 4,145 for pitchers.
-- the player's career average
-- the player's age that year, and age-squared (that is, quadratic on age)
-- a dummy variable for the league-season
-- a "player-specific error term".
Numbers are park adjusted.
After running the regression, Bradbury calculates the implied "peak age" for each metric:
29.41 linear weights
28.26 DPT (doubles plus triples rate)
23.56 Strikeouts (for pitchers)
32.47 Walks (allowed)
27.39 Home Runs (allowed)
For most of the hitting categories, the peak age is above the conventional wisdom of 27 – most are around 29. After quoting various studies that have found younger peaks, Bradbury writes,
"The results indicate that both hitters and pitchers peak around 29. This is older than some estimates of peak performance ..."
Bradbury also notes that the results are consistent with the idea that the more raw athleticism is required, the earlier the skill peaks; strikeouts, for instance, which require raw arm speed peak the earliest, and walks, which are largely mental, peak the latest:
"Consistent with studies of ageing in specific athletic skills, baseball players peak earlier (later) in abilities that require more (less) physical stress."
I agree with Bradbury on this last point, but I don't think his actual age estimates can be relied upon. Specifically, I think peak ages are really closer to 27 than to 29.
One reason for this is that the model specifically requires the curve to be a quadratic – that is, symmetrical before and after the peak. But are careers really symmetrical? Suppose they are not – suppose the average player rises sharply when he's young, then falls gradually until old age. The curve, then, would be skewed, with a longer tail to the right.
Now, suppose you try to fit a symmetrical curve to a skewed curve, as closely as you can. If you pull out a sheet of paper and try it, you'll see that the peak of the symmetrical curve will wind up to the right of the actual curve. The approximation peaks later than the actual, which is exactly what JC found.
I have no proof that the actual aging curve is asymmetrical in this exact way, but players career's are not as regular as the orbits of asteroids. There's no particular reason that you'd expect players to fall at exactly the same rate as they rise, especially when you factor in playing time and injuries. The quadratic is a reasonable approximation, but that's all it is.
Another reason is selective sampling. By choosing only players with long careers, Bradbury left out any player who flames out early. And so, his sample is overpopulated with players who aged particularly gracefully. That would tend to overestimate the age at which players peak.
(He limited his data to players between 24 and 35, which he says is done to minimize selection bias, but I'm not sure how that would help.)
There is perhaps some evidence that there's a real effect. JC ran the same regression again, but this time including only players with Hall of Fame careers. For hitters, the peak age dropped by almost an entire year, from 29.41 to 28.51. That might makes sense; HOFers are the best players ever, and were more likely to have had long careers even if they aged less gracefully. That is, they'd still be good enough to stay in the league after a substantial drop, and would be much more likely to hit the 5000 PA cutoff even if they peaked early and dropped sharply.
(In fairness, you could argue that HOFers were less likely to be injured, and therefore more likely to peak later. But I think the "good enough to stay in the league" effect is larger than that, although I have no proof. Also, the HOF pitchers' peak age dropped only 0.08 years from the non-HOFers, so the effect I cite seems to hold only for hitters.)
Finally, there's selective sampling on individual seasons. A player who falls sharply and suddenly won't get enough playing time to qualify for Bradbury's study that year. And so, a plot of his career will be gentler at the right side. He'd be nearly vertical between his next-to-last season and his last season. But, since Bradbury doesn't consider his last season, the study won't see that vertical drop, and the quadratic will be gentler, with its peak to the right of where it would be otherwise.
Try this yourself: draw an aging curve that peaks, drops a bit, then falls off vertically. Draw the best fit symmetrical curve on it.
Now, draw the same again curve, but, instead of the vertical line, have it just end before the vertical line starts. Draw the best-fit symmetrical curve on this second one. You'll see it peaks later than when the vertical line was there.
(Again, in fairness: Bradbury ran a version of his study in which there was no season minimum for plate appearances or batters faced – just the career minimums -- and the results were similar. I've explained why I think, in theory, the minimums should skew the results, but I have to admit that, in real life, they didn't. There are perhaps some other reasons it didn't happen – perhaps a lot of the effect comes from the "vertical" players released in spring training, so they didn't make the study at all – but still, the results do seem to contradict this third theory of mine.)
So you've got three ways in which the study may have made assumptions or simplifications that forced the peak age to be higher than it should be:
-- assuming symmetry;
-- selective sampling of long careers;
-- selective sampling of seasons.
In that light, my conclusions would be that Bradbury's methodology might yield a reasonable approximation, but not much more than that. I think the study can correctly identify the basic trend, and is probably correct within a couple of years, but I wouldn’t bet on it being any closer than that.