The Bradbury aging study, re-explained
A few days ago, J.C. Bradbury responded to my recent post on his age study.
Bradbury had authored a study claiming that hitters peak at age 29.4, contradicting other studies that showed a peak around 27. His study was based on the records of all batters playing regularly between age 24 and 35. I argued that, by choosing only players with long careers progressing to an relatively advanced age, his results were biased towards players who peak late -- because, after all, someone with the same career trajectory, just starting a few years earlier, would be out of baseball by 35 and therefore not make the study.
In response, Bradbury denies that selective sampling is a problem. He writes,
"Phil Birnbaum has a new theory as to why I’m wrong (I suspect it won’t be his last)."
Actually, it's not a new theory. I mentioned it at exactly the same time and in the same post as another theory, last April. Bradbury actually linked to that post a few days ago.
Also, the reason "it won't be my last" is that, like many other sabermetricians, I am curious to find out why there's a difference between Bradbury's findings, which find a peak age of 29+, and many previous studies, which find a peak age of 27. They can't both be correct, and they way to resolve the contradiction is to suggest reasons and investigate whether they might be true.
But, Bradbury also said that I showed "a serious lack of understanding of the technique I employed." He's partially right -- I did misunderstand what he did. After rereading the paper and playing around with the numbers a bit, I think I have a better handle on it now. This post, I'm going to try explaining it (and why I still believe it's biased). Please let me know if I've got anything wrong.
Previously, I had incorrectly assumed that Bradbury's study worked like other aging studies I've seen (such as Justin Wolfers', or Jim Albert's (.pdf)). In those other studies, the authors took a player's performance over time, smoothed it out into a quadratic, and figured out the peak for each player.
Then, after doing that for a whole bunch of players, those other studies would gather all the differently shaped curves, and analyze them to figure out what was going on. They implicitly assumed that every player has his own unique trajectory.
Bradbury's study doesn't do that. Instead, Bradbury uses least-squares to estimate the best single trajectory for *every batter in the study*. That's 450 players, all with exactly the same curve, based on the average.
According to this model, the only difference between the players is that some players are more productive than others. Otherwise, every batter has exactly the same shaped curve. The only difference the model allows, between the curves of different players, is vertical movement, up for a better player, down for a worse one.
For instance: take Carlos Baerga, whose career peaks early with a short tail on the left and a long tail on the right, peak in his early 20s. Then take Barry Bonds, whose career is the opposite: his career peaks late, with a long tail on the left and a short tail on the right.
What Bradbury's model does is take both curves, put them in a blender, and come out with two curves that look exactly the same, peaking in the late 20s. The only difference is that Bonds' is higher, because his level of performance is better.
The model fits 450 identical curves to the actual trajectories of the 450 players. They can't be particularly good fits, because they're all the same. If you look at those 450 fitted curves, they're like a vertical stack of 450 identical boomerangs: some great hitter at the top, some really crappy hitter at the bottom, and the 448 other players in between.
I can pull a boomerang off the top, and show you, this is what Barry Bonds looks like. The best fit is that he started low, climbed until he reached 29 or so, then started a symmetrical decline (the model assumes symmetry). You'll ask, "what does Carlos Baerga look like?" I'll say, "it's exactly the same as Barry Bonds, but lower." I'll take my Barry Bonds boomerang, and lower my arm a couple of inches. Or, I can just pull the Baerga boomerang out of the middle of the stack.
(One more way of putting it. See this chart? This is how Justin Wolfers represents the careers of a bunch of great pitchers. He smoothed the actual trajectories, but modeled that every pitcher gets his own peak age, and his own steepness of curve. But for this study, they would all be the same shape, just one stacked above the other.)
Now, it's seems to me that the model is way oversimplified. It's obviously false that all players have the same trajectory and the same peak age. People are different. They mature at different rates, both in raw physical properties, and in how fast they learn and adapt. Indeed, this is something the study acknowledges:
"Doubles plus triples per at-bat peaks 4.5 years later for Hall-of-Famers, which indicates that elite hitters continue to improve and maintain some speed and dexterity while other players are in decline."
So, implicitly, even Bradbury admits that the model's assumptions are wrong: some players age differently than others.
However, even if the model is wrong in its assumptions and in how it predicts individual players, it's possible to argue that the composite player it spits out is still reasonable.
For instance, suppose you have three people. One is measured to be four feet tall, one five feet, and one six feet. There are two ways you can get the average. You can just average the three numbers, and get five feet.
Or, you can create a model, an unrealistic model, that says that all three are really the same height, and any discrepancies are due to uncorrelated errors by the person with the measuring tape. If you run a regression to minimize the sum of squares of those errors, you get an estimate that all three people are actually ... five feet.
The model is false. The three people aren't really of equal height, and nobody is so useless with a tape measure that their observations would be off by that much. But the regression nonetheless gives the correct number: five feet. And so you'll be OK if you use that number as the average, so long as you don't actually assume that the model matches reality, that the six-foot guy is really the same height as the four-foot guy. Because there's no evidence that they are -- it was just a model that you chose.
I think that's what's happening here. It's obvious that the model doesn't match reality, but it has the side effect of creating a composite average baseball player, whose properties can be observed. As long as you stick to those average properties, and don't try to assume anything about individual players, you should be OK. And that's what Bradbury does, for the most part, with one exception.
A consequence of the curves having the same shape is that declines are denominated in absolute numbers, rather than percentages of a player's level. If the model says you lose 5 home runs between age X and age Y, then it assumes *everyone* loses 5 home runs, everyone from Barry Bonds to Juan Pierre -- even if Juan Pierre didn't have 5 home runs a year to lose!
If Bonds is a 30 home run guy at age X, he's predicted to drop to 25 -- that's a 17% decline. If Juan Pierre is a 5 home run guy at age X, he's predicted to drop to 0 -- a 100% decline.
In real life, that's probably not the way it works -- players probably drop closer to the same percentage than by the same amount. Table VII of the paper says that a typical hitter would lose about half his homers (on a per PA basis) between 30 and 40. If Bradbury used a season rate of 16 homers as "typical," that's a 8 HR decline. But what about players who hit only 4 homers a year, on average? The model predicts them dropping to minus 4 home runs!
Now, that's a bit of an unfair criticism. The text of the study doesn't explicitly argue that a Bonds will drop by the same number of home runs as a Baerga, even though the study deliberately chose a model that says exactly that. Remember, the model is unrealistic, so as long as you stick to the average, you're OK. Bonds and Pierre are definitely not the average.
But, then, why does Bradbury's Table VII deal in percentages? The model deals in absolutes. Bradbury obtained the percentages by applying the absolutes to a "typical" player, presumably one close to average. So why not put "-8 HR" in that cell, rather than "-48.95%"?
By showing percentages, there's an unstated implication, that since the model shows an average player with 16 HR drops to 8, you can extrapolate to say that a player with 40 HR will drop to 20. But that would have to be backed up by evidence or argument. And the paper provides neither.
-- the model assumes all players have the same peak age, and the same declines from their peak (which is another way of saying that it assumes that all players have the same shape of trajectory.)
-- it does assume some players (Barry Bonds) have a higher absolute peak than others (Jose Oquendo), but still have the same shape of career.
-- it assumes that all players rise and decline annually by the same absolute amount. In the agespan it takes for a 10-triple player to decline to 5 triples, a 6-triple player will decline to 1 triple, and Willie Aikens will decline to -5 triples.
What can you get out of a model like that, with its unrealistic assumptions? I think that you can reasonably look at the peak and shape as applied to some kind of hypothetical composite of the players used in the study. But I don't think you can go farther than that, and make any assumptions about other types of players.
So: when Bradbury's study comes up with the result that his sample of players peaked at 29.5 years (for Linear Weights), I think that's probably about right -- for his sample of players. When he says that the average home run hitter loses 8 home runs between 30 and 40, I think that's probably about right too -- for his sample of players.
My main argument is not that the model is unrealistic, and it's not that there's something wrong with the regression used to analyze the model. It's that the sample of players that went into the model is biased, and that's what's causing the peak to be too high.
Bradbury's model works for his sample -- but not for all baseball players, just the ones he chose. Those were the ones who, in retrospect, had long careers.
To have a long career, you have to keep up your performance for many years. To keep up your performance for many years, you need to have a slower decline than average. If you have a slower decline than average, a higher proportion of your value comes later in your career. If a higher proportion of your value comes later in your career, that means that you'll have an older-than-average peak.
So choosing players with long careers results in a peak age higher than if you looked at all players.
Bradbury disagrees. He thinks that Hall of Fame players may have a significantly different peak than non Hall-of-Fame players, but doesn't think that players with long careers might have a different peak than players with short careers.
That really doesn't make sense to me. But Bradbury has evidence. In his response to my post, he reran his study, but for all players with a minimum of 1000 PA, instead of his previous minimum 5000 PA. That is, he added players with short careers.
He found no difference in the peak age.
That's a pretty persuasive argument. I argued A, Bradbury argued B, and the evidence appears to be consistent with B. No matter how good my argument sounds, if the evidence doesn't support it, I better either stop arguing A, or explain why the evidence isn't consistent with B.
Still, the logic didn't seem right to me. So I spent a couple of days trying to replicate Bradbury's study. I wasn't able to duplicate his results perfectly, but many of them are close. And I'm not sure, but I think I have an idea about what's going on, why the evidence is consistent with A. That is, why Bradbury's 1000+ study comes up with a peak of 29 years, while other studies have come up with 27.
I'll get to that in the next post.