Sabermetric Research: Why you can't calculate aging trajectories with a standard regression

I found myself in a little Twitter discussion last week about using regression to analyze player aging. I argued that regression won't give you accurate results, and that the less elegant "delta method" is the better way to go.

Although I did a small example to try to make my point, Tango suggested I do a bigger simulation and a blog post. That's this.

(Some details if you want:

For the kind of regression we're talking about, each season of a career is an input row. Suppose Damaso Griffin created 2 WAR at age 23, 2.5 WAR at age 24, and 3 WAR at age 25. And Alfredo Garcia created 1, 1.5, and 1.5 WAR at age 24, 25, and 26. The file would look like:

2 23 Damaso Griffin
2.5 24 Damaso Griffin
3 25 Damaso Griffin
1 24 Alfredo Garcia
1.5 25 Alfredo Garcia
1.5 26 Alfredo Garcia

And so on, for all the players and ages you're analyzing. (The names are there so you can have dummy variables for individual player skills.)

You take that file and run a regression, and you hope to get a curve that's "representative" or an "average" or a "consolidation" of how those players truly aged.)

------

I simulated 200 player careers. I decided to use a quadratic (parabola), symmetric around peak age. I would have used just a linear regression, but I was worried that it might seem like the conclusions were the result of the model being too simple.

Mathematically, there are three parameters that define a parabola. For this application, they represent (a) peak age, (b) peak production (WAR), and (c) how steep or gentle the curve is.*

(*The equation is:

y = (x - peak age)^2 / -steepness + peak production.

"Steepness" is related to how fast the player ages: higher steepness is higher decay. Assuming a player has a job only when his WAR is positive, his career length can be computed as twice the square root of (peak WAR * steepness). So, if steepness is 2 and peak WAR is 4, that's a 5.7 year career. If steepness is 6 and peak WAR is 7, that's a 13-year career.

You can also represent a parabola as y = ax^2+bx+c, but it's harder to get your head around what the coefficients mean. They're both the same thing ... you can use basic algebra to convert one into the other.)

For each player, I randomly gave him parameters from these distributions: (a) peak age normally distributed with mean 27 and SD 2; (b) peak WAR with mean 4 and SD 2; and (c) steepness (mean 2, SD 5; but if the result was less than 1.5, I threw it out and picked a new one).

I arbitrarily decided to throw out any careers of length three years or fewer, which reduced the sample from 200 players to 187. Also, I assumed nobody plays before age 18, no matter how good he is. I don't think either of those decisions made a difference.

Here's the plot of all 187 aging curves on one graph:

The idea, now, is to consolidate the 187 curves into one representative curve. Intuitively, what are we expecting here? Probably, something like, the curve that belongs to the average player in the list.

The average random career turned out to be age 26.9, peak WAR 4.19, and steepness 5.36. Here's a curve that matches those parameters:

That seems like what we expect, when we ask a regression to find the best-fit curve. We want a "typical" aging trajectory. Eyeballing the graph, it does look pretty reasonable, although to my eye, it's just a bit small. Maybe half a year bigger left and right, and a bit higher? But close. Up to you ... feel free to draw on your monitor what you think it should look like.

But when I ran the regression ... well, what came out wasn't close to my guess, and probably not close to your guess either:

It's much, much gentler than it should be. Even if your gut told you something different than the black curve, there's no way your gut was thinking this. The regression came up with a 19-year career. A career that long happened only once in the entire 187-player sample. we expected "representative," but the regression gave us 99.5th percentile.

What happened?

It's the same old "selective sampling"/"survivorship bias" problem.

The simulation decided that when a player's curve scores below zero, those seasons aren't included. It makes sense to code the simulation that way, to match real life. If Jerry Remy had played five years longer than he did, what would his WAR be at age 36? We have no idea.

But, with this simulation, we have a God's-eye view of how negative every player would go. So, let's include that in the plot, down to -20:

See what's happening? The black curve is based on *all* the green data, both above and below zero, and it lands in the middle. The red curve is based only on the green data above zero, so it ignores all the green negatives at the extremes.

If you like, think of the green lines as magnets, pulling the lines towards them. The green magnets bottom-left and bottom-right pull the black curve down and make it steeper. But only the green magnets above zero affect the red line, so it's much less steep.

In fact, if you scroll back up to the other graph, the one that's above zero only, you'll see that at almost every vertical age, the red line bisects the green forest -- there's about as much green magnetism above the red line it there is below it.

In other words: survivorship bias is causing the difference.

------

What's really going on is the regression is just falling for the same classic fallacy we've been warning against for the past 30 years! It's comparing players active (above zero) at age 27 to players active (above zero) at age 35. And it doesn't find much difference. But that's because the two sets of players aren't the same.

One more thing to make the point clearer.

Let's suppose you find every player active last year at age 27, and average their performance (per 500PA, or whatever). And then you find every player active last year at age 35, and average their performance.

And you find there's not much difference. And you conclude, hey, players age gracefully! There's hardly any dropoff from age 27 to age 35!

Well, that's the fallacy saberists have been warning against for 30 years, right? The canonical (correct) explanation goes something like this:

"The problem with that logic is that it doesn't actually measure aging, because those two sets of players aren't the same. The players who are able to still be active at 35 are the superstars. The players who were able to be active at 27 are ... almost all of them. All this shows is that superstars at 35 are almost as good as the league average at 27. It doesn't actually tell us how players age."

Well, that logic is *exactly* what the regression is doing. It's calculating the average performance at every age, and drawing a parabola to join them.

Here's one last graph. I've included the "average at each age" line (blue) calculated from my random data. It's almost a perfect match to the (red) regression line.

------

Bottom line: all the aging regression does is commit the same classic fallacy we repeatedly warn about. It just winds up hiding it -- by complicating, formalizing, and blackboxing what's really going on.

Labels: aging, regression

11 Comments:

At Monday, November 18, 2019 6:24:00 PM, Anonymous said...: I'm building a probabilistic age estimator based on understanding references to Damaso Griffin and Alfredo Garcia in 2019. It, uh, thinks I'm pretty old. ;)

Friend of the site Tyler D.
At Monday, November 18, 2019 6:51:00 PM, Phil Birnbaum said...: You're not old! You're a student of baseball history. :)
At Monday, November 18, 2019 10:33:00 PM, Mike said...: I've thought about this a bit myself over the years, and I think it's likely that players who put up (say) 2 WAR at age 20 have differently *shaped* careers than players who put up 5 WAR, particularly on the back end.
I haven't made the effort to get the data to try to examine that, but I really should.
At Tuesday, November 19, 2019 2:15:00 AM, Jonas said...: Correct me if I'm wrong, but if teams are picking players optimally, without any age bias, then given sufficiently large sample, the average performance for each age should be the same, i.e., league average. Right?
At Tuesday, November 19, 2019 8:13:00 AM, Rodney Fort said...: Hi Phil.

The question seems to be the impact of aging on performance. And it seems you find that the relationship is “contaminated” by experience varying across ages. I’ve used the following to untangle age and experience when thinking about player value (and salary, too).

Just put in age and its square, and experience and its square. Then, the regression gives you the non-linear impact of age on performance, holding experience constant. It’s also fair to include other impacts by player (injury for example).

Or, in work with Roger Noll, we regressed age and its square on experience and used the residual of experience and its square. A bit less intuitive interpretation since now it is “more experienced than age would suggest” or “less experienced than age would suggest”.

Or it’s entirely possible I’ve missed the point (you know that I sometimes do). Cheers. Rod Fort
At Friday, November 22, 2019 6:07:00 PM, Alex said...: I'm not familiar with baseball research, but has anyone done a random effects regression to calculate aging curves? I imagine someone must have. From reading through the link you have on the delta method it seems like the key difference is that the delta method compares each player to himself, whereas the standard regression just throws everyone together. A random effects regression would be more similar in that it compares individual players to themselves.
At Saturday, November 23, 2019 1:38:00 PM, Phil Birnbaum said...: Jonas,

Hmmm ... I don't think that's necessarily the case. Suppose ALL 27 year olds were better than ALL 26 year olds. If there aren't enough 27s to fill a whole league, teams will also have 26s, but the average performance of 27s will be higher than the average performance of 26s.

Sorry for the late response!
At Saturday, November 23, 2019 1:41:00 PM, Phil Birnbaum said...: Hi, Rod,

Sorry so delayed getting back to you!

I think experience and age are different issues ... it's a good point that experience can "contaminate" age if what you're really looking for is just age-related performance changes without the effects of experience or coaching.

I was getting at something else, that when you mix different careers, the regression is unable to combine trajectories and it just smooths the average at each age, so what you get is the effects of the population change rather than the age changes.

Phil
At Saturday, November 23, 2019 1:45:00 PM, Phil Birnbaum said...: Hi, Alex,

Sorry for the delay getting back to this.

Not sure what a random effects regression is.

The biggest difference is the delta compares 25s only to 26s, rather than comparing 25s to 35s (and all other ages). By doing it that way, it lessens the effects of attrition of the lesser players.

What you want is to compare a player to himself. If you compare all 25s to all 35s, the pool has shrunk from (say) 200 players to 40 (better) players, so you're comparing the better 40 players to themselves, but also comparing the worse 160 players to the better 40 players.

By doing only 25 to 26, the pools might be 200 players to 190 players, so you're mostly comparing players to themselves.

Then you compare 26 to 27, 27 to 28, and so on, and combine all the results.

Phil
At Sunday, November 24, 2019 2:11:00 PM, Jonathan Judge said...: Correct. A random effects regression in theory can help solve a few different problems presented by an aging curve: (1) players are always tracked relative to themselves; (2) players of any career length still borrow strength from the other players in the sample, including those with longer careers; (3) because the random effects automatically shrink the values toward their probable mean for all players for all years, you are working with more accurate underlying estimates which helps everything. You may still need an additional control for survivorship but this an approach worth looking at. You could model the aging effect itself as a spline or an additional random effect but the values I suspect would be fairly similar.
At Sunday, December 08, 2019 8:56:00 PM, Alex said...: Phil - I tried out some random effects regression stuff and it worked relatively well. I sent you an email about it, not sure if it's still an address you use.

<< Home

Sabermetric Research

Monday, November 18, 2019

Why you can't calculate aging trajectories with a standard regression

11 Comments:

About Me

Previous Posts