Sabermetric Research: Can we measure player improvement over the decades?

Conventional wisdom is that baseball players are getting better and better over the decades. How can we know if that’s true?

We can’t go by hitting stats, because the pitchers are improving just as much as the batters. We can’t go by pitching stats, either, because the batters are improving just as much as the pitchers. It could be that players have improved so much that if Babe Ruth came back today, he’d only hit like, say, Tino Martinez, or maybe Raul Mondesi. But can we prove that?

One way to measure improvement is to look at evidence that doesn’t involve other players. If the Babe regularly hit a 90-mph fastball 400 feet, but Martinez and Mondesi don’t, that’s good evidence that Ruth is better. If the best pitchers in 1960 could throw fastballs at 95 miles per hour, and that’s still their speed today, that might be evidence that there hasn’t been much improvement. But we don’t have enough data on pitch speeds or bat speeds to do valid comparisons. For another, it’s hard to compare intangibles, like the average deceptive movement on a slider today versus 15 years ago.

But it would be nice if there were a way to measure improvement just from the statistics, from the play-by-play data. One such attempt to measure batter improvement was a study by Dick Cramer in SABR’s 1980 “Baseball Research Journal”. For every year in baseball history, Cramer compared every player’s performance this year to the same players’ performance last year. Cramer found, for instance, that the same group of batters hit .004 worse in 1957 than they did in 1956. Therefore, the estimate is that baseball players in general had improved by four points in 1957.

Of course, there’s a fair bit of randomness in this measure, and so when Cramer graphs it over the years, there are lots of little ups and downs. In general, the line rises a bit every year, but, like a stock price graph, there are lots of little adjustments. In fact, the method works quite well in identifying those seasons when there were reasons for the level of play getting better or worse. There’s a sharp drop in the mid-1940s, when many players went off to war, and an increase in 1946. There are drops in 1961 and 1977 for expansion.

But while the method may look like it works, it doesn’t. As Bill James explained, the study doesn’t take into account that players’ performances change naturally with age. A group of 26-year-olds will play better, on average, than they did at 25 – not because they league is worse, but because those players themselves are better. And the reverse is true for 36-year-olds; they’ll play worse than when they were 35, independent of the rest of the league.

Now, since the method covers all players, not just one age, you’d think it would all cancel out. But it doesn’t, necessarily. There’s no guarantee that if you take all the players in the league one year, and compare their performance the next year, that the age-related performance change will be zero. And it has to be zero for this to work. Even if it’s only close to zero, it ruins the study. So when you compare 1974 to 1973, and find a difference of 2 points in batting average, that’s the sum of two things:

-- The improvement in the league, plus

-- The age-related improvement/decline in the players studied.

And there’s no easy way to separate the two. However, if you assume that the age-related term is constant, you can see the year-to-year relative improvement. That’s why the method correctly shows a decline during the war and an improvement after. You can see which years had the biggest changes, but you can’t figure out just how much that change was, or even if it was an improvement or decline. On page 440 of his first Bill James Historical Baseball Abstract (the 1986 hardcover), Bill James had a long discussion of the method and why it doesn’t work. He wrote:

"Suppose you were surveying a long strip of land with a surveying instrument which was mismarked, somehow, so that when you were actually looking downward at an 8% grade, you thought you were looking level. As you moved across the strip of land, what would you find? Right. You would wind up reporting that the land was rising steadily, at a very even pace over a large area. You would report real hills and real valleys, but you would report these things as occurring along an illusory slope. "And what does Cramer report? Average batting skill improving constantly at a remarkably even rate, over a long period of time."

Cramer’s graph is a jagged, irregular line, at about 45 degrees. What James is saying is that the jaggedness is correct, but the 45 degrees is likely wrong. Imagine rotating the jagged line so that it’s at 60 degrees, which means a stronger batting improvement than Cramer thought. That’s consistent with the data. Rotate it down so it’s now at 20 degrees, which means slight improvement. That’s also consistent. Rotate it down to zero degrees, so it’s roughly horizontal, meaning no improvement. That’s still consistent. And even rotate it so it slopes down, meaning a gradual decline in batting ability over time. That, also, is perfectly consistent with the statistical record.

Here’s a hypothetical situation showing how this might work. Suppose that every player enters the league with 15 "points" of talent. This declines by one point every year of the player’s career – he retires 15 years later, after he’s dropped to zero. If you were to draw this as a graph, it would be a straight line going down at 45 degrees – the older the player, the worse the talent. Now, that curve is the same every year, so it’s obvious that the talent level of the league is absolutely steady from year to year – right? But if you look at any player who is in the league in consecutive seasons, his talent drops by one point. By Cramer’s logic, it looks like the league is improving by one point a year. But it isn’t – it’s standing still!

If you don’t like the declining talent curve, here’s one that’s more realistic. Suppose that again, players come into the league at 15 talent points. But this time, they rise slowly to 25 talent points at age 27, then decline down to zero, at which point they retire after a 16-year career. The exact same analysis holds. If you were to actually put some numbers in, and look at the players who play consecutive seasons, you would find that, on average, they drop one point a year. (Mathematically, if every player declines from 15 to zero in fifteen years, the average must be minus one per year.) It still looks like the league is improving, when in fact it’s standing still.

It’s easy to come up with similar examples where the average consecutive-year player declines, but the league is declining; or where the average player improves even while the league is getting better.

The bottom line is that even if players appear to be declining, it does not mean the league is improving -- or vice versa.

Why do I mention this now, 26 years after Cramer’s study? Because this year, in the excellent Baseball Prospectus book “Baseball Between the Numbers,” Nate Silver repeats the same study in Chapter 1. Silver figures that if Babe Ruth played in the modern era, he’d look like – yes, Tino Martinez or Raul Mondesi. For personal, non-empirical, hero-worshipping reasons, I’m glad that’s not necessarily true.

7 Comments:

At Friday, August 04, 2006 10:01:00 AM, Tangotiger said...: A few years ago, I also did such a study, and also came up with similar conclusion.

The problem is not the age issue, which can be corrected by applying an aging curve, or limiting your sample to some age group, like 23-29.

The problem is regression toward the mean. Because the guys you choose in your year-to-year matches more likely have performed above average in the first year, that performance is higher than his actual true talent (more good luck than bad luck). So, just by virtue of the regression, we expect his performance to drop. Not because the league got better, but simply because we factor out the luck.

I repeated my study, this time controlling for regression. What I found is that over the last 30 or 40 years, there's been very little change in performance. And that Babe Ruth would not be BABE RUTH, but more like a great hitter.

I'll publish it some day.
At Friday, August 04, 2006 10:54:00 AM, Phil Birnbaum said...: Hi, tangotiger,

Good point, never thought of that.

I guess even if you choose based on symmetrical criteria for the two years, you'll have that problem, because whether you get playing time the second year depends on performance the first year, and not vice-versa.
At Friday, August 04, 2006 12:05:00 PM, Tangotiger said...: Well, you'll always have a problem. For example, the more PA you get, the more likely you performed well. So, you've got alot of selection bias to account for.

You might like this study.

It's not that big of a problem, year-to-year, but once you start chaining, those 0.5% error rates you have, over 50-80 years, become a 30% to 50% error rate.
At Friday, August 04, 2006 12:34:00 PM, Phil Birnbaum said...: Tango,

Interesting study. I'll have to give some thought to what to make of it.

Either players get better when they're given playing time (which is consistent with your "The Book" study that pinch hitters don't hit well), or managers are good at knowing which bench warmers have improved in the off-season.

Phil
At Friday, August 04, 2006 1:49:00 PM, Tangotiger said...: Or, it's pure selective sampling. He is hot for the first 200 PA, and so the manager perceives this as a change in performance level. He decides to give him 400 PA.

Or, injuries.

The key point is that PA itself is not something independent. The number of trials given is dependent on the prior performance!

You flip a coin, and you get heads 40 times out of 100. You decide to put it away for the year. You pick it up again next year, and you get 70 heads on 100 tries! Wow! Let me flip this sucker 100 more times. It comes up heads 50 times. End-of-season totals? 120 heads on 200 tries, for a rate of .600. (Last year it was .400 on 100 trials).

Nothing changed about the coin. But, the perceived change caused the manager to give the player more chances to play.
At Friday, August 04, 2006 1:57:00 PM, Phil Birnbaum said...: That makes sense too ... never thought of it.
At Monday, August 07, 2006 11:17:00 AM, Anonymous said...: There is a long discussion of this post at Baseball Think Factory

<< Home

Sabermetric Research

Thursday, August 03, 2006

Can we measure player improvement over the decades?

7 Comments:

About Me

Previous Posts