Replacement level talent vs. observations: a study
both JC Bradbury and King Kaufman expressed some skepticism about the concept of "replacement value". In theory, replacement value is the level of talent that can be obtained quickly at league minimum salary, like your best minor-leaguer, or the free agent who almost got an offer but didn't.
Generally, conventional sabermetric wisdom is that a replacement-level player is one who performs at a level 20 runs (or two wins) below average, if pro-rated to a full season. (Two wins below average is zero "wins above replacement," or "WAR".) By this standard, no team should play anyone expected to perform below this level.
In their arguments, Bradbury points to the fact that there were, in fact, many players who had performed at less than this level of performance. In a recent post on replacement value, King Kaufman checked, and found that
"In the major leagues in 2010, 24.5 percent of all innings were thrown by pitchers who ended the season with a negative WAR."
(UPDATE: In the original version of this post, I had originally incorrectly painted Kaufman as a replacement-value skeptic, which he is not. See his comment below.)
The explanation, of course, is that the fact that they *performed* below replacement doesn't mean teams *expected* them to perform below replacement. Teams might have overestimated their abilities, or, more likely, they just had bad years due to random chance. If you're bet on heads ten times, some coins are going to land heads only 4 times out of 10. It doesn't mean that there was anything wrong with your prior expectations of the fairness of the coin.
Anyway, I thought I'd run a little experiment.
I started with every batter in Jeff Sackmann's "Marcel" database from 2000-2009. ("Marcel" is a prediction method created by Tom Tango, which forecasts a player's performance this year based on his statistics the previous three years.) Assuming the Marcel predictions are reasonable, I counted how many player-seasons were expected to be below replacement level. If the theory is correct, it should be "zero."
It wasn't zero, but it was close. Over 10 years, only 152 players total were expected below replacement value that year. That's 15 players per year, spread among 30 teams. Half a player per team. That's not bad.
And, it's possible I had replacement value wrong. I didn't include fielding, just basic linear weights. And I used arbitrary position adjustments -- catchers and middle infielders had to be -40 runs per 600 PA to be replacement level, 1B and DH had to be 0, and everyone else had to be -20. It's very possible that, of the 152 players, many of them actually *weren't* below replacement level because of their defensive skills.
Also, there were playing-time limits. Jeff's database excludes players with low expected playing time (the smallest he forecasts is for 185 PA). And I left out all players who had fewer than 50 actual plate appearances that season. I figured that if a guy projected below replacement, but the team only gave him 10 AB, that's close enough to zero that we won't count it.
So, half a player per team per season seems reasonably consistent with the concept of replacement level. It's not like teams are signing these guys left and right.
If there were 15 players per season *projected* to be below replacement level, how many of them actually *performed* below replacement level?
The answer: 1,025 total, or 102 per season. There were about seven times as many players *observed* to be below replacement level than *predicted* to be below replacement level.
That makes sense -- if you flip 100 pennies, where replacement level is 0.5, there would be *infinitely* more coins observed below 0.5 than actually below 0.5. (Assuming all pennies are fair coins, that is.)
Now, the experiment. For every player in the database, I decided to randomly simulate their season based on their Marcels. Basically, I treated their Marcel prediction like an APBA card, and ran off a bunch of plate appearances. I simulated the exact number of PA that they *actually* had that season, regardless of how Marcel predicted their playing time.
Then, to simulate the uncertainty about the player's talent, I chose an adjustment from a normal curve, with standard deviation of +5 runs, and added that to the performance. (UPDATE: the +5 was for 500 Marcel PA. I adjusted accordingly for fewer Marcel PA by the square root of the ratio, so for 125 PA, I used +10.)
If Marcels are good, unbiased predictors, and teams were indeed getting rid of players who fell below replacement, then we should see 1,025 below-replacement performances in the simulation, not just in real life.
Well, we don't.
I ran the simulation 10 times, and the average was 561 players, not 1,025. We got a little over half.
Why? After I ran this, I realized the reason is selective sampling. Suppose you have two players who have talent of -10. Six weeks into the season, and just by luck, one of them is awful, at a rate of -30, and the other one is doing OK, at a rate of +10.
What happens? The -30 guy is released, and winds up the season at -30 over 100 AB. The second guy is allowed to play the whole year, and winds up at -5 over 500 AB.
One out of two wound up having performed below replacement in real life. But, in the simulation, it'll be less than that.
In the simulation, there's less than a 50% chance that the first guy will wind up at less than -20 over 100 AB. And there's a much, much smaller than 50% chance that the -10 guy will wind up below -20 over a full 500 AB.
So the simulation will underestimate the number of below-replacement performances, because, in real life, once a marginal player is below replacement, he's not often given a chance to rise back out of it. But in the simulation, he gets his full number of PA regardless.
in that light, I adjusted the simulation to add one new rule: if a player was expected to be +10 or less, and, a third of the way through his expected season, he's below replacement, he gets released. (If, after a third of the season, he's above replacement -- even a little bit -- he plays the entire rest of the season regardless of what happens afterwards.)
Now, the simulation goes from 561 below-replacement performances, to 800. Still less than 1,025, but better.
So, finally, I did one more thing: I changed the standard deviation of the uncertainty of the player's talent from 5 runs to 10.
Now, we get to 863. That's 84 percent of the way there.
After all that, I'm not sure quite how much the simulation tells us. To do a proper comparison, we need a better model of how teams decide how much playing time to give a hitter based on expectations and performance.
What we *do* find out, though is:
-- If you trust Marcel, then it does seem that few teams are willing to keep a player who has performed below replacement.
-- Regardless, many players *do* perform below replacement.
-- Simple probability shows that, at a bare minimum, over half the players who perform below replacement do so because of luck.
-- With other not-too-unreasonable assumptions, we can get that percentage up into the 80s.
My view about all this that it's less than fully conclusive. Still, it should be fairly persuasive. If you didn't accept the "replacement player" hypothesis before, this little study should have enough in it to get you to reconsider.
What do you think?
UPDATE, 12/14: King Kaufman posts in the comments that I misinterpreted his views on replacement value. My apologies to King, and I've revised the post accordingly.