Sabermetric Research: Does WAR undervalue injured superstars?

In an article last month, "Ain't Gonna Study WAR No More" (subscription required), Bill James points out a flaw in WAR (Wins Above Replacement), when used as a one-dimensional measure of player value.

Bill gives an example of two hypothetical players with equal WAR, but not equal value to their teams. One team has Player A, an everyday starter who created 2.0 WAR over 162 games. Another team has Player B, a star who normally produces at the rate of 4.0 WAR, but one ear created only 2.0 WAR because he was injured for half the season.

Which player's team will do better? It's B's team. He creates 2.0 WAR, but leaves half the season for someone from the bench to add more. And, since bench players create wins at a rate higher than 0.0 -- by definition, since 0.0 is the level of player that can be had from AAA for free -- you'd rather have the half-time player than the full-time player.

This seems right to me, that playing time matters when comparing players of equal WAR. I think we can tweak WAR to come up with something better. And, even if we don't, I think the inaccuracy that Bill identified is small enough that we can ignore it in most cases.

------

First: you have to keep in mind what "replacement" actually means in the context of WAR. It's the level of a player just barely not good enough to make a Major League Roster. It is NOT the level of performance you can get off the bench.

Yes, when your superstar is injured, you often do find his replacement on the bench. That causes confusion, because that kind of replacement isn't what we really mean when we talk about WAR.

You might think -- *shouldn't* it be what we mean? After all, part of the reason teams keep reasonable bench players is specifically in case one of the regulars gets injured. There is probably no team in baseball, when their 4.0 WAR player goes down the first day of the season, can't replace at least a portion of those wins from an available player. So if your centerfielder normally creates 4.0 WAR, but you have a guy on the bench who can create 1.0 WAR, isn't the regular really only worth 3.0 wins in a real-life sense?

Perhaps. But then you wind up with some weird paradoxes.

You lease a blue Honda Accord for a year. It has a "VAP" (Value Above taking Public Transit) of, say $10,000. But, just in case the Accord won't start one morning, you have a ten-year-old Sentra in the garage, which you like about half as much.

Does that mean the Accord is only worth $5,000? If it disappeared, you'd lose its $10,000 contribution, but you'd gain back $5,000 of that from the Sentra. If you *do* think it's only worth $5,000 ... what happens if your neighbor has an identical Accord, but no Sentra? Do you really want to decide that his car is twice as valuable as yours?

It's true that your Accord is worth $5,000 more than what you would replace it with, and your neighbor's is worth $10,000 more than what would he would replace it with. But that doesn't seem reasonable as a general way to value the cars. Do you really want to say that Willie McCovey has almost no value just because Hank Aaron is available on the bench?

------

There's also another accounting problem, one that commenter "Guy123" pointed out on Bill's site. I'll use cars again to illustrate it.

Your Accord breaks down halfway through the year, for a VAP of $5,000. Your mother has only an old Sentra, which she drives all year, for an identical VAP of $5,000.

Bill James' thought experiment says, your Accord, at $5,000, is actually worth more than your mother's Sentra, at $5,000 -- because your Accord leaves room for your own Sentra to add value later. In fact, you get $7,500 in VAP -- $5,000 from half a year of the Accord, and $5,000 from half a year of the Sentra.

Except that ... how do you credit the Accord for the value added by the Sentra? You earned a total of $7,500 in VAP for the year. Normal accounting says $5,000 for the Accord, and $2,500 for the Sentra. But if you want to give the Accord "extra credit," you have to take that credit away from the Sentra! Because, the two still have to add up to $7,500.

So what do you do?

------

I think what you do, first, is not base the calculation on the specific alternatives for a particular team. You want to base the calculation on the *average* alternative, for a generic team. That way, your Accord winds up worth the same as your neighbor's.

You can call that, "Wins Above Average Bench." If only 1 in 10 households has a backup Sentra, then the average alternative is one tenth of $5,000, or $500. So the Accord has a WAAB of $9,500.

All this needs to happen because of a specific property of the bench -- it has better-than-replacement resources sitting idle.

When Jesse Barfield has the flu, you can substitute Hosken Powell for "free" -- he would just be sitting on the bench anyway. (It's not like using the same starting pitcher two days in a row, which has a heavy cost in injury risk.)

That wouldn't be the case if teams didn't keep extra players on the bench, like if the roster size for batters were fixed at nine. Suppose that when Jesse Barfield has the flu, you have to call Hosken Powell up from AAA. In that case, you DO want Wins Above Replacement. It's the same Hosken Powell, but, now, Powell *is* replacement, because replacement is AAA by definition.

Still, you won't go too wrong if you just stick to WAR. In terms of just the raw numbers, "Wins Above Replacement" is very close to "Wins Above Average Bench," because the bottom of the roster, the players that don't get used much, is close to 0.0 WAR anyway.

For player-seasons between 1982 and 1991, inclusive, I calculated the average offensive expectation (based on a weighted average of surrounding seasons) for regulars vs. bench players. Here are the results, in Runs Created per 405 outs (roughly a full-time player-season), broken down by "benchiness" as measured by actual AB that year:

500+ AB: 75
401-500: 69
301-400: 65
201-300: 62
151-200: 60
101-150: 59
76-100: 45
51- 75: 33

A non-superstar everyday player, by this chart, would probably come in at around 70 runs. A rule of thumb is that everyday players are worth about 2.0 WAR. So, 0.0 WAR -- replacement level -- would about 50 runs.

The marginal bench AB, the ones that replace the injured guy, would probably come from the bottom four rows of the chart -- maybe around 55. That's 5 runs above replacement, or 0.5 wins.

So, the bench guys are 0.5 WAR. That means when the 4.0 guy plays half a season, and gets replaced by the 0.5 guy for the other half, the combination is worth 2.25 WAR, rather than 2.0 WAR. As Bill pointed out, the WAR accounting credits the injured star with only 2.0, and he still comes out looking only equally as good as the full-time guy.

But if we switch to WAAB ... now, the full-time guy is 1.5 WAAB (2.0 minus 0.5). The half-time star is 1.75 WAAB (4.0 minus 0.5, all divided by 2). That's what we expected: the star shows more value.

But: not by much. 0.25 wins is 2.5 runs, which is a small discrepancy compared to the randomness of performance in general. And even that discrepancy is random, since something as large as a quarter of a win only shows up when a superstar loses half the season to injury. The only time when it's large and not random is probably star platoon players -- but there aren't too many of those.

(The biggest benefit to accounting for the bench might be when evaluating pitchers, who, unlike hitters, vary quite a bit in how much they're physically capable of playing.)

I don't see it as that a big deal at all. I'd say, if you want, when you're comparing two batters, give the less-used player a bonus of 0.1 WAR for each 100 AB of playing time.

Of course, that estimate is very rough ... the 0.1 wins could easily be 0.05, or 0.2, or something. Still, it's still going to be fairly small -- small enough that I'd be it wouldn't change too many conclusions that you'd reach if you just stuck to WAR.

Labels: Bill James, WAR

5 Comments:

At Thursday, March 26, 2015 12:02:00 AM, Matt said...: Excellent analysis, Phil. A couple of points/questions.

Last year, there were only 42 hitters with a 4.0 WAR or better. With 750 players rostered at any given time (roughly 2/3 hitters), probably 600+ any given year, a WAR better than 4.0 puts a player (hitter) in roughly the top 8%. Maybe this isn't "elite" superstar, but it's up there and I'd say this player had a superstar year.

Your analysis compared a 4.0 player getting hurt and being replaced with a 0.5. What if you compared a 3.0 player (of which there are many more of) with a bench level 0.5 player. This would obviously make the end delta even less.

If a team has to replace a 6.0* WAR guy with a 0.5, the difference would be much greater than replacing a 2.5 guy with the same 0.5.

So if your conclusion is that we can safely ignore the inaccuracy Bill James is pointing out, I would agree for most all cases. Even with your analysis (using the 4.0 WAR guy), one could claim that a 0.1 difference is largely in the noise. Was Puig at 5.4 a significantly different player than Victor Martinez at 5.3? Not at all.

Just curious... what would you consider being a significant difference in two players' WARs? Maybe 0.4 or 0.5?

Again, nice analysis. It was a good read.

*There were only 13 hitters with a WAR of 6.0 or higher in 2014. These guys are true superstars-- less than 2% of the league.
At Thursday, March 26, 2015 12:07:00 AM, Phil Birnbaum said...: I'm not 100% sure what you mean. A difference of 0.5 WAR in *talent* is very significant, more than two million dollars. A difference of 0.5 WAR in *performance* is partly signal (talent) and partly noise (luck) ... you want to look at the two players' full careers, and anything else that's relevant, in order to figure out who's better.

Unless you mean that WAR itself isn't sufficiently accurate that you can measure even performance that accurately. That is, you don't know if you'd rather have Puig's 5.4 or Martinez's 5.3 added to your team's stats.

In general, if the stat distinguishes good from bad, you'd expect a large group of 5.4s to be 0.1 better than a large group of 5.3s. It has to be that way-- otherwise, you you couldn't say a group of 5.4s is better than a group of 1.4s. But if you're trying to figure out which of two players is actually more talented ... it's really close to 50/50, maybe 53/47 or something.

Does that help?
At Thursday, March 26, 2015 2:05:00 PM, Anonymous said...: One of my issues with WAR in general, and it relates well to what Phil posts here, is the varying levels of replacement player, depending on timing (lose a man for the year vs a month vs a game) makes it so we cannot logically use one replacement level that works for all cases.
I propose that players who are above average get full (100%) value for those runs created above that line, and that we set two other lines; temp replacement and worst-case replacement; for which the player may only generate something like 2/3rds and 1/3rd value for their production above those levels.
This goes toward solving the issue Phil brings up; it helps solve the "negative WAR" problem by lowering the bottom replacement level to where it will be rare to produce below zero; and in the end will value stars properly, as B James's article mentioned.
This system works well for me in simulation games (strat o matic, Scoresheet) where defining replacement is a key to success.

Tom H
At Thursday, March 26, 2015 2:28:00 PM, SrMeowMeow said...: This seems wrong to me.

Here's my simplification of your argument:

a) 2 WAR over half a season is worth more than 2 WAR over a full season, because the replacement player for the other half of the season will be worth at worst 0 WAR and at best will allow you to leverage some above-replacement-level bench player you weren't using.
b) Since you'd rather have the 2 WAR in half a season, it should be worth more.

I agree with b), not with a).

Your examples all rely on having "freely" available replacements that are better than 0 WAR. It's true that baseball teams have a bench of part-time players who can substitute for an injured player already available, but you're ignoring the cost to having an above-replacement-level bench in the first place.

Here are two options.

Player X: A 4 WAR per 162 games true talent player who I know will miss exactly 81 games per season.
Player Y: A 2 WAR per 162 games true talent player who will never miss a game.

Let's say 5M/WAR for simplicity.

Say you have one bench slot. If you sign a replacement level bench player in both cases, you get 2 WAR from each option.

If you sign a 1 WAR bench player, you get 2.5 WAR from Player X + bench, and 2 WAR from Player Y + bench (never used). But you have to pay for the bench player! Player X essentially gives you the opportunity to spend more and build a composite better player out of Player X and another above-replacement-level player, but this isn't free. You pay the same cost for the bench player's WAR that you do for Players X and Y.

The car example makes this more obvious. There is a real cost to just keeping a second working car sitting unused in your garage: the money you're not getting by selling it. That's why people don't just leave above-replacement-level assets lying around unused - it's wasteful. The assets have a fixed value, however you're using them.

Would you really argue that a car that breaks half the year, allowing people to leverage the second car they might have, provides more value? Why wouldn't you just sink your entire car budget into the car that provides you the most value? Similarly, why not just take Player X + his replacement's salary and buy one player with the same projected full-season WAR?
At Thursday, April 02, 2015 2:19:00 AM, custom scholarship essays writing said...: The sentences are super and excellent to read

<< Home

Sabermetric Research

Wednesday, March 25, 2015

Does WAR undervalue injured superstars?

5 Comments:

About Me

Previous Posts