Sabermetric Research: The "Hot Stove Economics" salary model, Part II

Last post I talked about the economics and logic behind the player valuation model JC Bradbury used in "Hot Stove Economics." This post will talk about the actual player valuations, which is probably more interesting.

The main question the valuation model tries to answer is: how much is an extra (marginal) win worth to a baseball team? Conventional sabermetric wisdom says it's somewhere between $4 million and $5 million (for example, here's a recent post by Tango saying it's about twice $2.36 million). I have a vague feeling I've read something that made me think it's a bit lower ... but, for purposes of this post, I'll go with $4.5 million.

JC's results are very different. His model isn't linear, so there isn't one fixed value, but on average a win seems to be worth $1.2 million.

The two models differ by a factor of three! Why the discrepancy?

I think part of the reason is how JC interpreted the results of his regression.

JC came up with a model that suggested that every team receives the same revenue for their Nth win, regardless of the size of the market. For instance, if the Atlanta Braves make $X million more when they win 83 games than when they win 82, then the New York Yankees will also value their 83rd win at $X million. That's really at odds with conventional wisdom -- not just sabermetric conventional wisdom, but announcer and columnist and fan conventional wisdom, too.

So why doesn't the model value a Yankee win at more than a Atlanta win? It's because when he tested the "different teams have different values" theory, JC found that its coefficient didn't come out statistically significant from zero in the regression. However, it came out very "baseballically significant." According to the rejected regression (page 183, column 4), a difference in population of 1 million people gives an effect of $72,000 per win.

The New York metro area has a population of about 19 million. Atlanta has a population of about 5 million. At $72,000 per million people, that means a marginal win for the Yankees should be worth about $1 million more than a marginal win for the Braves.

Remember, the average win is worth $1.2 million. So a Yankee win is worth almost twice the average! Nonetheless, JC rejects the idea because the coefficient didn't come out statistically significant.

The problem is, the regression had only five datapoints per team. That was enough to show evidence of an effect, but not at the 95% level. It was 1.41 SD from zero -- that's 92% significance one-tailed, or 84% significance two-tailed. But, because that was still short of the required 95%, JC chose to proceed as if there were ZERO effect of population size on marginal wins.

As I have written before, I think that's a big problem. The regression found the effect we were looking for, in the direction we were looking for, and even approximately the *size* we were looking for. If, after all that, we don't have statistical significance, the problem is that your sample isn't big enough.

Another way to look at it is that it has to do with the choice of null hypothesis. JC chose to look for evidence of whether there was any evidence that the Yankee effect was not zero. He didn't find enough evidence. But what if he chose to look for evidence that the effect was *proportional to population*? If he did that, he'd probably find that there wasn't enough evidence to reject that hypothesis, either.

------

But, even if we accepted the idea that different teams have different revenue curves, we still have a problem. After all, the regression would still show an average of about 1.2 million per win. It's just that the Yankees would be at about $2 million, while the small-market teams would be below $1 million. So the discrepancy between $4 million and $1.2 million is still there.

I think what's happening is that there's a variation of "Simpson's Paradox" here, where the shape of the overall curve is very different from the shapes of the individual curves.

First, let's talk about the fact that JC chose to fit a cubic instead of just a straight line. Over at "Beyond the Box Score" and "The Book" blog, commenter Sky K. shows how the individual teams could be linear, and it's only when you plot them together that you get the illusion that the overall curve looks like a cubic.

He posts two beautiful illustrations, which I'll repost here (hope Sky doesn't mind). The first one shows that the linear is almost as good as the cubic except for five outlying points, which *all* belong to the Yankees.

(Click on any of the graphs for a larger version.)

The second one shows how every team, including the Yankees, could be linear, while the overall curve would look curved.

Sky's curves illustrate that the individual teams could be different from the overall curve in terms of whether they're linear or not. Now, I'm going to adapt Sky's curve (although much uglier) to show that the individual teams could be different from overall in terms of the *slope* of the curve:

That shows how it's possible that every team has a high marginal win value (of $4 million, say) when you fit a curve to them individually, but it looks like they only have a low marginal win value (of $1.2 million, say) when you fit a line to them as a group.

To check if that's really what's happening, you'd want to run the same regression, but with dummy variables for every team (and without the population variables). I'm not sure what you'll get with only five datapoints per team, but I'd bet that even if the 30 individual estimates were all over the place, the averaege would be higher than $1.2 million.

------

So that explains why JC's model has too low an estimate for the marginal value of a win. But if that's the case, shouldn't all the player salary estimates also be way too low? Let's take a look.

As far as I understand, the current sabermetric thinking on the way to estimate a player's value is to figure that a replacement-level player is worth the league minimum of $450,000, and that every win above that adds about $4.5 mllion. Although JC rejects the idea of "replacement level," that's not a problem -- it turns out that league average is about 2 wins above replacement, so we can just say that the average player is worth $9.45 million, and then you add or subtract $4.5 million for each win above or below average.

I'm going to call this the "sabermetric method", even though I'm sure there are sabermetricians who disagree with it or have slight variations on it.

JC's method, of course, is different. To get the estimate of a player's value, you start by figuring the value of the average player. JC has that at $4.8 million in batting value (for a full-time player who takes 10% of his team's plate appearances), and another million or two in fielding value (depending on position). A typical fielding value is $1.65 million for a player with 95% of his team's innings at third base. Adding batting to fielding, we get that an average full-time position player is worth about $6.4 million a year.

To go from an average player to a specific player, you add the value of that player's batting and fielding performance, above or below average. That's just JC's original team revenue curve (the first curve in my previous post).

Finally, you adjust for inflation. I've bumped the estimates in the book by 20%, since JC's values are in 2007 dollars.

So, now, here's the important graph. It shows both estimates of player value: JC's (curved line), and the sabermetric one (straight line).

For players who are about average, or a bit below, the two curves aren't that far apart. Since a substantial proportion of major league baseball players are indeed in this range, it's hard to choose one over the other for those players.

Where the estimates diverge is for better players, the stars and superstars -- and for the bad players, the guys who can barely hold on to their jobs.

For the mediocre players, JC has a full-time bad player (-2 wins, say) at over $5,000,000. The sabermetric view would argue that, if you can get a minor leaguer to do the job just as well for the major league minimum, that player can't possibly be worth more than $450,000. JC says that if we think we an find a -2 player at the league-minimum salary, we're mistaken, and it's actually difficult to find a -2 player. His own curve crosses the $450K mark somewhere below -5, which I think is far too low.

However, the issue of sub-par players is not all that important in practical terms, because there are very few bad players signed as free agents to be full-time players.

Where it gets interesting is superstars. Take, for instance, Albert Pujols, who's typically 6 wins above average (8 wins above replacement) every year. JC would say he's worth about $25 million (he had him at $21.6 million in 2010, but that's because it was a bit of an off-year). The sabermetric method would say he's $36.5 million.

Those estimates aren't even close: one is almost 50 percent higher than the other. How can we figure out which one is better?

What we could do is wait for Albert Pujols to become a free agent, and see what he gets. Is it closer to $25MM, or $36MM? Whichever it is, that would give valuable evidence on how the market values him (although, to be sure, either side of the debate could argue that the market is wrong).

That would be great, except: most players as good as Pujols sign long-term contracts, and a player's performance over that contract varies due to the effects of aging. So that complicates things. To figure out what a player is worth on a multi-year deal, you have to estimate how he'll perform after the effects of aging. In addition, pay rates normally increase over time, so there's a certain amount of inflation built in to the deal.

So you have to make assumptions about inflation, and about aging patterns.

As it turns out, the assumptions that JC's method makes are very different from the assumptions sabermetrics makes. And the direction of the difference is exactly opposite to the difference in the single-season curves shown in the graph. That means that when you look the numbers for a long-term deal, the two methods aren't off by nearly as much as they would be for a one-year deal.

1. Salary inflation

Suppose that player X is projected to be 2 wins above average five years from now. JC has that a season like that is worth $10 million today, and the sabermetric method says it's worth $18.45 million today.

But what will it be worth in five years?

JC assumes that salaries in general will rise by 9 percent a year. At 9 percent, his $10 million salary will inflate to $15.4 million.

I'm not sure what rate of inflation various sabermetricians use; Tango has it at 6 percent over the last 10 years, but 3 percent over the last four years. If we settle on 5%, then today's $18.45 million value turns into a $23.5 million value in five years.

So that closes the gap a bit. Originally, it was $10 million to $18.45 million, a gap of 84 percent. After inflation, it's $15.4 million to $23.5 million, a gap of only 53 percent.

2. Aging

As has been noted here and elsewhere, JC believes player's age-related decline starts later, and is much less severe, than the sabermetric community does.

Tom Tango has found that the best players (like the ones we're discussing here) age at about half a win per season, starting at 27. So, consider a 30-year-old player who's 2 wins above average next year. Five years from now, he'll be 2.5 wins worse than he is now. That's a huge difference -- the decline is worth $11.25 million off the player's value. That means Tango will have the player's fifth year worth $7.2 million in today's dollars, or $9.2 million after five percent inflation.

JC's aging curve is much flatter -- he'll have the 35-year-old within about half a win of his performance at 30 (not half a win per season, but half a win total). So JC will have the player worth still about $8.5 million in today's dollars, or about $13 million after nine-percent inflation.

For seasons in the near future, JC's method gives lower values than the sabermetric method. But for seasons in the far future, JC's method gives *higher* values than the sabermetric method.

For a long-term deal, the far seasons cancel out the near seasons, and so the total values wind up comparable, in a lot of cases.

------

So, my view on JC's methodology is that the reason his method gives reasonable values is mostly coincidence -- his positive errors cancel out his negative errors. Of course, JC would say the same thing for the sabermetric method, but in reverse. He'd say it's *us* whose logic doesn't work, and it's *our* errors that, coincidentally, cancel each other out.

So, how to resolve which version is correct?

The first way is just to judge the logic. In my previous post, and here, I explained why I think JC's model's assumptions don't hold. If you read JC's book and posts, you can evaluate my comments, and make up your own mind.

The second way is to look at the empirical data of how much free agents actually sign for. Take all the superstar players who recently came to the end of a long-term contract. Figure out, in retrospect, how much each of the seasons was worth by each method, adjusted for actual inflation. Add them all up and compare them to the actual value of the contract. See which method came closer, and keep a W-L record.

I bet that the conventional sabermetric method would soundly defeat JC's method. And I'd be willing to bet on that going forward.

Labels: baseball, economics, payroll

4 Comments:

At Friday, December 10, 2010 10:46:00 AM, Sky said...: Phil, here's a third pretty graph (in my opinion) showing how revenue changes for actual teams over the 2005-2009 time period. It's very very flat, as opposed to your sketch -- in other words, team revenues hold steady over a short time period regardless of wins. And that makes some intuitive sense. One down year for the Yankees doesn't kill their fan base. Making the playoffs causes more season tickets the *following* year. Etc.

http://www.beyondtheboxscore.com/2010/12/7/1861137/sabernomics-werth-deal-is-aggressive-but-reasonable#53907754

For the record, I don't think team rev/win curves are linear. It's just really hard to draw ANY conclusions from simply plotting five years worth of team data.
At Friday, December 10, 2010 10:51:00 AM, Phil Birnbaum said...: Sky,

Thanks. You're right ... revenues are probably partly a function of *expectations* of winning, not just on winning. Your example of the Mets is particularly to the point.

There's probably a certain amount of lag in the relationship between wins and revenue, and we're all oversimplifying a bit by implicitly assuming it's immediate.
At Friday, December 10, 2010 11:08:00 AM, Sky said...: Also, and I've said this elsewhere, but "aging" is somewhat a semantics issue. JC, by limiting his sample to 82 players throughout history who had 5000 PAs and at least 300 PAs every year for 10 years, is pretty much measuring ONLY the effects of getting older/wiser. If we KNEW a player wouldn't get injured, it might be a solid model.

But that's a bad assumption. When others address "aging", they're including all types of attrition -- such as injuries that cause missed playing time or injuries that hurt performance. When projecting a player's performance you need to account for the this type of drop-off, whether or not you call it "aging". Even if a magic 27 year-old never aged, his production and playing time would dwindle -- simply playing the game causes injuries.
At Friday, December 10, 2010 11:15:00 AM, Phil Birnbaum said...: Right. If I may paraphrase you ... JC is saying that decline over time has two causes: "normal" aging, and injury-related (which also increases with age).

Us sabermetrics guys combine them into one overall thing we call "aging". JC prefers to separate them. However, in his analysis, he includes only "normal" aging and doesn't project other aging-related attrition.

The latter omission is what causes his estimates to be too high. And that's a problem even if you 100% agree with JC's definitions.

<< Home

Sabermetric Research

Friday, December 10, 2010

The "Hot Stove Economics" salary model, Part II

4 Comments:

About Me

Previous Posts