## Tuesday, December 28, 2010

### On babies, batting averages, and weighted means

Here's a famous old math problem you've probably seen many times before:

A king decides that his country has too many men and not enough women. So he issues a decree: once a couple has a boy, they're not allowed to have any more babies.

The king reasons as follows: no family will have more than one boy. But some families will have two girls, or three girls, or even six girls before they have a boy. So there will wind up being a lot more girls than boys.

Is the king's reasoning correct?

The answer: the king's reasoning is not correct. There will still be 50 percent boys, and 50 percent girls. There are many ways to figure this out. I'd refer you to a solution on the internet, but I can't seem to find the problem. (Can anyone supply a link?)

Now: last week, Steven Landsburg posted a variation on the question, as follows:

There's a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What [is the expected value of the] fraction of the population [that] is female?

At first, this seems like the same question. But it's not. As Landsburg explains, the answer to this one is *not* 50 percent.

But even though the answer isn't 50 percent, it *seems* like it should be -- so much so that commenters are perplexed. One reader, a physics professor, seems certain that Landsburg is wrong, and that the answer is indeed 50 percent. Landsburg has challenged him to a \$15,000 bet.

Landburg is indeed correct. I'm going to try explaining why, with a baseball analogy. But before I do, think about it a bit, and maybe read Landsburg's posts, to try to figure it out for yourself. It took me a while to get my head around it.

-----

OK, here we go.

Suppose that in a given season, the overall MLB batting average is .250. What is the average player's batting average?

The answer is NOT .250. But this time it's a lot easier to figure out why.

If you check, you will find that the average major leaguer hits less than the league average .250.

Why? Because when you average individual players, you give them equal weight. But when you figure out the composite MLB average, you weight by the number of at-bats. And good players have more AB than bad players. Therefore, the overall average is inflated by the fact that you weight the good players more heavily, and will wind up being higher than the average of the individual players weighted equally.

This is easier to see with an example. Suppose there are two players in the league. Player A goes 100 for 399, for a batting average of .251. Player B, who is a pitcher, goes 0 for 1. The average player went .125 (the average of .251 and .000). But, overall, the league hit .250 (100 for 400).

Weighted by AB, the league hit .250. Weighted by player, the league hit .125.

Got it for baseball? Now, let's apply it to Landsburg's problem. Because births are random, and we want the average, imagine repeating the birth simulation for a million different countries.

In baseball, players with more AB have a higher proportion of hits. In Landsburg's example, countries with more births have a higher proportion of girls. That's obvious, isn't it? If there are 100 families in the country, there'll always be 100 boys at the end. The only difference, then, must be the number of girls. Countries with more babies, then, *must* have more girls, and therefore a higher proportion of girls.

So we have exactly the same situation for countries as for batters.

-- Country A might have 100 boys and 91 girls, which means a .476 "girling average".
-- Country B might have 100 boys and 109 girls, which means a .522 "girling average".

Overall, there are 200 boys, and 200 girls, for a composite average of .500. However, the average *country* has an average of only .499 (the average of .476 and .522).

If you weight the average by *babies*, you get .500, as expected. But if you weight it by *countries*, you get .499.

Landsburg's question requires you to weight the average by country, and that's why the answer is less than .500.

-----

We'll see if he gets any takers for his bet. I'm guessing he doesn't.

Labels: , , ,

## Thursday, December 23, 2010

### How many wins do baseball teams buy? Some evidence

Pretty much every model of baseball team payroll assumes that wins are worth more the closer you get to playoff contention. If you have 75 wins, buying a 76th win won't help you that much, because the fans don't care a whole lot whether you have 75 wins or 76. But if you have 86 wins, the 87th win is good value, because it puts you closer to the post-season, which is what the fans care about. So that 87th win can net you a lot more money.

One consequence of this hypothesis is that, when they decide how many wins to buy on the free-agent market, teams should *never* settle on any number between, say, 80 and 85. That's because, if the 82nd win is worth buying, or the 85th, then the 86th should *always* be worth buying too. It will generate more revenue than the 82nd win, but cost the same amount of money. (For an imperfect analogy: nobody buys just three tires for their car.)

I think Tango has been somewhat skeptical of this theory ... his hypothesis is that team ownership gives the GM an arbitrary budget, without actually figuring out what the revenue curve would look like or doing a cost/benefit analysis. It sounds implausible to me that that would happen frequently (although I'm sure it could happen occasionally, or even regularly). But neither one of us really has any evidence.

Well, actually, I now have a little bit. A couple of weeks ago, frequent commenter Guy e-mailed me and suggested I look at actual win totals to see if there is a gap between 80 and 85 wins. You wouldn't expect *zero* wins among those teams, of course, since better and worse teams will occasionally fall into the 80-85 range just due to luck, or misjudgment of talent. But you should expect fewer 84s than 88s, for instance.

There is some evidence that that's the case. I looked at raw win totals from 1998 to 2010, and looked at them in various ways. Here are the frequencies of individual wins:

70 wins: 6 teams
71 wins: 9 teams
72 wins: 11 teams
73 wins: 7 teams
74 wins: 12 teams
75 wins: 15 teams
76 wins: 9 teams
77 wins: 8 teams
78 wins: 10 teams
79 wins: 13 teams
80 wins: 11 teams
81 wins: 6 teams
82 wins: 8 teams
83 wins: 17 teams
84 wins: 7 teams
85 wins: 11 teams
86 wins: 15 teams
87 wins: 8 teams
88 wins: 16 teams
89 wins: 12 teams
90 wins: 11 teams
91 wins: 9 teams
92 wins: 9 teams
93 wins: 9 teams
94 wins: 6 teams
95 wins: 15 teams

Nothing obvious looking at it that way ... it is interesting, though, that 81-81 happened less often than any other record within 10 games of .500 -- you'd expect it to be the most. But the highest frequency record was 83-79, just two wins away, so it's probably just coincidence.

We can see more if we group in bins of five wins:

70-74 wins: 45 teams
75-79 wins: 55 teams
80-84 wins: 49 teams
85-89 wins: 62 teams
90-94 wins: 44 teams

Since the 80-84 group is closest to the average of 81, you'd expect that to be the highest group. Instead, the two groups before and after are higher.

The best breakdown to see the pattern is by threes:

71-73 wins: 27 teams
74-76 wins: 36 teams
77-79 wins: 31 teams
80-82 wins: 25 teams
83-85 wins: 35 teams
86-88 wins: 39 teams
89-91 wins: 32 teams
92-94 wins: 24 teams

That shows it neatly. As hypothesized: teams appear more likely to choose either mid-70s, or mid-80s -- less in the middle.

Admittedly, this is weak evidence ... but Bill James did also mention a tendency for teams to peak higher than .500, back in the 1985 Abstract (p. 116).

------

UPDATE: I reran the study, except that instead of using actual wins, I used expected wins. The "expected" is the actual wins, after subtracting out

-- pythagorean luck
-- runs created luck
-- runs created against luck
-- offense luck (players having career years)
-- pitching luck (pitchers having career years).

The last two categories of luck are very approximate ... the algorithm I used probably isn't perfect. It involves calculating a player's expectation by a weighted average of the two previous and two following years, regressed to the mean a bit. (Details are in Powerpoint slides at my website.)

Anyway, I expected the results for expected wins to be even stronger, but they weren't. Here are the three-year groupings:

71-73 wins: 35 teams
74-76 wins: 44 teams
77-79 wins: 46 teams
80-82 wins: 44 teams
83-85 wins: 55 teams
86-88 wins: 36 teams
89-91 wins: 30 teams
92-94 wins: 13 teams

There's still an effect, but a smaller one. The 80-82 group is smaller than the 77-79 and 83-85 groups, which is something.

This is from 1998 to 2007. Going back to 1977, the results are a little more pronounced:

71-73 wins: 78 teams
74-76 wins: 111 teams
77-79 wins: 143 teams
80-82 wins: 127 teams
83-85 wins: 139 teams
86-88 wins: 93 teams
89-91 wins: 72 teams
92-94 wins: 45 teams

There's one element of luck that's not included here: injuries. Expected wins do not include any losses due to "too many" injuries to key players, or gains due to "too few" injuries to key players.

Labels: , ,

## Monday, December 13, 2010

### Replacement level talent vs. observations: a study

Recently, both JC Bradbury and King Kaufman expressed some skepticism about the concept of "replacement value". In theory, replacement value is the level of talent that can be obtained quickly at league minimum salary, like your best minor-leaguer, or the free agent who almost got an offer but didn't.

Generally, conventional sabermetric wisdom is that a replacement-level player is one who performs at a level 20 runs (or two wins) below average, if pro-rated to a full season. (Two wins below average is zero "wins above replacement," or "WAR".) By this standard, no team should play anyone expected to perform below this level.

In their arguments, Bradbury points to the fact that there were, in fact, many players who had performed at less than this level of performance. In a recent post on replacement value, King Kaufman checked, and found that

"In the major leagues in 2010, 24.5 percent of all innings were thrown by pitchers who ended the season with a negative WAR."

(UPDATE: In the original version of this post, I had originally incorrectly painted Kaufman as a replacement-value skeptic, which he is not. See his comment below.)

The explanation, of course, is that the fact that they *performed* below replacement doesn't mean teams *expected* them to perform below replacement. Teams might have overestimated their abilities, or, more likely, they just had bad years due to random chance. If you're bet on heads ten times, some coins are going to land heads only 4 times out of 10. It doesn't mean that there was anything wrong with your prior expectations of the fairness of the coin.

Anyway, I thought I'd run a little experiment.

I started with every batter in Jeff Sackmann's "Marcel" database from 2000-2009. ("Marcel" is a prediction method created by Tom Tango, which forecasts a player's performance this year based on his statistics the previous three years.) Assuming the Marcel predictions are reasonable, I counted how many player-seasons were expected to be below replacement level. If the theory is correct, it should be "zero."

It wasn't zero, but it was close. Over 10 years, only 152 players total were expected below replacement value that year. That's 15 players per year, spread among 30 teams. Half a player per team. That's not bad.

And, it's possible I had replacement value wrong. I didn't include fielding, just basic linear weights. And I used arbitrary position adjustments -- catchers and middle infielders had to be -40 runs per 600 PA to be replacement level, 1B and DH had to be 0, and everyone else had to be -20. It's very possible that, of the 152 players, many of them actually *weren't* below replacement level because of their defensive skills.

Also, there were playing-time limits. Jeff's database excludes players with low expected playing time (the smallest he forecasts is for 185 PA). And I left out all players who had fewer than 50 actual plate appearances that season. I figured that if a guy projected below replacement, but the team only gave him 10 AB, that's close enough to zero that we won't count it.

So, half a player per team per season seems reasonably consistent with the concept of replacement level. It's not like teams are signing these guys left and right.

If there were 15 players per season *projected* to be below replacement level, how many of them actually *performed* below replacement level?

The answer: 1,025 total, or 102 per season. There were about seven times as many players *observed* to be below replacement level than *predicted* to be below replacement level.

That makes sense -- if you flip 100 pennies, where replacement level is 0.5, there would be *infinitely* more coins observed below 0.5 than actually below 0.5. (Assuming all pennies are fair coins, that is.)

-----

Now, the experiment. For every player in the database, I decided to randomly simulate their season based on their Marcels. Basically, I treated their Marcel prediction like an APBA card, and ran off a bunch of plate appearances. I simulated the exact number of PA that they *actually* had that season, regardless of how Marcel predicted their playing time.

Then, to simulate the uncertainty about the player's talent, I chose an adjustment from a normal curve, with standard deviation of +5 runs, and added that to the performance. (UPDATE: the +5 was for 500 Marcel PA. I adjusted accordingly for fewer Marcel PA by the square root of the ratio, so for 125 PA, I used +10.)

If Marcels are good, unbiased predictors, and teams were indeed getting rid of players who fell below replacement, then we should see 1,025 below-replacement performances in the simulation, not just in real life.

Well, we don't.

I ran the simulation 10 times, and the average was 561 players, not 1,025. We got a little over half.

Why? After I ran this, I realized the reason is selective sampling. Suppose you have two players who have talent of -10. Six weeks into the season, and just by luck, one of them is awful, at a rate of -30, and the other one is doing OK, at a rate of +10.

What happens? The -30 guy is released, and winds up the season at -30 over 100 AB. The second guy is allowed to play the whole year, and winds up at -5 over 500 AB.

One out of two wound up having performed below replacement in real life. But, in the simulation, it'll be less than that.

In the simulation, there's less than a 50% chance that the first guy will wind up at less than -20 over 100 AB. And there's a much, much smaller than 50% chance that the -10 guy will wind up below -20 over a full 500 AB.

So the simulation will underestimate the number of below-replacement performances, because, in real life, once a marginal player is below replacement, he's not often given a chance to rise back out of it. But in the simulation, he gets his full number of PA regardless.

-----

in that light, I adjusted the simulation to add one new rule: if a player was expected to be +10 or less, and, a third of the way through his expected season, he's below replacement, he gets released. (If, after a third of the season, he's above replacement -- even a little bit -- he plays the entire rest of the season regardless of what happens afterwards.)

Now, the simulation goes from 561 below-replacement performances, to 800. Still less than 1,025, but better.

So, finally, I did one more thing: I changed the standard deviation of the uncertainty of the player's talent from 5 runs to 10.

Now, we get to 863. That's 84 percent of the way there.

-----

After all that, I'm not sure quite how much the simulation tells us. To do a proper comparison, we need a better model of how teams decide how much playing time to give a hitter based on expectations and performance.

What we *do* find out, though is:

-- If you trust Marcel, then it does seem that few teams are willing to keep a player who has performed below replacement.

-- Regardless, many players *do* perform below replacement.

-- Simple probability shows that, at a bare minimum, over half the players who perform below replacement do so because of luck.

-- With other not-too-unreasonable assumptions, we can get that percentage up into the 80s.

My view about all this that it's less than fully conclusive. Still, it should be fairly persuasive. If you didn't accept the "replacement player" hypothesis before, this little study should have enough in it to get you to reconsider.

What do you think?

-----

UPDATE, 12/14: King Kaufman posts in the comments that I misinterpreted his views on replacement value. My apologies to King, and I've revised the post accordingly.

Labels: ,

## Friday, December 10, 2010

### The "Hot Stove Economics" salary model, Part II

Last post I talked about the economics and logic behind the player valuation model JC Bradbury used in "Hot Stove Economics." This post will talk about the actual player valuations, which is probably more interesting.

The main question the valuation model tries to answer is: how much is an extra (marginal) win worth to a baseball team? Conventional sabermetric wisdom says it's somewhere between \$4 million and \$5 million (for example, here's a recent post by Tango saying it's about twice \$2.36 million). I have a vague feeling I've read something that made me think it's a bit lower ... but, for purposes of this post, I'll go with \$4.5 million.

JC's results are very different. His model isn't linear, so there isn't one fixed value, but on average a win seems to be worth \$1.2 million.

The two models differ by a factor of three! Why the discrepancy?

I think part of the reason is how JC interpreted the results of his regression.

JC came up with a model that suggested that every team receives the same revenue for their Nth win, regardless of the size of the market. For instance, if the Atlanta Braves make \$X million more when they win 83 games than when they win 82, then the New York Yankees will also value their 83rd win at \$X million. That's really at odds with conventional wisdom -- not just sabermetric conventional wisdom, but announcer and columnist and fan conventional wisdom, too.

So why doesn't the model value a Yankee win at more than a Atlanta win? It's because when he tested the "different teams have different values" theory, JC found that its coefficient didn't come out statistically significant from zero in the regression. However, it came out very "baseballically significant." According to the rejected regression (page 183, column 4), a difference in population of 1 million people gives an effect of \$72,000 per win.

The New York metro area has a population of about 19 million. Atlanta has a population of about 5 million. At \$72,000 per million people, that means a marginal win for the Yankees should be worth about \$1 million more than a marginal win for the Braves.

Remember, the average win is worth \$1.2 million. So a Yankee win is worth almost twice the average! Nonetheless, JC rejects the idea because the coefficient didn't come out statistically significant.

The problem is, the regression had only five datapoints per team. That was enough to show evidence of an effect, but not at the 95% level. It was 1.41 SD from zero -- that's 92% significance one-tailed, or 84% significance two-tailed. But, because that was still short of the required 95%, JC chose to proceed as if there were ZERO effect of population size on marginal wins.

As I have written before, I think that's a big problem. The regression found the effect we were looking for, in the direction we were looking for, and even approximately the *size* we were looking for. If, after all that, we don't have statistical significance, the problem is that your sample isn't big enough.

Another way to look at it is that it has to do with the choice of null hypothesis. JC chose to look for evidence of whether there was any evidence that the Yankee effect was not zero. He didn't find enough evidence. But what if he chose to look for evidence that the effect was *proportional to population*? If he did that, he'd probably find that there wasn't enough evidence to reject that hypothesis, either.

------

But, even if we accepted the idea that different teams have different revenue curves, we still have a problem. After all, the regression would still show an average of about 1.2 million per win. It's just that the Yankees would be at about \$2 million, while the small-market teams would be below \$1 million. So the discrepancy between \$4 million and \$1.2 million is still there.

I think what's happening is that there's a variation of "Simpson's Paradox" here, where the shape of the overall curve is very different from the shapes of the individual curves.

First, let's talk about the fact that JC chose to fit a cubic instead of just a straight line. Over at "Beyond the Box Score" and "The Book" blog, commenter Sky K. shows how the individual teams could be linear, and it's only when you plot them together that you get the illusion that the overall curve looks like a cubic.

He posts two beautiful illustrations, which I'll repost here (hope Sky doesn't mind). The first one shows that the linear is almost as good as the cubic except for five outlying points, which *all* belong to the Yankees.

(Click on any of the graphs for a larger version.)

The second one shows how every team, including the Yankees, could be linear, while the overall curve would look curved.

Sky's curves illustrate that the individual teams could be different from the overall curve in terms of whether they're linear or not. Now, I'm going to adapt Sky's curve (although much uglier) to show that the individual teams could be different from overall in terms of the *slope* of the curve:

That shows how it's possible that every team has a high marginal win value (of \$4 million, say) when you fit a curve to them individually, but it looks like they only have a low marginal win value (of \$1.2 million, say) when you fit a line to them as a group.

To check if that's really what's happening, you'd want to run the same regression, but with dummy variables for every team (and without the population variables). I'm not sure what you'll get with only five datapoints per team, but I'd bet that even if the 30 individual estimates were all over the place, the averaege would be higher than \$1.2 million.

------

So that explains why JC's model has too low an estimate for the marginal value of a win. But if that's the case, shouldn't all the player salary estimates also be way too low? Let's take a look.

As far as I understand, the current sabermetric thinking on the way to estimate a player's value is to figure that a replacement-level player is worth the league minimum of \$450,000, and that every win above that adds about \$4.5 mllion. Although JC rejects the idea of "replacement level," that's not a problem -- it turns out that league average is about 2 wins above replacement, so we can just say that the average player is worth \$9.45 million, and then you add or subtract \$4.5 million for each win above or below average.

I'm going to call this the "sabermetric method", even though I'm sure there are sabermetricians who disagree with it or have slight variations on it.

JC's method, of course, is different. To get the estimate of a player's value, you start by figuring the value of the average player. JC has that at \$4.8 million in batting value (for a full-time player who takes 10% of his team's plate appearances), and another million or two in fielding value (depending on position). A typical fielding value is \$1.65 million for a player with 95% of his team's innings at third base. Adding batting to fielding, we get that an average full-time position player is worth about \$6.4 million a year.

To go from an average player to a specific player, you add the value of that player's batting and fielding performance, above or below average. That's just JC's original team revenue curve (the first curve in my previous post).

Finally, you adjust for inflation. I've bumped the estimates in the book by 20%, since JC's values are in 2007 dollars.

So, now, here's the important graph. It shows both estimates of player value: JC's (curved line), and the sabermetric one (straight line).

For players who are about average, or a bit below, the two curves aren't that far apart. Since a substantial proportion of major league baseball players are indeed in this range, it's hard to choose one over the other for those players.

Where the estimates diverge is for better players, the stars and superstars -- and for the bad players, the guys who can barely hold on to their jobs.

For the mediocre players, JC has a full-time bad player (-2 wins, say) at over \$5,000,000. The sabermetric view would argue that, if you can get a minor leaguer to do the job just as well for the major league minimum, that player can't possibly be worth more than \$450,000. JC says that if we think we an find a -2 player at the league-minimum salary, we're mistaken, and it's actually difficult to find a -2 player. His own curve crosses the \$450K mark somewhere below -5, which I think is far too low.

However, the issue of sub-par players is not all that important in practical terms, because there are very few bad players signed as free agents to be full-time players.

Where it gets interesting is superstars. Take, for instance, Albert Pujols, who's typically 6 wins above average (8 wins above replacement) every year. JC would say he's worth about \$25 million (he had him at \$21.6 million in 2010, but that's because it was a bit of an off-year). The sabermetric method would say he's \$36.5 million.

Those estimates aren't even close: one is almost 50 percent higher than the other. How can we figure out which one is better?

What we could do is wait for Albert Pujols to become a free agent, and see what he gets. Is it closer to \$25MM, or \$36MM? Whichever it is, that would give valuable evidence on how the market values him (although, to be sure, either side of the debate could argue that the market is wrong).

That would be great, except: most players as good as Pujols sign long-term contracts, and a player's performance over that contract varies due to the effects of aging. So that complicates things. To figure out what a player is worth on a multi-year deal, you have to estimate how he'll perform after the effects of aging. In addition, pay rates normally increase over time, so there's a certain amount of inflation built in to the deal.

So you have to make assumptions about inflation, and about aging patterns.

As it turns out, the assumptions that JC's method makes are very different from the assumptions sabermetrics makes. And the direction of the difference is exactly opposite to the difference in the single-season curves shown in the graph. That means that when you look the numbers for a long-term deal, the two methods aren't off by nearly as much as they would be for a one-year deal.

1. Salary inflation

Suppose that player X is projected to be 2 wins above average five years from now. JC has that a season like that is worth \$10 million today, and the sabermetric method says it's worth \$18.45 million today.

But what will it be worth in five years?

JC assumes that salaries in general will rise by 9 percent a year. At 9 percent, his \$10 million salary will inflate to \$15.4 million.

I'm not sure what rate of inflation various sabermetricians use; Tango has it at 6 percent over the last 10 years, but 3 percent over the last four years. If we settle on 5%, then today's \$18.45 million value turns into a \$23.5 million value in five years.

So that closes the gap a bit. Originally, it was \$10 million to \$18.45 million, a gap of 84 percent. After inflation, it's \$15.4 million to \$23.5 million, a gap of only 53 percent.

2. Aging

As has been noted here and elsewhere, JC believes player's age-related decline starts later, and is much less severe, than the sabermetric community does.

Tom Tango has found that the best players (like the ones we're discussing here) age at about half a win per season, starting at 27. So, consider a 30-year-old player who's 2 wins above average next year. Five years from now, he'll be 2.5 wins worse than he is now. That's a huge difference -- the decline is worth \$11.25 million off the player's value. That means Tango will have the player's fifth year worth \$7.2 million in today's dollars, or \$9.2 million after five percent inflation.

JC's aging curve is much flatter -- he'll have the 35-year-old within about half a win of his performance at 30 (not half a win per season, but half a win total). So JC will have the player worth still about \$8.5 million in today's dollars, or about \$13 million after nine-percent inflation.

For seasons in the near future, JC's method gives lower values than the sabermetric method. But for seasons in the far future, JC's method gives *higher* values than the sabermetric method.

For a long-term deal, the far seasons cancel out the near seasons, and so the total values wind up comparable, in a lot of cases.

------

So, my view on JC's methodology is that the reason his method gives reasonable values is mostly coincidence -- his positive errors cancel out his negative errors. Of course, JC would say the same thing for the sabermetric method, but in reverse. He'd say it's *us* whose logic doesn't work, and it's *our* errors that, coincidentally, cancel each other out.

So, how to resolve which version is correct?

The first way is just to judge the logic. In my previous post, and here, I explained why I think JC's model's assumptions don't hold. If you read JC's book and posts, you can evaluate my comments, and make up your own mind.

The second way is to look at the empirical data of how much free agents actually sign for. Take all the superstar players who recently came to the end of a long-term contract. Figure out, in retrospect, how much each of the seasons was worth by each method, adjusted for actual inflation. Add them all up and compare them to the actual value of the contract. See which method came closer, and keep a W-L record.

I bet that the conventional sabermetric method would soundly defeat JC's method. And I'd be willing to bet on that going forward.

Labels: , ,

## Sunday, December 05, 2010

### The "Hot Stove Economics" salary model

Hot Stove Economics,” JC Bradbury’s new book, concentrates on the question of assigning a dollar value to baseball player production. JC has changed his methodology somewhat from his first book, but … I still don’t agree with the book’s valuations.

JC does, roughly, what other sabermetricians do: First, he figures out how many wins a player’s performance is worth. Then, he translates that into salary. The first part, I think, is generally OK; it’s the second part that I’m not sure about.

The valuation part starts out the same way that others have done – by figuring out, at the team level, how winning more games helps them make more money. (Actually, he uses run differential, and throughout this post I’ll translate that to wins via the traditional sabermetric shortcut of ten runs per win.)

JC takes every team from 2003 to 2007, and plots revenues vs. wins. He gets a graph that looks fairly flat at low levels of wins, but starts rising, and accelerating, at 71 wins. To fit a curve, he settles on a cubic, and runs some regressions to get an equation that relates on-field performance to revenues. Here’s what he comes up with.

Revenue in millions = (0.0641)*RD + (0.000979) * RD^2 + (0.00000312) * RD^3 + (.0000061 * metro population) + (19.55 if there’s a “new stadium honeymoon”) + 95.5

(RD = run differential (runs scored minus runs allowed). Population is metropolitan statistical area. A new stadium honeymoon is considered to exist if the team is playing in a stadium that’s 8 or fewer years old.)

There’s a surprise here. Under the model, an additional Nth win (or run) is worth exactly the same amount, regardless of the size of the market. That’s a bit counterintuitive. You’d think that if the Yankees improve from 65 wins to 95, that’ll create more revenue than when the Brewers move from 65 games to 95. Fans spend more money when their team wins more, so you’d think that, since the Yankees have at least twice the fan base as the Brewers, an extra win by them would create a lot more revenue than an extra win by the Brewers.

But the model says no.

Why? In a nutshell, I think there weren’t enough datapoints in JC’s regression for the “win worth more to a big market team” coefficient to come out statistically significant. That’s a subject I’ll talk more about in a future post. For this one, I’ll just talk about other aspects of the model.

------

Let me start by show you a graph of JC’s equation. This one is the above cubic function for revenues above average vs. wins, leaving out the fixed team effects. (It’s figure 6.1 in the book, page 121.)

And here’s a graph of revenues per marginal (additional) win. That one’s not in the book. I graphed it by taking the derivative of the previous curve. (This is the more important curve, and I’ll be referring to it often.)

(You can double-click on any graph for a larger version.)

There’s something not right here. From the second curve, you can see that once you get past 71 wins, each win is worth more than the last. If that’s the case, why would any team every stop spending? The Yankees have seen fit to spend themselves into a 95- to 100-win team every year. But if the 101st win is worth even more than the 100th win, why do they do that? Why don’t they spend and spend and spend, into the 130s and the 140s?

Obviously, this can’t be the way things are in the real world.

A more reasonable model would have the value of a win coming down somewhere in the range of 95 wins or so, on the theory that once the team has shown that it’s good enough to make the post-season, the fans won’t necessarily spend much more when their team wins 107 games than when it wins 106.

Another thing that doesn’t look quite right is the behavior of the curves in the range for mediocre teams. From about 63 wins to 78 wins, JC’s model suggests that teams earn *less* money by winning more! That doesn’t really make sense, does it?

JC calls the dip the “loss trap.” In fairness, he does suggest (page 122) that it might just be an artifact of having chosen a cubic equation for the model. But, he also wonders if it could have something to do with revenue sharing, that when teams get successful, they have to pay more into the common pool. Unless I’m missing something, I don’t see how it could. The marginal tax rate on additional revenues is only 31%, so you’re still making more money after revenue sharing is deducted. It’s like if you get a raise at work – even though you pay more taxes, you’re still better off, so long as you’re in a tax bracket that’s less than 100%.

And, in addition, if you look at the empirical data instead of the fitted model – which JC shows on page 76 – there’s no “loss trap” there at all. It shows rising revenue, as you’d expect. Very slightly rising, but still rising.

Even if you flatten the “loss trap” to be horizontal instead of negative (as JC does), the curve would show that, between about 62-100 and 81-81, revenue barely changes at all. That doesn’t make sense. Suppose you’re at 62-100, and you spend millions of dollars to buy 19 wins worth of free agents, like a Pujols or three. Should you really expect that you’re throwing 100% of your money down the drain, that the fans won’t wind up buying any more tickets than before? Or, going the other way: if you’re at .500, and you sell off all your decent players, pocket the payroll savings, and drop to 100 losses … will the fans really just shrug and spend the same amount they used to?

That can’t really be right, can it?

So I think the problem is with the model. Part of the problem is that it treats every team as having the same revenue curve, and part of the problem is the fact that JC insisted on fitting a cubic equation. The cubic causes a couple of common-sense principles to be left out of the model – that marginal revenue per win has to top out somewhere before 100 wins, and that every win has to increase revenues at least a little bit.

If you look at other models, (such as Nate Silver’s, and my tweak of Nate’s), they look similar to JC’s, but they satisfy those two conditions -- they deliberately bring down the value of wins once you reach the mid-90s. JC’s actually looks a lot like mine – all you have to do is raise it a bit, so that every win is positive, and reverse it at about 95, so it heads down instead of up.

For the record, here’s Nate’s curve, followed by mine.

------

In any case, maybe all that is a nitpick. You can always just take JC’s curve and tweak it in all the right places, and JC’s calculations will actually not change that much.

But, they still matter. JC’s methodology is to estimate a player’s value by his “Marginal Revenue Product” (MRP), and that depends, to a large extent, on getting the revenue curves right.

In economics, the MRP is defined as the value the last unit of labor is worth to a firm. It’s a principle of economics that, in a free market, with perfect information, a worker gets paid exactly his MRP. The key is that it has to be the *last* unit of labor.

In the past, I’ve disagreed with JC about how this actually works in a baseball context, but here’s how I think it works in real life:

Suppose you run a mechanic shop, business varies randomly from day to day, and you’re open 9 hours a day. You charge customers \$60 an hour for a mechanic’s labor, and you pay the mechanics \$20 an hour.

Suppose you have enough business to keep one mechanic occupied for 100% of his time. Obviously, you’ll hire him. His revenue will be \$540, and he’ll only cost \$180.

Will you hire a second mechanic? Suppose there’s not enough business to be sure both mechanics are always working. Maybe the second mechanic will only be busy for 6 hours a day. (In real life, you’re not going to have one mechanic working 9 hours and the other one 6 – you’ll probably work them both 7.5 hours each. But let’s ignore that for now.) Still, 6 hours is enough that you’ll hire him. He’ll bring in an extra \$360, and you’ll still pay him \$180.

If the third mechanic would bring in only an extra five hours a day, you’ll hire him too – you gain \$300 and pay \$180.

Now, maybe the fourth mechanic will add only three hours of billable work a day. He’ll bring in \$180, and cost \$180. Now, you’re indifferent to whether you hire him or not. Let’s suppose he actually brings in a tiny bit more than \$180, so you hire him.

For a fifth mechanic, the extra potential hours of work he adds might be mostly wasted – there might be only two extra hours when you need a fifth guy, on only the busiest days. On average, he adds two hours of revenue a day, or \$120, but still costs \$180. So, obviously, you don’t hire him.

The “workers earn their MRP” principle states that the wages of *every* mechanic are equal to the MRP of the *last mechanic you hire*. That works, here. The last mechanic is the fourth one. He brings in \$180 in revenue. And all your mechanics earn \$180.

In this example, we knew the revenue curve for the shop, and also the market wage for mechanics. But suppose we didn’t know the wage? No problem. We could just look at the shop, and see that it has four mechanics. We then check, if we added a fifth mechanic (or, actually, a fraction of a fifth mechanic, in order to keep the marginal-revenue curve continuous), how much more revenue would he bring in? Maybe it turns out that if we hired the fifth guy for six minutes, the shop would earn an extra \$1.98 in revenue. So those six minutes are worth \$1.98 in wages, which works out to \$19.80 an hour, close to the actual figure of \$20.

Now, you could do this for *any* mechanic shop, not just this one. If you find the biggest Chevrolet dealer in town, with 70 mechanics, you’ll still find that six minutes of a hypothetical 71st mechanic is still worth about \$20 an hour.

No matter which garage you look at, the salary always equals the MRP of the last unit of labor.

------

JC now tries to extend that to baseball players – or, more specifically, the wins they bring in. If you believe a player is paid his MRP, the first thing you have to do is figure out how much the *last player hired* brings to the team.

So, what JC does, is he says, OK, just like you can take any mechanic shop and figure out the MRP of the last mechanic, you can take any baseball team and figure out the MRP of the last player. So, let’s just take an average team. Suppose it hires one last player, who’s 1 run above average, and it becomes an 82-win team. How much is that extra win worth in revenue?

And JC’s cubic equation tells him that. So that’s the player’s MRP, and so that’s what a win is worth.

That would work, except for one thing: for the value of the 82nd win to be the MRP, it has to be the *last win hired*. But it won’t be! If you look at JC’s curve, or even Nate’s or mine, we all agree that the 83rd win is worth more than the 82nd, and the 84th win is worth more than the 83rd, and so on up to at least 90.

So, it will never be the case that a rational team, in a free market for wins, will stop hiring after the 81st or 82nd win. Because, if they chose to hire the 81st win, they will *always* hire an 82nd win – it costs the same as the 81st win, but brings in more revenue!

For a rational team, the 82nd win will never be the last one hired, so it can’t be the MRP.

It’s like … suppose I offer that if you go out and buy me apples, I’ll pay \$1 for the first one, \$2 for the second one, and \$6 for the third one. How many will you bring me? It depends. If apples cost less than \$3, you’ll bring me three. If apples cost more than \$3, you’ll bring me none. There is *no* case where you bring me exactly one apple or exactly two apples. And so the marginal value of the second apple -- \$2 – is almost meaningless in terms of figuring out when an apple is worth buying.

What JC is doing is figuring out what the 82nd win is worth, but, because that win isn’t the last, it’s NOT the same as figuring out the MRP of a win.

------

I’ve argued that no team will ever stop buying free agents at 81-81, that it will always continue on to at least the high 80s. Why, then, is it the case that you can find teams that win between 81 and 85 games, if the model says it shouldn’t happen?

Luck, mostly. The model doesn’t say that no team will *finish* 81-81 – it says that no team will *buy an 81-81 level of talent*. Teams that go 81-81 were probably teams that got lucky when management budgeted for (say) 74-88, or teams that got unlucky when management budgeted for (say) 90-72. The standard deviation of wins in a season from luck is about 6, so, about 1 time in 20, a team will finish 12 games above or below where you’d expect.

In addition, teams don’t have perfect information or perfect scouts. They might think they have a team capable of winning 88 games … but, alas, they overestimated the value of their stopper and their free-agent first baseman, and, unbeknownst to them, they really assembled an overpriced 83-win team.

Still, it’s not hard to think of reasons for teams to target 81-81 that sound plausible. For instance, maybe they’re in such a small market that it’s not worth it for them to try to get to 89 wins – they’d rather stay at 81 and hope for luck. They might prefer to sell some of their young, underpaid players for cash, but they can’t. And so, they shrug and go with what they’ve got.

That’s not unreasonable, but it’s not consistent with JC’s model. First, the model assumes that all teams have the same revenue curve, so there are no teams whose revenues per win are smaller than others. Second, if a team holds on to players only because they can’t sell them, that contradicts the assumptions of the “salary=MRP” hypothesis, which assumes an efficient market both ways.

-----

So, when valuing individual players, JC estimates the number of runs above or below average they were responsible for. He then goes to the revenue graph and figures out how much revenue those runs are worth to the average team (positive runs are worth more than negative runs, because the curve is steeper on the positive side of 0 than the negative side).

Then, since the curve is denominated in runs above or below average, he adds in what the “average” is worth for a player with that many plate appearances. The total, then, is the MRP for that player.

Two concerns about how he does all that:

1. In some cases, JC’s valuation comes in a lot different from the market price of the player. CC Sabathia, for instance, signed a contract worth some 15% more than the model estimates, even after assuming a generous 9% inflation rate for future years.

Why is that? JC explains that it’s because the Yankees are a better than average team. The better the team, as you remember from the graph, the more additional wins are worth. Therefore, because Sabathia joins a better-than-average Yankee team, his contribution is higher up the revenue curve than it would be for an average team, and so he’s worth more.

But that’s not right at all!

Basically, what JC is doing is saying that the bigger garage pays a mechanic more because he brings in more revenue for them. But that’s not how MRP works.

If you recall, the revenue value of the mechanics to the owner were:

1st mechanic: \$540
2nd mechanic: \$360
3rd mechanic: \$300
4th mechanic: \$180

Now, suppose there’s another garage that charges more per hour (because it’s in a more expensive part of town, say, or it has better equipment). Its revenue curve looks like:

1st mechanic: \$720
2nd mechanic: \$480
3rd mechanic: \$400
4th mechanic: \$240
5th mechanic: \$180

Suppose these two garages are hiring free-agent mechanics. So far, they’ve hired two each, and they’re looking for a third. They interview a candidate.

Now, if the first garage hires the new guy, revenue will increase by \$300. If the second garage hires the new guy, revenue will increase by \$400.

Does it follow, then, that the new guy has a higher MRP if he’s hired by the second garage? Will the second garage offer more money for that mechanic?

No, to both questions. MRP is the value of the LAST mechanic hired, not the NEXT mechanic hired. Neither garage has hired its last one yet. When they do, it will be the fourth one for the first garage, and the fifth one for the second garage. Both will bring in \$180, and therefore earn \$180. \$180 is the MRP.

In mechanics, and baseball players, and real life, when one party values something higher than the next party, that does NOT mean they pay more for it. It means they buy a higher quantity. If I love Taco Bell and you only like it, it doesn’t mean I wind up paying \$3 for a taco while you pay \$1. It means that we both pay \$1, but I wind up going more often.

Similarly, the bigger garage doesn’t pay more for mechanics – it just hires more of them. And the New York Yankees don’t pay more for free agents – they just sign more of them.

According to my (admittedly limited) knowledge of economics, that’s one of the fundamentals of markets, the “Law of One Price.” It doesn’t matter how much I want, need, or value a product, compared to you: in a competitive market, we wind up paying the same price.

I might value my first taco of the week at \$3, and you might value it at \$1. But we each pay \$1. The difference is that I value my second taco at \$1, and you value it only at 50 cents. Since it costs \$1, I buy a second one and you don’t.

What I think JC is doing here is saying, “well, it looks like the Yankees are paying \$2 for a \$1 taco, but that’s because they like tacos so much.” That’s not right. If the Yankees like tacos so much, they won’t pay \$2 for them. They’ll just buy more of them at \$1.

If CC Sabathia’s salary is higher than the model thinks it is, it could be because:

-- there’s something wrong with the model or calculation
-- the Yankees overestimated Sabathia’s value when they signed him
-- the market was imperfect at the time of Sabathia’s signing (maybe he was the only decent pitcher left and the Yankees were desperate).

But it’s *not* because Sabathia’s MRP is higher with the Yankees.

2. If a player is worth 50 runs, the calculation is the difference in revenue between a team with a run differential of zero, and a run differential of 50. That’s not quite right, and can overestimate the value of good players.

If you’re trying to estimate the marginal value, you take the thinnest slice you can. So if a player has a run differential of +50, you pretend he’s a +1, figure out the gain, and multiply that by 50. Even better, you pretend he’s a +.0001, figure out the gain, and multiply by 50,000.

If you don’t do that, you’ll wind up rating a +60/+0 combination higher than a +30/+30 combination. You’ll also wind giving a +60 more positive revenue than you give a –60 negative revenue. That makes no sense, because if you hire a +60 and a –60 simultaneously, you should get zero.

Another way to look at this: in equilibrium, the unit of labor you want to use is the smallest one possible – it’s not the player, it’s the win (or the run). The “MRP=Salary” identity tells you that a team’s last *run* should be valued at the revenue the run provides. It does not say that the last ten runs, or hundred runs, or million runs, should be valued at the revenue that many runs provide.

It’s impossible for both to be the same, since the value of the run changes as the team gets better.

------

So, how *do* you find MRP based on this model? Well, maybe you start by figuring out where the equilibrium is. Given the model, how will teams target their spending?

Start by recalling that JC has every team with exactly the same marginal revenue curve. That means that, since we’re assuming that every team is rational, they all have to be pursuing the same strategy. Since the average team has to be 81-81, that might be what they’re all shooting for.

But, as we saw, it’s not rational to shoot for 81-81 … if the 81st win is worth buying, then so is the 82nd, and 83rd, and so on, all the way to infinity.

So, it probably works like this: some teams trade away their talent to other teams. As a result, you wind up with two sets of teams. One set of teams targets 130-32 or something. The other set of teams winds up getting rich on other teams’ money by selling them talent and winding up deep in the second division.

In that case, the task is to figure out exactly where on the curve that equilibrium winds up. After crunching the numbers, you’ll probably wind up with something like, 15 teams at 130-32, and the other half at 32-130. Then, you can go to the revenue curve, and figure out the value of the 131st win (or, actually, the run that brings you from 130 wins to 130.1 wins). That becomes your MRP.

But that’s obviously not convincing, because it’s not true to real life. To get a realistic MRP, you need a realistic equilibrium. And the logical implication of your model is that half the teams should be at 130 wins and the other have at 130 losses … well, that certainly isn’t realistic.

The bottom line: in order to draw valid conclusions about revenue and wins, you need an accurate model. It’s just not possible to get a decent estimate for MRP by assuming team revenue follows 30 identical cubic equations.

Labels: , ,

## Wednesday, December 01, 2010

### The Economist reports on the cricket study

A few months ago, I reviewed a cricket study, by Shekhar Aiyar and Rodney Ramcharan, that purported to show that rookie players who debuted at home had longer careers than those who debuted on the road.

The authors argued that this was because cricket decision-makers looked at the players' raw statistics for their debut, and neglected to adjust for home-field advantage (HFA) when doing so. Therefore, they incorrectly concluded that the home players were better than the road players.

However, my take was that the data showed no such thing. I argued that while the "decision makers are improperly ignoring HFA" hypothesis was not refuted by the data, neither was the "decision makers are acting correctly" hypothesis. In other words, the sample was not large enough to distinguish the two hypotheses, and the authors simply privileged one of them over the other.

There's now a new version of the paper (.pdf); I haven't gone through it, but it looks pretty much the same as the old.

In addition, The Economist recently ran an article on the study. They reported on it uncritically, and I think they may have missed part of the point -- my reading is that they seem to believe the study also discovered HFA, instead of just quantifying it.

Labels: ,