Friday, September 24, 2010

Are Traded Players "Lemons"? An updated study

Here's something kind of shocking I found when I looked at the performance of traded players.

I took all batters from 1901 to 1975 who started a season with a new team. I eliminated all players whose Marcel predictions had them projected to have fewer than 400 PA that year. I also eliminated all players who had less than 1,000 Runs Created for their career so far. So I'm left with 102 full-time players with good careers so far.

Then, for each of those players, I used similarity scores to find a control, the closest match in Marcels for that year among non-traded players. The control had to be the same age and play the same position.

So now I have two groups of 102 players each, controlled for age and position, with almost identical projections for that year. The only difference appears to be that one group was traded, and the other was not.

How do you think their actual performance would compare for the subsequent season?

If you thought the two groups would have similar performance that year, you'd be right. Their composite batting lines were very close.

So far, no surprises.

Now, consider moving beyond that season. You still have the same two groups, who are the same age, play the same position, had identical Marcels last year, and performed identically last year. The only difference between the two groups is that, in one of the groups, every player was traded before last season.

How would you expect the rest of their careers to match up?

This time, if you thought they'd be nearly identical, you'd be very wrong. It turns out that the control group played 60 percent longer than the traded group, and, in addition, was more productive -- by almost three quarters of a run created per 27 outs.


Why does that happen? I'm not sure, but I have some guesses (and full details with all the numbers) in a draft of a followup to my 2004 "Are Traded Players Lemons?" study. The draft of the new study is here (and contains a link to the old study).

I'd appreciate any comments you might have on the new study, and any other hypotheses you might have that I didn't think of.


(P.S. Many thanks to Jeff Sackmann, who published a database of Marcels last month, making the study possible.)


UPDATE: in light of some of the comments, especially from Guy, I've updated the study ... before, I thought there might be a lemons effect. Now, I'm not so sure.

Use the same link.

Labels: , ,

Sunday, September 19, 2010

Does MLB payroll matter less than it used to?

In MLB this year, team payroll barely matters. Money is less important in 2010 than in any season since 1994.

That's according to an article a few days ago in the Wall Street Journal. Unfortunately, I don't think that's right ... or at least, I can't reproduce the result.

The article says,

"According to estimated payroll figures released throughout the season, the correlation beteween a team's player payroll and its winning percentage is 0.14, a number that makes the relationship almost statistically irrelevant. That figure is 67 percent below last year's mark and is easily the lowest since the strike."

However, when I run the same correlation, I get a correlation of .36. Where does the .14 come from? My best guess is that it's the r-squared, since .36 squared equals about .13.

It's also possible that the author of the article used different salary data than what I used -- mine is USA Today data from the beginning of the season. But could that data be so wrong as to turn a .14 into a .36? I doubt it, especially considering USA Today has numbers very similar to Baseball Reference.

The 2010 numbers are roughly in line with what I get for 2008 and 2009:

2010: correlation of .36
2009: correlation of .48
2008: correlation of .29

It seems to me that 2010 is fairly normal. BTW, because the 2010 season isn't over yet, it's actually a bit lower now than it will likely wind up -- but only by a point or two.

The actual measures of payroll vs. wins, obtained from the regressions, are also similar:

2010: $ 8.9 MM in payroll associated with one win
2009: $ 6.2 MM in payroll associated with one win
2008: $12.6 MM in payroll associated with one win

These differences might look large, but they're really not, because of the wide confidence intervals associated with the estimates. For instance, the 2009 estimate has a 95% confidence interval of anywhere between $3.6 MM and $20.4 MM per win. The 2008 estimate is not even that different from zero wins per dollar spent, with a p value of .085.


Also, these results don't mean that a free-agent win actually costs this much ... other studies have shown the correct number is about $4.5 million per win. These numbers are higher because they look at team payrolls overall, and there are other ways to get wins other than free agents. Therefore, the connection between salary and wins is looser overall than if everyone were a free agent.

For instance, suppose team A has $50 MM worth of arbs and slaves good enough for 80 wins, while team B has $45 MM worth of arbs and slaves only good for 75 wins.

Team A buys 5 free-agent wins for $25 MM, bringing it to 85 wins. Team B buys 15 free-agent wins for $75 MM, bringing it to 90 wins. Overall, team B spent $45 MM more than team A, but has only 5 more wins to show for it. The regression shows that wins are associated with $9 MM in spending, when in reality free-agent wins cost only $5 MM each.

There are probably other scenarios that would give you a similar result.


The article says that back in 1998, the correlation between payroll and wins was a huge .71. Yup ... I ran a regression, and got .76 (the difference is probably because I used a different data source than the WSJ). The 1998 list is scary. Of the top 15 teams by payroll, only two finished below .500. And of the bottom 15 teams, only one finished above .500. In 1998, payroll was pretty close to destiny.

But that season may have been an anomaly. The WSJ article has a little graph of the trend, and 1998 was chosen for a mention because it's the highest point on the curve. Still, there's an obvious decline in correlation that takes place around 1999-2000. What might have caused that?

As I've mentioned before, a change in correlation doesn't necessarily mean that there's a real change in the relationship between the variables. If teams suddenly decide to all spend similar amounts, that will cause an apparent drop in correlation even if money is just as important as ever. (For instance, if you drop the seven highest-spending teams from the 2010 regression, as well as the seven lowest-spending teams, the correlation drops from .36 all the way down to .10, but the "dollars per win" value stays roughly the same, moving only from $9MM to $12MM.)

But there hasn't really been any payroll compression. In 1998, the SD of payroll was 43 percent of the mean. In 2008, 2009, and 2010, it was 44 percent, 38 percent, and 42 percent, respectively.

So what else could it be?

Was there a change in the labor agreement around then that somehow created more slaves and arbs? That would do it, because, the easier it is for poorer teams to keep cheap players, the easier it is for them to compete with a low payroll.

Or, are slaves and arbs better players now than they were then? Joey Votto will earn only $500,000 this year ... if there are more Vottos than there used to be, scattered around the league, that would weaken the link between payroll and success.

Or, maybe with the crackdown on PEDs, older players are retiring earlier, and so slaves and arbs are getting more playing time? I like this theory, but it doesn't really explain the low correlations during the steroid years of 2000-2003.

Any other ideas?

Labels: , , ,

Sunday, September 12, 2010

A study on how travel affects NBA teams

Something I didn't know: NBA teams score more points in the second half of the season than in the first half.

From the 1990-91 to the 2006-07 seasons, both the home and visiting teams scored almost a point higher later in the season than earlier. Specifically, the average score in the first half (that is, where both teams are playing games 1-41) was 100.0 to 96.5. In the second half, it was 100.9 to 97.4. (The home team's winning percentage was .608 in both cases.)

The difference of 0.9 points is statistically significant, more than 4.7 SDs away from zero. (The SD of home team points in a game was 12.9 in the first half, 12.7 in the second half. There were 9,206 games total in each half.)

I learned all this from a study in the October, 2010 issue of the "Journal of Sports Economics," which arrived in the mail last week. It's called "Travel Costs in the NBA Production Function," by Andrew W. Nutting.

Of all the findings in the study, I think that one is the most interesting. Anyone have any ideas why this happens, why scoring changes significantly as the season goes on? The study doesn't speculate. As you can tell by the title, its main concern is how travel affects team performance, so it concentrates on that issue.

The study's main regression predicts home team winning percentage based on a bunch of variables, repeated for home and visiting team. There are actually two identical regressions -- one for first half of the season, and one for second half. The variables were:

-- travel distance since last game
-- total travel distance in last 7 days (excuding since last game)
-- total travel distance in previous days 8-14
-- total travel distance in previous days 15-28
-- total travel distance more than 28 days ago (log)
-- days since last game
-- games in last 7 days
-- games 8-14 days ago
-- games 15-28 days ago
-- total games so far this season (log)
-- one of two dummies for this team (one for home and one for road).

As you might expect, the fatigue variables were significant, at least in the first half of the season. The more rest since last game, the better the chance of winning. But the visiting team effect was almost three times as large as the home team effect (.029 winning percentage points per rest day for the visitors, as compared to .011 for the home side).

Strangely, the number of games in the last seven days had the reverse effect -- the more recent games you played, the *better* your chances of winning this one. It's statistically significant for the home team, but almost zero for the visiting team.

Why would that be? I think maybe there's confounding with the "rest since last game" variable. Is there any reason to think that having four days rest before this game is twice as good as having two days rest? It's probably not. So maybe the "four days rest" estimate is too high. But, if you've had four days rest, you probably haven't played a lot of games in the last week. And so, the coefficient for that might come out negative, to offset the high "four days rest" estimate.

I bet that would disappear if you used dummies for days of rest, instead of the actual number.

But ... in the second half of the season, the direction is reversed. The more games in the last week, the *worse* the performance this game. Moreover, now the effect seems to have moved from the home team to the visiting team: the home team coefficient is not significant, but the visiting team is (.021).

Another puzzler: in the first half of the season, the more total games already played, the worse the performance -- an effect that's statistically significant at more than 3 SDs. But, again, the effect reverses in the second half! That puzzled me a bit. I suspect it's because the study actually uses the logarithm of games: that means the numerical distance between games 1 and 2 is the same as between games 32 and 64. In the first half of the season, there's a wide variation in log(games) -- from 0 (game 1) to 3.7 (game 41). In the second half, it runs only from 3.7 to 4.4. So if the effect exists but isn't logarithmic at all, that might be causing funny things to happen.

Other than those, there are a few other variables that come up as barely significant in one half, but not the other. This suggests to me that they're random noise -- with 20 variables in each half, you're bound to wind up with some of that.

One such variable is "log of the number of miles flown by the team as of a month ago." You'd think when that those come up significant, but only for the first half of the season, and only at the 10% level, you'd just dismiss them. And, indeed, the author writes, "This relationship indicates a very substantial lag of distance travelled in the win production function."


A second regression examines the effects of time zone change on the probability of winning.

However, there's a problem. The study uses a signed value that indicates the direction of travel -- East Coast to West Coast is -3, but West to East is +3. That means things will just wash out.

Suppose jet lag causes bad play whichever way you move. That means that -3 will be bad, and +3 will be bad, and 0 will be good. That creates a U-shaped curve. But the regression is looking for a straight line -- and the best-fit straight line through a U is simply horizontal!

And, indeed, the regression comes up with a near-zero coefficient for time zone change. But we can't tell whether that's because (a) there truly is no effect, or (b) there *is* an effect, but it's the same no matter which way you're travelling.


Finally, an apparent time-zone anomaly. When teams from the West play in the Eastern time zone, they get better. But when teams from the East play in a Western time zone, they get worse. This happens in both halves of the season, and both halves generally have the same pattern -- the more time zones difference, the larger the effect. For instance, here are the numbers for visiting teams in the first half:

3 zones west = -0.092
2 zones west = -0.076
1 zones west = -0.039
1 zones east = +0.037
2 zones east = +0.072
3 zones east = +0.085

I can't explain that, but, again, it could be some confounding. The regression includes the log of cumulative time-zone changes so far this season, and that might be less than linear (in the sense that twice the travel should give you less than twice the effect). So that variable might overstate the cumulative travel effect for west-coast teams (which travel more), and the "moving east" variable favors them in order to compensate.


My overall impression of the paper is that there are too many similar variables confounding things, and so it's hard to get a true idea of whether or not there's anything real happening here. But I still wonder why teams score more in the second half of the season.

Labels: , ,

Friday, September 03, 2010

Did the Yankees engineer themselves into unprofitability?

My last post talked about how the New York Yankees, baseball's richest team, is actually losing money in part because it makes large revenue sharing payments to help out the other teams.

In the comments, "Johnmeister" responded by arguing that the Yankees' reported earnings are understated, because they don't include local TV revenue. That's a fair point. In fact, I had forgotten that I had posted about that a few years ago. Back then, I had made the same argument -- that if you include the TV money, now the loss turns into a profit.

However, that may not be right. The New York article on which I based my previous post counts the $60 as part of Yankees revenues, which implies that the $60 million is already taken into account in computing the team's bottom line. So their $28 million loss in 2006 really *was* a loss.

But, after thinking about the TV situation a bit, I realized: the Yankees really *are* highly profitable, at least in the sense that really matters.

In 2001, the Yankees created YES, and sold it the rights to broadcast games for 15 years. At the same time, they sold 63 percent of the newly-formed network to Goldman Sachs for (as near as I can tell) about $535 million. (The Yankees kept the remaining 37 percent.)

So, basically, they sold about 2/3 of the TV rights to Yankee games. Those broadcast rights are a huge revenue generator, bringing in even more cash than the revenues from ticket sales. Effectively, the Yankees sold off a large chunk of the business.

Now, that sale wouldn't necessarily have to change the profitability of the team. The Yankees could just take the $535 million, invest it in some other business of roughly equal risk, and earn profits that might be about the same as the profits from YES. It's like when you sell one stock in your investment portfolio, and buy a different one. Your overall return should be about the same.

But: what if, instead, owner George Steinbrenner just took the $535 million and, instead of reinvesting it in the team, he just paid it to himself as a dividend? That's perfectly legitimate -- it was his team, after all, and he could do what he wanted.

But: by taking assets out of the Yankees, the team becomes less profitable. If the Yankees' 37 percent share of broadcast revenues was worth $60 million, that means the other 63 percent, the portion belonging to Goldman Sachs, was about $100 million. With a profit margin of 60 percent of revenues (according to the New York Times), that's a perennial $60 million in profits (as of 2006 -- it's probably more now) that the Yankees sold off.

If you add Goldman Sachs' $60 million back in, you get that the Yankees' baseball operation didn't really lose $28 million: it made a *profit* of $32 million!

Indeed, if Steinbrenner had wanted to, he could have make the loss even larger. For instance, he could have sold the remaining 37 percent of YES. He could have sold his concession profits by telling Goldman Sachs (or any other group of investors), "look, I made $5 million in concession profits last year -- make me a one-time payment of $50 million, and you can have those profits yourself, forever." He could even get rid of his entire gate receipts, by telling the world, "if you give me $1.5 billion in cash, you can take over ticket sales and keep every penny." And so on -- securitizing every stream of revenues he could think of -- luxury boxes, merchandise profits, and so on and so on.

Then, Steinbrenner would have paid himself a huge dividend of all the money raised.

If he did that, then the Yankees would have a massive loss every year. The team's expenses would have stayed the same, but all its revenues would be gone, sold off to others. The Yankees would still have had to pay all those salaries, and all those stadium expenses, and so on -- but they'd have no baseball revenues to pay them with. The result, year after year, would be a massive loss, maybe $300 million or more.

Steinbrenner would then have had to pay off the Yankee losses out of his own pocket. That wouldn't have been a problem, because he'd have pocketed at least a couple of billion dollars out of the sale of all those revenue streams, which he'd have invested elsewhere.

But the fact is that the Yankees would still be losing big money -- because almost all the assets of the team had been sold off. The *business* of Yankees baseball would still be profitable. But most of that business would no longer belong to the Yankees themselves.

What Steinbrenner did was just a small part of that. He didn't sell his ticket revenues, or his merchandise revenues, or his concession revenues. He just sold 63 percent of his local TV revenues. But that was enough to turn the profit into a loss.

To summarize: when you ask if the Yankees are making money, those can be two different questions:

1. If you take all the Yankees' baseball revenues, and subtract all the Yankees' baseball expenses, is there money left over?

2. If you take all the Yankees' baseball revenues *except for 63% of the TV money, which you've voluntarily sold to someone else*, and subtract all the Yankees' baseball expenses, is there money left over?

Question #1 is the one that makes the most sense, and shows the Yankees are profitable. Question #2 is not particularly relevant, but it seems to be the one the media cares most about, and the answer gives the impression that Yankees Baseball is an unprofitable business.

I'm not sure if George Steinbrenner arranged this just to make it look like the Yankees were money-losers. My guess is that was a side effect, and that the real benefit was that, by selling $100 million in annual revenues for a lump sum, it was sheltered from MLB's 31 percent revenue sharing "tax".

Still, all things considered, I'm sure MLB is not unhappy that the world thinks the Yankees are significantly less profitable than they actually are.

Labels: , , ,