Sabermetric Research: How dominant should the Yankees be to maximize league revenues?

Every couple of years, you read complaints about how World Series TV ratings are going to be very low because the pennant winners are small market teams. The implication is that if Major League Baseball had a choice, they would make a lot more money if they could arrange for big city teams to win most of the time.

If that's true for the World Series, it's also true for the regular season. If winning lots of regular season games increases attendance by 10%, then it's better for the league as a whole if it's the Yankees do the winning. Ten percent of the Yankees' revenues is much, much higher than ten percent of the Expos' revenues.

At some point, though, that arrangement could backfire. If the Yankees are 15 games up by the end of May, fan interest in Baltimore and Toronto might drop so much that even higher attendance in New York won't make up for it. Therefore, there must be some kind of equilibrium, where making the Yankees good, but not *too* good, maximizes the revenues of MLB as a whole.

Finding that equilibrium is the goal of a 1976 study (from the American Economic Review, JSTOR access required) by Joseph W. Hunt Jr., and Kenneth A. Lewis, "Dominance, Recontracting, and the Reserve Clause: Major League Baseball."

Hunt and Lewis run regressions on team-seasons from 1969-73, to predict home and road attendance based on a bunch of factors. These factors are:

-- metro population
-- if the team is a division winner, its average lead over the season
-- a weighted average of games behind on August 15 and end of season
-- that same weighted average for last year
-- a dummy variable (2, 1, or 0) for the length of time since the last pennant
-- the number of superstars on the team (45 HR or 25 wins)
-- a bunch of other stuff specific to the team – stadium, ticket price, etc.

Armed with their attendance estimators, they can then compute expected revenues (the sum of: home attendance, road attendance, post-season, and TV money). I won't say much about the results of the regression, because the signs are about what you'd expect in terms of direction and magnitude.

But having done all that, the authors now want to figure out exactly where the equilibrium level of dominance is, in terms of winning the division. How often should the city with the highest population -- call it the Yankees -- win the AL East, in order to maximize the entire division's total revenues?

To calculate this, the authors have to do some additional work; that's because their regression doesn't include "chance of winning the division". They must therefore figure out the relationship between the proabability of winning, and those other variables above.

To do that, they run more regressions to try to relate the known dependent variables listed above (average games behind, number of superstars) to probability of winning the division. Based on that, they sketch out – by experimenting -- what a typical division might look like, for a given Yankee probability of winning the division.

Suppose you wanted to construct a division where the Yankees win 30% of the time. The authors have it looking like this:

Team ...... prob ... pop .... GBL .. lead ... flag .. stars
-----------------------------------------------------------
Yankees .... 30% ... 7.0 .... 9.0 ... 2.0 ... 1.67 ... 0.36
Team 2 ..... 25% ... 3.5 ... 16.0 ... 1.5 ... 1.20 ... 0.34
Team 3 ..... 15% ... 3.0 ... 18.0 ... 1.0 ... 0.50 ... 0.32
Team 4 ..... 15% ... 2.5 ... 20.0 ... 1.0 ... 0.50 ... 0.32
Team 5 ..... 10% ... 2.0 ... 22.0 ... 0.5 ... 0.45 ... 0.30
Team 6 ...... 5% ... 1.5 ... 25.0 ... 0.2 ... 0.23 ... 0.23

To read the top row of the chart, for the Yankees to have a long-term 30% chance of winning the division (with a population of 7.0 million), they would average 9.0 games behind the leader, they would lead the division by an average 2.0 games throughout the 30% of seasons they win, the time since their last pennant would be 1.67 out of 2, and they would have 0.36 of a superstar on their team.

With their model constructed, the study can go back to its original regression to see how much revenue each team will now have. Totalling up all six teams gives $29,307,095.

That $29.3 million is when the Yankees win 30% of divisions. The maximum revenue occurs when the Yankees actually win 43% of divisions. Here's their chart:

20% ... $28.5 million
30% ... $29.3
40% ... $29.4
50% ... $29.3
60% ... $28.9
70% ... $28.5
80% ... $28.0
90% ... $27.2
100% .. $25.1

Hunt and Lewis write,

"Over the past thirty years, the level of domination achieved by the New York Yankees has been 50 percent, while the that achieved by the Los Angeles/Brooklyn Dodgers has been about 37 percent. The long-run experience is consistent with the predicions implied by the experiments, indicating that the recontracting market [where players are sold from small market teams to the Yankees and Dodgers] operates reasonably well over the long run."

From an economic standpoint, what the authors are saying is this: as predicted by the Coase Theorem, players will go where they can earn the most revenue for a team. Since a superstar makes a lot more money for the Yankees than he would on the Brewers, the Brewers should be selling players to the Yankees so as to maximize both teams' revenues, but only until the Yankees are good enough to win the division 43% of the time. And the study supports the idea that this happens.

But I don't think the technique is strong enough to support the conclusions. Notice that the revenue estimates are very close – the 30% figure is only 3% higher than the 70% figure. Are the regressions, based on a small amount of data as they are, reliable enough that we can trust the results within 3%? I think it's obvious that we cannot.

And some of the regression variables seem arbitrary – for instance, the dummy variable for pennants is set to 2 if the team won in the past four years, and 1 if they won within the past nine years. Is that really what we'd expect? What if teams that haven't won in a long time – say, the Cubs and Red Sox – generate attendance *because* of that? And why should the *average* number of games ahead matter? Won't a close race attract more fans than a runaway championship?

Generally, my gut says that if you changed the variables used and the experiment even a little bit, you'd get substantially different results.

My feeling is that the method behind this paper is reasonable, and if you had perfect data, it might work. But I'd bet that you'd need so much data, and there are so many unknown variables driving attendance and revenues, that even doing the best you can, with the best information available – as Hunt and Lewis did – just isn't going to be enough.

Labels: baseball, competitive balance, economics

Sabermetric Research

Sunday, January 07, 2007

How dominant should the Yankees be to maximize league revenues?

2 Comments:

About Me

Previous Posts