Sabermetric Research: February 2011

Saturday, February 26, 2011

Why is there no home-court advantage in foul shooting?

There's a home-site advantage in every sport.

Why is that? Nobody knows. One hypothesis is that it's officials favoring the home team. One piece of data that appears to support that hypothesis is that when you look at situations that don't involve referee decisions, the home field advantage (HFA) tends to disappear. In "Scorecasting," for instance, the authors report that, in the NBA, the overall home and road free-throw percentages are an identical .759. Also, in the NHL, shootout results seem to be the same for home and road teams, and likewise for penalty kick results in soccer.

However, there's a good reason for the results to look close to identical even if HFA is caused by something completely unrelated to refereeing.

The reason is that free-throw shooting involves only one player. At the simplest level, you could argue that foul shooting is offense. All other points scored in basketball are a combination of offense and defense. Not only is the offense playing at home, but the defense is playing on the road, which, in a sense, "doubles" the advantage. Therefore, if the home free-throw shooting advantage is X, the home field-goal shooting advantage should be at least 2X.

That's an oversimplification. A better way to think about it is that a foul shot attempt is the work of one player. A field goal attempt, on the other hand, is the end result of the efforts of *ten* players. Not every player is directly involved in the eventual shot attempt, but every player has the potential to be. A missed defensive assignment could lead to an easy two points, and the offense will take advantage regardless of which of the five defensive players is at fault. The same for offense: if a player beats his man and gets open, he's much more likely to be the one who gets the shot away. The weakest or strongest link could be any one of the ten players on the court.

So it might be better to guess that the HFA for a possession is 10X, rather than just X. We can't say that for sure -- it could be that the things a player has to do on a normal possession are so much more complex than a free throw, that the correct number is 20X. Or it could be that a normal possession is less complex than a free throw, so perhaps 5X is better. I don't know the answer to this, but 10X seems like a reasonable first approximation.

------

What would the actual numbers look like?

The home court advantage in basketball is about three points. That means that instead of (say) 100-100, the average game winds up 101.5 to 98.5.

Three points per game, divided by 10 players, is 0.3 points per game per player. Over (say) 200 possessions, that's 0.0015 points per possession per player.

If home-court advantage were made up only of serious mistakes, mistakes that turn a normal 50 percent possession into a 100 percent or zero percent possession, then that works out to exactly one point per mistake. In that case, the average player would make one such extra mistake every 667 possessions. That's a little less than one every three games. If you assume that a mistake is worth only half a point, then it's one mistake per player for every 333 team possessions.

In reality, of course, it's probably not nearly as granular as "mistakes" or "good plays". It's probably something like this: a player plays his role with an overall average of 50 effectiveness units, random between possessions, plus or minus some variation. But that's an average of home, where he plays with an average of 51 effectiveness units, and road, with an average 49 effectiveness units.

Still, that doesn't matter to the argument: the important thing is HFA is one point per player for every 667 total team possessions, regardless of how it manifests itself.

------

Now, let's go back to free throws. I'm going to assume that a player's HFA on a single possession should be about the same as a player's HFA on a single free throw. Is that OK? It's a big assumption. I don't have any formal justification for it, but it doesn't seem unreasonable. I'd have to admit, though, that there are a lot of alternative assumptions that also wouldn't seem unreasonable.

But the point of this post is that it is NOT reasonable to assume that a player's HFA on a free throw should be the same as the overall HFA for an entire game. That wouldn't make any sense at all. That would be like seeing that the average team wins 50 percent of games, and therefore expecting that the average team should win 50 percent of championships. It would be like seeing that the Cavaliers are winning 17 percent of their games, and expecting that they score 17 percent of the total points.

In any case, the overall argument stays the same even if you argue that the HFA on a single possession should be twice that of a single free throw, or half, or three times. But I'll proceed anyway with the assumption that it's one time.

If the HFA on a free-throw is the same 0.0015 points per player as on a possession, then you'd expect the difference between home and road free throw percentages to be 0.15%. Instead of the observed .759 home and road, it should be something like .75975 home, and .75825 road.

Why don't we see this? Well, here's one possible explanation. Visiting teams are behind more often, so will commit more deliberate fouls late in the game. They will try to foul home players who are worse foul shooters. Therefore, the pool of home foul shooters is worse than the pool of road foul shooters, which is why it looks like there's no home field advantage in foul shooting.

Since we're talking about such a very small HFA in the first place, this doesn't seem like an unreasonable explanation. It would be interesting to run the numbers, but controlling for who the shooter is. I suspect if you have enough data, you'd spot a very small home-court advantage in foul shooting.

Labels: basketball, free throws, home field advantage, NBA

Monday, February 21, 2011

"Scorecasting" reviews

Coincidentally, Chris Jaffe and I both have reviews of "Scorecasting" out today. Here's Chris, at The Hardball Times. Here's me, at Baseball Prospectus.

Labels: Scorecasting

Thursday, February 17, 2011

Two issues of "By the Numbers" available

Two new issues of SABR's "By the Numbers" are now available at my website. One came out today, the other two weeks ago.

The issues are pretty thin, due to low submission volume. I hope to get more aggressive in asking online authors to allow us to reprint.

Labels: baseball, SABR

Saturday, February 12, 2011

Bleg: Know any good referee studies?

I've been invited to this year's MIT Sloan Sports Analytics Conference, to participate in the "Referee Analytics" panel. I guess if I'm going to be talking about refereeing, I should try to get up to date on some of the research that's been going on.

So, a bleg: could you guys refer me, either in the comments or online, to what you think is important research on refereeing/umpiring in any sport? Much appreciated.

Oh, and speaking of umpiring ... a couple of years back, I had a nine-post analysis of the study about umpires and racial discrimination. Recently, I distilled those posts into an article that ran in the Fall, 2010 issue of SABR's "Baseball Research Journal."

Here's a .PDF of that article. I recommend it over my original posts ... back then, I was trying to figure it out as I was going along. This article is a distillation of the analysis, and what I actually concluded. (If you do want the original posts, they're linked at my website.)

And, by the way, SABR has an archive of past BRJ articles. It's a pretty good resource. There's at least one Bill James piece, for instance.

Labels: race, referee bias, umpires

Wednesday, February 09, 2011

Packers win, casinos barely profit

It turns out that Las Vegas casinos didn't make a whole lot of money on this year's Super Bowl. According to the State of Nevada Gaming Board (.pdf), the sports books' profit was a mere 0.83 percent of the $87,491,098 wagered on the game. That's a total profit of only $724,176.

For the last ten years, the average profit was about 8 percent, or about 10 times higher.

If you make the assumption that all bettors need to wager $6 to win $5, then, on average, the casino keeps $1 of every $12 bet, which is 8.3%. That's not too far off from the actual amounts for the last decade, so let's assume that's the case, and see where it leads us.

Making that assumption and doing a bit of algebra, I get that the relationship between the amount of profit and the percentage bet on the winning team is

Percent Winners = (6 - 11 * profit percentage) / 6

Plugging 0.83% into the equation gives that 54.09% of bettors won their bets on Sunday.

Here are the numbers for all 10 years:

2011: 54.90% winners, 0.83% profit
2010: 50.01% winners, 8.30% profit
2009: 50.07% winners, 8.20% profit
2008: 56.07% winners, 2.90% loss
2007: 46.96% winners, 13.9% profit
2006: 49.47% winners, 9.30% profit
2005: 45.27% winners, 17.0% profit
2004: 46.20% winners, 15.3% profit
2003: 50.56% winners, 7.30% profit
2002: 52.74% winners, 3.30% profit

The highest proportion of winners was 56.07 percent, in 2008 (when the Giants beat the undefeated Patriots). That's pretty high. Put into baseball team terms (which I think all percentages should do), it's almost 91-71.

However, we need to take those percentages with a grain of salt, for a couple of reasons.

First, we assumed all bettors are betting $6 to win $5. That's not necessarily true. Big bettors probably get better odds than that. And some proposition bets probably pay worse odds than $5 to $6. Without knowing the expected percentage the house takes, the percentages of winners in the table are only rough estimates.

If you change your assumption to assume that bettors get a better deal than 5:6, the percentage of winners will move closer to 50% in every case.

UPDATE: here are the numbers assuming 10:11 odds:

2011: 51.94% winners, 0.83% profit
2010: 48.03% winners, 8.30% profit
2009: 48.09% winners, 8.20% profit
2008: 53.84% winners, 2.90% loss
2007: 45.10% winners, 13.9% profit
2006: 47.51% winners, 9.30% profit
2005: 43.48% winners, 17.0% profit
2004: 44.37% winners, 15.3% profit
2003: 48.56% winners, 7.30% profit
2002: 50.66% winners, 3.30% profit

Second: the usual assumption is that the casinos want to eliminate risk by having the same amount of money on both sides of the bet. That way, the bookies are certain to win a fixed amount: no matter what happens, they pay the winners with $5 of the losers' money, and keep the remaining $1.

So, one naive assumption is that the sports books weren't that great in predicting how bettors would behave. They obviously had the spread too low if 55% of bettors successfully picked the Packers -- and they nearly lost money because of it.

That might be true: the original line favored the Packers by 2.5 points. In response to the Packers attracting too much action, the bookmakers could have moved the spread to -3. But, because so many games are won by a field goal, 3 points might have been too big a gap from 2.5, and the pendulum might have swung too far towards the Steelers.

However, it's also possible that the bookmakers have an excellent idea of the "true" odds, and are willing to take a certain amount of additional risk if it's in their favor. For instance, suppose the casinos realized that the chance the Packers would beat the spread was only 45 percent. In that case, they might have been happy to take a bit more action on the Patriots. They assumed a bit more risk for that game, and it cost them -- but, over time, they make more money on average by going with the odds.

So, we can't really draw any detailed conclusions, because of our assumptions. However, we CAN say that:

1. If all bets were taken at 5:6 odds, then almost 55% of "pick 'em" bets on the Super Bowl were winners last Sunday.

2. Regardless, there was much more winning than usual on Sunday, enough to almost wipe out the oddsmakers' profits.

I know there are lots of gambling experts out there who might have enough information to explain what really did happen on Sunday (and correct any bad assumptions I may have made). Anyone?

Hat Tip: The Sports Economist

Labels: football, gambling, NFL

Friday, February 04, 2011

"Scorecasting" on players gunning for .300

A few months ago, I wrote about a study by two psychology researchers, Devin Pope and Uri Simonsohn. The study found that, for players hitting .299 in their last at-bat of the season, they wound up hitting well over .400 in that last at-bat. The authors concluded that it's because .299 hitters really want to get to .300, and, therefore, they try extra hard (and succeed).

But, really, that isn't the case. It's really just an illusion caused by selective sampling. When a player hitting .299 gets a hit to push him over .300, he is much more likely to be taken out (or held out) of the lineup, to preserve the .300. Therefore, it's not that they're more likely to get a hit in their last at-bat -- it's that their last at-bat is more likely to be one that results in a hit.

(For an analogy: when a game ends with less than 3 outs, the last batter probably hits well over .500 (since the winning run must have scored on the play). But that's not because the player rises to the situation; it's because, as it were, the situation rises to the player. When he gets a hit, he's the last batter because the game ends. When he doesn't, he's not the last batter.)

Since the original study and article, the authors have modified their paper a bit, saying that the batting average effect is "likely to be at least partially explained" by selective sampling. However, the data given in the previous posts does suggest that almost the *entire* effect is explained by selective sampling. (PDFs: Old paper; new paper.)

There is one part of the study's findings that's probably partially real, and that's the issue of walks. None of the .299 hitters walked in their last at-bat. That's partially selective sampling -- if they walked, they're still at .299, and stayed in the game, so it's not their last at-bat -- but probably partially real, in that .299 hitters were more likely to swing away.

(My results are in previous posts here and here.)

------

The study is given featured status in "Scorecasting," in the chapter on round numbers. However, while the authors of the original paper mention the selective sampling issue, the authors of "Scorecasting" do not:

"What's more surprising is that when these .299 hitters swing away, they are remarkably successful. According to Pope and Simonsohn, in that final at-bat of the season, .299 hitters have hit almost .430. ... (Why, you might ask, don't *all* batters employ the same strategy of swinging wildly? ... if every batter swung away liberally throughout the season, pitchers would probably adjust accordingly and change their strategy to throw nothing but unhittable junk.) ...

"Another way to achieve a season-ending average of .300 is to hit the goal and then preserve it. Sure enough, players hitting .300 on the season's last day are much more likely to take the day off than are players hitting .299."

"Scorecasting" treats these two paragraphs as two separate effects. In reality, the second causes the first.

You can read an excerpt -- almost the entire thing, actually -- at Deadspin, here.

------

One thing that interested me in the chapter was this:

"But no benchmark is more sacred than hitting .300 in a season. It's the line of demarcation between All-Stars and also-rans. It's often the first statistic cited when making a case for or against a position player in arbitration. Not surprisingly, it carries huge financial value. By our calculations, the difference between two otherwise comparable players, one hitting .299 and the other .300, can be as high as two percent of salary, or, given the average major league salary, $130,000."

The authors don't say how they calculated that, but it seems reasonable. A free-agent win is worth $4.5 million, according to Tom Tango and others. That means a run is worth $450,000. One point of batting average, in 500 AB, is turning half an out into half a hit. Assuming the average hit is worth about 0.6 runs and an out is worth negative 0.25 runs, that means the single point of batting average is worth a bit over 0.4 runs. That's close to $200,000.

That figure is higher than the authors' figure of $130,000. The difference is probably just that the authors used the average MLB salary, which includes players not yet free agents (arbs and slaves). However, they imply that the difference between .299 and .300 is worth more than other one-point differences. That might be true, but it would be nice to know how they figured it out and what they found.

------

Finally, two bloggers weigh in. Tom Scocca, at Slate, criticizes the original study. Then, Christopher Shea, at the Wall Street Journal, criticizes Scocca.

Labels: .300 hitters, baseball, psychology, Scorecasting

Tuesday, February 01, 2011

Scorecasting: are the Cubs unlucky, or is it management's fault?

The Chicago Cubs, it has been noted, have not been a particularly huge success on the field in the past few decades. Is Cubs' management to blame? The last chapter of the recent book "Scorecasting" says it's true. I'm not so sure.

The authors, Tobias J. Moskowitz and L. Jon Wertheim, set out to debunk the idea that the Cubs lack of success -- they haven't won a World Series since 1908 -- is simply due to luck.

How do they check that? How do they try to estimate the effects of luck on the Cubbies? Not the way sabermetricians would. Instead, the authors ... well, I'm not really sure what they did, but I can guess. Here's how they start:

"Another way to measure luck is to see how much of a team's success or failure can't be explaiend. For example, take a look at how the team performed on the field and whether, based on its performance, it won fewer games than it should have."

So far, so good. There is an established way to look at certain aspects of luck. You can look at the team's Pythagorean projection, which estimates its won-lost record from its runs scored and runs allowed. If it beat its projection, it was probably lucky.

Also, you can also compute its Runs Created estimate. The Runs Created formula takes a team's batting line, and projects the number of runs it should have scored. If the Cubbies scored more runs than their projection, they were somewhat lucky. If they scored fewer, they were somewhat unlucky.

But that doesn't seem to be what the authors do. At least, it doesn't seem to follow from their description. They continue:

"If you were told that your team led the league in hitting, home runs, runs scored, pitching, and fielding percentage, you'd assume your team won a lot more games than it lost. If it did not, you'd be within your rights to consider it unlucky."

Well, yes and no. Those criteria are not independent. If I were told that my team scored a certain number of runs, I wouldn't care whether it also led the league in home runs, would I? A run is a run, whether it came from leading the league in home runs, or leading the league in "hitting" (by which my best guess is that the authors meant batting average).

The authors do the same thing in the very same paragraph:

"How, for instance, did the 1982 Detroit Tigers finish fourth in their division, winning only 83 games and losing 79, despite placing eighth in the Majors in runs scored that season, seventh in team batting average, fourth in home runs, tenth in runs against, ninth in ERA, fifth in hits allowed, eighth in strikeouts against, and fourth in fewest errors?"

Again, if you know runs scored and runs against, why would you need anything else? Do they really think that if your pitchers give up four runs while striking out a lot of batters, you're more likely to win than if your pitchers give up four runs while striking out fewer batters?

(As an aside, just to answer the authors' question: The 1982 Tigers underperformed their Pythagorean estimate by 3 games. They underperformed their Runs Created by 2 games. But their opponents underperformed their own Runs Created estimate by 1 game. Combining these three measures shows the '82 Tigers finished four games worse than they "should have".)

Now, we get to the point where I don't really understand their methodology:

"Historically, for the average MLB team, its on-the-field statistics would predict its winning percentage year to year with 93 percent accuracy."

What does that mean? I'm not sure. My initial impression is that they ran a regression to predict winning percentage based on that bunch of stats above (although if they included runs scored and runs allowed, the other variables in the regression should be almost completely superfluous, but never mind). My guess is that's what they did, and they got an correlation coefficient of .93 ... or perhaps an r-squared of .93. But that's not how they explain it:

"That is, if you were to look only at a team's on-the-field numbers each season and rank it based on those numbers, 93 percent of the time you would get the same ranking as if you ranked it based on wins and losses."

Huh? That can't be right. If you were to take the last 100 years of the Cubs, and run a projection for each year, the probability that you'd get *exactly the same ranking* for the projection and the actual would be almost zero. Consider, for instance, 1996, where the Cubs outscored their opponents by a run, and nonetheless wound up 76-86. And now consider 1993, when the Cubs were outscored by a run, and wound up 84-78. There's no way any projection system would "know" to rank 1993 eight games ahead of 1996, and so there's no way the rankings would be the same. The probability of getting the same ranking, then, would be zero percent, not 93 percent.

What I think is happening is that they're really talking about a correlation of .93, and this "93 percent of the time you would get the same ranking" is just an oversimplification in explaining what the correlation means. I might be wrong about that, but that's how I'm going to proceed, because that seems the most plausible explanation.

So, now, from there, how do the authors get to the conclusion that the Cubs weren't unlucky? What I think they did is to run the same regression, but for Cub seasons only. And they got 94 percent instead of 93 percent. And so, they say,

"The Cubs' record can be just as easily explained as those of the majority of teams in baseball. ... Here you could argue that the Cubs are actually less unlucky than the average team in baseball."

What they're saying is, since the regression works just as well for the Cubs as any other team, they couldn't have been unlucky.

But that just doesn't follow. At least, if my guess is correct that they used regression. I think the authors are incorrect about what it means to be lucky and how that relates to the correlation.

The correlation in the data suggests the extent to which the data linearly "explain" the year-to-year differences in winning percentage. But the regression doesn't distinguish luck from other explanations. If the Cubs are consistently lucky, or consistently unlucky, the regression will include that in the correlation.

Suppose I try to guess whether a coin will land heads or tails. And I'm right about half the time. I might run a bunch of trials, and the results might look like this:

1000 trials, 550 correct
200 trials, 90 correct
1600 trials, 790 correct
100 trials, 40 correct

If I run a regression on these numbers, I'm going to get a pretty high correlation -- .9968, to be more precise.

But now, suppose I'm really lucky. In fact, I'm consistently lucky. And, as a result, I do 10 percent better on every trial:

1000 trials, 605 correct
200 trials, 99 correct
1600 trials, 869 correct
100 trials, 44 correct

What happens now? If I run the same regression (try it, if you want), I will get *exactly the same correlation*. Why? Because it's just as easy to predict the number of successes as before. I just do what I did before, and add 10%. It's not the correlation that changes -- it's the regression equation. Instead of predicting that I get about 50% right, the equation will just predict that I get about 55% right. The fact that I was lucky, consistently lucky, doesn't change the r or the r-squared.

The same thing will happen in the Cubs case. Suppose the Cubs are lucky, on average, by 1 win per season. The regression will "see" that, and simply adjust the equation to somehow predict an extra win per season. It'll probably change all the coefficients slightly so that the end result is one extra win. Maybe if the Cubs are lucky, and a single "should be" worth 0.046 wins, the regression will come up with a value of 0.047 instead, to reflect the fact that, all other things being equal, the Cubs' run total is a little higher than for other teams. Or something like that.

Regardless, that won't affect the correlation much at all. Whether the Cubs were a bit lucky, a bit unlucky, about average in luck, or even the luckiest or unluckiest team in baseball history, the correlation might come out higher than .93, less than .93, or the same as .93.

So, what, then, does the difference between the Cubs' .94, and the rest of the league's .93, tell us? It might be telling us about the *variance* of the Cubs' luck, not the mean. If the Cubs hit the same way one year as the next, but one year they win 76 games and another they win 84 games ... THAT will reduce the correlation, because it will turn out that the same batting line isn't able to very accurate pinpoint the number of wins.

If you must draw a conclusion from the the regression in the book -- which I am reluctant to do, but if you must -- it should be only that the Cubs' luck is very slightly *more consistent* than other teams' luck. But it will *not* tell you if the Cubs' overall luck is good, bad, or indifferent.

------

So, have the Cubs been lucky, or not? The book's study doesn't tell us. But we can just look at the Cubs' Pythagorean projections, and runs created projections. Actually, a few years ago, I did that, and I also created a method to try to quantify a "career year" effect, to tell if the team's players underperformed or overachieved for that season, based on the players' surrounding seasons. (For instance, Dave Stieb's 1986 was marked as an unlucky year, and Brady Anderson's 1996 a lucky year, because both look out of place in the context of the players' careers.)

My study gave a total of a team's luck based on five factors:

-- did it win more or fewer games than expected by its runs scored and allowed?
-- did it score more or fewer runs than expected by its batting line?
-- did its opponents score more or fewer runs than expected by their batting line?
-- did its hitters have over- or underachieving years?
-- did its pitchers have over- or underachieving years?

(Here's a PowerPoint presentation explaining the method, and here's a .ZIP file with full team and player data.)

The results: from 1960 to 2001, the Cubs were indeed unlucky ... by an average of slightly over half a win. That half win was comprised of about 1.5 wins of unlucky underperformance of their players, mitigated by about one win of being lucky in turning that performance into wins.

But the Cubs never really had seasons in that timespan in which bad luck cost them a pennant or division title. The closest were 1970 and 1971, when, both years, they finished about five games unluckier than they should have (they would have challenged for the pennant in 1970 with 89 wins, but not in 1971 with 88 wins). Mostly, when they were unlucky, they were a mediocre team that bad-lucked their way into the basement. In 1962 and 1966, they lost 103 games, but, with normal luck, would have lost only 85 and 89, respectively.

However, when the Cubs had *good* luck, it was at opportune times. In 1984, they won 96 games and the NL East, despite being only an 80-82 team on paper. And they did it again in 1989, winning 93 games instead of the expected 77.

On balance, I'd say that the Cubs were lucky rather than unlucky. They won two divisions because of luck, but never really lost one because of luck. Even if you want to consider that they lost half a title in 1970, that still doesn't come close to compensating for 1984 and 1989.

------

But things change once you get past 2001. It's not in the spreadsheet I linked to, but I later ran the same analysis for 2002 to 2007, at Chris Jaffe's request for his book. And, in recent years, the Cubs have indeed been unlucky:

2002: 67-95, "should have been" 86-76 (19 games unlucky)
2003: 88-74, "should have been" 86-76 (2 games lucky)
2004: 89-73, "should have been" 90-72 (1 game unlucky)
2005: 79-83, "should have been" 86-76 (6 games unlucky)
2006: 66-96, "should have been" 82-80 (16 games unlucky)
2007: 85-77, "should have been" 88-74 (3 games unlucky)

That's 42 games of bad luck over seven seasons -- an average of 7 games per season. That's huge. Even if you don't trust my "career year" calculations, just the Pythagoras and Runs Created bad luck sum to almost 5.5 of those 7 games.

So, yes ... in the last few years, the Cubs *have* been unlucky. Very, very unlucky.

------

In summary: from 1960 to 2001, the Cubs were a bit of a below-average team, with about average luck. Then, starting in 2002, the Cubs got good -- but, by coincidence or curse, their luck turned very bad at exactly the same time.

----

But if the "Scorecasting" authors don't believe that the Cubs have been unlucky, then what do they think is the reason for the Cubs' lack of success?

Incentives. Or, more accurately, the lack thereof. The Cubs sell out almost every game, win or lose. So, the authors ask, why should Cubs management care about winning? They gain very little if they win, so they don't bother to try.

To support that hypothesis, the authors show the impact (elasticity) of wins on tickets sold. It turns out that the Cubs have the lowest elasticity in baseball, at 0.6. If the Cubs' winning percentage drops by 10 percent, ticket sales drop by only 6 percent.

On the other hand, their crosstown rivals have one of the highest elasticities in the league, at about 1.2. For every 10 percent drop in winning percentage, White Sox ticket sales drop by 12 percent -- almost twice as much.

But ... I find this unconvincing, for a couple of reasons. First, if you look at the authors' tables (p. 245), it looks like it takes a year or so after a good season for attendance to jump. That makes sense. In 2005, it probably took a month or two for White Sox fans to realize the team was genuinely good; in 2006, they all knew beforehand, at season-ticket time.

Now, if you look at the Cubs' W-L record for the past 10 years, it really jumps up and down a lot; from 1998 to 2004, the team seesawed between good and bad. For seven consecutive seasons, they either won 88 games or more (four times), or lost 88 games or more (three times). So, fan expectations were probably never in line with team performance. Because the authors predicted attendance based on current performance, rather than lagged performance, that might be why they didn't see a strong relationship (even if there is one).

But that's a minor reason. The bigger reason I disagree with the authors' conclusions is that, even when they're selling out, the Cubs still have a strong incentive to improve the team -- and that's ticket prices. Isn't it obvious that the better the team, the higher the demand, and the more you can charge? It's no coincidence that the Cubs have the highest ticket prices in the Major Leagues (.pdf) at the same time as they're selling out most games. If the team is successful, and demand rises, the team just charges more instead of selling more.

Also, what about TV revenues, and merchandise sales, which also rise when a team succeeds?

It seems a curious omission that the authors would consider only that the Cubs can't sell more tickets, and not that total revenues would significantly increase in other ways. But that's what they did. And so they argue,

"So, at least financially, the Cubs seem to have far less incentive than do other teams -- less than the Yankees and Red Sox, and certainly less than the White Sox. ... Winning or losing is often the result of a few small things that require extra effort to gain a competitive edge: going the extra step to sign the highly sought-after free agent, investing in a strong farm team with diligent scouting, monitoring talent, poring over statistics, even making players more comfortable. All can make a difference at the margin, and all are costly. When the benefits of making these investments are marginal at best, why undertake them?"

Well, the first argument is the one I just made: the benefits are *not* "marginal at best," because with a winning team, the Cubs would earn a lot more money in other ways. But there's a more persuasive and obvious argument. If the Cubs have so small an incentive to win, if they care so little that they can't even be bothered to hire "diligent" scouts ... then why do they spend so much money on players?

In 2010, the Cubs' payroll was $146 million, third in the majors. In 2009, they were also third. Since 2004, they have never been lower than ninth, in a 30-team league. Going back as far as 1991, there are only a couple of seasons that the Cubs are below average -- and in those cases, just barely. In the past 20 years, the Cubs have spent significantly more money on player salaries than the average major-league team.

It just doesn't make sense to assume that the Cubs don't care about winning, does it, when they routinely spend literally millions of dollars more than other teams, in order to try to win?

Labels: baseball, Cubs, luck, payroll, regression, Scorecasting

Sabermetric Research

Saturday, February 26, 2011

Why is there no home-court advantage in foul shooting?

Monday, February 21, 2011

"Scorecasting" reviews

Thursday, February 17, 2011

Two issues of "By the Numbers" available

Saturday, February 12, 2011

Bleg: Know any good referee studies?

Wednesday, February 09, 2011

Packers win, casinos barely profit

Friday, February 04, 2011

"Scorecasting" on players gunning for .300

Tuesday, February 01, 2011

Scorecasting: are the Cubs unlucky, or is it management's fault?

About Me

My stuff

Hardcore Sabermetric Research Links

Other Sports Research Links

Medium Core Sabermetric/Baseball Links (more to come)

More Baseball Stuff

Blogroll

Previous Posts

Archives