Wednesday, August 29, 2012

Do NHL teams draw more fans when they're more likely to win?

When you do a regression, you've got to figure out not just what the coefficients are, but also what they *mean*.  Sometimes, there's no obvious answer.

I came across a recent hockey study (which will remain nameless) that tries to figure out if teams sell more tickets when they're more likely to win the game.

They did a regression, to estimate attendance (actually, the log of attendance) based on these variables:

-- previous season attendance
-- new arena
-- first game of season
-- home previous season win %
-- home current season win % to date
-- home goals scored per game to date
-- home goals allowed per game to date
-- visitor win % to date
-- visitor goals scored per game to date
-- visitor goals allowed per game to date
-- visitor penalty minutes allowed per game to date
-- home team game number (actual and squared)
-- franchise and season dummies
-- probability of home team winning (Vegas odds).

The last coefficient, "probability of home team winning," was significant and positive.  The authors concluded that fans are attracted to the games where the home team is more likely to win.

But ... that's not necessarily the case. 

What the regression might suggest is that more fans show up to the high-probability games *holding all the other variables constant*.  That changes everything.

Let's hold those other variables constant.  Suppose the home team was .600 last year, is .600 this year, and has scored 2.5 goals and allowed 2.0 goals per game so far this year.  And the visiting team is .470 this year, and has scored 3.0 goals and allowed 3.25 goals per game so far this year. 

Holding all those things constant, more fans show up when the odds of winning are better for the home team.

But what does that actually mean? 

Don't the odds of winning depend almost completely on those numbers, the ones we're holding constant?  A .600 team playing a .450 team should be expected to win, say, 70% of its home games (making that number up).  

If the Vegas odds are higher than 70%, what does that mean?  It means that all those performance numbers aren't an adequate indication of the team's quality.  Why might that be?  Maybe an opposing star player is injured.  Maybe Sidney Crosby has just come back from injury.  Maybe the home team's rookies have started to blossom, or it made a good trade recently.  Maybe ... well, there are probably other things. 

In any case, it's hard to see how any of that stuff would lead to the conclusion that the home fans care about the probability of winning.  Couldn't they just be responding to Sidney Crosby being back?  Might they not just want to see the new superstar they just acquired?  Or, maybe fans respond more to a team that's gotten better recently, rather than a team that's been the same quality for a couple of years now.

Or, maybe they're more likely to come to the game if the *opposition's* star player was injured.  But that doesn't sound right.  Aren't visiting teams' stars supposed to be an attraction?

In any case, the conclusion "fans respond to probability of winning" doesn't necessarily follow from the regression -- because the regression coefficients have to be interpreted while keeping everything constant. 


By the way, here's what I think is really going on.  The study includes factors for the quality of the teams, but they're not that accurate.  The teams' records on a "so far this year" basis are almost meaningless early in the season.  For the first few games played, they're mostly random. 

Therefore, many of the games have only an imperfect measure of how good the teams are.  Therefore, the Vegas odds pick up the slack.  And, therefore, what that coefficient might be telling you is that a team draws more fans *when they're a better team*.  That's not news.

But ... looking again at the list of variables, there's a measure of the home team's record last year, but not the visiting team's.  So, it could also be that the results hold because fans come out more for bad visiting teams than good ones.  Except that the regression appears to show higher attendance for better visiting teams ... but, again, you interpret the coefficients that way, you again have to hold everything else constant, including win probability.

There's no real way to tell from this regression whether it's the good home team or the bad visiting team causing the effect.  My gut says it's all the home team, but I could be wrong.


More importantly, what exactly is it that the authors are trying to figure out? 

They talk about whether fans come out more when there's a better chance of winning.  Well, the chance of winning depends almost completely on two things: the quality of the home team, and the quality of the visiting team. 

We already know, don't we, that a winning home team draws more fans?  If the authors agree that we already know that, then they must be wondering whether more fans come out for a bad visiting team. 

In that case, why include the visiting team quality variables, which eliminate the possibility of answering that question? 

Labels: ,


At Wednesday, August 29, 2012 3:41:00 PM, Anonymous mettle said...

What kind of regression did they actually do?
Simultaneous multiple? Hierarchical? Each variable one at a time?
Those sorts of things matter and could help sort out the (obvious) colinearity between all these variables.

At Wednesday, August 29, 2012 3:45:00 PM, Blogger Phil Birnbaum said...

Simultaneous multiple regression ...
"standard censored normal regression model." Right-censored because arenas have attendance limits.

At Wednesday, August 29, 2012 4:14:00 PM, Anonymous David said...

So, other than the censoring, they just did single equation, plain vanilla linear regression?

If so, then yes, I totally agree with Phil. Even if not, the fact that their controls are not symmetric (past season home and past season visiting, etc.) suggests major data mining.

At Wednesday, August 29, 2012 4:38:00 PM, Anonymous Anonymous said...

isn't there a major problem with running a regression on attendance without factoring in the price of the tickets? if the team was expected to be better and management raised prices in line with elasticity they could keep attendance flat but still realize demand growth in increased attendance revenue.

i appreciate the fact that attendance numbers are public record while club finances are not, but i'd remain skeptical of any results that assume Quantity equals Demand without factoring in Price.

At Wednesday, August 29, 2012 4:56:00 PM, Blogger Phil Birnbaum said...

Pretty sure it's just normal regression. If you want to see the study: current (Aug.) issue of JSE, second article.


Post a Comment

<< Home