## Thursday, August 10, 2006

### NCAA home field advantage estimated within 14 points

How much blood can one study try to squeeze from a tiny little stone?

A lot. This academic study by Byron J. Gajewski, “There’s no Place Like Home: Estimating Intra-Conference Home Field Advantage Using a Bayesian Piecewise Linear Model,” tries to estimate home field advantage in Big 12 NCAA football – by using a sample of only 432 intra-conference games from 1996-2004.

With 432 games, the standard deviation of winning percentage for a .600 team is about .023. So, even if you observed a home winning percentage of .600, the 95% confidence interval would still be (.553, .647) – a pretty wide interval. So you’re not going to get all that much useful information from a sample of this size.

But the study is much more ambitious than even that – it tries to estimate a separate home field advantage (HFA) for each of the twelve teams. The assumption that each team has its own particular home field advantage is, I suppose, not unreasonable, but trying to find it off a sample of only 72 games per team seems like overreaching.

Further, those 72 games are played over nine years. If you assume that every team is going to have a different home field advantage, wouldn’t you also assume that it could vary from year to year, along with the players? This is college football, where there’s complete turnover every four years. Why assume that the 1996 Sooners will have the same HFA as the 2002 Sooners? The author calls the assumption “reasonable” because “the fan base is likely to be very stable,” making the implicit assumption (and I think I’ve seen at least one study disproving this for baseball) that HFA is a function of attendance.

There’s still more complexity. The study doesn’t just figure out the difference between a team’s home record and its road record. It actually tries to estimate each team’s intrinsic quality, and the quality of its opponent. And that’s hard to do. You can’t just take the season record, because of luck. You could take each season and regress it to the mean, but then you’re ignoring information about the previous or next season. For instance, a team that goes 6-2 is perhaps really a 5-3 team that got lucky -- but a team that goes 6-2 between 8-0 seasons might actually have 6-2 talent, or even 7-1.

The study chooses to solve this problem by fitting a straight line over the nine years, but allowing it to change direction at three fixed points. (That is, the best-fitting four straight lines of certain fixed length with no discontinuity.) This seems reasonable, but other, equally reasonable decisions could give substantially different results, especially with so few data points.

Finally, the author uses a Bayesian model and a simulation via “Markov Chain Monte Carlo” to get the final results. I’m not sure how this affects the conclusions, but some of them are unexpected. For instance, over the years of the study, Baylor was .167 (6-30) at home but .000 (0-36) on the road. A naive estimate of the home field advantage would be half of the difference, or .083. Gajewski’s method comes up with -.025, suggesting that Baylor was actually worse at home than on the road. (Part of the reason, presumably, is that they faced easier opponents at home, a fact the naive method wouldn’t consider.)

Here’s the full list. The study’s numbers are approximate, as I had to read them off a graph.

Team ....... Naive estimate .. Study estimate

Baylor ........ .083 .......... -.02
Iowa State .... .069 ........... .15
Kansas State .. .069 ........... .03
Kansas ........ .097 ........... .10
Missouri ...... .083 ........... .12
Oklahoma State. .083 ........... .15
Oklahoma ...... .069 ........... .08
Texas A&M ..... .097 ........... .11
Texas Tech .... .139 ........... .17
Texas ......... .111 ........... .05

While I admittedly don’t understand all of the Bayesian and Monte Carlo aspects of the study, I can’t imagine how this small amount of data, with so many variables, could yield anything close to an accurate estimate of anything.

And the study admits it. The estimates in the table above have very wide 90% error bars -- estimating from the graph, about .15 (almost exactly one touchdown) in each direction. The Oklahoma State home field advantage could be as low as zero points, or as high as 14 points. Which, really, isn’t anything we didn’t already know.