Thursday, January 29, 2009

Tim Harford on NFL overtime

At Slate, economist Tim Harford lends support to the "auction" suggestion for NFL overtime.

As it stands now, possession on OT is determined by a coin toss; the team that loses the toss has to kick off from their own 30-yard line. But that's too big an advantage for the receiving team. From 2000 to 2007, Brian Burke reports that teams winning the toss won 60% of games.

Harford suggests that the 30-yard line was appropriate back when the rule was created in 1974, before field-goal kickers got so good. But now, it gives too much of an advantage to the flip winner.

He suggests, as did Brian before him, that the teams themselves decide what yard line is fair. One way to do this is to flip a coin again; the loser of the flip picks a yard line, and the winner of the flip then decides whether to start on offense (at that yard line), or defense (where the other team starts at that yard line).

Another possibility is an auction. The referee will start naming off yard lines, starting at the 1 and moving up. As soon as one of the coaches is willing to take the ball on offense at that line, he throws down his challenge flag.

Either one of these options sounds reasonable to me. I prefer the first one, because, after a few tries, there will emerge a consensus wisdom on what the right yard line is, and what looks to be the fun drama of the auction will get boring pretty quick. Plus, you might need a replay to see which coach was first, if they both choose the same outcome.

Labels: , ,

Thursday, January 22, 2009

Mid-season coaching changes are correlated with disappointing records

The new issue of JQAS is out. Its second article, by three Swedish researchers, is called "Coach Succession and Team Performance: The Impact of Ability and Timing – Swedish Ice Hockey Data." The idea is that you can use a regression to try to figure out whether a coaching change helps or hurts the team. Alas, I think the authors are confusing cause and effect.

Their regression predicts this year's winning percentage based on a bunch of factors: last year's winning percentage, the coach's career winning record, the number of games coached, whether there was a coaching change, and whether that coaching change came in mid-season.

They find that mid-season coaching changes are associated with bad records, and conclude that it takes time for the team to adapt to the new coach. But isn't it more likely that the bad record led to the coaching change, rather than the other way around? Teams don't fire their coach when he's having a good season.

Actually, it's possible that the current season's record is only for the new coach; but, still, you'd expect a bad record in those cases. The previous coach was fired when the team underperformed, and (even considering regression to the mean) the underperformance is partly intrinsic to the team's talent. So the performance should still be similarly sub-par with the new coach, even if the coaching change didn't make any difference at all.

Another result finds that within teams, there's a high correlation between the coach's previous record and the current record – for all six teams studied, it was between .72 and .77. I didn't really expect it to be that high. You could assume that teams go through stages of badness and goodness, and each is likely to be associated with one particular coach. But I didn't think that would bring the correlation up as high as it did. Then again, there are only 21 seasons in the study for each team.

Labels: ,

Sunday, January 18, 2009

Sabermetrics and Malcolm Gladwell's "Blink"

I just finished reading "Blink," my Malcolm Gladwell. (The book actually came out in 2005, but I'm slow.)

The book's subtitle is "The Power of Thinking Without Thinking." Purportedly, it's about intuition and non-analytical ways of solving problems. But I took away something different. To me, the common thread in the book is that sometimes, the way we try to solve complex problems doesn't always lead to the best result, and simpler methods, that you wouldn't think would be good enough, work much better.

What are these other simpler methods? The one most mentioned in reviews of the book I've seen is plain intuition. When the Getty Museum was approached to purchase what was purportedly an ancient Greek statue in 1983, they did all the requisite scientific analysis to rule out forgery. The marble was indeed of a kind found in Greece, and the surface had tarnished exactly as you'd expect of a statue that old.

But when experts viewed it, they immediately, within two seconds, knew it was a fake. How? They couldn't explain, and still can't. It just seems that with their years of expertise, they immediately felt something wasn't right. Their exact methods were within the "locked door" of their consciousness, and they still can't explain how they knew they were looking at a forgery.

So from the book's title, and the first chapter, and the reviews, you'd think the book was about how intuition trumps analysis – that scouts (to switch to a sports analogy) can spot things that sabermetricians can't. But, surprisingly, the rest of the book is kind of the reverse. It's not about how analysis isn't as good as intuition. It's about how a lot of analysis is wrong, and how simpler analyses are better. Instead of "intuition is better than analysis," it's more like "the kind of analysis that your intution suggests is wrong." To me, aside from the first chapter, the book vindicates sabermetrics over traditional scouting.

For instance, take Gladwell's example of trying to figure out whether a particular married couple will stay together or get divorced. When psychologists were allowed to watch and listen to the couples converse, they were able to guess right only 50% of the time, no better than luck.

But one researcher, John Gottman, figured out a better way. He analyzed a bunch of conversations and scored the partners on twenty different aspects of their argument – disgust, contempt, anger, defensiveness, stonewalling, whining, and so forth. Based on that coding, he was able to find a formula that was accurate over 95% of the time, given only an hour's worth of data. It turns out that "contempt" is the number one predictor of divorce; "stonewalling," "defensiveness," and "criticism" are the next three (in some order). From those four measurements, you can apparently get a phenomenal predictor.

The obvious analogy is forecasting a baseball player's performance. A scout, or a sportswriter, would look at the guy, and see how fast he is, and how smooth his swing looks, and what his batting stance looks like, and so on. But if you know what's relevant and what's irrelevant, you can limit yourself to a few basic statistics, and the guy's age; you wind up with something like the Marcels, with accuracy pretty close to the maximum.

Most of the book similarly supports the idea of analysis – and that the analysis that works better is often a lot simpler than the analysis that doesn't. When a patient walks into the emergency room with chest pain, how do you know if it's a serious heart problem or not? Every doctor uses his own intuition, with mixed results. But, at one Chicago hospital, an admistrator named Brendan Reilly came up with a simple algorithm that beat the doctors handily – 95 percent correct, versus "between 75 and 89 percent" correct. As Gladwell writes,

"For all the rigor of his calculations, it seemed that no one wanted to believe what he was saying, that an equation could perform better than a trained physician."


That seems like a defense of analysis over intuition rather than the other way around, doesn't it?



Labels: ,

Monday, January 12, 2009

Kaufman: "Ignorance is not a sportswriting skill"

In this Salon column, King Kaufman is critical of a certain breed of sportswriters, the ones who know little about sabermetrics and tout their ignorance as a virtue.

Labels: ,

Saturday, January 10, 2009

NFL home underdogs win late in the season

A while ago, Steven Levitt pointed out that betting on NFL home underdogs is a winning strategy. In the sample he covered, home dogs won more than 53% of the time.

In the comments to my last post on the subject, Brian Burke pointed me to an academic study that found the effect is almost completely caused by late season effects.

That paper is called "The Late-Season Bias: Explaining the NFL's Home Underdog Effect." It's by Richard Borghesi of Texas State University.

In a series of tables, Borghesi shows us that in weeks 15-18 of the NFL season, home teams tend to overperform relative to the spread. First, the raw numbers. In weeks 1-14, home teams were favored by an average 2.57 points, and actually won by 2.48 points. In the later weeks, the spread dropped slightly to 2.40 points, but the home teams won by 4.46 points. That is, in the 704 late-season games in the sample (1981-2000), the home team beat the spread by an average of 2.04 points.

The effect was even more extreme in the 188 playoff games, where the home team beat the 5.75 point spread by 2.85 points.

Here it is in an easier-to-read font:

Weeks 1-14: visiting team beat spread by 0.09 points
Week 15-18: home team beat spread by 2.06 points
Playoffs: home team beat spread by 2.85 points


If you consider only home underdogs, the effect persists. This is just the home dogs, from Borghesi's Table 3:

Weeks 1-14: home underdogs beat spread by 0.29 points
Weeks 15-18: home underdogs beat spread by 3.13 points
Playoffs: home underdogs beat spread by 11.33 points

The playoff effect is a huge 11.33 points, but there were only 18 games in the sample.

In Table 4, Borghesi shows that the effect is getting stronger over time. Here are the "late home underdogs" by era:

1981-1985: late home underdogs beat spread by 1.28 points
1986-1990: late home underdogs beat spread by 2.95 points
1991-1995: late home underdogs beat spread by 2.42 points
1996-2000: late home underdogs beat spread by 6.92 points


Why is the effect confined to the later weeks of the season? Could it be the weather? In Table 5, Borghesi defines "cold weather advantage" as a game in which a northern team hosts a southern team. The home advantage is greatest in the cold months:

Aug-Sept: cold weather teams failed to cover spread, by 1.36 points
October: cold weather teams beat spread by 0.84 points
November: cold weather teams beat spread by 1.49 points
Dec-Jan: cold weather teams beat spread by 1.93 points


Remember, none of these tables tell us anything about the relative skills of the teams – just how they did against the spread. Since the spread could (and probably does) adjust somewhat for these effects, we don't actually know whether warm weather teams play worse in the cold. It's possible that, say, the Dolphins actually play a touchdown *better* in Green Bay in December, but the spread corrects by 10 points, making the Packers three points better than the spread. Admittedly that's unlikely, but the point is that we can only draw conclusions about the betting line, and not about the teams themselves.

How much of the "home underdog" effect is caused by the weather? To find out, you'd have to take the total home underdog games, subtract out the weather-related home underdog games, and see what's left. Borghese doesn't give enough data for us to be able to do that, I don't think.

Borghesi then posits a few simple betting rules, and figures out how you'd have done if you'd followed them:

.4905 -- bet on early-season home teams
.5144 -- bet on early-season home underdogs

.5422 -- bet on late-season home teams
.6000 -- bet on late-season home underdogs
.6071 -- bet on late-season home 2+ (points) underdogs
.5789 -- bet on late-season home 8+ (points) underdogs


And, finally, Borghesi runs a couple of regressions. The first one tries to predict the winner based on which team is at home, and the spread. Its predictions are 55% accurate early in the season, and 57% accurate late in the season. However, the regression is run separately for each year, which means the results can take advantage of pretending to "understand" random year-to-year fluctuations. When the same regression formulas are used to predict *next year's* winners, the results drop back to 50%.

However: Borghesi finds that if you run the regression every week, based only on the last month's games, you can predict next week's winners late in the season at a 53% clip. But that's only a 361-317 record, which is less than 2 SDs from .500.

One last regression in the paper tries to retrospectively predict winners based not only on the spread and the home team, but various performance statistics of the two teams. This gives 67% accuracy, but so what? Again, the regression might just be picking up randomness and spurious correlations.

(Why are the regressions there? Maybe (he said, mischievously) you can't get a paper published just by counting and tabulating things, even if you answer the question better than by more involved methods.)

If you ignore the regressions, the paper does present very good evidence that the entire "home underdog" effect happens late in the season, and that weather is at least partially responsible for it.

Labels: , , ,

Sunday, January 04, 2009

The home underdog effect isn't holding up

A year ago, I posted about a Steven Levitt study showing that NFL bettors don't seem to like home underdogs. In the online tournament Levitt studied, it turned out that when the home team is the dog, only 31.8% of bettors backed it – the other 69.2% bet on the road favorite.

And, as it turned out, the home underdogs would have been good bets; they beat the spread almost 58% of the time. Levitt hypothesized that because bookies know that their customers prefer road underdogs, they shade the point spread in favor of the home team, in order to win more of those bets.

But in a post last week on Freakonomics, Levitt revisited the home underdog effect, and found that, over the past two seasons, it no longer held. In 2007, home underdogs were 44-45-1 against the spread in 2007, and 32-45-2 against the spread in 2008.

Why the change? Levitt thinks it's just luck, and not a change in the way bookmakers set their lines.

Thinking about the issue again, I wonder if the original finding, that 58% of home underdogs beat the spread, might not itself have been just luck. In Levitt's original paper, there were 2,286 bets on home underdogs. But that's the number of bets, not the number of games. There were only 85 NFL games studied in total. That means that there were probably less than 40 home underdogs to bet on. 58% of 40 games is only 23-17, which is really not that big a deal, is it?

When Levitt looked at 21 seasons' worth of NFL betting lines, instead of just those 85 games, home underdogs were still a good bet, beating the spread 53.3% of the time. That's significant, but not as impressive as 58%. Assuming 53.3% is the "real" figure, that means that in 2007-8, home dogs should have gone 90-79. They actually went 76-90-3, or 13 games below what they "should" have. That's almost exactly two standard deviations.

If you don't assume 53.3% is the "real" number, and instead you test, statistically, for the difference between the two means (53.3% earlier, and 45.9% later), you wind up with a bit less than two standard deviations. I'd agree with Levitt that it's probably just luck.

Labels: ,

Thursday, January 01, 2009

Measuring the Dolphins' improvement

The Miami Dolphins were 11-5 this year, improving by 10 games over their abysmal 1-15 record in 2007. Carl Bialik wonders just how historic an improvement this was. On the one hand, he says, the improvement of 62.5 percentage points (.625) is huge. But on the other hand, there are only 16 games in the NFL season, so we're really only talking about 10 games.

Because of the shorter season, changes in winning percentages in the NFL tend to be larger than in other leagues, at least those changes that arise due to luck. For a .500 team, the standard deviation of wins in the NFL is 2 (the square root of .500 times .500 times 16). Expressed in winning percentage, that works out to .125. By comparison, it's only .039 in baseball, and .055 in basketball or hockey (ignoring the NHL's extra standings point for an overtime loss).

To get the SD of the difference between two consecutive seasons, you multiply the single-season SD by the square root of 2 (about 1.414). So a typical between-seasons luck difference would be 2.8 games. And 5% of the NFL – that's 1 or 2 teams – should have a swing of more than 5.6 games.

But it's not likely that *all* of the Dolphins' improvement is random luck. A substantial amount is probably due to better talent.

One way to check is to look, not just at the win-loss record, but at the component stats. In 2007, the Dolphins' opponents outscored them by 170 points. In 2008, the Dolphins actually scored more than the opposition, by 28 points.

According to "The Hidden Game of Football," by Bob Carroll, Pete Palmer, and John Thorn, it takes about 36 points to turn a loss into a win. That means the Dolphins "should have" been about 3-13 last year, and 9-7 this year. By that standard, they were 2 games unlucky last year, and 2 games lucky this year. (That four game swing is only about 1.4 standard deviations, which is nothing special.) This means that their talent improved by six games: 1-15 last year, plus 2 games luck last year, plus 2 games luck this year, plus 6 games skill difference, adds up to 11-5.

You can drill down even deeper – instead of points, you can look at yards gained and allowed, penalties, and turnovers. Brian Burke did that for 2007 and 2008. He found that Miami was the unluckiest team in the NFL last year, by a long shot, winning 4.4 games fewer than expected. In 2008, the Dolphins were 0.9 games lucky. That's a swing of 5.3 games due to luck, which leaves the remaining improvement of 4.7 games attributable to talent.

It makes sense that Brian found more luck with his method than just by looking at points. The total random effect can be broken up into three components:

-- players having "lucky" years by playing over their head and accumulating gaudier stats than normal;
-- teams scoring more points than you'd expect based on their stats;
-- teams winning more games than you'd expect based on their points scored.

In general, teams with extreme records, or extreme changes, are likely to rank high in each of these three categories. The "points" method counts only the third; Brian's method counts only the last two. It's likely that if you had a method of looking at the individual players, both years, you'd find even more luck. I have no way of knowing for sure, but if I had to give a best estimate, I'd say the Dolphins' 11-game renaissance looks like about 6 games luck, and 5 games skill.

Labels: , ,