Sabermetric Research: June 2009

There are a lot of NBA games that end up tied after 48 minutes – almost twice as many as expected.

A post from the "Cheap Talk" blog charted score differences for every NBA game between 1997 and 2009. It found a fairly smooth curve, except at zero, where that particular outcome was about twice as frequent as expected.

As it turns out, the spike comes only in the last few seconds of the game. As the authors show in a video on their post, there is barely any spike at all with 40 seconds left. The spike starts to emerge at about 20 seconds, and then grows steadily until 0:00.

Why does this happen? The authors don't really give a hypothesis. One obvious reason, though, is that for the team that's behind, closing the deficit is worthless unless they tie or take the lead. In the first quarter, a team down by three might go for the easy two-pointer instead of the unlikely three. But, with five seconds to go, only the three has any value. Therefore, it's either a tie or nothing.

And if you look at the video again, you'll see that it's not just that zero spikes, but that the values surrounding zero drop. That makes sense – if the team behind by three fails to make it, they'll start fouling the opposition. The most likely result is that they fall further behind. So you get a spike at zero, but a drop at (say) –3, because a minus three with five seconds left gets turned into a minus six or something.

Another possibility (as mentioned by commenters to the post) is that teams are overly conservative. With a tie game and five seconds left, the team with possession might decide to run out the clock instead of risking a turnover. Or, a team behind by two might go for the field goal instead of the three, even if the three is the better strategy. Consider two equally-matched teams. A 40% chance to make two points (and tie) gives you a 20% chance of winning the game. A 30% chance to make a three (and win) gives you a 30% chance of winning the game. A conservative coach might go for the two anyway.

That possibility needs more investigation; I wouldn't accuse teams of playing suboptimally without a bit more evidence. I have to admit, though, that it does have some intuitive plausibility.

UPDATE: follow-up post at Cheap Talk here.

Hat tip: The Sports Economist

Labels: basketball, NBA

The Royal Canadian Golf Association (RCGA), Canada's governing body for golf, has a committee to consider updating the system by which a golfer's handicap is computed. Tim B. Swartz, the committee's statistical guy, has a paper in the most recent JQAS explaining the new proposed system.

I'm going to simplify things a bit and explain the situation as I understand it from the paper. Leaving out the technical adjustments (even at the cost of a bit of inaccuracy), I'll describe the current handicap system like this:

You start with your 20 most recent scores (with a few adjustments that I'll discuss later), relative to par. Then, you drop the 10 worst scores, leaving only the 10 best. You average those 10 best. That's your handicap.

Why change this system? For one thing, as Swartz points out, your handicap isn't a true indication of your expected score; golfers fail to shoot their handicap more often than not. As I see it, you're taking the average of the top half, so you'd expect the handicap to be at about the 25% mark. If you assume that scores are normal, then since the normal curve is fatter near the middle, it's a bit more than 25%. As it turns out, the mean of the right half of the normal curve is about ~~.6367~~ .798, and the chance of beating a Z-score of ~~+.6367~~ +.798 is about ~~26.2%~~ 21.2%. So you'd expect a golfer to beat his handicap about 21% of the time.

Swartz checked, using a database of scores from a golf club in Alberta. As it turns out, golfers actually beat their handicap 36% of the time, not 21%. Maybe I made a mistake in the calculation; maybe golf scores aren't really normal; or maybe the various adjustments are causing the difference.

Another problem with the current system is that in casual head-to-head play, it favors the better golfer. Swartz generated a bunch of random matches from his database, and found that the better golfer won 55 percent of the time, rather than 50 percent.

A third problem, and an important one, is that in multi-player tournaments, the winner is likely to be a golfer with a higher handicap. That's because a bad golfer, with a handicap of (say) 20 (which represents a score of about 92), could reasonably have a very good day and shoot an 80, finishing at -12. But a scratch golfer (0 handicap, 72 average) is much less likely to match the -12 by shooting a 60 on the day.

The more players in the tournament, the more likely someone will have a much better game than normal. And those "much betters" are likely to be from the worst golfers.

In his simulation, Swartz found that the top third of golfers won only 27% of the 99-player tournaments. The middle third won 33%, and the worst third won 40%. So the current system favors the better golfer in tournaments of two players, but favors the worse golfers in tournaments of many players.

So how does Swartz fix the current system? Two ways: he makes the handicap represent the player's average score, instead of his 74th percentile score. Second, he divides by a player's standard deviation (effectively converting a raw score to a Z-score), which neutralizes the luck factor in large tournaments.

Here are the details.

Like the current system, Swartz considers only the 20 most recent scores. But instead of dropping the worst 10, he drops only the worst four, leaving 16 scores. Then, instead of just averaging them, the new system uses mathematical statistical techniques to estimate the best normal curve to fit the data (keeping in mind that the four worst scores are missing). That is, it asks the question: what is the best fit normal curve that takes into account that we're looking at only the best 16 of 20 observations?

Swartz gives linear formulas (like a Linear Weights estimate of the 16 scores) to estimate the mean and SD of that best-fit curve; he says that those formulas are minimum variance linear unbiased estimators, which means you can't do better (by using different weights) unless you go to a non-linear estimator.

Those estimates of mean and SD become the player's stated handicap (so, effectively, there are two numbers for the handicap instead of one). Then, for his next (21st) round, his raw score is converted to a Z-score, and that's what gets compared to the other players' Z-scores to determine the winner.

In the study's simulations, golfers shot their handicap 45% of the time with this new system (fairer than 36% with the old system); one-on-one matchups were won by the better golfer only 48 to 51 percent of the time (fairer than 55%); and in tournaments, the best golfers won 29% of the time (fairer than 27%) while the worst won 32 to 34 percent of the time (fairer than 40%).

I promised some details of the adjustments to player scores that go into the formulas. I'll outline them here, and you can see the paper (which is nicely presented and very easy to read) for the details.

First, under both systems, scores are adjusted twice for the difficulty of the course. There's the course rating, which specifies how hard the course is for excellent (scratch) golfers, and the slope rating, which specifies how hard the course is for worse (bogey) golfers after adjusting for the course rating.

Then, there's something called "equitable stroke control" (ESC). That sets a maximum possible score for each hole, so that (for instance) a bad golfer can't score more than a quadruple bogey. Even if it takes him ten strokes to finish a par-three, he can't put more than 7 down on the scorecard. (In Canada, the stroke limit varies by handicap between bogey and quadruple-bogey; in the US, it seems the limits are fixed and not based on par. Swartz says this is the only difference in the current system between the two countries.)

The idea is that very high scores measure golfer frustration rather than skill, and should be discounted. Also, Swartz says, it discourages "sandbagging," which is deliberately trying to inflate your handicap, and provides a maximum if you forget to write down your score.

In this study, Swartz often gives results both with ESC and without, and the results are fairly similar.

But, after all those adjustments, I think the essence is:

-- the old system adjusts for your score relative to your own 63rd percentile.
-- the new system adjusts any outliers in your worst quintile, and gives you a Z-score relative to your own distribution.

I'm not a serious golfer, so I don't know if the added complexity of the new system is worth the advantages. It does seem to me that the new system is better, though.

Labels: golf

Sabermetric Research

Tuesday, June 16, 2009

Why are there so many overtime games in the NBA?

Monday, June 01, 2009

A new golf handicapping system

About Me

My stuff

Hardcore Sabermetric Research Links

Other Sports Research Links

Medium Core Sabermetric/Baseball Links (more to come)

More Baseball Stuff

Blogroll

Previous Posts

Archives