Sabermetric Research: A new golf handicapping system

The Royal Canadian Golf Association (RCGA), Canada's governing body for golf, has a committee to consider updating the system by which a golfer's handicap is computed. Tim B. Swartz, the committee's statistical guy, has a paper in the most recent JQAS explaining the new proposed system.

I'm going to simplify things a bit and explain the situation as I understand it from the paper. Leaving out the technical adjustments (even at the cost of a bit of inaccuracy), I'll describe the current handicap system like this:

You start with your 20 most recent scores (with a few adjustments that I'll discuss later), relative to par. Then, you drop the 10 worst scores, leaving only the 10 best. You average those 10 best. That's your handicap.

Why change this system? For one thing, as Swartz points out, your handicap isn't a true indication of your expected score; golfers fail to shoot their handicap more often than not. As I see it, you're taking the average of the top half, so you'd expect the handicap to be at about the 25% mark. If you assume that scores are normal, then since the normal curve is fatter near the middle, it's a bit more than 25%. As it turns out, the mean of the right half of the normal curve is about ~~.6367~~ .798, and the chance of beating a Z-score of ~~+.6367~~ +.798 is about ~~26.2%~~ 21.2%. So you'd expect a golfer to beat his handicap about 21% of the time.

Swartz checked, using a database of scores from a golf club in Alberta. As it turns out, golfers actually beat their handicap 36% of the time, not 21%. Maybe I made a mistake in the calculation; maybe golf scores aren't really normal; or maybe the various adjustments are causing the difference.

Another problem with the current system is that in casual head-to-head play, it favors the better golfer. Swartz generated a bunch of random matches from his database, and found that the better golfer won 55 percent of the time, rather than 50 percent.

A third problem, and an important one, is that in multi-player tournaments, the winner is likely to be a golfer with a higher handicap. That's because a bad golfer, with a handicap of (say) 20 (which represents a score of about 92), could reasonably have a very good day and shoot an 80, finishing at -12. But a scratch golfer (0 handicap, 72 average) is much less likely to match the -12 by shooting a 60 on the day.

The more players in the tournament, the more likely someone will have a much better game than normal. And those "much betters" are likely to be from the worst golfers.

In his simulation, Swartz found that the top third of golfers won only 27% of the 99-player tournaments. The middle third won 33%, and the worst third won 40%. So the current system favors the better golfer in tournaments of two players, but favors the worse golfers in tournaments of many players.

So how does Swartz fix the current system? Two ways: he makes the handicap represent the player's average score, instead of his 74th percentile score. Second, he divides by a player's standard deviation (effectively converting a raw score to a Z-score), which neutralizes the luck factor in large tournaments.

Here are the details.

Like the current system, Swartz considers only the 20 most recent scores. But instead of dropping the worst 10, he drops only the worst four, leaving 16 scores. Then, instead of just averaging them, the new system uses mathematical statistical techniques to estimate the best normal curve to fit the data (keeping in mind that the four worst scores are missing). That is, it asks the question: what is the best fit normal curve that takes into account that we're looking at only the best 16 of 20 observations?

Swartz gives linear formulas (like a Linear Weights estimate of the 16 scores) to estimate the mean and SD of that best-fit curve; he says that those formulas are minimum variance linear unbiased estimators, which means you can't do better (by using different weights) unless you go to a non-linear estimator.

Those estimates of mean and SD become the player's stated handicap (so, effectively, there are two numbers for the handicap instead of one). Then, for his next (21st) round, his raw score is converted to a Z-score, and that's what gets compared to the other players' Z-scores to determine the winner.

In the study's simulations, golfers shot their handicap 45% of the time with this new system (fairer than 36% with the old system); one-on-one matchups were won by the better golfer only 48 to 51 percent of the time (fairer than 55%); and in tournaments, the best golfers won 29% of the time (fairer than 27%) while the worst won 32 to 34 percent of the time (fairer than 40%).

I promised some details of the adjustments to player scores that go into the formulas. I'll outline them here, and you can see the paper (which is nicely presented and very easy to read) for the details.

First, under both systems, scores are adjusted twice for the difficulty of the course. There's the course rating, which specifies how hard the course is for excellent (scratch) golfers, and the slope rating, which specifies how hard the course is for worse (bogey) golfers after adjusting for the course rating.

Then, there's something called "equitable stroke control" (ESC). That sets a maximum possible score for each hole, so that (for instance) a bad golfer can't score more than a quadruple bogey. Even if it takes him ten strokes to finish a par-three, he can't put more than 7 down on the scorecard. (In Canada, the stroke limit varies by handicap between bogey and quadruple-bogey; in the US, it seems the limits are fixed and not based on par. Swartz says this is the only difference in the current system between the two countries.)

The idea is that very high scores measure golfer frustration rather than skill, and should be discounted. Also, Swartz says, it discourages "sandbagging," which is deliberately trying to inflate your handicap, and provides a maximum if you forget to write down your score.

In this study, Swartz often gives results both with ESC and without, and the results are fairly similar.

But, after all those adjustments, I think the essence is:

-- the old system adjusts for your score relative to your own 63rd percentile.
-- the new system adjusts any outliers in your worst quintile, and gives you a Z-score relative to your own distribution.

I'm not a serious golfer, so I don't know if the added complexity of the new system is worth the advantages. It does seem to me that the new system is better, though.

Labels: golf

7 Comments:

At Tuesday, June 02, 2009 3:27:00 PM, JavaGeek said...: I believe the right-half normal curve's mean is E(X|x>0) = 0.798, which has a p(x<-0.798)=0.212.

Your link says it should be sqrt(2/Pi), which is what I calculated above.

RE 21% vs. 36%:
I'm curious what type of golfers we are dealing with in Alberta, I take it these are not pro's. Would it be unreasonable to presume that a golfer gets better after playing 20 rounds. Would -1 or -2 improvement, be about right for playing that much golf?

- The normality conclusions seems correct. (the data itself is almost normal + 18 holes is almost enough for a reasonable CLT) - you can always using a more flexible distribution with very similar results.
- The "adjustments" should average out.
At Tuesday, June 02, 2009 5:43:00 PM, Phil Birnbaum said...: Aargh! I forgot to take the square root. Thanks! Will fix the post.

So, as you say, 36% is not just too high, it's WAY too high. I suppose you could improve a bit in 20 rounds ... you could check by seeing if scratch golfers beat their handicap more often than bad golfers. You'd think it'd be harder to improve if you're already shooting par a good portion of the time.
At Thursday, June 04, 2009 12:10:00 PM, Don Coffin said...: To me, the real problem is that, for many golfers, this system leads to a handicap that is calculated in a manner that is hard to understand and hard to explain. There is surely some tradeoff between improved accuracy and degree of transparency.
At Tuesday, June 09, 2009 12:30:00 PM, Kiran Rasaretnam said...: There are two issues that I see with the golf handicapping system (current and new) that aren't addressed in any formal manner.

The first is the tendency for golfers to "sandbag" their reporting of golf scores. I suppose one could make the argument that the newer system will reduce the likelihood of sandbagging as its supposed to represent the expected score of a golfer as opposed to the potential ability of the golfer.

The second is that neither system accounts for the time period over which the 20 rounds are included. 20 rounds in 2 months is a lot different than 20 rounds over 2 years, but bith systems treat those the same way...

Great post.

Cheers,
Kiran
At Tuesday, June 09, 2009 1:28:00 PM, Kiran Rasaretnam said...: I meant to say "both", as opposed to "bith".

One additional comment regarding time period. Ideally, the handicapping system should also take into account recency of scores, so 20 rounds completed in 2 months a year ago is treated differently than 20 rounds completed in the last 2 months....
At Wednesday, June 24, 2009 1:20:00 PM, parinella said...: I'm a little late to this, but comments:
--Golfers aren't going to get better over 20 rounds unless they are new or they have some specific game improvement program.
--Good golfers win more head-to-head matches in part because there is a 96% multiplier (average of 10 best rounds * 0.96) specifically designed to reward better golfers.
--ESC in USA varies with handicap.
--Handicap is intended to be what a golfer is capable of. Dropping out bottom 10 also reduces one type of sandbagging (inflating bad scores). This and the 96% thing are features, not bugs.
--The distribution has a longer right tail than a normal one does. I have 200 rounds over 10 years. Low differential is 6.0, average handicap of 10.5 (varied at any time between 9 and 12), median is 13.5, mean is 14.1, max is 26. (Actual score distribution would have an even wider tail, since ESC chops off the really bad holes.)
--Dean Knuth helped develop the handicap system and knows everything. http://www.popeofslope.com/ From that page:
The odds of scoring better than your handicap in any given round are one in five, according to Dean Knuth, the former USGA handicapping official who devised the Slope rating system and whose Web site, popeofslope.com , is a fount of fun facts about handicaps. The chances of beating your handicap by three strokes? One round in 20. By eight strokes? One round in 1,138. For most of us, that's once in a lifetime. Wall Street Journal,
"The Genius of Handicapping", Nov 1, 2008, by John Paul Newport
At Tuesday, November 23, 2010 7:06:00 AM, Unknown said...: All of which is true but theoretical.
1.Even if no one sandbagged, the following complicating factors exist
* Some golfers have a much wider spread of scores than others
* Some golfers are less able to cope with winter conditions (effective course playing length much longer)
* some golfers are not good under the stress of match play

And in terms of the use of a handicap: -
* Why is winning 50% of the time fair? Surely a better definition of fair would be winning 50% of the rounds in which a golfer played reasonably (Say within his buffer zone)?
* how can you expect the same handicap to work for a large field competition and also for a one on one?

IMHO it is asking too much to expect a single simple handicap system to work in all these cases and bias is bound to exist.

<< Home

Sabermetric Research

Monday, June 01, 2009

A new golf handicapping system

7 Comments:

About Me

Previous Posts