Wednesday, March 19, 2008

Guest post: Don Coffin on golf performance measures

(This is Phil. In response to a previous post on golf scores, Don Coffin, an economist and sabermetrician from Indiana University Northwest, did a little extra research. Here's Don:)


I [Don] have done some research on the relationship between various measures of golfer performance and overall performance (strokes per round; prize money), but have found it difficult to know exactly where to go with it. Here’s how I have approached it.

Overall performance, measured as strokes per round, has three components:

1. Shots off the tee. There is essentially no variation in this measure of performance. Everyone has essentially one tee shot per hole, or 18 per round, and the standard deviation is almost zero.

2. Putts. There is some variation here, but less than one would expect. In 2007, according to data at, the average number of putts per round (averaged across golfers, so this is an average of averages) was 29.30, with a standard deviation on 0.52. The coefficient of variation was 1.77%.

3. All other shots. There’s a very little more variation here; again, using 2007 data, the average was 23.98 “other” shots per round, with a standard deviation of 0.63, and a coefficient of variation of 2.62%.

Overall, golfers in 2007 averaged 71.28 strokes per round, with a standard deviation of 0.59 strokes per round, for a coefficient of variation of only 0.83%. Overall performance was, then, less variable than the components of scoring with some variation.

The PGA reports a number of what it calls “skill statistics;” all of these are reported in Table 1. (Putts per round shows up in the “skill statistics;” “Other Shots” is Strokes per Round, minus Putts per Round, minus 18). If our objective is to explain overall performance, as measured by Strokes per Round, then we have to select explanatory variables from among the available performance measures. For 2007, the PGA reported all the “skill statistics” data for 196 golfers.

I believe it is inappropriate to use Putts per Round as an explanatory variable. If we could control adequately for other performance measures, then the (expected) coefficient on Putts (in a multiple regression) would be 1—each additional putt would raise Strokes per Round by 1. What would be useful, however, would be to find explanatory factors for the components of Strokes per Round—Putts, and Other Shots.

( ... continued)

Labels: ,


At Friday, March 21, 2008 12:56:00 AM, Blogger Unknown said...

Reading this and some of the other articles on "golfermetrics" I think it is becoming apparent that regression is a flawed technique for analyzing causal relationships.

If we think back to regression 101 we know that we should identify the variable we want to predict(the dependendent variable) and the possible independent variables.

The critical word here is "independent".

Ignore sports for a second and suppose we were trying to find a relationship between Cancer and other variables. We might decide to choose whether the: person is a smoker; family has a previous history of cancer; lives near a nuclear reactor ... I dunno, but it is easy to see that the variables are (relatively) independent.

The problem with the gold example presented by Don is that the variables are far from independent.

Part of this stems from the fact that golf is a linear game. For instance, it is impossible to putt unless you hit a drive. This means that putting and driving are not independent variables. What you do on your drive *directly* affects what you do on your approach which *directly* affects your putting.

The clue is in the regression equation itself. Take the final putts regression. Independent variables include: hit fairway, GIR, drive distance, among others.

Take GIR. If you were going to try to predict that you'd use other independent variables in the putts regression to predict this -- especiallys distance and hit fairway. In otherwords the variables are not independent.

Phil -- this is a classic case of an intermediate outcome as discussed in your previous golf article. And that is going to be a problem with any golf regression.

This is why the t-tests and coefficients aren't particularly stable and, for me, reduces the value of regression in this sport.



Post a Comment

<< Home