Sabermetric Research: An economist predicts the Olympic medal standings

Daniel Johnson is an economics professor who, according to Forbes magazine, makes "remarkably accurate" predictions on how many Olympic medals each country will win. But I'm not sure, based on the description given, that the predictions are all that remarkable.

From the article, it sounds like what Johnson is doing is running some kind of regression, on "per-capita income, the nation's population, its political structure, its climate and the home-field advantage for hosting the Games or living nearby." He doesn't consider anything specific about the sports or athletes.

How accurate are Johnson's predictions? I'm not really sure. Forbes says,

"Over the past five Olympics, from the 2000 Summer Games in Sydney through the 2008 Summer Games in Beijing, Johnson's model demonstrated 94% accuracy between predicted and actual national medal counts. For gold medal wins, the correlation is 87%."

What does that mean? From the word "correlation," my guess is that those numbers are the correlation coefficient, or "r". But an r of .94 doesn't mean that the predictions are 94% accurate. It just means that the *best fit straight line* is 94% accurate. It's possible to be wrong, perhaps badly wrong, in every guess, but still have an r of 1, which means 100% correlation. For instance, if you underpredict every below-average country by 50% of the difference, and you overpredict every above-average country by 50% of the difference, you'll get a perfect correlation, but really crappy guesses. For instance, here's an example of how that might happen:

Country A: estimate 80, actual 65
Country B: estimate 60, actual 55
Country C: estimate 40, actual 45
Country D: estimate 20, actual 35

Regressing estimate on actual (or is it actual on estimate? I forget which way the word "on" implies, but never mind) gives a 100% correlation, but the actual guesses aren't spectacular in the least.

Anyway, it might be some other method that Johnson uses to compute the accuracy percentage, but it's hard to evaluate the claims without an explanation.

(UPDATE: as this blog post was going to press, I found Johnson's website, which confirms that it *is* correlation. It still could be that it's some kind of method that doesn't have the flaw of my example above. The site contains a media release, but no actual copy of the paper, which was published in "Social Science Quarterly" in December, 2004.)

More importantly, you can't tell how impressive a set of predictions is without something to compare it to. At Forbes, commenter "Doubter" points this out, and tries using the results of the previous Olympics to predict the current one. For the top five countries, he gets an 85% accuracy rating, and correctly points out that "include a bunch of countries with stable medal counts (Jamaica, Japan, Nigeria, Kenya, most European countries) and I am sure it gets much better."

I'm pretty confident that if you were to just use a weighted average of previous Olympics results and adjust for home field advantage, you'd come pretty close to what Mr. Johnson was able to do. Forbes should have realized that the results probably aren't "remarkably accurate" -- just "accurate".

Also confusing is the estimate of home field advantage. There were no results given for the Winter Olympics, but for the summer games, the host team "typically garners 25 additional medals compared with its expected performance, 12 of them gold." It doesn't really make sense that the home field advantage should be a fixed number of medals. Shouldn't it be a percentage increase? Canada won 11 medals in 1976 in Montreal, none of them gold. That was a few more medals than usual, probably because of home field. Should they really have been expected to win minus 14 medals in 1988 in Seoul, of which minus 12 would be gold? Or, on the other hand, were they just unlucky in Montreal, where they should have won about 30, when they were in the single digits in 1968 and 1972?

Or, if Pakistan were to host the Olympics, would you really expect them to jump from (say) 1 medal to 26?

Oh, and one more thing: in his 2010 predictions, Johnson has the top 13 countries winning 250 medals, but only 57 golds. Overall, gold are 33.3% of medals, but for those 13 countries, Johnson has them winning only 22.8% golds. How come? An eyeballing of the 2006 chart shows about 1/3 golds for those countries then ... I wonder why the drop?

Labels: Forbes, forecasting, olympics

5 Comments:

At Thursday, February 18, 2010 1:51:00 PM, BMMillsy said...: As for the regression coefficient for home field, maybe it's a proper censoring/truncating in the model, but they report a simple mean estimate of the medal increase for being at home. I'd have to read the paper though.

I haven't read the paper, but in response to the last comment, it could be a selection bias toward countries with less athletes. Perhaps when the smaller countries actually send athletes to the games, it's because they're so far and away better at that sport than others. If you're only going to send 5 athletes to the games, there doesn't seem to be much reason to send athletes that aren't likely to win a gold.

I'm not sure what the answer is, but it seems plausible that, despite the movie Cool Runnings, Jamaica wouldn't find it worth sending a couple athletes to Vancouver to come in 4th place. However, since the US already has a team there, why not send some extra athletes and increase the chances of getting more total medals. The marginal cost of sending one more marginal athlete would seem pretty small compared to what it would be for another country. But I'm just thinking aloud.
At Thursday, February 18, 2010 2:23:00 PM, Phil Birnbaum said...: Yup, it could be that Forbes quoted what Johnson found that the average host gains, but phrased it as what *every* host is expected to gain.

I don't have the paper to verify that.

BTW, 25 medals of HFA seems like a *lot*. It's at least a doubling for all but the top few countries.
At Thursday, February 18, 2010 2:35:00 PM, BMMillsy said...: Ah. That's one thing I had no clue about: how many do these countries actually win. Doubling seems like a *lot*, as you say.
At Sunday, February 21, 2010 8:56:00 PM, Anonymous said...: looking at his predictions - italy & finland are both way overrated. they haven't even made a dent in the vancouver games thus far. underrated? the u.s. (who have been dominating) and south korea (which is right up there in the standings right now).
At Saturday, March 20, 2010 3:04:00 PM, Anonymous said...: Not such a great prediction. 6 of 13 actually finished within 1 rank in the medal count of their predicted rank.

Predicted / Actual
1st / 3rd
2nd / 1st
3rd / 4th
4th / 5th
5th / 8th
6th / 6th
7th / 2nd
8th / 15th
9th / 17th
10th / 11th
11th / 9th
12th / 7th
13th / 12th

Nation Predictions for 2010
Total / Gold Medals
Canada 27 5
United States 26 5
Norway 26 4
Austria 25 4
Sweden 24 4
Russia 23 8
Germany 20 7
Italy 19 3
Finland 14 4
Switzerland 13 4
China 12 2
South Korea 11 4
Netherlands 10 3

Source: http://www.forbes.com/2010/01/19/olympic-medal-predictions-business-sports-medals-table.html

Actual Results for 2010
Total / Gold Medals
United States 37 9
Germany 30 10
Canada 26 14
Norway 23 9
Austria 16 4
Russia 15 3
South Korea 14 6
Sweden 11 5
China 11 5
France 11 2
Switzerland 9 6
Netherlands 8 4
Czech Republic 6 2
Poland 6 1
Italy 5 1
Japan 5 0
Finland 5 0
Australia 3 2
Slovakia 3 1
Belarus 3 1
Slovenia 3 0
Croatia 3 0
Latvia 2 0
Great Britain 1 1
Kazakhstan 1 0
Estonia 1 0

Source: http://www.nbcolympics.com/medals/2010-standings/index.html

<< Home

Sabermetric Research

Thursday, February 18, 2010

An economist predicts the Olympic medal standings

5 Comments:

About Me

Previous Posts