Monday, February 26, 2007

Do women choke in pressure situations?

Of top corporate executives, only one in 40 is a woman. That's perhaps because women have a higher tendency to choke under pressure.

That's according to
a study by Israeli researcher M. Daniele Paserman. Paserman analyzed the results of men's and women's tennis tournaments. He found that men scored roughly the same percentage of "unforced errors" regardless of the clutchness of the situation, but women's unforced error rate went up signficiantly when the chips were down.

(Hat tip to Slate and Steven E. Landsburg, whose summary of the study is

(Tennis scoring is described
here. Basically, four points make a game and six games make a set. A match is best of three sets for women, or best of five sets for men. There are other rules, like you have to win by two – see the link for details.)

Generally, points are divided, after the fact, into one of three types. "Winners" are volleys that the other player can't handle. "Unforced errors" occur when an opponent "has time to prepare and position himself" but makes a point-ending mistake. And "forced errors" are mistakes that were in part caused by a skilled return from the opponent. These are just to describe what happened – they don't affect scoring or ranking or anything.

Men and women have different percentages of the error types. According to the study, the numbers look like this:

Men .... 31% winners, 30% unforced errors, 35% forced errors
Women .. 30% winners, 37% unforced errors, 30% forced errors

The study doesn’t go into detail about why the overall numbers are different, but my guess is that it has to do with the relative strength of the sexes. Men, being stronger, have faster serves and returns than women; faster volleys would lead to opponents having less time to react to a shot, which would increase the incidence of forced errors. The more forced errors, the fewer unforced errors (since the three types sum roughly to 100%).

That may not be the correct explanation, but, in any case, the point is that womens' higher percentage of unforced errors doesn't necessarily mean they're chokers, or more careless players in general. (And the study makes no such claim.)

Paserman starts by comparing unforced error rates in two situations – one more important (the last set of the match – 5th for men, 3rd for women), and the other less important (all other sets). After adjusting for a whole bunch of things – player quality, location, etc. -- he finds that men make about 1.4 percentage points more unforced errors in the important situations, but women make 2.9 percentage points more:

Men .......... +1.4 percentage points (1.9 standard deviations)
Women ........ +2.9 percentage points (3.5 standard deviations)
Difference: .. +1.5 percentage points (1.4 standard deviations)

The results moderate slightly when, instead of using a binary "yes/no" variable corresponding to the last set, he uses a sliding scale for how "important" the set is. In that situation, the women's number falls to +2.4 (1.9 standard deviations).

Paserman then switches from set-level data to play-by-play data, which is where it gets more interesting. First, he figures out the "importance" of any given point, in terms of the magnitude in which it affects the outcome of the set. In a tiebreaker, importance would be high, but when one player already leads 5 games to zero, it would be very small. (In baseball, Tangotiger calls this "leverage".)

Then, he divides all the situations into quartiles, and calculates unforced error percentage in each.

Lowest importance ... Men 31% ... Women 34%
Quartile 2 .......... Men 30% ... Women 37%
Quartile 3 .......... Men 31% ... Women 39%
Highest importance .. Men 31% ... Women 40%

As you can see, women make almost 15% more unforced errors in high-pressure situations than in low-pressure situations. And there is no such effect for men.

These are controversial results, and Paserman ran a few checks to see if other factors might account for the effect.

-- he adds variables for ability, which set in the match it is, which round of the tournament it is, and others. The effect remained.
-- he checked whether changing the definition of "importance" would change the results. The effect remained.
-- he replaced the "imporance" variable with a series of dummy variables representing the actual game score. The effect remained. In fact, that "explains more than 40 percent of the variation in ... unforced errors."

Paserman then hypothesized that the errors might be the result of risk aversion. If women play safer with the match on the line, that will make things easier for opponents, thus reduce the number of winning shots and forced errors. (As Steven Landsburg writes, "If both players just keep lobbing the ball back and forth, there can't be any forced errors, so all errors are recorded as unforced.")

To check for risk aversion, Paserman looked at first serves. (In tennis, a player is given two chances to serve: if the first one fails, the server gets one more opportunity.) If women are risk averse in the clutch, this should lead to a higher percentage of successful first serves.

And it does:

Lowest importance .......... baseline
Quartile 2 .......... Men +10% ... Women +11%
Quartile 3 .......... Men + 7% ... Women +25%
Highest importance .. Men + 7% ... Women +28%

If I've interpreted the results of the study properly, women are a full 28% more likely to hit a legal first serve in clutch situations than in meaningless situations. That can be considered a "conservative" strategy because there's a tradeoff between effectiveness and legalness – a risky shot is more likely to be out, but also more likely to win the point if it's in. And, in this case, the strategy of more legal serves was costly – women servers were 11% less likely to win the clutch points, even with (or perhaps "because of") the greater proportion of legal first serves.

Also, men hit harder first serves in more important situations (+1.5 mph difference between the two extremes of importance), while women hit softer ones (-1.5 mph). In second serves, men's speeds dropped by 1.7 mph, but women's dropped by a much larger 3.5 mph. Again, this is evidence that women are more conservative.

Finally, conservative play is evident in the number of strokes per rally. Men's strokes per point (both players combined)increased by 1.4, and women's by 1.2. But in this measure, it looks like it's men who are more conservative here.

In his conclusions, Paserman does argue that it would be dangerous to extrapolate from poor judgement in tennis results to poor judgement in corporate management. However, he does say that

"... this sample is more representative of the extreme right tail of the talent distribution that is of interest for understanding the large under-representation of women in top corporate jobs ..."

The impression you get is that Paserman does think the results have some value as evidence on the question of women's achievement in other fields.


Well, I'm not so sure.

First, here are the percentages for the three types of women's shots for all for quintiles:

Lowest importance ... 29% winners, 34% unforced, 31% forced
Quartile 2 .......... 30% winners, 37% unforced, 31% forced
Quartile 3 .......... 29% winners, 39% unforced, 30% forced
Highest importance .. 30% winners, 40% unforced, 28% forced

Notice that the percentage of winners is constant, and the percentage of forced errors is also pretty steady. It's the percentage of unforced errors that varies a bit. But even then, and as the commenter at the end of Landsburg's article argues, it's mostly the 34% figure that's responsible for the trend. Change the top-middle cell to, say, 38%, and the effect pretty much disappears.

And that's not so farfetched. The first line adds up to only 94%, but the other three lines add up to 98%. Where did the other 4% go? Paserman doesn't tell us. If you add it on to unforced errors, you get to 38%, and there's barely any clutch effect left at all! Is it possible that some unforced errors are being misclassified for the low importance case?

So that's an important unanswered question.

Here's another: is it possible that the differences between men and women's tennis might explain some of the differences?

For instance, there's the difference in rules -- five sets for men against three sets for women. And there are strategic differences. Just a few days ago, an article in the Globe and Mail talked about " ... the significance of strategy in the women's game -- as opposed to the men's game which is often dictated by power serves."

Could these be part of the explanation? Here's one attempt:

Suppose that on any given shot, you can try a strategy varying between two extremes. You can hit the ball very hard, hoping to catch your opponent off-guard and win the point right there. Alternatively, you can lob the ball over, hoping your opponent makes an unforced error and gives you the point.

Now, consider the "least important" quartile. Those are situations when it's obvious who's going to win the set. Maybe it's 5-0 in games. In that case, it would take a miracle for the other player to come back. Therefore, winning a point in this situation isn't that important. What's more important is wearing the other player out. If one player can make the opponent run around and get tired out, s/he gains an advantage that way even if the results of that set aren't affected.

What's the best way to tire the opponent out while not tiring yourself out? Maybe, instead of lobbing the ball, hit it hard. That way, the unwary opponent has to start and stop, change direction quickly, and perhaps run all over the place. And there's a good chance the power shot will end the point, so maybe you yourself won't have to run any more at all.

Of course, the opponent is playing the same strategy, and will exert less effort to return those hard shots (since it doesn't matter much who wins that point). This leads to shorter rallies (as observed).

Now, suppose the harder you hit the ball, the less control you have. Men hit the ball harder than women, and so more of their hard shots fail. That's an unforced error. And so, in the least important situations, men show up as having (relatively) more unforced errors than women.

Implausible, perhaps. But the idea that women intrinsically have trouble under pressure, isn't that implausible too?

Or another possibility: Suppose that judges tend to call an error "forced" if it came from a volley above X mph, but "unforced" if it came from a volley below X mph. If X is chosen such that it's easy for a man to achieve, but harder for a woman to achieve, that would lead to more unforced errors called against women overall (which is what the data show). In the lowest quartile, both men and women might try to hit hard shots to wear their opponents out. Even so, men would have roughly the same number of Xs as in important situations, since they hit Xs as a matter of course. But, if women only hit Xs by putting out extra effort, those shots will disproportionately appear in the first quartile, which will move that category's errors from unforced to forced. And again, that's what we see in the data.

Finally, a third theory: when the set is 5-0 and the outcome is all but certain, men have more incentive to slack off than women do. That's because the men have five sets to play, rather than three, and have a greater need to conserve energy. This leads, somehow, to a different pattern of play, one that leads to more "unimportant" unforced errors than for the women. I don't know what that pattern might be, but there is good reason to think it should be different.

Anyway, as unlikely as these hypotheses may seem, they do seem more reasonable than the "successful women choke under pressure" theory. Are there any other plausible theories that I didn't think of?

Labels: , ,

Wednesday, February 21, 2007

Is home advantage in boxing because of biased judging?

Here's still another study from the Home Field Advantage issue of "Journal of Sports Sciences," this one called "Do judges enhance home advantage in European championship boxing?"

In boxing, if one fighter does not knock out his opponent, the winner of the fight is determined solely by the opinions of judges – that is, entirely subjectively. If the "home field advantage" in sports is partly the result of biased officials, you would expect that boxing would show a very large HFA, compared to similar sports where judges are not as large a factor.

In the paper, authors N. J. Balmer, A. M. Nevill, and A. M. Lane look at the issue by, in effect, comparing boxing to itself. Specifically, they compare fights in which there was a knockout, to fights in which the judges decided the winner. Since knockouts are independent of the judges' scoring decisions, the authors argue that if HFA is larger in decisions than in knockouts, this represents evidence of judge bias.

And, after a bunch of logistic regressions that adjust for boxer quality, that's indeed what they found.

"For equally matched boxers, expected probability of a home win was 0.57 for knockouts, 0.66 for technical knockouts, and 0.74 for points decisions. ... We suggest that interventions should be designed to inform judges to counter home advantage effects."

But there's a big problem getting from the numbers to the conclusion. Specifically, there is no reason why you should expect the HFA in knockouts to equal the HFA in decisions.

For instance: suppose that when a boxer knows he's losing on points, late in a fight, he knows the only way he can win is to knock out his opponent. So he takes lots of chances, hoping to land a lucky punch for the knockout. The opponent who's leading, on the other hand, concentrates only on defense, hoping to protect himself from a knockout to win the fight on points.

If that scenario is the cause of most actual knockouts, then it's the *weaker* boxer, not the stronger, who wins the most knockouts. It's perfectly possible that the HFA might even be *negative* on fights that ended in knockouts. And that would be the case whether or not the judges were biased in the other 90% of the fights.

In general, selectively sampling games based on *after-the-fact* criteria can give you almost any HFA at all.

For instance, suppose you want to find a category of hockey games in which the intrinsic home winning percentage is over .900. Here's one, based on a simplified model of hockey:

Suppose team A and team B are equal. Overall, they each get 30 shots on goal per game, with a shooting percentage of .100 overall. But that's an average of the home team actually shooting .120, and the visiting team shooting .080 (for an expected score of 3.6 to 2.4).

I ran a simulation of this game, and, counting a tie has half a win, the home team has an overall winning percentage of around .695.

But, you can easily show, using the binomial theorem, that the home team will have a winning percentage of .906 in games in which either team scores exactly eight goals.

How often will each team score exactly 8 goals? By the binomial theorem, for the home team, it's

p(home) = .120^8 * .880^22 * C(30,8) = 1/66

For the visiting team, it's

p(road) = .080^8 * .920^22 * C(30,8) = 1/637

1/66 divided by (1/66 + 1/637) equals .906. (This ignores 8-8 ties, which I didn't bother accounting for.)

And so:

-- in hockey games as a whole, the HFA is .195.
-- in hockey games where one team scores a "knockout" of at least 8 goals, the HFA is .406.

Compare this to what the boxing study found:

-- in boxing matches by a decision, the HFA is .24.
-- in boxing matches by a knockout, the HFA is .06.

In the hockey case, the results are due to the structure of the game, not any bias on the part of the referees. And the same could be true in the boxing case. Which means, that, unfortunately, the conclusion that boxing judges are biased is not supported by the evidence.

Labels: ,

A study with season-by-season home field advantage data

Another home field advantage article from the April, 2005 "Journal of Sports Sciences":

This one is "
Long-term trends in home advantage in professional team sports in North America and England (1876-2003)," by R. Pollard and G. Pollard. There's not much statistical analysis here, but it gives tables for year-by-year HFA data for the four major sports (including AL and NL separately), as well as for the four levels of English soccer. So if you need to know what the home winning percentage was for the 1966-67 NHL, you can look it up here (.606).

The authors note that for every sport, the HFA was highest in the early years of the respective league. They suggest travel as the explanation. Also, there was a large decline in English Football following the sport's seven-year hiatus during WWII – home winning percentage instantly dropped from the high .600s to the low-to-medium .600s. The authors mention players' loss of familiarity with the stadiums as one possible cause, but you get the impression they don't really believe strongly in that explanation. In any case, the HFA remained permanently lower, which wouldn't have happened if the cause was a temporary adjustment to the home stadiums.

Notable is that home field advantage in the NHL dropped significantly in the last 40 years or so. It was about .600 around 1970, then fell steadily to the .540s today. Any idea why? Maybe worse competitive balance? I suppose easier travel might also have something to do with it, but the other sports don't show it (although the NBA's HFA started dropping in the mid-80s).

Trivia: the highest season HFAs for the North American sports were:

NHL: .741 (1918-19, only 27 games played)
NBA: .749 (1950-51)
AL : .629 (1902)
NL : .655 (1877, only 177 games played)
NFL: .667 (1940, only 51 games played)


Tuesday, February 20, 2007

Huge home field advantages in Aussie Rules football

The April, 2005 issue of "Journal of Sports Sciences" is dedicated to the topic of home field advantage (thanks to Drew for pointing this out in a previous comment), and I've started going through some of the papers. Here's one to start off.

It's by Stephen R. Clarke, and titled "
Home advantage in the Australian Football League." And it found some interesting home field effects.

Most of the 16 teams in the AFL are located in the state of Victoria. It turns out that teams from outside Victoria have higher long-term (1980-98) HFA's than the others – quite a bit higher. The six "non-Victorian" teams have HFAs of 36, 25, 19, 17, 17, and 10 points, respectively. That's the top-five HFAs, plus one in the middle. (
The overall average is 10.)

AFL game scores average around 100 points per team, so HFAs of 17 to 36 are pretty significant. From Table II of the study, it looks it takes about 100 points to turn a loss into a win. So Adelaide's 36-point HFA extrapolates to a home advantage of .360 – the equivalent of a .860 home winning percentage for an average team. (Of course, that advantage will be less if scoring overall is higher in Adelaide.) Clarke tells us that in 2002, the two teams from the state of Western Australia – who had HFAs of 19 and 17 points -- were 16-4 at home, but 2-18 on the road. (That doesn't include games against each other.)

Another way of looking at it is that, according to Clarke, the average margin of victory in the AFL is about 30 points. An advantage of 36 points where the average margin is 30, is, of course, quite significant.

Why does this happen? Clarke cites crowd effects. There are four teams that share a single stadium in Melbourne, and those teams have among the lowest HFAs. Clarke argues that the teams that play in the shared stadium attract not just fans of their own team, but fans of the opposition, as well as neutral fans. Also, the stadium (which holds 100,000) is usually not filled. Clubs that moved to the shared stadium have seen their advantage drop. And the one non-Victorian team that doesn’t have an outsized HFA is from Sydney, where AFL is "not the traditional game and crowds have not been strong."

Also, there's the "familarity" explanation. Opposition teams have at least four times as much familiarity with the shared stadium as any others, which may make their road disadvantage smaller.

Clarke does mention travel – some of the non-Victorian teams are some 2,000 km from Victoria -- but dismisses it on the grounds that (a) after a few years, the clubs should have got used to it, and (b) travel generally becomes easier over time, but the HFA is fairly constant.

Finally, there's weather. Some of the non-Victorian teams play in much hotter climates than Victoria, "and it is probably true that away sides have ... difficulty coping with this."

I'm not sure what to make of all this. For the shared stadium, it could be that the HFA is normal when an out-of-town team visits – but that when two "home" teams play each other there, the HFA disappears. And so the low observed HFA is just the average of a normal HFA and a zero HFA.

For the non-Victorian teams, the weather seems to be the most plausible explanation – running around for two hours in the very hot seems quite different from running around for two hours in normal weather. But, obviously, I don't really know.

Labels: ,

Saturday, February 17, 2007

How much referee bias would it take to account for home field advantage?

One more thought on the subject of whether the home field advantage (HFA) could simply be due to refereeing bias: it occurred to me that you can measure *how much* bias it would take to account fully for the HFA.

For baseball, most of umpiring is ball/strike calls. The home winning percentage in MLB is about .540, an "excess" of .040. To convert .040 losses into wins, you need about .4 runs. That's about three strikes turned into balls (or vice-versa, when the visiting team bats).

Does that seem reasonable? How many borderline calls are there in a game, and are there really enough that the home team could get three more in its favor than the visiting team?

The NFL home winning percentage is, I think, around .590. It takes about 400 yards to equal one win, so the home advantage is about 36 yards. Are there enough controversial calls (or non-calls) in a game to add up to a net of +36 home yards? This one, it seems to me, is harder to answer – there's so much simultaneous action in the game that it would be hard to notice missed infractions.

Assume that referee discretion in basketball is mostly foul calls. A foul on the offense that leads to two foul shots turns an expected one point (teams score about a point per possession) into about a point and a half (75% foul shooting percentage times two shots). So a foul is worth half a point.

It takes 30 points to turn a loss into a win, and the NBA home winning percentage is .625. That means the refereeing has to favor the home team by almost four points a game – which is eight foul calls (or non-calls). Seems high, but I don't really know.

The NHL home winning percentage last year was .573 (not considering the bonus point for a "regulation tie"). I don't know how many goals it takes to turn a loss into a win – but, to be conservative, let's suppose it's 5. That means the home team has an advantage of .36 goals per game.

Assuming a minor penalty is worth .18 goals (which looks like a
typical power-play conversion percentage), that would require referee bias of an extra two penalties per game to the visiting team. That seems a bit high to me.

But I have to emphasize that I don't really know – I don't watch enough games to know the frequency of controversial or missed calls for sure. Just going by my intuition, and what the TV announcers say, I'd guess that referees don't make enough biased calls to account for the home field advantage.

I suppose you could study this by listening to both the home and away broadcasts of games, and counting the number of times the announcers or commentators question a call.

I'd be interested in what others think. Have you noticed any apparent bias in officiating? If so, how much do you think there is?

(And please correct my numbers above, if I've used incorrect ones, which I probably have.)


Thursday, February 15, 2007

QB Score and football "box score" statistics

In "The Wages of Wins," the authors introduced a quarterback evaluation statistic called "QB Score." The formula is

QB Score = Yards – (3 * plays) – (50 * turnovers)

Basically, any play that gains fewer than three yards is a negative; any play that gains more than three yards is a positive.

In our respective reviews of the book, Roland Beech and I both criticized QB Score.
Beech argued that the statistic doesn't take situation into account. "By the writers' measure," he wrote, "an 8 yard pickup on 3rd and 20 is worth more to a team than a 5 yard pickup on 3rd and 5."

My criticism was similar; I argued that teams who play a different style of football might be able to achieve the same yardage and first-down success, but with more plays. QB score would rate these kinds of teams lower than their success rate would suggest.

But I've been thinking about this a bit, and I'm wondering if the stat might not possibly be reasonable after all. (Or, at least as reasonable as a QB stat could be, given that it doesn't separate the contribution of the quarterback from the other players on his team.)

My criticism could be answered by empirical evidence showing that QB score "works" – in the sense of predicting points scored – for all kinds of teams, rushing teams as well as passing teams. I have no actual evidence that teams actually have different enough tendencies in short/long play calling to make the statistic invalid for some of them, and it's possible that it works well enough for all kinds of teams. And, even if not, you might be able to adjust the stat fairly easily by adding another variable to the regression, maybe "percentage running plays" or some such.

Beech's criticism might be countered by the fact that, over the long term, the effect of situation evens out. After all, we do accept that for baseball. Linear Weights values a home run higher than a single. That means that a solo home run in a 12-0 game is worth more than a two-run game-winning single with two outs in the bottom of the ninth. That doesn't invalidate Linear Weights, which implicitly makes the assumption that timing is random, and evens out in the long run. Can't QB Score make the same assumption?

There is a difference between the QB Score case and the Linear Weights case. In baseball, any given play almost always either helps the team, or hurts the team – that is, it always has the same sign. With rare exceptions, a single is always a good thing, always makes the team's chances of winning better. An out, on the other hand, (almost) always reduces the team's chances of winning. Hits are good; outs are bad.

On the other hand, as Roland Beech points out, an 8-yard gain on third down is sometimes good (when it's third-and-7) and sometimes bad (when it's fourth-and-9). A one-yard gain is sometimes good (on third-and-inches) and sometimes bad (on first-and-10). But QB score always counts the 8-yard gain as a positive, and the 1-yard-gain as a negative.

It seems intuitively wrong to give the quarterback a "credit" for failing to make a first down, or to "penalize" his statistics for converting a third-and-inches. In baseball, we often don't credit the situation properly, but at least we always give it the right sign. That's partly what's disconcerting to me about QB Score, that we sometimes reward failure or penalize success. It would be disconcerting to be watching a game, see the quarterback convert a two-yard pass on third-and-1, and realize that made his rating go *down*.

But if these anomalies cancel out over the long term, and QB Score actually "works," should we care that some of the component plays aren't individually accurate? I'm not sure. Maybe we're just spoiled. The structure of baseball, basketball, and hockey just happen to be such that the atom of performance we analyze – the plate appearance or the possession –turns out to be unambiguously good or bad. In football, the sign of the atom just happens to be dependent on the situation, and we're not used to that. It might be a compromise we just have to accept, unless we're willing to move to a different atom, like the "first down" or the "possession". But those comprise many separate plays, and make it impossible to isolate the performance of individual players, like the quarterback.

All this, of course, is contingent on QB Score actually being shown to work well for all types of teams and quarterbacks. I haven't seen any studies validating QB Score, and I think that's still the biggest reason for doubt. But if we want to see any non-situational "box score" statistic for football, we may have to accept that it's sometimes going to get the signs wrong at the level of individual plays.

Labels: ,

Monday, February 12, 2007

A "value added" Super Bowl study

Economists love gambling markets data almost as much as sabermetricians love Retrosheet data. And so it's not that suprising that less than VII days after Super Bowl XLI, an academic economist has run an analysis of the betting market during the big game.

It's by Keith Jacks Gamble, and it's called "
An Analysis of the Super Bowl Using Price Changes on an Online Prediction Exchange." It's funny that he calls it "online prediction" instead of "online gambling," especially considering his name.

Anyway, I don't want to sound like I'm making fun of it, because it's a really nice paper. Gamble followed the market's estimate of the teams' winning probabilities during the game, and checked how those changed after specific big plays. For instance, the Bears' touchdown on the opening kickoff reduced the Colts' chances of winning – according to the market -- from 68% to 57.25%. And after the Colts threw an interception on their first drive, the Colts' chances dropped even farther, down to 49.25%.

(One thing I wish is that Gamble had told us more about the bid-ask spread. Gamble took the market's probability estimate as the mean of the spread; this is probably decent enough if the spread is narrow, but not so much if the spread is wide. That's because market efficiency puts the "correct" probability somewhere between the bid and ask, but not necessarily the middle. That is, in an efficient market, you don't see a 20-dollar-bill to have a bid-ask of $18/$19 – because that would enable easy profits to be made buying them up at $19. But $15/$20 is possible in a rational market, as are $20/$25 and $19.99/$20.01. And only in that last case does averaging the spread lead to the "right" estimate.

So if Gamble got his 68% from a bid/ask of, say, 65%-71%, there's more reason to doubt it than if it came from 67.5%/68.5%.)

There's no guarantee, of course, that the market is correctly calculating the probabilities – but economic conventional wisdom says it should be close. And it does seem pretty reasonable. The Colts were favored by 7 points over 60 minutes. After the Bears' touchdown, the seven points were erased – but the Colts now would expect one extra possession over the rest of the game. That extra possession is worth something – maybe two or three points? – so a probability of 57.25% seems about right.

After the subsequent interception that gave possession back to the Bears, they were up by 7. If the Colts normally would outscore the Bears by 7 in 60 minutes, their advantage might be down to only 6.5 with the time remaining in the game – giving a half-point advantage to the Bears. And now the number of possessions should be about even (since the Bears had the ball now, but the Colts would receive the kickoff in the second half). So you'd expect the Colts to be at about a half-point disadvantage now, which is reasonably consistent with the observed 49.25%.

(It isn't enough to consider possessions and time alone – there's still the issue of field position. It's possible that if you take field position into account (discussed a bit
here), you might come up with 49.25% exactly.)

Gamble then computes the game's MVP by adding up all the probability changes of the plays he was involved in – similar to the "
value added" method for baseball. I'm not absolutely thrilled by this method, because of the difficulty of assigning the individual plays to the individual players, and because using the specific probabilities tends to reward "clutchness," which is sensitive to the time and situation.

But, in any case, Kelvin Hayden was the most valuable player by this method, adding 18 percent of a win on the strength of his one interception/touchdown return; the top Bear was Muhsin Muhammad, at 12.25. (But illustrating one of the weaknesses of the method – the Hayden interception was on a pass meant for Muhammad. If you attribute any of that interception's impact to Muhammad, he becomes much less valuable.)

Rex Grossman was the biggest negative impacter on both teams, with –36.5. Peyton Manning was a quite-respectable 9.75.

(Hat tip:
Marginal Revolution.)

Labels: , ,

Friday, February 09, 2007

"Win Zone" gives golfer win probabilities for tournaments in progress

Suppose that after the first round of a four-round PGA golf tournament, Tiger Woods is at 67 while some guy you never heard of is leading at 66. Who has the better chance of being the eventual winner?

It's obviously Tiger: he's capable of putting a good score together every round, while the leader probably just had a lucky Thursday. Tiger is likely to wind up around 67-67-67-67, while the other guy will probably go something like 66-73-78-74.

A new statistic from the Golf Channel, called "
Win Zone," follows this logic to try to estimate players' chances of winding up the winner of the tournament. It looks at the players' histories, their scores, what hole they're on, how hard the course is, and so on, and comes up with a number representing the player's actual chances.

The Golf Channel website describes the system via a text summary and a video (follow above link), but doesn't give a lot of details on how it works. They do say that it runs "over 2 million calculations every minute," which suggests a simulation, or perhaps a Markov Chain analysis (but a simulation seems much more likely).

As I write this, the second round of the "2007 AT&T National Pebble Beach Pro-Am" is complete.
Here are the Golf Channel's top 10 in "Win Zone," along with those players' leaderboard stats. (This link is probably good until tomorrow when the next round starts.)

01. 35.7% Jim Furyk .......... –12 (tied for 1st)
02. 33.0% Phil Mickelson ..... –12 (tied for 1st)
03. 11.1% John Mallinger ..... – 9 (tied for 3rd)
04. 06.5% Kevin Sutherland ... – 9 (tied for 3rd)
05. 03.4% Craig Kanada ....... – 7 (tied for 5th)
06. 03.0% Davis Love III ..... – 7 (tied for 5th)
07. 02.5% Mark Hensby ........ – 7 (tied for 5th)
08. 01.8% Vijay Singh ........ – 5 (tied for 11th)
09. 01.5% Aaron Baddeley ..... – 5 (tied for 11th)
10. 01.0% Jesper Parnevik .... – 4 (tied for 18th)
10. 01.0% Atwal Arjun ........ – 2 (tied for 40th)
10. 01.0% Kirk Triplett ...... – 2 (tied for 40th)
10. 01.0% Justin Leonard ..... – 1 (tied for 57th)

This is pretty much what you'd expect – but I'm surprised at how Justin Leonard is seen to have as good a chance to win as Jesper Parnevik. Parnevik is three strokes ahead of Leonard, and has only 17 players to pass. Leonard has to make up 11 strokes in two rounds, and jump over 56 other players on the way up.

You'd expect Leonard's high probability of winning, despite his mediocre score so far, would mean he's a better golfer than Parnevik. It doesn't seem that way. I don't follow golf much, but Leonard has a "World Golf Rating" (whatever that is) of 176th, while Parnevik is 107th. So that can't be it.

What is it, then? According to the Golf Channel article and video, the ranking takes into account such other factors as:

-- who played well on the course in previous rounds of this tournament
-- how players have done on this course in the past
-- how players have done on *specific holes* of this course in the past.

So I'm wondering whether Win Zone isn't reading too much into the small samples of previous player/course/hole results. There's no way to tell, because they don't give details of their calculations or weightings. But I'd be wary of any statistic that considers whether a player is on a "hot hand," in light of the fact that studies have generally been unable to find such an effect in other sports.

Aside from that, does Win Zone work?

Well, there's really no way to know; Win Zone is new, and there's not enough data to analyze. The Golf Channel's arguments in its favor aren't really all that relevant. For instance, they tell us that Win Zone gives a better chance of picking the winner than just the current leaderboard. But that's not much of an achievement – in the example in the first paragraph, it would be obvious to any fan that Tiger had a better chance of winning than his no-name opponent, and we wouldn't need a fancy methodology to tell us that.

(Also, I disagree with some of the explanations in the video – for instance, they argue that a Win Zone probability of 50% is a "milestone," because at that point "the odds are with you." To which I say, "so?" Why should 50.1% be that much more important than 49.9%?)

But one reasonable way to check the system is to compare it to "market odds" from

Furyk ...... 31.8 to 34.3% (Win Zone says 35.7%)
Mickelson .. 33.2 to 35.5% (Win Zone says 33.0%)
Love III .... 3.8 to 6.7 % (Win Zone says 3.0%)
Singh ....... 1.9 to 4.3 % (Win Zone says 1.8%)

Of course, the bettors' actions could be influenced somewhat by Win Zone – but betting markets tend to be pretty smart, and I trust their estimates to be hard to beat.

So it seems to me that Win Zone would give you a reasonably accurate rundown of the probabilities, at least for the favorites. I do wish they had given more details of their system, but even without those details, their win probabilities are a useful addition to the regular leaderboard stats.

(Thanks to John Matthew IV for the pointer.)


Wednesday, February 07, 2007

Robin Hanson: Will academics respect blog postings?

Pertinent to the discussion on academic vs. "amateur" sabermetric research, Robin Hanson posts at Overcoming Bias:

"So can we create an academic blog world, where blog posts get academic credit? If someone gets a Nobel prize for developing an idea that was first explained in someone else's carefully written but short blog post, will that blog author be celebrated, or will he be ignored as the sort of distraction that academics can't be expected to pay attention to?"

Worth reading the whole thing.

Also from Hanson's post,

"People almost never look up ten year old newspaper columns, but they do often read ten year old academic papers. So an academic paper may still have a better chance at long term influence than a newspaper column."

The latter quote is why I think that publishing research in "By the Numbers" is valuable, even if that research is already published online. As BTN editor, I admittedly may be biased, and a lot of people disagree with me on this – Tangotiger, for instance (see comments starting


Monday, February 05, 2007

Earl Weaver muses on the value of baserunning speed

Remember when Bill James sort-of-quoted Earl Weaver on the sacrifice? If I remember, he had Earl saying something like "you can take the bunt and shove it up someone’s ass and leave it there."

I thought Bill was somewhat joking. But now, after hearing
this (courtesy of Jeff Merron), I’m thinking that was probably a direct quote from Earl.

Now I much better understand why Bill liked Earl so much.

How much of home field advantage is the officiating?

There are many factors that could cause home field advantage (HFA): refereeing, travel, park familiarity, crowd support, and many more. And there have been studies that have checked some of these. In “The Diamond Appraised,” for instance, Craig Wright found no effect for crowd size, and that teams playing in brand new parks had only slightly below-average HFAs. But, when the new park was in the same home city, the HFA was actually higher than average – it was brought down by lower-than-average HFAs for the new-city parks.

Wright also found a huge HFA for triples – he advanced the theory that players at home know exactly how to play balls hit off the wall to limit opposing hitters to two bases.

I just ran across this
recent NBA study on home field advantage by Roland Beech. What that study suggests to me is that refereeing plays a significant role.

The study compared NBA home and road statistics in several different categories. The home team was superior in every category – with one exception: free throw percentage. Here are some of the results:

............ Home ... Road

FG% ........ .460 ... .447
Off. Reb. %. 30.6 ... 28.5
Def. Reb. %. 71.5 ... 69.4
FTA ........ 27.0 ... 25.6
FT% ........ .746 ... .744

Where does the refereeing come in? Well, free throw percentage is the one category in basketball where there’s no possible influence from the referees. And in that one category, home teams had almost exactly the same performance as road teams.

That’s not what I expected to see. I would have expected a modest improvement in every category across the board. If players are just physically and mentally not at their prime when on the road, that should affect free throws too. But there was hardly any difference there.

With regard to foul calling, Beech notes that the the free-throw HFA accounted for 1.3 points per game. The overall HFA was 3.4 points per game. Still, that’s pretty significant, 38% of the total. You’d have to acknowledge, though, that not all of that difference – in fact, not any of it -- need be referee bias. It could just be that road players commit more infractions than home players. But still, the difference is suggestive.

And all the other categories could also be significantly influenced by referees. If road players know they are more likely to be called for a foul, they will play less agressively. And so even in cases where a foul isn’t called, their stats should be lower, both on offense and defense.

Similar research for baseball can be found in
this study from Tom Meagher (previously mentioned in this blog in the comments here). Meagher found that most of the HFA can be explained by walks and strikeouts. In fact, if home and road teams were equal in every category except K and BB, the home team would have a winning percentage of about .520. Since the actual HFA is only about .532, we can see that balls and strikes form a very large proportion of total home field advantage. And, of course, balls and strikes are the category in which umpire judgement figures most prominently.

And, as in basketball, the other categories would also be influenced by umpires. A pitcher on the road who notices a shrunken strike zone will get behind in the count and have to give batters better pitches to hit. This could account for the fact that there is still some HFA in the results of ground balls put into play – the umpire effect would mean home batters hit “better” ground balls off worse pitches.

Following this line of thinking, it’s even theoretically possible that *all* of the home field advantage could be attributed to refereeing. I don’t think that’s the case – I believe that other factors exist (Wright’s explanation of the triples effect, for instance, seems right to me). But I do wonder if refereeing is a bigger effect than originally thought.

How would you find out? I’m not sure you can. But you can start by finding categories where, as for free-throw percentage, refereeing should have little or no effect. You could look at NFL field goal conversions, or the results of NHL shootouts. If every such category shows no advantage to the home team, that would reinforce the theory that HFA is predominantly refereeing.

And you can also look at sports where refereeing isn’t a factor at all – golf or tennis, say, or bowling. The HFA in those sports could give a strong suggestion of how much of the advantage in other sports is inherent in the players’ performances. Anything above that amount, in refereed sports, we’d suspect is a function of the officiating.

Labels: , ,

Sunday, February 04, 2007

Levitt: the Super Bowl betting line is wrong

I think Steven Levitt is joking with us.

In this
blog post, Levitt argues that the Super Bowl oddsmakers have it wrong. Indianapolis was favored by 6.5 points. Levitt wonders why that could be, considering the Bears defeated their opponents by a much higher margin than the Colts did theirs:

"A good rule of thumb during the regular season is the spread is equal to half of the gap between the two teams point differentials in games so far, adjusted for the home field advantage. During the regular season, Indianapolis outscored its opponents by 67 points. Chicago outscored its opponents by 172 points. During the playoffs both teams outscored their opponents by 28 points (Indy in 3 games, Chicago in 2 games). By this usually reliable rule of thumb, Chicago should be favored by 2 or 3 points ...

"So I’ve got my money on the Bears ..."

It’s unusual for an economist to discount the opinion of a market, even in the best of times with the best of analysis. To casually dismiss the market’s opinion based on such a simple rule of thumb, without even trying to figure out what other factors are being taken into account – quality of opposition, and so forth – would be virtually unheard of.

So I suspect Levitt is just having a little fun with us here.

Labels: ,