Sabermetric Research: Do women choke in pressure situations?

Of top corporate executives, only one in 40 is a woman. That's perhaps because women have a higher tendency to choke under pressure.

That's according to a study by Israeli researcher M. Daniele Paserman. Paserman analyzed the results of men's and women's tennis tournaments. He found that men scored roughly the same percentage of "unforced errors" regardless of the clutchness of the situation, but women's unforced error rate went up signficiantly when the chips were down.

(Hat tip to Slate and Steven E. Landsburg, whose summary of the study is here.)

(Tennis scoring is described here. Basically, four points make a game and six games make a set. A match is best of three sets for women, or best of five sets for men. There are other rules, like you have to win by two – see the link for details.)

Generally, points are divided, after the fact, into one of three types. "Winners" are volleys that the other player can't handle. "Unforced errors" occur when an opponent "has time to prepare and position himself" but makes a point-ending mistake. And "forced errors" are mistakes that were in part caused by a skilled return from the opponent. These are just to describe what happened – they don't affect scoring or ranking or anything.

Men and women have different percentages of the error types. According to the study, the numbers look like this:

Men .... 31% winners, 30% unforced errors, 35% forced errors
Women .. 30% winners, 37% unforced errors, 30% forced errors

The study doesn’t go into detail about why the overall numbers are different, but my guess is that it has to do with the relative strength of the sexes. Men, being stronger, have faster serves and returns than women; faster volleys would lead to opponents having less time to react to a shot, which would increase the incidence of forced errors. The more forced errors, the fewer unforced errors (since the three types sum roughly to 100%).

That may not be the correct explanation, but, in any case, the point is that womens' higher percentage of unforced errors doesn't necessarily mean they're chokers, or more careless players in general. (And the study makes no such claim.)

Paserman starts by comparing unforced error rates in two situations – one more important (the last set of the match – 5th for men, 3rd for women), and the other less important (all other sets). After adjusting for a whole bunch of things – player quality, location, etc. -- he finds that men make about 1.4 percentage points more unforced errors in the important situations, but women make 2.9 percentage points more:

Men .......... +1.4 percentage points (1.9 standard deviations)
Women ........ +2.9 percentage points (3.5 standard deviations)
Difference: .. +1.5 percentage points (1.4 standard deviations)

The results moderate slightly when, instead of using a binary "yes/no" variable corresponding to the last set, he uses a sliding scale for how "important" the set is. In that situation, the women's number falls to +2.4 (1.9 standard deviations).

Paserman then switches from set-level data to play-by-play data, which is where it gets more interesting. First, he figures out the "importance" of any given point, in terms of the magnitude in which it affects the outcome of the set. In a tiebreaker, importance would be high, but when one player already leads 5 games to zero, it would be very small. (In baseball, Tangotiger calls this "leverage".)

Then, he divides all the situations into quartiles, and calculates unforced error percentage in each.

Lowest importance ... Men 31% ... Women 34%
Quartile 2 .......... Men 30% ... Women 37%
Quartile 3 .......... Men 31% ... Women 39%
Highest importance .. Men 31% ... Women 40%

As you can see, women make almost 15% more unforced errors in high-pressure situations than in low-pressure situations. And there is no such effect for men.

These are controversial results, and Paserman ran a few checks to see if other factors might account for the effect.

-- he adds variables for ability, which set in the match it is, which round of the tournament it is, and others. The effect remained.
-- he checked whether changing the definition of "importance" would change the results. The effect remained.
-- he replaced the "imporance" variable with a series of dummy variables representing the actual game score. The effect remained. In fact, that "explains more than 40 percent of the variation in ... unforced errors."

Paserman then hypothesized that the errors might be the result of risk aversion. If women play safer with the match on the line, that will make things easier for opponents, thus reduce the number of winning shots and forced errors. (As Steven Landsburg writes, "If both players just keep lobbing the ball back and forth, there can't be any forced errors, so all errors are recorded as unforced.")

To check for risk aversion, Paserman looked at first serves. (In tennis, a player is given two chances to serve: if the first one fails, the server gets one more opportunity.) If women are risk averse in the clutch, this should lead to a higher percentage of successful first serves.

And it does:

Lowest importance .......... baseline
Quartile 2 .......... Men +10% ... Women +11%
Quartile 3 .......... Men + 7% ... Women +25%
Highest importance .. Men + 7% ... Women +28%

If I've interpreted the results of the study properly, women are a full 28% more likely to hit a legal first serve in clutch situations than in meaningless situations. That can be considered a "conservative" strategy because there's a tradeoff between effectiveness and legalness – a risky shot is more likely to be out, but also more likely to win the point if it's in. And, in this case, the strategy of more legal serves was costly – women servers were 11% less likely to win the clutch points, even with (or perhaps "because of") the greater proportion of legal first serves.

Also, men hit harder first serves in more important situations (+1.5 mph difference between the two extremes of importance), while women hit softer ones (-1.5 mph). In second serves, men's speeds dropped by 1.7 mph, but women's dropped by a much larger 3.5 mph. Again, this is evidence that women are more conservative.

Finally, conservative play is evident in the number of strokes per rally. Men's strokes per point (both players combined)increased by 1.4, and women's by 1.2. But in this measure, it looks like it's men who are more conservative here.

In his conclusions, Paserman does argue that it would be dangerous to extrapolate from poor judgement in tennis results to poor judgement in corporate management. However, he does say that

"... this sample is more representative of the extreme right tail of the talent distribution that is of interest for understanding the large under-representation of women in top corporate jobs ..."

The impression you get is that Paserman does think the results have some value as evidence on the question of women's achievement in other fields.

----------------------

Well, I'm not so sure.

First, here are the percentages for the three types of women's shots for all for quintiles:

Lowest importance ... 29% winners, 34% unforced, 31% forced
Quartile 2 .......... 30% winners, 37% unforced, 31% forced
Quartile 3 .......... 29% winners, 39% unforced, 30% forced
Highest importance .. 30% winners, 40% unforced, 28% forced

Notice that the percentage of winners is constant, and the percentage of forced errors is also pretty steady. It's the percentage of unforced errors that varies a bit. But even then, and as the commenter at the end of Landsburg's article argues, it's mostly the 34% figure that's responsible for the trend. Change the top-middle cell to, say, 38%, and the effect pretty much disappears.

And that's not so farfetched. The first line adds up to only 94%, but the other three lines add up to 98%. Where did the other 4% go? Paserman doesn't tell us. If you add it on to unforced errors, you get to 38%, and there's barely any clutch effect left at all! Is it possible that some unforced errors are being misclassified for the low importance case?

So that's an important unanswered question.

Here's another: is it possible that the differences between men and women's tennis might explain some of the differences?

For instance, there's the difference in rules -- five sets for men against three sets for women. And there are strategic differences. Just a few days ago, an article in the Globe and Mail talked about " ... the significance of strategy in the women's game -- as opposed to the men's game which is often dictated by power serves."

Could these be part of the explanation? Here's one attempt:

Suppose that on any given shot, you can try a strategy varying between two extremes. You can hit the ball very hard, hoping to catch your opponent off-guard and win the point right there. Alternatively, you can lob the ball over, hoping your opponent makes an unforced error and gives you the point.

Now, consider the "least important" quartile. Those are situations when it's obvious who's going to win the set. Maybe it's 5-0 in games. In that case, it would take a miracle for the other player to come back. Therefore, winning a point in this situation isn't that important. What's more important is wearing the other player out. If one player can make the opponent run around and get tired out, s/he gains an advantage that way even if the results of that set aren't affected.

What's the best way to tire the opponent out while not tiring yourself out? Maybe, instead of lobbing the ball, hit it hard. That way, the unwary opponent has to start and stop, change direction quickly, and perhaps run all over the place. And there's a good chance the power shot will end the point, so maybe you yourself won't have to run any more at all.

Of course, the opponent is playing the same strategy, and will exert less effort to return those hard shots (since it doesn't matter much who wins that point). This leads to shorter rallies (as observed).

Now, suppose the harder you hit the ball, the less control you have. Men hit the ball harder than women, and so more of their hard shots fail. That's an unforced error. And so, in the least important situations, men show up as having (relatively) more unforced errors than women.

Implausible, perhaps. But the idea that women intrinsically have trouble under pressure, isn't that implausible too?

Or another possibility: Suppose that judges tend to call an error "forced" if it came from a volley above X mph, but "unforced" if it came from a volley below X mph. If X is chosen such that it's easy for a man to achieve, but harder for a woman to achieve, that would lead to more unforced errors called against women overall (which is what the data show). In the lowest quartile, both men and women might try to hit hard shots to wear their opponents out. Even so, men would have roughly the same number of Xs as in important situations, since they hit Xs as a matter of course. But, if women only hit Xs by putting out extra effort, those shots will disproportionately appear in the first quartile, which will move that category's errors from unforced to forced. And again, that's what we see in the data.

Finally, a third theory: when the set is 5-0 and the outcome is all but certain, men have more incentive to slack off than women do. That's because the men have five sets to play, rather than three, and have a greater need to conserve energy. This leads, somehow, to a different pattern of play, one that leads to more "unimportant" unforced errors than for the women. I don't know what that pattern might be, but there is good reason to think it should be different.

Anyway, as unlikely as these hypotheses may seem, they do seem more reasonable than the "successful women choke under pressure" theory. Are there any other plausible theories that I didn't think of?

Labels: clutch, gender, tennis

8 Comments:

At Monday, February 26, 2007 4:56:00 PM, Anonymous said...: Very interesting set of issues. I do think you need to distinguish between the conclusion that women play more conservatively in pressure situations and the conclusion that they "choke." From this data, it certainly looks like the first is true. But it's only choking if it results in a reduced likelihood of winning, and that's not clear from your summary.

A few thoughts:

1) Could the missing percentage be service aces (or service winners)? That would be consistent with the tendency toward more conservative serving as leverage increases. And if so, it means that women are serving -- and probably playing -- more conservatively on high-leverage points.

2) Given the definition of importance/leverage, these points will come disproportionately in games among comparably skilled players, less often in mismatches. While the regressions try to control for ranking of each player, they don't appear to control for the level of parity. It could be that winners and forced errors are more likely in mismatches among women, but less likely when parity is high, such that unforced errors necessarily rise in matches between peers (while this is not true for men). Also, the structure of tennis tournament ensures that matchups of peers, and thus high-lvg points, will tend to come late in tournaments, while mismatches come in the early rounds.)

3) It seems possible to me that the "correct" level of risk/caution may be different in matches btwn players of equal talent as opposed to mismatches. And the risk/reward calculation for aggressive play also may not be the same for men as for women. For example, men probably hit more service winners, and the differential between 1st serve and 2nd serve success may not be the same for both genders. It would be interesting to see if women who display the biggest risk aversion are any less successful than those whose high-leverage play shows the least change.
At Monday, February 26, 2007 5:09:00 PM, Phil Birnbaum said...: Guy, interesting points.

The study can't tell you about reduced likelihood of winning, because it pooled both players. The author did find that the server's chance of winning the point was lower in the more conservative serve situations, but that's all. And, as you point out, there may be other reasons for that.

But you're right -- a higher percentage of unforced errors doesn't necessarily mean worse play or choking. A singles hitter might try for a home run in a 9th-inning tie with two outs, but the increase in strikeouts doesn't mean it's the wrong strategy.
At Tuesday, February 27, 2007 10:44:00 AM, Anonymous said...: The chance of winning the point was lower in the 4th quartile (high-leverage), as you say, but also for the 2nd and 3rd quartiles by about the same magnitude. In other words, the server is more likely to win very low-leverage points. I'm not sure that really tells us whether more conservative serving is an effective strategy.

There's a lot being mixed up here. The high-leverage points will be played disproportionately 1) among players of similar talent, 2) players of above-avg talent (weak players never get to play other weak players), and 3) in later rounds. The regressions try to control for all of that, but you have to wonder if they succeed. It could just be that in matches between similarly-ranked players the server tends to win less often, and/or that such matches tend to have fewer winners/forced errors and more unforced errors.

I think you could better control for a lot of these factors by looking at each individual match, and calculating the ratio of high-leverage unforced errors to low-leverage unforced errors within each match. That would perfectly control for players, round, conditions, etc. You could then see whether, for any two players, there's a tendency to have more errors in high-leverage situations.

(More generally, I think academic sports researchers could design studies that take more advantage of natural controls, and rely less on the regression model to try to do that.)
At Wednesday, February 28, 2007 3:49:00 PM, Anonymous said...: Good points, Guy. I think you are right there is a bias here that a greater than average share of high-leverage points come against players of similar (and above-average) ability. That probably affects all the results.

The most striking statistic for me was the +28% first serve % in the top quartile of women leverage points. (Confirmed by serve speed) I cannot come up with a rationale explanation for why a serving strategy (with respect to risk choice) would change depending on the leverage situation.

To me, regardless of the importance of the point, a server would choose along the a continuum of no risk to high risk serves in their arsenal to maximize the probability of winning the point. The probability of point win-maximizing level of riskiness is not a function of the importance of the point...

This is different than in football (for example) where a risky offensive play could lead to 7 the other way. Here, the worst that can happen on any point is that you lose the point.

I wonder if there are other examples of cautious play correlating positively with leverage of the play even if every play should have the same strategy.

Jump-serves in volleyball, penalty kicks in soccer (kick middle vs. corner)?
At Wednesday, February 28, 2007 8:43:00 PM, Phil Birnbaum said...: Nate,

When I play squash, my serving arm gets weak towards the end of the match. So if one of the games is lopsided, I ease up on my serves to save my arm for more important times later.

Same strategy could be at work in tennis.
At Thursday, March 01, 2007 9:50:00 AM, Anonymous said...: Phil, by "ease up" do you mean slow down your serve? From the data for women, a lopsided game (low leverage) results in more aggressive serving (faster and lower accuracy).

If you are saying: "in a lopsided game of squash I'll make a faster and more aggressive serve that will either be an ace or a fault so I don't have to tire myself in a long rally" then that makes sense to me.
At Thursday, March 01, 2007 9:57:00 AM, Phil Birnbaum said...: You're right, my effect is opposite to the one in the study -- I slow my serve down in lopsided games, while the tennis players slowed their serves in close games.

But that doesn't really affect my point, which is that avoiding fatigue would be one reason to play differently in low-leverage situations than in high-leverage situations. But, agreed, that doesn't explain the observations, which go the other way. Maybe, as you suggest, the object is to cause shorter rallies -- that's where the fatigue might be, in the cardio rather than in the arm.

FWIW, in my squash example, service faults are uncommon, so there is no reason to slow down just for purposes of making a legal serve, like there is in tennis.
At Thursday, March 01, 2007 10:17:00 AM, Anonymous said...: Again, I think the level of parity may be the key factor. In mismatches (big difference in player rankings), both players have a potential incentive to play aggressively: the better player wants to get match over with and save strength for subsequent, more important, matches; the weaker player's only hope of victory is likely that his/her 1st serve and passing shots are "on" that day. So the better player is not pursuing a strategy that maximizes the chance of winning that point; the weaker player probably is, but in a more competitive match the ideal level of aggressiveness for that same player might be lower. And of course, the low-importance quartile points come mainly from mismatches, and vice-versa (Federer probably never plays a high-importance point until at least the semi-finals!).

I don't know if there is such a link between talent parity and aggressiveness, or that it's stronger for women than for men, but it seems plausible.

<< Home

Sabermetric Research

Monday, February 26, 2007

Do women choke in pressure situations?

8 Comments:

About Me

Previous Posts