"Clemens Report" criticism misses the point
A couple of weeks ago, Hendricks Sports Management (HSM), Roger Clemens' agents, put together a document purporting to show that Clemens' late-career effectiveness was not unusual, compared to certain other great pitchers with long careers. While the report doesn't mention steroids at all, the intent of the report is clear: to show that you can't conclude any illegal behavior on Clemens' part simply by the fact that he remained effective late in his career.
An article in today's New York Times, by Eric Bradlow, Shane Jensen, Justin Wolfers, and Adi Wyner (BJWW), tries to debunk that HSM "Roger Clemens Report." In my opinion, it fails.
BJWW criticize the Clemens Report on the main grounds that if you want to see if Clemens' career trajectory is unusual, he should be compared to *all* "durable" pitchers, not just the three pitchers (Randy Johnson, Curt Schilling, Nolan Ryan) that Clemens' defenders chose.
So they found the 31 pitchers since 1968 with at least 15 seasons of 10 starts and 3000 IP over their careers. They plotted Clemens' career trajectory against the average of the group of 31. Here's the chart (am I allowed to show it here under fair use laws? Hope so.)
Clemens is markedly different: the average pitcher shows a U-shaped curve: an improvement up to about age 31, then a decline to the end of his career. Clemens, on the other hand, shows a straight line with a slight decline (for ERA), and an *opposite* U-shaped curve for WHIP: getting worse up to about age 37, then improving after that.
Therefore, the authors say, Clemens really IS unusual. His "statisticians-for-hire" agents are guilty of selection bias. "A careful analysis, and a better informed public, are the best defense against such smoke and mirrors."
Well, I don’t agree. I think BJWW should also have done a more careful analysis, and thought about their conclusions a bit more.
First: is this group of 31 pitchers (which, by the way, BJWW don't list) really the best control group to use? It is well-known among sabermetricians, since Bill James discovered it back in the 1980s, that power pitchers have much longer career expectations than control pitchers. Comparing Clemens to a mix of power- and control-pitchers would bias the group against him.
In their article, BJWW conclude that the graphs show Clemens to be "unusual" compared to the other pitchers. Well, of course he's unusual compared to most pitchers: he is an extreme power pitcher, of a type that has been shown, over 20 years ago, to have significantly longer careers than others! The Times authors think they have evidence that Clemens is on steroids, but what they've probably found is just evidence that Clemens is a power pitcher!
And this is the *less* important criticism of the Times article.
The second, and absolutely the most important point, is the authors are attacking a straw man. Clemens' agents are NOT saying that his career is *usual* – they are saying his career is *not unprecedented by a non-steroid user*. There's a big difference there, and it's not one of statistics or regressions or comparisons – it's one of common logic.
The public was saying, "look – Clemens' longevity is unusual – therefore he's probably taking steroids." HSM is replying, "Clemens' career is unusual, but not THAT unusual. Indeed, here are three pitchers with similar career trajectories, and nobody is saying *they* took steroids."
That's a convincing reply. To rebut it, it's not enough to show that Clemens' career is even farther from the average than HSM said – because even if that's true, it's irrelevant. The HSM argument doesn't depend on the average – it depends on the extremes. What HSM is saying is, "look, you have to understand, there is a certain type of pitcher, very atypical, who has this kind of career. It's not an outlier, it's not that rare, Clemens fits right in to that group, and it has nothing to do with steroids."
Look at it this way: suppose that five years ago, your neighbor Clem, down the street, comes into some money and builds a big extension on his house and buys a Ferrari. People think he robbed a bank or something. Subpoenaed to appear before a congressional investigation, he denies that he stole the money.
But the public still thinks Clem is a thief. Clem hires a lawyer to rebuff the charges. The lawyer says, look, Clem won the lottery in 2003, that's how he got rich. There's no theft at all. In fact, here are three other well-regarded rich guys who also won the lottery – Ryan, Schilling, and Johnson. They're rich too, and nobody thinks THEY stole anything! See, it's quite possible to get rich without robbing a bank, so lay off my client!
Then, four reporters, in a New York Times investigative article, say, well, why the heck should we compare Clem to only these three guys, cherry picked by Clem's lawyer? We should compare him to *everyone* who made a million dollars ever! They do, and find that, of everyone who made a million dollars in 2003, most of them were CEOs, and made similar amounts in 2004, 2005, and 2006. But Clem didn't make anything in those years – his career earnings trajectory is very different from the average million-dollar earner. See? We *should* be suspicious that Clem robbed a bank! His agents are full of crap!
Well, that argument is obviously silly -- but it's exactly the argument the Times authors make.
Even if the statistical analysis is correct, it simply doesn't matter whether Clem's earnings vary from CEOs. What matters is whether other people have won the lottery, and whether it's reasonable to think that Clem did too.
The relevant baseball question is not "how far is Roger Clemens from the norm?" The question is: "If a player is as far from the norm as Roger Clemens, what is the chance that he took steroids?"
And the answer is: if you acknowledge that Schilling, Ryan, and Johnson have roughly a similar career trajectory as Clemens, and you believe that none of them took steroids, then, from the statistical evidence alone, your first estimate of the probability Clemens cheated should be approximately *zero*.