The Wharton "Clemens Report" criticism -- Part II
At Freakonomics today, Justin Wolfers has a follow-up to yesterday's New York Times piece (for which, see my previous post).
In yesterday's article, Wolfers (and his three Wharton co-authors) showed that Roger Clemens' career trajectory is very different from the average veteran pitcher's. Today, he shows all 31 curves instead of just Clemens and the average:
(Click here for full-size)
Looking more closely at the methodology convinces me even more that the article's conclusions are inappropriate. For one thing, there are a few lines that are pretty close to Clemens'. For another thing, extrapolating the lines shows that many are a very poor fit -- Nolan Ryan, for instance, looks like he can pitch effectively at least into his 80s.
And if you look at what the regressions are actually doing, it turns out the curves can't have much to do with the effects of steroids at all.
The methodology that created the curves is such that, when you fit a line to a career, it has to be quadratic, which means the line has to be symmetrically U-shaped. (The U can be right-side up, or upside down, but must be symmetrical.) Therefore, there is an implicit assumption built in: that the slope of the player's improvement as he approaches his peak (or, in Clemens' case, the slope of his decline to the trough) has to equal the slope of his decline after the peak (in Clemens' case, the slope of his improvement after the trough).
That is: if a player's peak is (say) 32, the model insists that his numbers at 31 equal his numbers at 33; that his numbers at 25 equal his numbers at 39; and so on. (That's why Nolan Ryan's curve looks like he can pitch forever.)
What this means is that the shape of the curve depends, equally, on both ends of the player's career. What the Wharton curve is doing is not evaluating the player's old-age performance, but, rather, comparing the middle of the pitcher's career *to both ends*.
Now, of the 31 pitchers in the curve, many of them probably had sub-par starts to their careers. Take, for instance, Nolan Ryan. His first five seasons were all above his career average in WHIP. This helps keep his curve concave. If you look at the later part of his career, from 1979 to 1991, he was godlike – but those early years keep his curve from looking like that of Roger Clemens.
For his part, Clemens, started out well: his first five seasons, as a whole, were roughly in line with his career. So he doesn't get that initial downhill momentum that would lift the right-end of his curve in symmetry.
Which brings up another point: Clemens' "right end" is also excellent. Eventually he will age, and it will decline. Even if he now retires, is there any doubt that, if he kept pitching, he would *eventually* decline? Give him a few more years of pitching, and he'll look like other pitchers who were effective from the beginning but faltered with age – and his trajectory will look more like the others.
Most excellent pitchers nonetheless start out simply average, and end with a few mediocre seasons. Clemens started out well, and hasn't hit his decline phase yet. His curve is flatter than the 31 other pitchers because he is the only one who:
(1) Started out pretty well;
(2) Hasn't had many mediocre career-ending years yet;
(3) Happened to have his two worst years right in the middle of his career.
If you believe the Wolfers curve indicates steroids, then you have to believe that the above three points also indicate steroids.
But (1) has nothing to do with steroids, and (3) simply has to do with the timing of the study. So you're left with (2). That has very little value as evidence; and, in any case, it doesn't require the fitting of quadratic curves.
So the Wharton study doesn't really tell us much of anything.
And, when you think about it, how can a career curve tell you much about steroids anyway? If steroids make you better, they'll let you play longer before the inevitable decline. That will stretch out your career trajectory, but not change its basic convex shape.