Monday, February 11, 2008

The Wharton "Clemens Report" criticism -- Part II

At Freakonomics today, Justin Wolfers has a follow-up to yesterday's New York Times piece (for which, see my previous post).

In yesterday's article, Wolfers (and his three Wharton co-authors) showed that Roger Clemens' career trajectory is very different from the average veteran pitcher's. Today, he shows all 31 curves instead of just Clemens and the average:

(Click here for full-size)

Looking more closely at the methodology convinces me even more that the article's conclusions are inappropriate. For one thing, there are a few lines that are pretty close to Clemens'. For another thing, extrapolating the lines shows that many are a very poor fit -- Nolan Ryan, for instance, looks like he can pitch effectively at least into his 80s.

And if you look at what the regressions are actually doing, it turns out the curves can't have much to do with the effects of steroids at all.

The methodology that created the curves is such that, when you fit a line to a career, it has to be quadratic, which means the line has to be symmetrically U-shaped. (The U can be right-side up, or upside down, but must be symmetrical.) Therefore, there is an implicit assumption built in: that the slope of the player's improvement as he approaches his peak (or, in Clemens' case, the slope of his decline to the trough) has to equal the slope of his decline after the peak (in Clemens' case, the slope of his improvement after the trough).

That is: if a player's peak is (say) 32, the model insists that his numbers at 31 equal his numbers at 33; that his numbers at 25 equal his numbers at 39; and so on. (That's why Nolan Ryan's curve looks like he can pitch forever.)

What this means is that the shape of the curve depends, equally, on both ends of the player's career. What the Wharton curve is doing is not evaluating the player's old-age performance, but, rather, comparing the middle of the pitcher's career *to both ends*.

Now, of the 31 pitchers in the curve, many of them probably had sub-par starts to their careers. Take, for instance, Nolan Ryan. His first five seasons were all above his career average in WHIP. This helps keep his curve concave. If you look at the later part of his career, from 1979 to 1991, he was godlike – but those early years keep his curve from looking like that of Roger Clemens.

For his part, Clemens, started out well: his first five seasons, as a whole, were roughly in line with his career. So he doesn't get that initial downhill momentum that would lift the right-end of his curve in symmetry.

Which brings up another point: Clemens' "right end" is also excellent. Eventually he will age, and it will decline. Even if he now retires, is there any doubt that, if he kept pitching, he would *eventually* decline? Give him a few more years of pitching, and he'll look like other pitchers who were effective from the beginning but faltered with age – and his trajectory will look more like the others.

Most excellent pitchers nonetheless start out simply average, and end with a few mediocre seasons. Clemens started out well, and hasn't hit his decline phase yet. His curve is flatter than the 31 other pitchers because he is the only one who:

(1) Started out pretty well;
(2) Hasn't had many mediocre career-ending years yet;
(3) Happened to have his two worst years right in the middle of his career.

If you believe the Wolfers curve indicates steroids, then you have to believe that the above three points also indicate steroids.

But (1) has nothing to do with steroids, and (3) simply has to do with the timing of the study. So you're left with (2). That has very little value as evidence; and, in any case, it doesn't require the fitting of quadratic curves.

So the Wharton study doesn't really tell us much of anything.

And, when you think about it, how can a career curve tell you much about steroids anyway? If steroids make you better, they'll let you play longer before the inevitable decline. That will stretch out your career trajectory, but not change its basic convex shape.

Labels: , ,


At Tuesday, February 12, 2008 7:47:00 PM, Blogger Cyril Morong said...

I like what you did here.

I would like to see them or anyone else say by exactly how much Clemens deviated from the trend. Did he perform, say, 10% better than expected based on the normal aging pattern? Suppose that deviation is the biggest one. Then tell me who had the second biggest deviation and how much it was. If the next guy deviated 9% and then the next 8%, and so so on, Clemens just happens to be the biggest deviatior. Someone has to come first. Before their were PEDs, there was a biggest deviator.

Now if Clemens deviated by 20%, and then it was 9%, 8%, 7%, and so on. now he really starts to stick out.

It is like saying a guy batted .400. If the next highest average is .395, the .390, then .385, etc., it is not as amazing as a guy who batted .400 when the next highest was .350, the .345, and so on.

At Wednesday, February 13, 2008 12:03:00 PM, Blogger John C said...

Another simple way of looking at the given plot is to notice the tilt of the curve - the tilt is upward indicating the overall trend is an increase in WHIP. Or instead of a quadratic regression use a linear regression. The linear regression clearly depicts an upward slope indicating a trend toward WHIP increasing with age.

At Wednesday, February 13, 2008 4:35:00 PM, Blogger Nate Hebel said...

Very good point about what the enforcement of the quadratic equation means to your assumptions.

What is lacking in the analysis is discussion on whether the concavity of clemens in WHIP is statistically significant. By visual inspection, the data seem to have large error terms. And you're bleeding away degrees of freedom on a data set only 15-20 points deep per player to begin with, which can't help.

But to be fair, the intent of the Wharton study was not to prove steroids, but to note that the use of Nolan Ryan alone as a comparison was not proof of innocence. (Which is actually not a really difficult job...)

At Sunday, February 17, 2008 11:43:00 AM, Anonymous Anonymous said...

Don't know if this has been noted somewhere else, but the wharton study listed criteria for durable pitchers to match against Clemens and said there were 31. [Criteria: at least 3000 career innings and 15 seasons of 10 or more starts since 1968.] I get 33 others, interpreting "since 1968" to mean starting in 1969 and ignoring any seasons before 1969 for pitchers whose career began earlier.
The list:

Carlton, Jenkins, John, Koosman, P.Niekro, J.Niekro, Palmer, Perry, Ryan, Seaver, and Sutton all had careers starting before 1969. Ryan has a labelled curve in their graph, and since it starts at age 22, it seems to exclude his 1966 and 1968 seasons, when he was 19 and 21.

Did they in fact exclude these pre '69 seasons in constructing their aging curves? Obviously the run environment changed drastically between 1968 and 1969, but since they don't seem to have adjusted for other fluctuations in run environments, it might have been better to include only pitchers whose major league careers began in 1969 or later.

At Sunday, February 17, 2008 11:49:00 AM, Anonymous Anonymous said...

It's also important to note that during the pre-age-36 portion of Clemens' career the Great Offensive Explosion of 1993/1994 occurred. WHIP in the AL was 1.37 in 1991-92, but rose to 1.47 in 1994-95. So even if his performance didn't declined at all over those years, it would appear that it had. His move to the NL late in his career would of course have the opposite impact (as would his substantially reduced workloads in the later years).

Failing to control for offensive environment is the kind of obvious mistake that once shocked me to see coming from big name academics, but which I've now come to expect....

At Sunday, February 17, 2008 12:03:00 PM, Blogger Phil Birnbaum said...

Thanks, for the list of pitchers, Joe.

Guy: Absolutely, the authors should have corrected for the 1993-94 jump and for league. I didn't think it would have a big effect, but now that you mention it, yeah, of course, it's quite significant.

Why would reduced workloads tend to reduce Clemens' WHIP? Unless he played only home games, or against easy teams, or something ...

At Sunday, February 17, 2008 12:24:00 PM, Anonymous Anonymous said...

wrt Guy's point, shouldn't lowered workload for a starter like a low dose of the reliever's advantage? Facing a few batters one less time, a little less need to pace oneself? longer recovery between starts and/or less to recover from?

Now that I've looked a bit closer, The wharton authors are using an unusual approach to define age (or the graphical representation has a bug). Clemens was born Aug 62 and so should be 21 or 22 in his first season 1984, depending on how you want to define seasonal age. but his curve looks like it starts at 23. Schilling's curve looks like it starts at 27; he was both in Nov 66 , debuted in 1988 and had his first qualifying season in 1992. So his first qualifying season should occur at age 25 or 26. Randy Johnson's curve also appears to start at age 27, but he was born Sep 1963 and debuted in 1988, with his first qualifying season in 1989, so again his age is a year higher than it should be by the most aggressive seasonal age approach ...

Even if the error is just in the graph labels, the quality here doesn't inspire confidence ...

At Sunday, February 17, 2008 12:26:00 PM, Anonymous Anonymous said...

my own typos are no indication of low quality :-)

At Sunday, February 17, 2008 1:14:00 PM, Anonymous Anonymous said...

My thinking was the same as Joe's: that pitching 6.4 IP/GS (as Clemens did in Houston) should be easier than going 7.4 IP/GS (as Clemens did in Toronto). I suppose I was shooting from the hip a bit, as I'm not sure I can prove the point. But I certainly believe that if Clemens had been forced to obtain 3 more outs per start in his Houston years, it's highly likely his performance would have deteriorated. For one thing, we know that pitchers do worse facing hitters for a 2nd and 3rd time. In any case, 6.4 IP of Clemens' performance is certainly less valuable than 7.4 IP, and that should be taken into account.


Post a Comment

<< Home