Thursday, July 31, 2008

Batters improve when young -- but it looks like pitchers don't

In preparation for my upcoming presentation on aging patterns in baseball, I ran a little study. I found all players of a specific age – 25, say – and compared their performance at 25 to their performance at 26. I took all the arithmetic differences and averaged them (weighted by the smaller number of batting outs in each of the two seasons). The study covered 1948-49 to 2006-07.

Here are the results for hitters, by age. The numbers are the change in "runs created per 27 outs," adjusted for league offense. The large number to the right is the number of batting outs in the group. I've left out the extremes that had less than 200 batting outs.

18-19 ... +0.75 ... 1079
19-20 ... +0.81 ... 6303
20-21 ... +0.31 ... 24915
21-22 ... +0.14 ... 73503
22-23 ... +0.28 ... 144312
23-24 ... +0.09 ... 238990
24-25 ... +0.13 ... 332481
25-26 ... +0.05 ... 408934
26-27 ... –0.12 ... 439207
27-28 ... –0.08 ... 432988
28-29 ... –0.09 ... 408142
29-30 ... –0.20 ... 377018
30-31 ... –0.12 ... 330300
31-32 ... –0.25 ... 276881
32-33 ... –0.24 ... 226988
33-34 ... –0.25 ... 176198
34-35 ... –0.40 ... 137175
35-36 ... –0.32 ... 95257
36-37 ... –0.33 ... 66625
37-38 ... –0.60 ... 40682
38-39 ... –0.26 ... 27436
39-40 ... –0.49 ... 15966
40-41 ... –0.73 ... 9027
41-42 ... –0.41 ... 4848
42-43 ... –0.89 ... 1739
43-44 ... –0.24 ... 917
44-45 ... –0.94 ... 412
45-46 ... –0.48 ... 251

The results are just a little bit different from conventional wisdom ... it's normally accepted that the peak of performance is at age 27, but this study seems to show it's 26 (but 27 isn't actually much different).

Other than that, it's exactly as you'd expect – decelerating improvement up to a certain age, near flatness for a few years, and accelerated decline after that. I should probably draw this as a graph, and as a chart of cumulative performance, but I'll leave it like this for now.

However, for pitching, the results are not so neat. Here's the chart – the numbers are component ERA, and outs pitched (thirds of innings):

18-19 ... +0.03 ... 1441
19-20 ... -0.07 ... 9272
20-21 ... +0.11 ... 32909
21-22 ... +0.13 ... 83914
22-23 ... -0.01 ... 162500
23-24 ... +0.15 ... 256394
24-25 ... +0.05 ... 339126
25-26 ... +0.08 ... 390041
26-27 ... +0.20 ... 396202
27-28 ... +0.16 ... 391072
28-29 ... +0.18 ... 351132
29-30 ... +0.18 ... 323144
30-31 ... +0.20 ... 274521
31-32 ... +0.26 ... 233170
32-33 ... +0.13 ... 193837
33-34 ... +0.24 ... 156676
34-35 ... +0.23 ... 122400
35-36 ... +0.22 ... 92301
36-37 ... +0.34 ... 66984
37-38 ... +0.13 ... 45787
38-39 ... +0.16 ... 34301
39-40 ... +0.19 ... 25753
40-41 ... +0.48 ... 17021
41-42 ... +0.08 ... 11922
42-43 ... +0.24 ... 5906
43-44 ... +0.61 ... 4369
44-45 ... +0.30 ... 2790
45-46 ... +0.58 ... 2002
46-47 ... +0.97 ... 865
47-48 ... +0.53 ... 385

Only in two cases – 19-20 and 22-23 – do pitchers, as a group, actually improve. At all other ages, pitchers get worse from one year to the next.

However, the younger pitchers do seem to decline less than the older pitchers, as you'd expect. If you were to subtract 0.16 from all the Component ERAs in the table, every age up to 25-26 would be an improvement, and 19 of the 22 after that would be declines.

Still, I don’t understand why pitching should decline almost every year. A few possibilities:

1. I made a mistake in the calculations.

2. Unlike batters, pitchers at any age run the risk of a complete loss of effectiveness. And so the small decline at age 20, for instance, is a combination of 90% of pitchers improving by 0.20 runs per game, and 10% of pitchers declining by more than 2.00 runs per game.

3. There is asymmetry in the measurements. In hitting, a bad decline might be from 5 runs per game to 3. In pitching, a bad decline might be from 4 runs per game to 8. So it could be that the aging pattern is the same, but a bad pitching season could be a very large number, which skews the results.

4. It could be that because pitching is close to a one-dimensional physical skill, young pitchers are, in one sense, at their peak when young, and their entire career is a decline. This is somewhat supported by the fact that there are more young pitchers than young hitters, at least as measured by outs.

What do you think? I'm at a loss to explain what's going on.

Labels: ,

Tuesday, July 29, 2008

Wanted: pointers to baseball aging studies

Next week, at the American Statistical Association convention in Denver, there will be a sabermetrics-related session. It's called "What Can Statistical Methods Tell Us About Steroid Use and Its Effects Among Major League Baseball Players?", and it's partly in response to some of the analyses of Roger Clemens' career that have recently got some press.

I have been invited as a panelist, and will be giving a short presentation on what sabermetrics knows about aging in baseball. Most of what I know comes from Bill James' long essay at the back of the 1982 Abstract. I'm also aware of some of
J.C. Bradbury's work over at his Sabernomics blog, and the recent JQAS article by Ray C. Fair.

Does anyone know of other studies? Any help would be appreciated.

P.S. In another session, there will be three papers presented on home-field advantage ... I'm not involved with those, but I'm looking forward to attending.

Labels: ,

Friday, July 25, 2008

Golf numbers: comparing amateurs to pros

Last Monday, the New York Times described some interesting findings in golf sabermetrics.

The analysis, by golf researcher Mark Broadie (who does research for the PGA), describes and quantifies some of the differences between amateur golfers and the pros. So if you're looking to compare Tiger Woods to Phil Mickelson, Broadie's work probably won't help you quite yet. But it's interesting nonetheless.

Broadie had the PGA database with which to analyze pro scores. For amateur scores, he got players at a local course to log all their shots. He wound up with 43,000 amateur strokes, which, I suppose, is about 400-500 rounds.

He then analyzed the entire database to break down some of the score differences. For instance, at what distance is there a 50% chance of sinking a putt? The professionals break even on 8-foot putts, but, as you would imagine, it's shorter for amateurs. The breakdown by handicap:

8 feet: Pros
6 feet: Amateurs with handicap of 0-9 [I'll call these "A" amateurs]
5 feet: Amateurs with handicap of 10-19 ["B"]
4 feet: Amateurs with handicap of 20-36 ["C"].

In yardage off the tee:

279 yards: Pros
248 yards: Group A amateurs
237 yards: Group B amateurs
216 yards: Group C amateurs

And, a very interesting statistic: when hitting from 100 to 150 yards away from the hole, what percentage of that distance is left after the stroke?

5.6%: Pros
8.7%: Group A amateurs
12.0%: Group B amateurs
17.3%: Group C amateurs

And so Broadie suggests a way to see if your short game needs more help than your long game: figure out your own percentage, and compare it to which group you fit into. For instance, suppose your handicap is 3, which puts you in Group A. If your percentage is higher than 8.7, it means your short game is worse than that of your peers (which means your long game is better).

There are more tidbits: check out the entire article.

HT: Bob Timmermann


Sunday, July 20, 2008

Clutch hitting: why do some studies find it, and some not?

A couple of weeks ago, I listed a bunch of clutch hitting studies that were unable to find any evidence of clutch hitting. They found that some players did indeed hit better in the clutch, but on a scale almost exactly that you'd expect by chance.

However, there were a few other studies that *did* find some clutch talent (albeit in very small quantities). Andy Dolphin found some here, and Tom Tango found some here.

Why the difference? I'm not sure; Dolphin and Tango didn't publish the details of their studies. But one difference is that the "could find any" studies were based on batting average and OPS, while Dolphin's study was based on on-base percentage.

So maybe the clutch effect is present in walks only? Walks are a big part of OBP, but a smaller part of OPS, and no part of BA at all. Maybe pitchers do more "pitching around" in clutch situations, and some players are better at laying off those pitches than others?

I don't know if that's the case, but it's one possibility. We'd probably be able to answer the question with more authority if one of these authors were to repeat exactly the same study, but using the metric used by one of the other authors.

Anyone have any other suggestions for why we might have these different results?

Labels: ,

Tuesday, July 15, 2008

Slate understands sabermetrics

Great, great article on Slate explaining the sabermetric view of Derek Jeter's defense. Tango is quoted, as is his amazing Hardball Times study on Jeter's fielding (which I think can be found in the printed book only).

I'm not sure why there are so few mainstream articles discussing sabermetric findings ... my theory is that old-school sportswriters aren't interested in objective knowledge, so these articles have to come from outside. And there aren't that many "outside" writers about sports.

Labels: , ,

Friday, July 11, 2008

A new 2007-08 streakiness study

If you're sick of clutch hitting studies, you're probably *really* sick of streakiness studies. But here's a new one from "Eric Karros" of "Sons of Steve Garvey."

He found that after winning one or two consecutive games, teams were slightly more likely to win the next. But after winning three, four, or five consecutive games, they won the subsequent game far less often than you'd expect from their record.

Bottom line: like many other previous studies, no evidence of "momentum" or a "hot hand."

Some of this might be the pitching rotation: teams that win four in a row are likely to have done it with their top four starters, and their fifth starter is a worse pitcher, so more likely to lose. But that doesn't explain the five-in-a-row effect.

Hat tip: Carl Bialik

Labels: , ,

Sunday, July 06, 2008

Do teams play worse after a time zone change? Part IV

I realized I made a mistake in my last two posts on the time-zone study … I had three of the teams in the wrong time zone. The updated results still aren't signficant, but I'm going to go back and correct the numbers in the previous two posts.

I found this out because Dr. Winter, the study's author, was kind enough to send his own results for comparison. These are the home team's record with various amounts of "circadian advantage". (The circadian advantage occurs when one team has less time-zone lag than another, where the lag is the number of time zones crossed, minus the number of days since the crossing. So if you flew from Seattle to Chicago two days ago, you have a disadvantage of 1 – you crossed 2 time zones, but had 1 day to recover.)

Here are Dr. Winter's numbers:

Home record/3-hour circadian advantage: 77-48 (.616)
Home record/2-hour circadian advantage: 487-426 (.533)
Home record/1-hour circadian advantage: 1438-1204 (.544)
Home record/0 circadian advantage: 10207-8872 (0.535)
Home record/1-hour circadian disadvantage: 577-465 (.554)
Home record/2-hour circadian disadvantage: 152-133 (.533)
Home record/3-hour circadian disadvantage: 15-20 (.429)

The only sign that there's some effect is in the 3-day case. Home teams that just got back from the other coast went only .429; when the *visiting* team just arrived from the other coast, the home team beat them up to the tune of .616. However, neither of those two results is statistically significant – both are between 1 and 2 standard deviations from the mean.

My numbers are still a bit different from Dr. Winter's, but not much different – the conclusions are the same. And Dr. Winter did say that these numbers were adjusted since the original study and analysis is continuing.

For the record, here are the time zones I used (Retrosheet abbreviations):

3 hours from east: LAA, OAK, SFN, LAN, SDN, SEA, ARI, ANA
2 hours: COL
0 hours: all others

I'm now going to go back and update the other two posts with the correct numbers.

Labels: , ,

Wednesday, July 02, 2008

Clutch hitting: a new study from Pete Palmer and Dick Cramer

Of the many excellent presentations at last weekend's SABR convention in Cleveland, one of my favorites was the study by Pete Palmer and Dick Cramer, on clutch hitting. I have to admit that the subject has been done to death (notably by Palmer and Cramer themselves). And there are probably a lot of people like Chris Jaffe, who is "sooooooo very tired of clutch hitting studies."

So this study could be accused of beating a dead horse – other studies, I think, have already convincingly shown that clutch talent doesn't exist – but, on the other hand, on a controversial issue like clutch, you can never have too much evidence.

More important, the highly-regarded "The Book" (along with a previous study by author Andy Dolphin) does believe there is some evidence for clutch. So the debate isn't completely settled.

That's why I think this study does add valuable evidence to the pile.

Anyway, many thanks to Pete and Dick, who have allowed me to post their
presentation slides, and two writeups of their findings.


Let me start with a recap of my three favorite classic clutch studies, before getting to the new one.

(I will also point out that "clutch hitter" doesn't mean a player who hits well in the clutch – it means a hitter who performs *better* in clutch situations than normal, relative to the rest of the league.)

Dick Cramer, 1977

First, there was Dick Cramer's groundbreaking
study from 1977. Dick looked at all players in the 1969 and 1970 seasons. He figured the amount by which they increased their team's win probabilities over the season, and compared that to what you'd expect a raw measure of run performance from their batting line. The difference was their observed clutchness; a clutch player would have created more wins from his raw batting statistics, because his hits would have come when they were more important.

Comparing 1969 clutchness to 1970 clutchness, Dick found an r-squared of .038 for National League players, and .055 for American League players. Dick's conclusion was that, which such a small correlation, clutch hitting was not shown to exist.

As I write this, it now occurs to me that these aren't actually that small – the r is +/- 0.2 in both cases. The study doesn't actually say if the correlation was positive or negative (Dick, if you're reading this, which was it?). Of course, if it had been a negative correlation, that would be stronger evidence.

It's this study, I think, that Bill James criticizes in his famous "Underestimating the Fog" essay (
.pdf). Bill argues that Dick didn't actually prove clutch hitting doesn't exist. That's probably true, but it's pretty good evidence that, if it does exist, it's weak. Assuming the correlation is positive, it means that even a player who hit 100 points better in the clutch in 1969 would be expected to hit only 20 points better in the clutch in 1970.

Pete Palmer, 1990

In the March 1990 issue of By the Numbers (
.pdf, see page 6), Pete Palmer tackled the question a different way. He noted that even if there were no such thing as clutch talent, some players would *appear* to be clutch just because of dumb luck. He then figured what the distribution should be if it were all just luck, and compared it to the actual distribution.

If the two were the same, that would be evidence that clutch hitting is nothing more than random chance. If the two were different, that would show that clutch talent actually exists, over and above the random effect.

Consider the analogy of coin flips. A fair coin would land eight consecutive heads 1 time in 256. But if 10% of coins were "clutch," with a .600 heads average, you would see eight consecutive heads about 5 times in 256 – five times as many!

So clutch hitting talent would certainly show itself if it existed in any significant quantity.

But when Pete looked at the distribution of how player's hit in the clutch, he found it was perfectly consistent with a normal distribution. For instance, out of 330 random numbers from a normal distribution, you'd expect about one of those to be more than 3 SD above or below the mean. In real life, there was indeed exactly one – Tim Raines (.352 clutch, .296 non-clutch).

If clutch hitting were indeed a real skill, there would be a lot more than just one player 3 SD from the mean.

Because Pete found no "extra" extreme results than what would be expected by chance, his conclusion was that clutch hitting didn't appear to exist.

Tom Ruane, 2005

In this exhaustive review of a few decades worth of Retrosheet data, Tom Ruane looks at all players' clutch hitting stats, runs a random simulation as if they were all non-clutch hitters, and finds the distributions match almost exactly. (The relevant section can be found by going to
the study and searching for "Is The Data Random?")

His analysis is very similar to Pete's 1990 work, but with a much larger database.

Cramer and Palmer, 2008

Finally, we come to Cramer and Palmer's new study. It's a bit of a cross between Dick's 1997 study and Tom's study – it looks at 50 years' worth of Retrosheet data, but uses the "win probabilities" method.

And there are several sub-studies within it.

The first study calculated clutch performance for each of ten levels of leverage – highest clutch, with the game most on the line, all the way down to lowest clutch, with little chance of changing the outcome (like in a 15-0 ninth inning). Then, it calculated performance for 10 different random subsamples of the games (based on the date).

Comparing the two distributions, it turned out that the distribution of "clutchiness" was almost exactly the same as the distribution of "datiness". Since datiness is random, this suggests that clutchiness is no less random.

The other substudies were:

-- looking at only the 10% highest-leverage situations, there were almost exactly as many players 2 SD and 3 SD away from the mean as if clutch were random;

-- looking at clutch performance for the 897 players with at least 3000 PA in the last 50 years, the SD was about 3 runs of clutch per 500 PA. A random simulation gave 2.5 runs. Pete and Dick write that "it may be that real life variation could be a little different from the simulated value, but the two are pretty close." My take is that the 0.5 runs is fairly significant – it means the SD of clutch would be about 1.66 runs (the square root of (3 squared minus 2.5 squared)). Still, that means the top 2.5% of players would only be about 3 runs better than average.

-- rerunning Dick's year-to-year correlation experiment gave an r of .002, which is very, very close to zero, both theoretically and practically.

-- finally, for rookies first entering the league, there was no improvement from their first at-bat (when they would presumably be very nervous) to their 100th at-bat (when they should be less nervous. While this doesn't speak to the clutch issue directly, it does serve as more evidence that players' performance doesn't seem to be affected by their personal stress level.


There are lots of other studies on clutch hitting that I haven't mentioned here;
Cy Morong keeps an updated list of them. As I mentioned, Andy Dolphin did find evidence of significant clutch talent. "The Book," which Dolphin co-authored, found evidence of clutch talent with an SD equivalent to about 8 points of OBP. Those are the only studies I remember seeing that actually found something non-zero.

I'd be interested in seeing what Dolphin (and co-authors Tom Tango and Mitchel Lichtman) think about Pete and Dick's recent work.


While I'm on the subject of clutch, when Bill's "Understimating the Fog" came out a couple of years ago, I responded with a study of my own. Bill disagreed with what I did, and we had a bit of a discussion on the SABR message board. I'm linking to it here, because I don't think it's online anywhere else.

-- Here's Bill's original "Underestimating the Fog" essay (
-- Here's my response in "By the Numbers" (
pdf, see page 7).
-- Here's Bill's response to me, called "Mapping the Fog" (
-- And here's my response to Bill's response (

Labels: , , ,