How many runs are created by good baserunning?
There's a nice paper on baserunning in the latest issue of JQAS, "Using Simulation to Estimate the Impact of Baserunning Ability in Baseball." It's by Ben Baumer, the New York Mets' stats guy.
Baumer set out to quantify baserunning skill, in terms of runs. Specifically, he considered these seven skills:
-- advancing first to third on a single
-- advancing first to home on a double
-- advancing second to home on a single
-- beating out a DP attempt on a ground out
-- stealing second
-- stealing third
-- tagging up on a fly ball when on second or third
He created a (Markov) simulation using 2005-2007 league-average results for each of the seven skills, and proved that his model came close to actual league runs scored.
Then, he substituted actual team lineups, and, for every player, used their actual baserunning percentages for each of the seven situations. There were two probabilities for each situation: the probability of trying for an extra base (for double plays, this is the probability of there being a force play on the runner on first with less than two outs), and the probability of success given that an attempt was made.
For each team, he then ran the same simulation, but using league-average baserunning. The difference is an estimate of how many runs the team's players gained (or lost) with their baserunning.
The top three and bottom three:
-12.7 Red Sox
-20.8 White Sox
Baumer concludes that most teams should be within 25 runs of average baserunning.
But now he wants to figure out, in theory, how many runs a really great baserunning team would gain, and how much a really bad team would lose. He tries a bunch of different selection criteria for "best" and "worst." The results, simplified a bit:
+41.0 runs, -54.6 runs -- high/low attempt rates
+39.4 runs, -35.1 runs -- high/low success rates
+68.4 runs, -42.5 runs -- high/low combination of attempts/successes
The +68 lineup consisted of: Joey Gathright, Willy Taveras, Jose Reyes, Willie Harris, Chone Figgins, Nook Logan, Josh Barfield, Pablo Ozuna, and Juan Pierre. The -54 lineup was Bengie Molina, Mike Piazza, Josh Bard, Bill Mueller, Frank Thomas, Olmedo Saenz, Ryan Garko, Jay Gibbons, and Toby Hall.
As I said, I really like this paper; it asks an interesting and well-defined question and answers it well. Moreover, it's written for readers who know baseball a bit. It does use more mathematical notation than is necessary for sabermetricians, but given that it's an academic paper, and given that the notation is not overdone and clearly explained, I'd have to say that it's very well done.
The one criticism I have is that, as far as I can tell, Baumer used actual raw success rates and didn't regress to the mean at all. That means that while the results wind up accurate in terms of what the actual run contribution was, they are exaggerated estimates of the actual skill of the players involved. If you're thinking about 2010, there's probably no way to estimate, in advance, what any given set of baserunners will do. While the "combination" group added 68.4 runs a season from 2005 to 2007, they'd regress to the mean in 2009-2001 by some amount. What's that amount? We don't really know.
Oh, and one useful point that I'll use in future: for leagues that score 0.531 runs per inning, the variance of runs per inning is 1.125. I've always used 1.000 as an estimate, based on some research I did on the 1988 AL a long time ago, but I think that league scored only .5 runs/inning. Also of note: a simulation that assumes average pitching and an average lineup has a variance a bit smaller: around 1.1 runs instead of 1.125. That's obviously because the pitching doesn't vary in the simulation, only the hitting.