Wednesday, May 13, 2009

How many runs are created by good baserunning?

There's a nice paper on baserunning in the latest issue of JQAS, "Using Simulation to Estimate the Impact of Baserunning Ability in Baseball." It's by Ben Baumer, the New York Mets' stats guy.

Baumer set out to quantify baserunning skill, in terms of runs. Specifically, he considered these seven skills:

-- advancing first to third on a single
-- advancing first to home on a double
-- advancing second to home on a single
-- beating out a DP attempt on a ground out
-- stealing second
-- stealing third
-- tagging up on a fly ball when on second or third

He created a (Markov) simulation using 2005-2007 league-average results for each of the seven skills, and proved that his model came close to actual league runs scored.

Then, he substituted actual team lineups, and, for every player, used their actual baserunning percentages for each of the seven situations. There were two probabilities for each situation: the probability of trying for an extra base (for double plays, this is the probability of there being a force play on the runner on first with less than two outs), and the probability of success given that an attempt was made.

For each team, he then ran the same simulation, but using league-average baserunning. The difference is an estimate of how many runs the team's players gained (or lost) with their baserunning.

The top three and bottom three:

+21.1 Mets
+18.0 Yankees
+14.7 Rockies

-12.3 Marlins
-12.7 Red Sox
-20.8 White Sox

Baumer concludes that most teams should be within 25 runs of average baserunning.

But now he wants to figure out, in theory, how many runs a really great baserunning team would gain, and how much a really bad team would lose. He tries a bunch of different selection criteria for "best" and "worst." The results, simplified a bit:

+41.0 runs, -54.6 runs -- high/low attempt rates
+39.4 runs, -35.1 runs -- high/low success rates
+68.4 runs, -42.5 runs -- high/low combination of attempts/successes

The +68 lineup consisted of: Joey Gathright, Willy Taveras, Jose Reyes, Willie Harris, Chone Figgins, Nook Logan, Josh Barfield, Pablo Ozuna, and Juan Pierre. The -54 lineup was Bengie Molina, Mike Piazza, Josh Bard, Bill Mueller, Frank Thomas, Olmedo Saenz, Ryan Garko, Jay Gibbons, and Toby Hall.

As I said, I really like this paper; it asks an interesting and well-defined question and answers it well. Moreover, it's written for readers who know baseball a bit. It does use more mathematical notation than is necessary for sabermetricians, but given that it's an academic paper, and given that the notation is not overdone and clearly explained, I'd have to say that it's very well done.

The one criticism I have is that, as far as I can tell, Baumer used actual raw success rates and didn't regress to the mean at all. That means that while the results wind up accurate in terms of what the actual run contribution was, they are exaggerated estimates of the actual skill of the players involved. If you're thinking about 2010, there's probably no way to estimate, in advance, what any given set of baserunners will do. While the "combination" group added 68.4 runs a season from 2005 to 2007, they'd regress to the mean in 2009-2001 by some amount. What's that amount? We don't really know.

Oh, and one useful point that I'll use in future: for leagues that score 0.531 runs per inning, the variance of runs per inning is 1.125. I've always used 1.000 as an estimate, based on some research I did on the 1988 AL a long time ago, but I think that league scored only .5 runs/inning. Also of note: a simulation that assumes average pitching and an average lineup has a variance a bit smaller: around 1.1 runs instead of 1.125. That's obviously because the pitching doesn't vary in the simulation, only the hitting.

Labels: ,


At Thursday, May 14, 2009 1:13:00 AM, Blogger Phil Birnbaum said...

By the way, when I say "league average," I mean "MLB average." Just force of habit.

At Thursday, May 14, 2009 2:29:00 AM, Blogger Tangotiger said...

I get Ben's numbers... if I exclude basestealing:

WRT to baserunning, see post 25:

MarkovJust on basestealing, they should be +50. And if you really had a team of 9 speedsters, you can bet your a$$ they would steal far more like the go-go 80s.

At Thursday, May 14, 2009 8:46:00 AM, Blogger Unknown said...

Is there a way that someone knows of to read the J. of Quant. Anal. in Sports if my institution does not have a subscription?
They do not offer individual subscriptions on their website, otherwise I'd just get that. Thanks!

At Thursday, May 14, 2009 9:50:00 AM, Blogger Phil Birnbaum said...

Bill: there's a "guest" policy and a "guest login" where you can download without a subscription.

At Thursday, May 14, 2009 12:14:00 PM, Anonymous Guy said...

For those of us who haven't read the article yet: How does Baumer's approach differ from Dan Fox's?

At Thursday, May 14, 2009 12:26:00 PM, Blogger Phil Birnbaum said...

I'd have to read Dan's series again, but I think (and someone can correct me) that Dan classified outs by what kind they were and where they were hit (ground outs to the left side of the infield, say), and used the 24-state matrix to calculate values.

Baumer treated all outs/hits the same, and used a simulation instead of the theoretical values.

I prefer Fox's approach, because it's reproducible and you don't have to check assumptions in someone else's simulation. But verifying a previous finding using a different approach has value too.


Post a Comment

<< Home