Sabermetric Research: A Diamond Mind simulation as baseball strategy research

A science column from Alan Schwarz a couple of weeks ago investigates the effects of various baseball strategies, using a simulation.

To check out batting orders, Schwarz got Luke Kraemer at Diamond Mind to simulate two sets of 100 seasons of the 2008 Yankees. In one set, A-Rod batted fourth; in the other set, he batted ninth. The difference was 42 runs; the regular Yanks scored 789 runs, while the A-Rod-at-the-bottom-of-the-order Yanks scored only 747.

Schwarz doesn't tell us how he checked intentional walks, but finds that they are a bad strategy, costing five runs per season. That's not a very useful result; there are times when the IBB makes more sense, and times when it makes less sense. Which did Diamond Mind simulate?

Stolen Bases: Diamond Mind took the 2008 Rays and the 2008 A's, and reversed their respective propensities to steal ("switched their mind-sets," is what the article says). The A's dropped by 20 runs, but the Rays *improved* by 47 runs, "suggesting that perhaps the Rays were running too often in real life."

As it turns out, the real Tampa Bay team stole 142 bases and were caught only 50 times, for a 74% success rate; that should put them well in the black, compared to the rule of thumb that you need to be successful 67% of the time to break even. So I'm at a loss to explain the 47 run difference.

The only thing I can think of is a sample size issue. I think the SD of a team's runs scored in a single game is about 3. So the SD of a season's worth of runs is 3 times the square root of 162, or about 38 runs. The SD of the average of 100 season's worth is one-tenth of that, or about 4 runs. The difference between two 100-season averages is the square root of 2 times that, or about 5.4 runs.

But 47 runs is almost 9 standard deviations. So I'm still not sure what's going on.

Finally, the sacrifice bunt. When the simulation forced the bunt-avoiding Red Sox (27 SH in 2008, compared to the league-average 34) to do it more often, they lost 19 runs. But when they got the bunt-loving Mets (73, league average 66) to do it less, the result was also a loss – 15 runs. Schwarz concludes that the Mets' real-life bunting was better than the Red Sox, that they chose to bunt in more favorable situations. But, weren't both these numbers based on the simulation? If so, the real-life situations should make no difference.

If the comparisons, however, *were* based on real life, then we have sample size issues based on the real-life sample, which is only 162 games, with an SD of about 38 runs. Maybe the 2008 Mets and Red Sox scored more or fewer than the simulation because of luck? We should be able to tell by looking at Runs Created – but, for some reason, almost all teams undershot their RC estimate in 2008 (and their Base Runs estimate too, at least for the versions I tried).

Anyway, while I like the simulation method, I wish the results had been presented more clearly. As it stands, I'll stick to "The Book"'s conclusions on these issues of baseball strategy.

P.S. Here's what Tony LaRussa thinks of these results:

“There’s way too much importance given to what you can produce from a machine,” he said. “These are human beings, and I don’t think any computer is going to model that close to what we deal with at this level.”

Hat Tip: Daniel Hamermesh at Freakonomics

Labels: Alan Schwarz, baseball, runs created, simulation