Wednesday, April 22, 2009

Has Runs Created stopped working?

Does Runs Created not work any more?

The reason I ask is that, if you take a look at the 2008 AL and NL pages on Baseball Reference, you'll see that RC overestimated actual runs for 28 of the 30 teams. The average discrepancy was a huge +58 runs in the NL, and +19 runs in the NL.

To emphasize: that's not the average after removing the signs, that's the average *including* the signs. If half the teams had been +58 and the other half had been –58, the average would have been zero. It wasn't.

So what I'm saying is, Runs Created now appears to be biased too high.

This has been happening since the mid 90s. Here is the average team discrepancy by season:

1985 -4
1986 –1
1987 +2
1988 –5
1989 –5
1990 +4
1991 –7
1992 +7
1993 +0
1994 +8
1995 +7
1996 +7
1997 +19
1998 +15
1999 +19
2000 +19
2001 +18
2002 +19
2003 +19
2004 +27
2005 +25
2006 +27
2007 +24
2008 +26

Now, we know that Runs Created is biased too high for higher run environments, so that might be part of it. But it's not all of it. In the three seasons 1994 to 1996, there were 4.92, 4.84, and 5.03 runs per game respectively. But in 2005 there were only 4.59 runs per game, and in 2008, only 4.65 runs per game.

Could it be that the pattern of offensive events is different? Maybe there are different patterns of offensive events than there used to be (maybe more walks per single, or something?), and Runs Created doesn't work well when that happens?

By the way, I tried Base Runs, using the first version found on page 18 here (.pdf) with X=.535; the results weren't as extreme, but they were similar.

Anyone know what's going on? Is this a well-known problem and I just missed it?

P.S. For the record, I think I'm using the "technical" version of Runs Created found on this Wikipedia page.

Labels: ,

Monday, April 20, 2009

A Diamond Mind simulation as baseball strategy research

A science column from Alan Schwarz a couple of weeks ago investigates the effects of various baseball strategies, using a simulation.

To check out batting orders, Schwarz got Luke Kraemer at Diamond Mind to simulate two sets of 100 seasons of the 2008 Yankees. In one set, A-Rod batted fourth; in the other set, he batted ninth. The difference was 42 runs; the regular Yanks scored 789 runs, while the A-Rod-at-the-bottom-of-the-order Yanks scored only 747.

Schwarz doesn't tell us how he checked intentional walks, but finds that they are a bad strategy, costing five runs per season. That's not a very useful result; there are times when the IBB makes more sense, and times when it makes less sense. Which did Diamond Mind simulate?

Stolen Bases: Diamond Mind took the 2008 Rays and the 2008 A's, and reversed their respective propensities to steal ("switched their mind-sets," is what the article says). The A's dropped by 20 runs, but the Rays *improved* by 47 runs, "suggesting that perhaps the Rays were running too often in real life."

As it turns out, the real Tampa Bay team stole 142 bases and were caught only 50 times, for a 74% success rate; that should put them well in the black, compared to the rule of thumb that you need to be successful 67% of the time to break even. So I'm at a loss to explain the 47 run difference.

The only thing I can think of is a sample size issue. I think the SD of a team's runs scored in a single game is about 3. So the SD of a season's worth of runs is 3 times the square root of 162, or about 38 runs. The SD of the average of 100 season's worth is one-tenth of that, or about 4 runs. The difference between two 100-season averages is the square root of 2 times that, or about 5.4 runs.

But 47 runs is almost 9 standard deviations. So I'm still not sure what's going on.

Finally, the sacrifice bunt. When the simulation forced the bunt-avoiding Red Sox (27 SH in 2008, compared to the league-average 34) to do it more often, they lost 19 runs. But when they got the bunt-loving Mets (73, league average 66) to do it less, the result was also a loss – 15 runs. Schwarz concludes that the Mets' real-life bunting was better than the Red Sox, that they chose to bunt in more favorable situations. But, weren't both these numbers based on the simulation? If so, the real-life situations should make no difference.

If the comparisons, however, *were* based on real life, then we have sample size issues based on the real-life sample, which is only 162 games, with an SD of about 38 runs. Maybe the 2008 Mets and Red Sox scored more or fewer than the simulation because of luck? We should be able to tell by looking at Runs Created – but, for some reason, almost all teams undershot their RC estimate in 2008 (and their Base Runs estimate too, at least for the versions I tried).

Anyway, while I like the simulation method, I wish the results had been presented more clearly. As it stands, I'll stick to "The Book"'s conclusions on these issues of baseball strategy.

P.S. Here's what Tony LaRussa thinks of these results:

“There’s way too much importance given to what you can produce from a machine,” he said. “These are human beings, and I don’t think any computer is going to model that close to what we deal with at this level.”

Hat Tip: Daniel Hamermesh at Freakonomics


Labels: , , ,

Wednesday, April 15, 2009

New issues of "By the Numbers"

Two new issues of "By the Numbers," the SABR baseball research newsletter I edit, are now available for download at my website.

Labels:

Sunday, April 12, 2009

The Utah Jazz have been playing much better when rested

This year, the Utah Jazz are 3-16 (.158) when they've played the night before, compared to 44-17 (.721) when they had the night off.

That's obviously a huge difference, close to 5 standard deviations. In two articles this week, Carl Bialik suggests that playing better in back-to-back games is a characteristic of a team, and may have predictive value in the playoffs:

"Two years ago, the Dallas Mavericks' 15-1 record in second legs of back-to-back games helped them earn the Western conference's top seed. Conversely, the Golden State Warriors were 5-17 without a day off. When the two teams met in the opening round of the playoffs, Golden State showed they were better than their No. 8 seed by sending the Mavericks home."


But: a quick check over at Basketball Reference shows that there might be a simple reason the Jazz haven't played well in those 19 games -- they're predominantly road games. Only three of the 19 games were at home (although Utah did lose all three).

The home-field advantage in basketball is by far the highest of the four major sports (the home team wins 60.8% of games, according to this presentation (.pdf)). If the average team is only .392 on the road, that works out to 7-12 (actually, close to 7-and-a-half wins). The Jazz still undershot, especially if you consider that they're a better-than-average team -- but not by as much as if you thought they should have been .721.

Basketball is also the sport in which the better team wins most often, and you could probably close the gap even more by accounting for the quality of the Jazz's opposition in those 19 games. I haven't looked, though. And, of course, the Jazz were cherry-picked for the article because of their extreme split -- Bialik says it's the largest in the past five seasons. If you assume the Jazz "should have" been, say, 9-11, then 3-16 doesn't seem that weird for being the worst outlier.

---

As for the Mavericks in 2006-07 ... they also played most of their second of back-to-back games on the road -- 10 out of 14 (I must have missed a couple from the game logs). Again, you'd have to check the quality of opposition to see if they played particularly weak opponents.

By the way, Bialik says that over the past five seasons, teams win 44% of their played-the-night-before games. That seems unremarkable, considering that it looks like those are predominantly road games.

Finally, I almost forgot about this study on how NBA teams play when rested.


Labels: ,

Friday, April 03, 2009

J.C. Bradbury on aging in baseball

J.C. Bradbury is on vacation from blogging, but is still posting occasionally. This week, he wrote that his article on baseball aging patterns has been published. Here's the link to the published version (gated), and here's a link to a freely-available version from last August.

Here's what JC did. He took every player with at least 5000 PA (4000 batters faced for pitchers) who debuted in 1921 or later. Then, for those players, he considered every season in which they had at least 300 PA (or 200 batters faced). That left a total of 4,627 player-seasons for hitters, and 4,145 for pitchers.

Then, for each season, he ran a regression for various measures of performance, such as linear weights batting runs. The regression predicted a single season number (actually, a Z-score), based on:

-- the player's career average
-- the player's age that year, and age-squared (that is, quadratic on age)
-- a dummy variable for the league-season
-- a "player-specific error term".

Numbers are park adjusted.

After running the regression, Bradbury calculates the implied "peak age" for each metric:

29.41 linear weights
29.13 OPS
30.04 OBP
28.58 SLG
28.35 AVG
32.30 BB
28.26 DPT (doubles plus triples rate)
29.89 HR
29.16 ERA
29.05 RA
23.56 Strikeouts (for pitchers)
32.47 Walks (allowed)
27.39 Home Runs (allowed)

For most of the hitting categories, the peak age is above the conventional wisdom of 27 – most are around 29. After quoting various studies that have found younger peaks, Bradbury writes,

"The results indicate that both hitters and pitchers peak around 29. This is older than some estimates of peak performance ..."


Bradbury also notes that the results are consistent with the idea that the more raw athleticism is required, the earlier the skill peaks; strikeouts, for instance, which require raw arm speed peak the earliest, and walks, which are largely mental, peak the latest:

"Consistent with studies of ageing in specific athletic skills, baseball players peak earlier (later) in abilities that require more (less) physical stress."


I agree with Bradbury on this last point, but I don't think his actual age estimates can be relied upon. Specifically, I think peak ages are really closer to 27 than to 29.

One reason for this is that the model specifically requires the curve to be a quadratic – that is, symmetrical before and after the peak. But are careers really symmetrical? Suppose they are not – suppose the average player rises sharply when he's young, then falls gradually until old age. The curve, then, would be skewed, with a longer tail to the right.

Now, suppose you try to fit a symmetrical curve to a skewed curve, as closely as you can. If you pull out a sheet of paper and try it, you'll see that the peak of the symmetrical curve will wind up to the right of the actual curve. The approximation peaks later than the actual, which is exactly what JC found.

I have no proof that the actual aging curve is asymmetrical in this exact way, but players career's are not as regular as the orbits of asteroids. There's no particular reason that you'd expect players to fall at exactly the same rate as they rise, especially when you factor in playing time and injuries. The quadratic is a reasonable approximation, but that's all it is.

Another reason is selective sampling. By choosing only players with long careers, Bradbury left out any player who flames out early. And so, his sample is overpopulated with players who aged particularly gracefully. That would tend to overestimate the age at which players peak.

(He limited his data to players between 24 and 35, which he says is done to minimize selection bias, but I'm not sure how that would help.)

There is perhaps some evidence that there's a real effect. JC ran the same regression again, but this time including only players with Hall of Fame careers. For hitters, the peak age dropped by almost an entire year, from 29.41 to 28.51. That might makes sense; HOFers are the best players ever, and were more likely to have had long careers even if they aged less gracefully. That is, they'd still be good enough to stay in the league after a substantial drop, and would be much more likely to hit the 5000 PA cutoff even if they peaked early and dropped sharply.

(In fairness, you could argue that HOFers were less likely to be injured, and therefore more likely to peak later. But I think the "good enough to stay in the league" effect is larger than that, although I have no proof. Also, the HOF pitchers' peak age dropped only 0.08 years from the non-HOFers, so the effect I cite seems to hold only for hitters.)

Finally, there's selective sampling on individual seasons. A player who falls sharply and suddenly won't get enough playing time to qualify for Bradbury's study that year. And so, a plot of his career will be gentler at the right side. He'd be nearly vertical between his next-to-last season and his last season. But, since Bradbury doesn't consider his last season, the study won't see that vertical drop, and the quadratic will be gentler, with its peak to the right of where it would be otherwise.

Try this yourself: draw an aging curve that peaks, drops a bit, then falls off vertically. Draw the best fit symmetrical curve on it.

Now, draw the same again curve, but, instead of the vertical line, have it just end before the vertical line starts. Draw the best-fit symmetrical curve on this second one. You'll see it peaks later than when the vertical line was there.

(Again, in fairness: Bradbury ran a version of his study in which there was no season minimum for plate appearances or batters faced – just the career minimums -- and the results were similar. I've explained why I think, in theory, the minimums should skew the results, but I have to admit that, in real life, they didn't. There are perhaps some other reasons it didn't happen – perhaps a lot of the effect comes from the "vertical" players released in spring training, so they didn't make the study at all – but still, the results do seem to contradict this third theory of mine.)

So you've got three ways in which the study may have made assumptions or simplifications that forced the peak age to be higher than it should be:

-- assuming symmetry;
-- selective sampling of long careers;
-- selective sampling of seasons.

In that light, my conclusions would be that Bradbury's methodology might yield a reasonable approximation, but not much more than that. I think the study can correctly identify the basic trend, and is probably correct within a couple of years, but I wouldn’t bet on it being any closer than that.




Labels: ,