Wednesday, November 08, 2006

"Who controls the plate?" A review by Charlie Pavitt

Benjamin Alamar, Jeff Ma, Gabriel M. Desjardins, and Lucas Ruprecht, Who Controls the Plate? Isolating the Pitcher/Batter Subgame, Journal of Quantitative Analysis in Sports, Volume 2 Issue 3, 2006, Article 4.

Benjamin Alamar is the editor of the on-line
JQAS, and, based on the evidence provided in this article, undoubtedly a very competent researcher. Having said this, this effort is a textbook case of what happens when very competent researchers with little understanding of advances on statistical baseball research attempt to contribute to our knowledge. The authors make two fundamental errors: first, they ignore past research on the topic, and second, they use ill-chosen performance measures in their estimations.

The issue at stake here is, quoting the abstract, “to determine the percentage of the outcome of an at bat that is controlled by a pitcher and the percentage that is controlled by the batter.” They seem to start off well enough. First, using play-by-play data for 2001 to 2003 obtained from STATS, they estimated expected run value for all base-out situations and then determined through multiple regression which game situation variables had an impact on scoring; the significant variables were out, base, league, ninth inning, extra inning, batter lineup position, and park effect. The use of the regression coefficient for each of these variables allowed for a more precise final determination of expected run values. Second, to increase accuracy for expected run values on plays ending with batted balls other than home runs, they combined this information with hit location data, specifically the odds of getting a hit based on where the ball was batted. Third, the results of all this were regressed on pitcher and batter indices to see which accounted for more variance. It is in their choice of these indices where the study goes to pieces. For batters, they choose strikeouts per plate appearance and home runs per plate appearance. These are useful measures, but seem incomplete without some further ways of distinguishing batters (both Babe Ruth and Dave Kingman struck out a lot and hit a lot of homers, but to say the least they had a few differences). For pitchers, they chose strikeouts per home run allowed and outs per bases allowed. The latter is relevant but biased by team defense; the former makes absolutely no sense to me at all. At no point do the authors provide rationale for these choices. Further, the whole exercise ignores bases on balls as a predictor. Anyway, the authors conclude that batters are responsible for 62% of expected run value and pitchers are responsible for the remaining 38%.

Time to editorialize. First, statistical baseball research as a discipline has a lot to learn from other sciences about the cumulative nature of knowledge. Far too many studies are performed in a historical vacuum, and as a consequence even well done efforts are too often wasted in constantly reinventing the wheel. Researchers are either ignorant of or unwilling to cite previous relevant work (case in point; as Phil Birnbaum noted in his review in a recent "By t
he Numbers", most of the research in the Baseball Prospectus folks’ Baseball Between the Numbers is well-done wheel reinvention). Alamar et al. are academics who should no better, but they have cited no past efforts to compare the impact of pitching and hitting on team performance. I am aware of the two studies attempting what Alamar et al. tried found in the academic literature.

In An Empirical Estimation of a Production Function: The Case of Major league Baseball, American Economist, Volume 25 Number 2, Fall 1981, pages 19-23, Charles E. Zech used the following indices to measure player performance: batting average and home runs to represent batting, stolen bases to stand in for speed, total fielding chances as an indicator of fielding, strikeout to walk ratio for pitching, and career manager won-loss record and years of experience to measure managing. Zech determined that batting accounted for 6 times more variance in team won-loss than pitching; fielding and managing had no significant impact.

Also, there's An Acturial Analysis of the Production Function of Major League Baseball, Journal of Sport Behavior, Volume 11 Number 2, 1988, pages 99-112. Michael D. Akers and Thomas E. Buttross purposely patterned their study after Zech’s, although they replaced total fielding chances with fielding average. They discovered that both hitting, as measured by batting average alone, and managing, as measured by manager’s career won-loss record only, were better predictors than pitching, as measured by strikeout to walk ratio. Again, managing and fielding impacts were small.

Given the similar findings of hitting being more important than pitching, Alamar et al.’s research might have some value as a replication of these past efforts with measurement indices and data sets, increasing our confidence in the validity of the conclusion. But this brings us to the second problem; choice of indices to represent batting and pitching. At the time of Zech’s work, all that researchers had to work with were the standard measures of pitching, batting, and fielding that predated sabermetric work. Even by the time of Akers/Buttross, all that was available other than the standard measures was a couple of years of raw Project Scoresheet data. But even then, we knew that on-base average was a better measure than batting average for getting on base and slugging average better than home runs alone for power, and that we knew that we could do a better job measuring fielding with some measure of plays made than with fielding average. Strikeout to walk ratio is actually a pretty good index for representing pitching, but now we know we can do better by adding home runs allowed to the mix. The point is, Alamar et al. are responsible for knowing this. Their choice of measurement indices is indefensibly ignorant.

In conclusion, Alamar et al. is a lot of sound and fury signifying nothing.

-- Charlie Pavitt

Charlie Pavitt writes reviews of sabermetric studies for "By the Numbers." (Click here, scroll down for current and back issues.) He also maintains a sabermetric bibliography.


At Wednesday, November 08, 2006 10:21:00 AM, Blogger Phil Birnbaum said...

Did my review really imply that most of the research in "Baseball Between the Numbers" is reinventing the wheel? If so, I didn't really mean that ... some of it is stuff we already knew, like the chapter on RBIs, but a lot of it is new and interesting.

At Wednesday, November 08, 2006 4:17:00 PM, Blogger Tangotiger said...

I reviewed the first paper Charles cited here:

The author also came by to chat.

At Friday, November 10, 2006 10:46:00 PM, Blogger Phil Birnbaum said...

Tangotiger's link URL is getting cut off in my browser. Here's a direct link.


Post a Comment

<< Home