Sabermetric Research: "The Cheater's Guide to Baseball," and Bonds/steroids evidence?

A reader of this blog sent me a copy of Derek Zumsteg's "The Cheater's Guide to Baseball," on condition that I write about it. So here you go. And I'm happy to report that it's a fine book.

Zumsteg takes us through many different kinds of cheating, even kinds that aren't cheating at all (like heckling and the hidden-ball trick). We get a good explanation of the 1919 Black Sox, and I learned about some groundskeeper shenanigans that I'd never heard of before. For instance, Cleveland's Emil Bossard would water down the infield when a ground-ball pitcher was scheduled to start for the Indians, so that opposition balls would die in the soggy ground.

The book is written in a friendly, easy style. Zumsteg usually sticks to the issue at hand, but, just when you least expect it, he'll throw in some pretty good sarcasm, and then get serious again. It works very well, not just because he's genuinely funny, but because you is sarcasm is so well-directed, you feel you can trust him on the other stuff.

Zumsteg gives himself freer rein in the little boxes that dot the book. He mocks Pete Rose's evasions with an imaginary conversation after you've just videotaped Pete hitting your car:

"You hit my car."
"No, I didn't."
"Yes, you did. I was right here. You slammed right into it."
"I don't know what you're talking about."
"I have video footage of your car ramming mine."
"I wasn't driving."
"You're clearly visible in the tape."
"Maybe that tape's from some other car I hit, I don't know."
"You can see today's newspaper on the dash of my car."
"Maybe it was some guy who looks like me."
"When you got out of your car, your wallet flopped open to your driver's license."
"I don't know why you're persecuting me like this. I don't deserve this kind of treatment."
"You hit my car!"

Readers who didn't understand the Pete Rose situation before reading this, should now have a pretty vivid idea of Rose's attempts at spin. Especially since most of us have had this kind of argument with some idiot or other.

As far as sabermetrics is concerned, there's one major study (which I'll talk about in a bit) and a bunch of claims that would be interesting to follow up on. For instance, Ichiro Suzuki is very particular about the moisture content of his bats. An academic study found that a dry bat could have a one percent increase in performance over a bat with a higher moisture content. I don't know what that means, but, if it means that a batted ball would go 1% farther, that's a huge advantage for Ichiro, or anyone.

Also notable:

-- Jamie Moyer is credited with being able to work an umpire, by arguing "as if he's giving the umpire a shoulder massage ... he pumps the umpire for information on balls ("Too low?" Ump nods. "Okay.") as he works out what can and can't get called a strike that day."

-- The Twins experimented with the Metrodome ventilation system, trying to get air currents running out when the Twins were at-bat, and running towards the batter when the opposition was at bat.

-- According to Baseball Prospectus, umpire Chuck Meriwether "was about two and a half times more likely than Mike Everitt" to call a base stealer out. Also, according to Zumsteg, "Derek Jeter is a master of the graceful-looking fake tag."

There are also several occasions where one character or another estimates the benefit of a particular form of cheating:

-- Lou Boudreau said "I wouldn't be surprised if [Bossard's groundskeeping] helped us win as many as ten games a year."

-- Earl Weaver argues that groundskeeping increased the Orioles' batting average by "more than 30 points."

-- Zumsteg himself argues that exceptional grounds crews "might be worth a few games a year."

-- And George Steinbrenner argued that Earl Weaver gained "eight or ten games a year" through intimidation of umpires.

Obviously, these are huge overestimates, and probably could be shown to be such. Were the Indians 10 games better at home than teams with less creative grounds crews? Were the Orioles actually 30 points better at home than on the road, after accounting for normal home field advantage? I haven't checked, but I'm pretty sure the answers would be no.

The book's only major sabermetric study comes near the end of the book, in the chapter on steroids. And it's an ingenious idea.

Zumsteg notes that according to the book "Game of Shadows," Barry Bonds was on a three-week-on, one-week-off steroid cycle in 2002. Moreover, Bonds complained that he didn't feel his usual self on his clean weeks.

If that were true, it should show up in Bonds' day-to-day records. He should be playing three weeks great, then one week not-so-great, then another three weeks great, and so on.

The only problem: Zumsteg didn’t know when Bonds' cycle started. So he ran all 28 possible breakdowns, and looked for the one with the biggest drop-off. That corresponded to a cycle starting on April 1 (or April 29). If you divide 2002 into four-week blocks, starting April 1, and consider Bonds "on" the first three weeks of each block, and "off" the last week of each block, you get:

"On"-- BA=.392, SLG=.878, HR/H=34%
"Off"- BA=.293, SLG=.533, HR/H=19%
----------------------------------
Diff-- BA=.099, SLG=.345, HR/H=15%

Does this evidence support the thesis that Bonds was on PEDs? Zumsteg says yes. One reason is that if you look at the second-best cycle, and the third-best cycle, and so on, they're all clustered around April 1 (or 29). "If we tried to apply this ... grouping to a season where a player was not experiencing [non-random] cycles ... the best fit dates would be randomly scattered."

But I don't think that's right. There will always be a cycle that looks bad, even if the results are just luck – the worst random cycle out of 28 will always be poor. And if a player hit poorly from (say) April 20-26, he will also have hit poorly from April 21-27, since, after all, the two intervals have six of their seven days in common! And that's true regardless of *why* he hit poorly those days. So I'd argue that the clustering of the bad cycles means nothing at all as far as steroids are concerned.

Zumsteg also tried this for other seasons (actually, he had Keith Woolner, of Baseball Prospectus, do it). They don't give much information on those other seasons, but say that in 1997 and 1998, when Bonds was almost certainly steroid-free, the effect is half the size of 2002, with less clustering. But in the suspect 2001 season, they again find clustering.

But still.

I ran a rough simulation of Bonds' 2002 season, assuming at-bats were random and his hitting was constant every day. Then I checked all 28 cycles, and found the one that made Bonds look the most suspicious. I then repeated that test 10,000 times. The results:

Simulation difference: .106 BA, .282 SLG, 17% HR/H
Observed 2002 numbers: .099 BA, .345 SLG, 15% HR/H

(technical details: I divided 2002 into 180 days. Each day, Bonds had 3 PA two-thirds of the time, and 4 PA one-third of the time. That meant he may have had more PA than his actual 2002, or fewer. Also, I found the *three* cycles with the highest difference, one cycle for each of the three stats. Zumsteg concentrated on finding only one cycle that maximized some function of all three stats, but it turned out that they all had their highs during that same cycle, so I figure this comparison is still legitimate.)

In truth, the actual Bonds was almost exactly as consistent as the random, non-steroidal one. The flesh-and-blood Bonds was actually slightly *more* consistent in terms of BA and HR/H percentage. He was, however, moderately more cyclic in terms of slugging percentage.

In terms of p-values, none of Zumsteg's findings are statistically significant compared to random:

p=.54 for BA
p=.28 for SLG
p=.58 for HR/H

(The p-values were derived from the simulation: in 10,000 trials, about 5,400 of them showed a "worst cycle" difference for BA of more than the actual .099 observed.)

I have to admit that I was disappointed by these findings – Zumsteg's idea was so intriguing that I was hoping that he'd indeed discovered something new. And the method had "worked," it would have been great for testing other players and coming up with a "suspects" list. Alas, it's not so.

Still, Zumsteg's book is good reading. Thanks again, anonymous benefactor, for the free copy.

Labels: baseball, steroids, streakiness

6 Comments:

At Friday, July 13, 2007 12:24:00 PM, Anonymous said...: Phil,

Would it be possible to do the following test:
(1) Measure Bonds' power by HR/(AB-SO), which would be a binomial event.
(2) Test whether any 'fourth' week pattern showed a drop in that binomial event beyond normal binomial variation.
(3) Apply a certain type of test (not sure of the name) in which you account for the fact that you are not testing the statistical significance of a POSITED fourth week pattern, but of ANY fourth week pattern.

michael_humphreys@aya.yale.edu
At Friday, July 13, 2007 3:03:00 PM, Phil Birnbaum said...: Michael, not sure if you could do that, because the 28 patterns aren't independent. If they were, you could use the distribution of the first order statistic, or something like that (if I remember my fourth-year stats courses).

I think under the circumstances, the simulation is the easiest way to go.
At Saturday, July 14, 2007 12:49:00 AM, Anonymous said...: Phil,

Is it true that the 28 patterns aren't independent? If we didn't think that a different 'treatment' had been applied to one of the 28 patterns, we would have no reason to expect any one to differ from the others.

I'm sure I'm missing something, and will do some further research on my end.

Michael
At Saturday, July 14, 2007 1:13:00 AM, Phil Birnbaum said...: Well, I think the 28 patterns aren't independent because they overlap. As I said in the post, if the 12th to 18th days are poor, the 13th to 19th should be poor too, since they are 86% the same days.
At Monday, July 16, 2007 10:13:00 PM, Tangotiger said...: I did an extensive study of Zumsteg's claim of Bonds on my blog, starting here:
http://www.insidethebook.com/ee/index.php/site/comments/cheater_cheater#12

There are several posts on the subject, including Google docs of my research.
At Monday, July 16, 2007 10:13:00 PM, Tangotiger said...: here you go

<< Home

Sabermetric Research

Thursday, July 12, 2007

"The Cheater's Guide to Baseball," and Bonds/steroids evidence?

6 Comments:

About Me

Previous Posts