### Did Jose Canseco teach his teammates to use steroids?

Here's a study that believes it has evidence that Jose Canseco influenced his teammates to take steroids, by finding that those teammates hit more home runs after Canseco left the team.

It's called "Learning Unethical Practices from a Co-worker: The Peer Effect of Jose Canseco," by Eric D. Gould and Todd R. Kaplan.

Here's what Gould and Kaplan did. They took all hitters since 1970 in the "power positions" (C, DH, OF and 1B) and examined how they did before playing with Canseco, while playing with Canseco, and after (but no longer) playing with Canseco. They controlled for a bunch of factors: year, manager's record, etc., and, most importantly, they controlled for the player himself, so that they are comparing parts of his career to other parts of his own career.

They found that there was indeed an effect. While playing with Canseco, the average player hit almost 1.1 home runs more than before Jose was on his team. And later, once separated from Canseco, the average player hit an additional 2.9 home runs! The 1.1 figure is almost statistically significant (about 1.7 SD), while the 2.9 is extremely significant (4.5 SD).

However, those numbers don't control for playing time. It turns out that while playing with Canseco, players had an average 16 extra AB than before. 1.1 HR in 16 AB is about 35 HR in a full season of 500 AB – more than average, even for a player in a power position, but really not that big a deal. And in the "after Canseco" years, the players had 53 extra AB in which to hit their 2.9 HR, which is 27 HR per season – probably still a little more than what you'd expect.

So far, it's a bit more power than you would have figured, but nothing too serious, and certainly not enough to persuasively show a "peer effect" of Canseco on his teammates.

Also, keep in mind that the study didn't control for the age of the players. Obviously, they'd be older in their "after Canseco" years than in their "before Canseco" years, and power generally increases with age. Taking that into account, wouldn't you now think that these numbers are pretty much as expected? Players get older, and, as they age, they get more playing time and hit for a little more power. I don't really see what the big deal is.

There is one caveat: the authors repeated their study for players other than Jose Canseco -- Rafael Palmeiro, Jason Giambi, Mark McGwire, Juan Gonzalez, Ivan Rodriguez, Dave Martinez, Ken Caminiti, Ken Griffey Jr, Ryne Sandberg, and Cecil Fielder. (The authors say they checked 30 players, but they only show the AB results for these 10.) It turned out that of all these players, Canseco showed the strongest effect in raw AB numbers. In home runs, the most comparable player was Ryne Sandberg. After playing with Sandberg, players had 1.6 more home runs per season than before Sandburg. (Recall that after Canseco, they showed 2.9.) After Canseco, players had 53 more AB. But after Sandberg, they had 24 *fewer* at-bats. That's a bigger increase in home run percentage on Sandberg's part. From these numbers, it would seem to me that Sandberg would be a better candidate as a bad steroid influence than Canseco.

In any case, the authors do acknowledge that the raw home run numbers are not very meaningful without also taking into account the changes in AB. So they ran another Canseco regression, this time including a variable for AB. Now, instead of 3 extra home runs, these players hit only 1 extra home run. That still comes out statistically significant, at 2.2 SD.

But again, this regression doesn't include player age. For players in the sample, the mean number of AB was 310. Would an increase of 1 HR per 300 AB not be consistent with normal patterns of aging for players in "power positions"? And that's not a 1 HR increase every year – it's the average increase between the several years the hitter played with Canseco, and the several years following. Again, it seems reasonable to me.

We'd have a better idea if the authors repeated this analysis for the other 10 hitters, controlling HR for AB there also. But they didn't, so we don't know if Canseco is a special case in this regard as well.

Finally, Gould and Kaplan dropped six players from the sample, the players that Canseco claimed to have personally injected with steroids. (Those were Palmeiro, Giambi, McGwire, Gonzalez, Rodriguez, and Martinez.) Without those six, the "after Canseco" increase in HRs dropped from 2.9 to 1.5. That's still statistically significant. But there's a corresponding increase in strikeouts (6.8) and walks (4.7), while batting average and slugging percentage barely change (.002 and .003 respectively). That suggests that, again, it's an increase in AB that's responsible for all these increases.

The question we're left with, then, is: why do Canseco's teammates increase their AB once he's gone? It could just be the situations, or the managers. Suppose Canseco tended to be signed by teams that were trying to win it all this year. Those teams wouldn't be playing a lot of rookies. But once they dropped back in the standings, they would tend to trade Canseco to a contender, and give their young players more at-bats.

Does that sound plausible? Canseco played on a lot more teams than most of the other ten players in the comparison set. That would suggest that he would have left a lot of teams in a rebuilding stage, wouldn't it?

Or maybe the types of managers who pushed for Canseco to be signed are the same types of managers that like to keep lots of guys on the bench. That means they get 75 AB when Canseco's on the team, and when they wind up as regulars a couple of years later, those 75 at-bats give them a data point in this study (the authors used a minimum of 50 AB).

Anyway, I'm mostly just thinking out loud here; I have no idea if this is the correct answer or not. The fact remains that "post-Canseco" players did tend to increase their AB. But the significance level in the regression has a hidden assumption, that the players on Canseco's teams were random. And they're not – the patterns of player AB have a lot to do with manager and GM tendencies, which means the variances of the observations are underestimated, and the significance overestimated. That's probably why, of the 11 regressions, three of their "post" AB numbers are statistically significant at the 5% level.

Indeed, look at the "change in AB" numbers for the other 10 players:

-33, 10, 20, 1, 32, -15, -22, -46, -24, 0

Compared to these, +53 isn't *that* out of place, is it? And this hardly looks like a normal distribution, which again suggests team tendencies are at work.

My best guess is the fact that players had more AB after Canseco left has more to do with circumstances than with Canseco. And the fact that players had more HR after Canseco left is almost perfectly explained by the increase in AB, and normal aging.

Canseco may very well have influenced his teammates to use steroids -- but the evidence contained in this paper does not substantiate that hypothesis very much, if at all.

(Thanks to John Matthew for the link.)

------

UPDATE: while it is correct that the regression didn't control for age, I just noticed it *did* control for years of experience. That means that the one-home-run-per-300-AB figure is more significant than I originally thought.

However, the regression assumes that power increases over the years are constant and linear, which might not be the case. For instance, if power increases faster when the player is young, the regression might underestimate power for (say) 29-year-olds, and overestimate it for (say) 39-year-olds.

My gut still says that it's still very possible that a long-term increase of 1 HR per 300AB, even after taking tenure into account, could still have many other causes, such as management decisions on one or more of Canseco's teams. And it still could be that the entire effect is caused by the players (of Canseco's six) that are generally acknowledged to have juiced.

I wish the authors had run more regressions that controlled for playing time, so that we'd have more useful data.

Slightly off topic--but it has to do with your point about how teams are not built randomly but with a GM's intentions in mind.

I read Moneyball only about a year ago, well after revelations about steroid use by A's players, including Giambi and others. As a stat freak I loved the whole sabremetric approach by Beane, but I cynically suspected that the A's success was based on steroids rather than Excel spreadsheets.

Basically, the A's drafted lots of high-OBP college guys without power. Then they get to the big leagues and start hitting HRs. I wonder why?

Perhaps it's the high OBP hitters that have the most to gain by bulking up. Power hitters probably aren't going to improve their timing or acuity. But contact hitters can improve their strength while retaining what made them high OPS batters to begin with.

I'm not saying Beane or the A's intentionally planned some steroid conspiracy, just that their type of player had the most to gain by using steroids.

The same concept could explain the Canseco effect as well.

