How does a player's team affect his output?
Does what team a batter is on affect his performance? We know there are some factors, such as park effects, in which the characteristics of the team affect a hitter. Are there others? And how important are they?
In "A Variance Decomposition of Individual Offensive Baseball Performance," David Kaplan tries to find out. I didn't understand the full details of his statistical methodology, but he starts off by using a statistical technique called "analysis of variance." I studied this a long time ago, and have since forgotten most of it. But if I understand it correctly, it examines the variance of the performance of players within teams, and then the variance of the team means themselves, to figure out how important teams are relative to players.
For instance, suppose there are three teams, each with three players, and their batting averages are:
Expos ....... .260 .260 .260
Pilots ...... .270 .270 .270
Browns ...... .280 .280 .280
In this case, we can conclude that 100% of the variance comes from the teams, and zero percent is inherent in the players. Perhaps the Browns' manager is great at bringing out the best in his hitters, or their home park is very hitter-friendly, or perhaps the Browns can afford to sign all the .280 hitters while the other teams can't.
Now, suppose the numbers look like this:
Expos ....... .260 .270 .280
Pilots ...... .260 .270 .280
Browns ...... .260 .270 .280
Here, the opposite is happening: there is zero variance between teams, and so 100% of the variance is among the individual players.
And here's one more case:
Expos ....... .260 .270 .280
Pilots ...... .270 .280 .290
Browns ...... .280 .290 .300
In this example, the variance among teams seems to be the same as among the players on the team, so we can probably say the variance is decomposed 50-50 between players and teams.
And, of course, in real life, the percentage can be anywhere between 0 and 100% -- not just 100%, 0%, or 50% like in the three examples above.
So what did Kaplan find? Here are a few of his numbers. He did 2000 and 2003 separately; I've taken the simple average of those two percentages. What's shown is how much of the variance can be attributed to the team:
What to make of this? I have no idea. I'm not sure why teams would be responsible for 28% of the variance in total bases, but only 12% of the variance in runs created.
However: Kaplan appears to have used actual counts, not rates, for all the counting stats. And he considered everyone with 200 plate appearances. What that means is that the variances are very heavily influenced by how the manager used his players. For instance, suppose the Browns and Pilots have identical players, but the Browns platoon and the Pilots don't. Then their hits might look like this:
Pilots: 120, 140, 160
Browns: 60, 60, 70, 70, 80, 80
This makes it look like there's big differences between the teams, since Browns players average 70 and Pilots players average 140. Futhermore, the Pilots' within-team variance is four times as high as for the Browns!
I'd guess that player usage patterns are the reason for some of the numbers, such as between-team effects being 28% of the variance of total bases, but only 3% of slugging percentage, when, really, those two statistics measure the same skill.
Also, the percentages result from a large muddle of a bunch of factors:
-- how much a team spreads out its at-bats among players;
-- park effects
-- whether teams that have good players are more likely to have other good players, because they can afford to spend more
-- whether teams that have good players are less likely to have other good players, because there's effectively a salary limit on how many good players they can afford
-- whether GMs concentrate on certain types of players, such as Billy Beane buying up a lot of OBP
-- whether managers are more likely to give playing time to certain types of players
-- managers' decisions on elective strategies like bunts and steals
-- and lots more that you can probably think of.
(As far as that point about elective strategies: it's safe to assume that variation in the rate of sacrifice hits should be very strongly team-related. That's because the bunt is a strategy held in different levels of esteem by different managers, especially in the American League. In the 2003 AL, sac hits ranged from 11 (Blue Jays) to 65 (Tigers). The fact that this study didn't show any reasonable effect – it showed that only 0.5% of bunt variance was team-related – suggests that the methodology is flawed, or at least not powerful enough for its intended purpose.)
Because the methodology tangles so many causes together, I don't know what this study tells us. I have no idea, absolutely none, of what any of Kaplan's numbers might mean in the baseball sense. Kaplan doesn't make any suggestions either. Could it be there's nothing we can conclude? Is it possible that the figures in the chart are no better than random numbers? Or am I missing something important?