Can managers induce "career years" from their players?
Over at the "Ask Bill" section of Bill James' website, there was some discussion last week about the 1980 Yankees (subscription required; start at April 7). They finished 103-59 despite a team that didn't look that great on paper. Was it that manager Dick Howser somehow got more out of the players than expected?
A few years ago, I did a study that tried to estimate how much a team was affected by the "career years" or "slump years" of their players. (Go here, look for "1994 Expos".) What I did, basically, was take a weighted average of a player's stats the two years before and two years after, regress it to the mean a bit, and use that as an estimate of what the guy "should have" done that year. Any difference, I attributed to luck. In the 1980 Yankees case, it was 12 games of "career years" from their hitters, and effectively zero for their pitchers.
A bit of discussion followed; Bill James wrote that he wasn't convinced:
"I am leery of describing as luck things that we don't understand. It may well be that players had good years because Howser or someone else was able to help them have good years."
Fair enough. In response, I posted a short statistical argument that if it *was* the manager, it couldn't happen very often, and another reader (Chris DeRosa) disputed what I said (partly, I think, because I didn't say it very well).
Since "Ask Bill" is not a good place for a long explanation, I thought I start again here and better explain what I'm talking about.
Suppose we knew the exact talent level of every team in the majors. That is: for every single game, between any two teams, we know the exact chance either team will win. If both teams have an equal chance, it's exactly like flipping a fair coin. If the favorite has a 64 percent chance of winning, it's like flipping a coin that has a 64 percent chance of landing heads.
In real life, this pretty much the way it works. If not, the Vegas odds on baseball games wouldn't be so close to even. If you could look at the specifics of a game and have a 90% idea of who would win that day, Vegas would routinely offer 9:1 odds on underdogs. And they don't. That means that a huge part of who wins a baseball game is unpredictable.
So, a team's season record is like a series of 162 coin tosses -- heads is a win, tails is a loss. Mathematically, using the binomial approximation to the normal distribution, you can show that the SD of team wins over a season, for a .500 team is about 6.3 wins. That is, you expect 81-81, but you could easily wind up 87-75, or even 69-93, just due to luck.
The SD drops as the team gets better or worse than .500, but it doesn't drop much. If it's a .600 team, rather than a .500 team, the SD due to "coin tossing" is still 6.2 wins. Even for a .700 team, the SD is still about six games a season -- 5.8, to be exact.
Also, there's no need to keep the assumption that all games are the same. Suppose, before every game starts, you know the exact talent of both teams, and even the exact home field advantage for that game. You can even be omniscient enough to adjust for the weather, and injuries, and the fact that the starting pitcher had a big fight with his wife last night. Before the game starts, you'll have an extremely accurate estimate of the chance of the home team winning.
Still, that chance will be substantially less than 100%. You'll still have a huge amount of luck happening. Your estimate is almost always going to be less than, say, .700. It is absolutely impossible to get much better than that, for the same reason it's impossible to predict what the temperature will be exactly one year from now.
In theory, it could be predictable -- but the predictability is over uncountable numbers of molecules, beyond any possible computing capability humans could ever devise. So what is left is essentially random.
That means that, when we total up your wins and losses for the season compared to talent, no matter how accurate your talent estimates are, you're going to find that your SD is *still* around 6.2. That's a unalterable, natural limit of the universe, like the speed of light.
If you have a model for estimating team talent, a good test of that model is how close your error can get to the natural lower bound of 6.2 wins.
The most naive model is when you predict that every team will wind up 81-81. If you check that, you'll find that the standard error of your estimates is around 11 wins. If you use a prediction method like Tom Tango's "Marcel", you'll get substantially closer. You could also check any other predictions, like the Vegas over/under line. I don't actually know what those are, but I'm guessing they'd be around 8 or 9 wins.
My model is at 7.2 wins. I'm pretty sure it's better than Marcels and Vegas, but that's only because it uses more data. Oddsmakers are predicting the team's talent *before* it happens; I'm predicting it after. Obviously, I have a huge amount more information to work with. From looking at the rest of Norm Cash's career, I know that Norm Cash wasn't as good a player in 1962 as his 1961 suggested, and I can adjust accordingly. Marcel looks only backwards, so it doesn't know that.
If that seems like I'm cheating, well, not really. I'm not using the method to show how good a predictor I am. I'm using it to try to figure out, after the fact, how good a team actually was. I'm not trying to predict the future; I'm trying to explain the past.
My method works like this. Suppose you have a team that talent of X wins, but, instead, it got Y wins. The difference between Y and X is, by definition, luck. How might we measure that luck?
I think that these five measurements completely add up to the amount of luck, without overlapping:
-- how much the team's hitters got lucky and had a career year;
-- how much the team's pitchers got lucky and had a career year;
-- how much the team differed from its Runs Created estimate;
-- how much the team's opponents differed from their Runs Created estimate; and
-- how much the team's wins differed from its Pythagorean Projection.
The first two items deal with the raw batting and pitching lines. The second two items deal with converting those lines to runs. And the last item deals with converting those runs to wins. (You don't have to consider the opposition's "career year", because the opposition's career year in hitting is your career year in pitching, and vice-versa.)
Any source of luck you can think of winds up in one of those five categories. A pitcher has a lucky BABIP? That shows up as a career year. Team gets lucky and hits unusually well in the clutch? Partly career years, partly beating their Runs Created estimate. Team gets lucky and goes 15-6 in extra inning games? Shows up in their Pythagorean discrepancy. Your shortstop has a lucky defensive year? That shows up in a pitcher's career year (which is based on opposition batting outcomes, and therefore includes defense).
It's all there.
So, for every team since 1961, I figured out their luck in each of the five categories. As I said earlier, the "career year" luck was by players' talent estimates based on the four surrounding years. The Runs Created and Pythagorean estimates were straightforward.
After all that, the unexplained discrepancy, as I said above, was 7.2 games.
That seems very close to the law-of-the-universe binomial limit of 6.2 games. The difference, however, is substantial: it's 3.7 games. (It works that way because 7.2 squared minus 6.2 squared equals 3.7 squared).
What does that 3.7 represent? It's not luck we haven't accounted for, because, I think, we've accounted for all the luck. We haven't accounted for it perfectly -- Pythagoras and Runs Created aren't exact. And, of course, the way I estimated a player's talent isn't perfect either.
So, here's what accounts for that extra 3.7 game standard deviation:
1. imperfections in Pythagoras and Runs Created
2. the fact that my method of estimating talent for "career years" is probably not that great
3. managerial influence in temporarily making players better or worse for a single season (Billy Martin's 1980 pitchers?)
4. injury patterns that make players look better or worse (but not injuries affecting playing time; that's reflected in the estimates already)
5. other sources of good or bad single years that aren't luck or injuries (steroids? Steve Blass disease?)
6. other things I'm forgetting (let me know in the comments and I'll add them here).
If I had to guess, I'd say that #2 is the biggest of all these things. My method just looks at four years. It may not be regressing to the mean properly. It doesn't distinguish between starters and relievers. It doesn't consider age (which is fine for most ages, but not for, say, 27, when it should give an extra boost over the average of 25, 26, 28, and 29). It takes previous or future career years at face value, so that, for instance, it predicts Brady Anderson's 1997 expectation based significantly on his 1996. (If you showed a human Brady's entire career, he probably wouldn't weight 1996 quite so high.)
UPDATE: Tango describes it better than I do:
"As for the reason for that 3.7, a large portion of that is almost certainly the uncertainty of the true talent for each player. There’s only so much we can know about a player, given such a small sample as 3000 plate appearances, combined with such a narrow talent base that is MLB."----------
In light of all that, my point about Dick Howser is this: since the entire unexplained residual SD is only 3.7 games, then there can't be a whole lot of manager influence in temporarily increasing a player's talent. It's certainly possible that Dick Howser managed his team into an extra 12 games of extra talent, but things like that certainly can't happen very often.
If you square the unexplained SD of 3.7, you get an unexplained variance of about 14. Multiply that by the 26 teams that existed in 1980, and you get about 356 total units of unexplained variance.
If Dick Howsers are routine, and there's typically one every season creating a discrepancy SD of 12, that Dick Howser singlehandedly contributes a variance of 144. That's about 40 percent of the total unexplained variance for a typical league. That's a lot.
Furthermore, it's absolutely impossible for there to be an average of two and a half Dick Howsers in MLB per year, each boosting his team by 12 wins worth of talent. If that were the case, then that would account for the entire 356 units of variance, which means all the other sources of error would have to be zero. That's obviously impossible.
Even if there were only half a Dick Howser every year, that would still be 21 Howsers in the period I studied. In that case, instead of seeing the discrepancies normally distributed, we'd see a normal distribution with 21 outliers.
But we don't.
If "batting career year discrepancy" is normally distributed, we should expect about 24 teams out of 1042 to have discrepancies of 2 SD or more. The actual number of teams at 2 SD or more in the study: 25, almost exactly as expected.
We should also expect 24 teams to have discrepancies of 2 SD or more going the other way. Actual number: 22.
So there is no evidence at all that there's anything more than luck going on. That still doesn't mean that Dick Howser can't be a special case ... it could be that career years are just random, *except for 1980 Dick Howser.* But, obviously, the number alone doesn't give us any reason to believe he is. A certain number of managers are going to have as big an effect as the 1980 Yankees, regardless. (And, in fact, three other teams beat them; the 1993 Phillies led the study with a "career year hitting" effect of 13.1 games.)
So, if you think Dick Howser is something other than a random point on the tail of the normal distribution, you have to explain why. It's like when Daphne Weedington, from Anytown, Iowa, wins the $200 million lottery jackpot. You don't know *for sure* that Daphne doesn't have some kind of supernatural power. But, after all, *someone* had to win. Why not Daphne?