New Bill James study on Pythagoras
In the light of lots of discussion on why the Diamondbacks beat their Pythagorean Projection, and what that means, Bill James wrote up a new study on whether such teams improve beyond expected in the following season.
Bill sent the study to interested readers on the SABR Statistical Analysis mailing list, but kindly allowed me to post his article, and the accompanying spreadsheet, for anyone interested:
You will notice that at the end of the article, Bill asks for comments ... if you comment here, I'll post to the list. I already posted some comments, which I will reprint below in small font. You probably want to read the study first.
Bill, thanks for the article! I thought I'd comment, since nobody else has (oops, Dvd Avins posted as I was writing this).
Summarizing Bill's study, if I've got it right:
If you take teams that overachieved – that "should have" been about .482 according to Pythagoras, but were instead actually .538 -- they wound up at .496 the next season.
If you take the most closely matched teams in runs scored and runs allowed, but that *underachieved* Pythagoras – they "should have been" .478, but were actually .447 – they would up at .474 the next season.
Converting everything to a 162 game season, to make things easier to understand:
Group 1: Should have been 78-84. Were actually 87-75. Next year were actually 80-82.
Group 2: Should have been 77-85. Were actually 72-90. Next year were actually 77-85.
The difference: three games (actually, 3.7 if you don't round everything). Adjusting for the fact that Group 1 was (according to Pythagoras) about 0.6 games better than Group 2 in the first place, brings us back to about 3 games.
In terms of runs scored and runs allowed, the difference is only about 2.3 games. The other 0.7 comes from Pythagoras. That is, the teams with the pythagorean advantage of 13 wins the first year had a pythagorean advantage of only 0.7 wins the next year.
First, I think it's plausible that the 0.7 win advantage is real. Pythagoras just counts runs; it doesn't count how important they are. Bill once wrote (and several others have verified) that each run given up by a stopper is, in terms of wins, worth double what it's worth to an average pitcher. So if the stopper on one team has an ERA 1.00 run better than another, and he pitches 90 innings, the 10 runs he saves are actually worth 20. That will mean his team beats Pythagoras by one win.
It's probably reasonable to assume that the teams that beat Pythagoras the most would have better stoppers than the ones who "un-beat" Pythagoras the most. 0.7 wins – 7 runs difference in stopper talent – seems reasonable to me.
That brings the unexplained difference down to 2.3 wins. What could explain that?
Dvd Avins suggested that it's management making changes: that, next year, when the overachieving team drops back to normal, the team will make some improvements.
Or, perhaps the changes might come in the off-season. The team that went 87-75 thought it was really an 87-75 team, and went out and signed an expensive free agent – the one guy they thought could take them over the top to the playoffs. The 72-90 team did not.
But an average of one free agent per team, over 100 teams, seems large, especially when Bill's sample covered all of baseball history, much of which didn't allow for easy free-agent signings.
However, these 87-75 teams are different from other 87-75 teams, in that the players' season records (not including pitcher W-L) look like the records of an 79-83 team, not an 87-75 team. That means that there's more opportunity for management to make changes. When your team scores 120 more runs than they allow, almost everyone looks good. But when your team gives up more runs than they allow, there's going to be more players that obviously seem in need of replacement. This may be *especially* true if the team is perceived to be a playoff contender.
That is, take two identical teams who give up a few more runs than they score. They both have a below-average DH who hits .260 with 15 home runs. The team that went 87-75 might consider it urgent to replace him – they think they need just one or two moves to make the playoffs. The team that went 72-90 knows they need a lot more than that, and may not spend money on free agents until their young players start improving.
Anyway, this may be wrong ... I’m just thinking out loud.
As for significance testing ... Tom Tango has figured that in MLB these days, the SD of team wins is 11.66 games. That breaks down as 9.7 games (in 162) for team talent, and 6.3 games for luck.
For the teams in Bill's list, 9.7 games for talent is too much, since the teams were not chosen randomly, but specifically to match the other group. So let's assume that instead of 9.7 games, the SD of the talent difference between the matched pairs is, say, 3 games.
That makes the SD of actual next-season wins come out to about 7 games (the square root of 6.3 squared plus 3 squared).
Since there were 100 pairs in the study, the SD of the average is 7 divided by the square root of 100, which is 0.7. And so the 2.3 win difference is three-and-a-bit SDs. Obviously that's significant, and so not just chance. But the management effect could be causing it.
Suppose you wanted to get that down to 1 SD, to feel comfortable calling it random variation. You'd have to reduce the difference by 1.6 wins. The most plausible way to do that is to attribute those 1.6 wins to management.
Suppose you have those two identical teams, but one overshoots Pythagoras to win 87 games, and the one undershoots to win 72 games. Both teams may try to improve ... but is it plausible to argue that, averaged over all 100 pairs, the 87-win teams will deliberately go out and improve their teams by 1.6 wins more than the 72-win teams?
I honestly don't know if that sounds reasonable. It's only 16 runs, though ...