Pythagorean good luck associated with Runs Created bad luck
I noticed recently that there's a negative correlation between certain measures that we think are random and independent. For instance, outshooting Pythagoras tends to be associated with undershooting Runs Created. I don't know why, and I'm looking for ideas.
Let me give you some background to what luck numbers I'm doing here.
Back in 2005, I did a study to estimate real teams' historical talent levels from their stats. I figured that there were five mutually exclusive ways a team could perform differently from its talent:
1. Its batters could have lucky or unlucky years, in terms of raw batting line.
2. Its pitchers could have lucky or unlucky years, in terms of the opposition's raw batting line.
3. It could create more or fewer runs than expected from its batting line (runs created).
4. Its opponents could create more or fewer runs than expected from their batting line (runs created).
5. It could over- or undershoot its Pythagorean projection.
The last three were easy -- I just compared them to their estimates. The first two were harder. How can you tell whether a player is having a career year? What I did, for that, is I took the weighted average of the four surrounding seasons, and regressed that to the mean. The results for players came out fairly reasonable.
The results for teams came out reasonable too, IMO. The luckiest team from 1960-2001 was the 2001 Mariners (who the study said "should have" won 89 games instead of 116), and the unluckiest was the 1962 Mets ("should have" won 61 instead of 40).
[If you want more details, see my web page (search for "1994 Expos"). You can actually download the spreadsheet there that I'm using. Also, I wrote up the findings for SABR's "Baseball Research Journal," and I found a repost here (.pdf).]
The "career year" estimates for teams seemed pretty good. I had tweaked the formulas to make them close to unbiased. For 1973 to 2001 -- the subset of seasons I'm using for this, less strike years -- the mean batting luck was +1.8 runs, and the mean pitching luck was -0.1 runs.
So, I was pretty happy with the overall results.
OK, so ... while I was working with the data yesterday, I noticed some correlations I didn't expect.
First: it turns out there's a strong correlation between "Pythagoras luck" and "career year luck" (batting plus pitching). That correlation is negative 0.1. Why would that happen?
The only theory I can think of -- when a team plays well, it wins a lot of games. That means it plays fewer ninth innings on offense, and more ninth innings on defense. That artificially makes it look lucky in Pythagoras (which is based on run differential).
But that should create a *positive* correlation with player performance luck, not negative!
Pythagoras luck had an SD of around 40 runs per season. Career year luck was around 65. So, every four extra Pythagoras wins is related to around negative 6 runs of "career year" effect. Not a whole lot, but I still don't know what's going on.
And, worse: there's a strong correlation between "Pythagoras luck" and "Runs Created luck". This time, negative 0.15.
So: for every win by which a team beats its Pythagoras, it's given up one-tenth of a win in Runs Created luck. How would that happen? The only thing I can think of is walkoff wins with runners on base: every one of those might lead RC to believe you were unlucky by ... what, half a run? So that's not really enough.
Finally ... there's a huge correlation (minus 0.2) between "career year luck" and "pythagoras + RC luck". For every four wins a team gained due to Pythagoras/RC luck, they lost one back to player underperformance.
For that, I have a hypothesis. Runs Created is known for overestimating the best offenses. So, when a team beats its RC estimate, it's less likely to be having a great year. That means its batters are more likely to be underperforming.
Here's something to support that idea: when I checked, I found almost all the correlation comes from comparing batting career years with batting RC luck, and from comparing pitching career years with pitching RC luck. Comparing pitching to batting gives almost zero correlations.
I'm not sure if that explanation is enough to explain the -0.2, but it's something.
So what's going on? Shouldn't clutch hitting (which is what RC luck is) be uncorrelated with, say, scoring runs when you need them the most (which is what Pythagoras luck is)? Shouldn't whether you get a few extra hits one season (which is career year luck) be uncorrelated with *when* those hits happen (which is RC luck)?
Why are these things associated? It must be something about the way I'm measuring them, as opposed to, being lucky one way causes you to be lucky another way. Right?