The OBP/SLG regression puzzle -- Part III
I ran regressions in the previous posts to predict winning percentage from on-base percentage and slugging. In those regressions, I had adjusted all teams to the league SLG and OBP. I had to. If you don't adjust, the results vary a lot.
Here's the regression completely unadjusted. (It's all teams from 1961 to 2009, except strike seasons.) Here's the equation. (I'll put the OBP/SLG coefficient ratio in brackets too.)
wpct = (2.19*OBP) + (0.07*SLG) - .2405 [ratio: 30]
That's an OBP/SLG ratio of over 30! We were expecting 1.7. It seems like slugging barely matters at all!
Compare that to the "regular" regression, which adjusts for league-season:
wpct = (2.70*OBP) + (0.89*SLG) - .7843 [ratio: 3]
OK, that's a bit better. The ratio is down to 3.
Guy argued, in the comments to the first post, that I need to adjust for park, too. He's right.
If I change winning percentage to what it would be if the team had posted those stats in a neutral park -- while still keeping the league adjustment -- I get this equation:
wpct = (2.65*OBP) + (1.09*SLG) - .8504 [ratio: 2.43]
Even better: we're down to 2.43!
An easier way might be just to not adjust anything, but include the league and park in the regression:
wpct = (2.63*OBP) + (1.15*SLG) - (2.58*league OBP) - (1.16*league SLG) - (0.0029* BPF) + 0.091 [ratio: 2.1]
Now, the ratio is all the way down to 2.1.
What's going on?
This one's pretty simple. When a team has a high OBP or SLG, it's a combination of two things:
-- batting talent, and
-- a high run environment for the league and park.
The first one actually has an impact on winning percentage. The second one doesn't. A high SLG doesn't help you if it's caused by the park, because the opposition benefits from it too.
The same is true for OBP. But ... SLG should be affected *more*. There are more high-HR and low-HR parks than there are, say, low-walk parks. The "steroids era" was mostly home-run related.
Comparing 1968 to 2001:
1968: OBP .299, SLG .340
2001: OBP .332, SLG .427
OBP increased 11 percent, but SLG increased 26 percent.
So, when you don't adjust, slugging doesn't matter as much, because it benefits the opposition too. That makes OBP look more important, relatively speaking.
(All credit for this finding goes to Guy ... he actually explained all this to me in his comment.)
As you'd expect, the problem goes away when you combine team offense with opposition offense in the same regression. Even without adjusting, you don't have a big problem, because both teams are affected the same way.
I used the *differences* between team OBP/SLG and opposition OBP/SLG, without any adjustument, and got
wpct = (2.09*OBP) + (0.897*SLG) + .5 [ratio: 2.33]
That's a ratio of 2.33.
But why do we need to care about the opposition at all? Commenter Alex suggested that if we try to predict "runs per game" instead of "winning percentage," we'll get even better results, because the opposition won't matter.
I'm checking that out for a future post.