Monday, March 05, 2007

Can batters decrease strikeouts by reducing their power?

Here is a baseball study called "The Tradeoff Between Home Runs and Contact Rate."

Lawrence W. Cinamon ran a regression on home run rate (HR/AB), as predicted by independent variables for park, season, and contact rate (non-K/AB). He uses two different models, with two different statistical corrections, the first for autocorrelation, and the second for autocorrelation and heteroskedasticity. (I leave it to the statisticians reading this to comment on the corrections, which I don't understand.)

For the first model, Cinamon finds that home run rate increases by .041 points for every one-point increase in strikeout rate. For the second model, it's .14 percentage points.

So, using the second model, increasing strikeouts from 20% of 500 AB to 21% of 500 AB should increase the number of home runs by .7 of a home run. Roughly, every seven strikeouts equals one home run.

My comments:

First, if you look at it a slightly different way, the effect is even larger. Another way to think of home run power is per ball in play: HR/BIP, rather than HR/AB. That is, the chance of hitting a home run *given that you don't strike out* increases by more than .14 points. This is consistent with Voros McCracken's model (also used by
Jim Albert), where a player's ability is the combination of four skills: (a) walking; (b) striking out when not walking; (c) hitting HRs when not walking or striking out; and (d) getting a hit when not walking, striking out, or hitting a home run.

That is: suppose Dave Kingman could hit a home run every time he put the ball in play – but he struck out 95% of the time. He'd have 25 hits, and 25 HR. We'd still call him a power hitter, despite his .050 batting average.

Second, Cinamon had a quadratic variable for years of experience, but dropped it because power peaked at 22 years, and "only three players in the sample have years numbered 22 or higher." But that's no reason to drop the variable. (I'm sure you could find diseases that peak at age 100, even though not many people live to be 100.) Cinamon says the result is due to "survivorship bias" – that home runs cause long careers, not the other way around. But both are true. Players do hit for more power as they get older, as Bill James told us many years ago.

In any case, even if there is survivorship bias, I don't see why that should invalidate the results. Cinamon says that "long ball causes long careers." But isn't that *more* reason to adjust for age?

What Cinamon does instead is limit his study to the first five seasons for players' careers. Since the tradeoff between HR and K is very likely to change as a player ages, the study's results now apply only to younger players.

Third, are we sure what the study observes is really a tradeoff? Suppose players are independently either high- or low-strikeout, and independently high- or low- power. And suppose there's no tradeoff possible – players are what they are, and can't change. And everyone makes the majors except the bad group, the high-K low-power group.

In that case, you'd get exactly the results you see here: HR hitters (a combination of low-K and high-K) have more strikeouts than singles hitters (who are low-K only). But there would be no tradeoff, just the illusion of one.

In that model, the effect is 100% selection bias, and 0% tradeoff. Cinamon assumes the effect is 0% selection bias, and 100% tradeoff. What's the real answer? Obviously, it's somewhere in the middle. But where? And how could we design a study to tell?

(Hat tip:
Marginal Revolution.)



At Tuesday, March 06, 2007 5:34:00 PM, Blogger Tangotiger said...

In FIP, the equation is 13*HR + 3*BB - 2*SO all per IP. The "tradeoff" is 6.5 K to 1 HR.

In LWTS, you have 1.4 runs for a HR, -.30 runs for a K, and all other events are a total of +.01 runs each. If you accept 4.5 more K and 1 more HR, and give up 5.5 other events, you are breaking even.

So, somewhere in the 5 or 6 range seems about right to me.


The "Jim Albert" model is also the "Voros" model, when he introduced DIPS back in 1998 or 1999:

Starting at Step 5, Voros proceeds to:
a. remove HBP
b. then remove BB
c. then remove K
d. then remove HR

At Tuesday, March 06, 2007 5:44:00 PM, Blogger Tangotiger said...

I should also point out that the MLB average is 6 K per HR.

At Wednesday, March 07, 2007 9:48:00 AM, Blogger Phil Birnbaum said...

Thanks, Tango.

Those are calculations of the theoretical breakeven rate, right? So any batter that makes a tradeoff at less favorable terms than 5-6 K for a HR is hurting his team.

I'll update the post to give Voros credit.


Post a Comment

<< Home