Sunday, January 24, 2010

Do bad hitters see more fastballs than good hitters?

I love simple studies, where you ask an interesting question, and then just answer it by looking at the data, without the need for any fancy techniques.

So here's one from Dave Allen. Allen asks, do better hitters get worse pitches to hit? It turns out they do: at almost every ball-strike count, they get fewer fastballs. For instance, the 20 best hitters in the league got 62.6% fastballs on the first pitch, while the 20 worst hitters saw 66.3 fastballs.

A similar finding holds for the location of those fastballs. Again at 0-0, the good hitters saw only 50.7% of them in the zone, but the bad hitters got 54.8%.

For fastball frequency, the trend reverses at 0-2, 1-2, and 2-2 -- the good hitter gets more fastballs there. I'm not sure why that would be. Maybe on those counts it makes sense for the pitcher to "waste" a pitch outside, hoping the batter will swing at it? And maybe those pitches are more likely to be fastballs? (That would also explain why on 3-2, it's again the bad hitters that get all the fastballs -- you don't want to deliberately miss the strike zone on a full count.)

I don't know if that hypothesis makes sense, but you guys probably know more about this stuff than I do.

Here's the subset of Allen's data that I talked about here ... see his study for all of it.

`Fastball Frequency by count("top" = best hitters,"bottom" = worst hitters)----------------------------------0-0 count: top 0.626, bottom 0.6630-2 count: top 0.549, bottom 0.5111-2 count: top 0.497, bottom 0.4842-2 count: top 0.530, bottom 0.5283-2 count: top 0.591, bottom 0.705`

It's studies like this that make me think that this kind of instantly-publishable "open-source" research (as a commenter on Tango's blog described it) delivers better results than peer-reviewed academic research, at least in sabermetrics. In academia, it seems like, to be accepted, it's not enough that a study teaches us something -- it also has to be "clever" or complex or sophisticated in a certain fashion -- usually a mathematical one. It's hard to describe, except in an "I know it when I see it" kind of way, but I bet anyone who reads a lot of papers will know what I mean.

I can't imagine a study like this one would make it into a journal. It just gets its results by counting. If you wanted to get these results into print, you'd have to embed it in another study of some kind, one that's a more mathematically complex. That's just my impression, of course, and I could be wrong ... any academics out there, tell me what you think.

Regardless, I think it's true that on the internet, all that counts is whether the reader learns something about baseball. And we definitely do learn something here.

In my mind, studies like this require cleverness too, but in a different way: figuring out that you can get a quick answer to an important question, with a very simple method, is something that's not easy to do. Kudos to Dave Allen for thinking of this one and writing it up.

---

Update: forgot to hat tip The Book blog, where mgl discusses the findings.

Labels: , ,

At Sunday, January 24, 2010 11:17:00 PM,  Financial Planner said...

What is the probability that the difference between the top & bottom hitters, occurred by chance alone?

At Monday, January 25, 2010 1:07:00 AM,  Phil Birnbaum said...

In the 0-0 buckets, there's probably around 10,000 pitches. That makes the SD about 1/2 of 1%. So that results is definitely significant.