Sabermetric Research: The "K" study for real

(Previous posts on the K study: one two three.)

[UPDATE: originally, this post had that the study used only players whose initials were both K. Commenters told me that I misread the study, that it was players where EITHER initial was K. Sorry for my screw-up. The original (incorrect) post can be obtained from me if you want it. This is now new.]

Commenters pointed me to the actual study, which is here.

The authors checked all players whose first name or last name began with K. They found those players struck out 18.8% of the time, against only 17.2% for non-K players.

Again, I couldn't reproduce this. Overall, I got numbers of 12.4% vs. 11.8%. This gets an effect in the same direction as the original study, but of a smaller magnitude.

And, again, this can be explained by eras, as in my previous post. From 1960+, the numbers are 14.5% for Ks, and 14.2% for others. Here are the decade numbers:

1910s: Ks 8.7%, non-Ks 8.5% (starting 1913)
1920s: Ks 7.4%, non-Ks 6.2%
1930s: Ks 7.9%, non-Ks 7.5%
1940s: Ks 8.1%, non-Ks 8.2%
1950s: Ks 9.1%, non-Ks 10.3%
1960s: Ks 14.2%, non-Ks 13.6%
1970s: Ks 12.8%, non-Ks 12.7%
1980s: Ks 14.1%, non-Ks 13.5%
1990s: Ks 15.6%, non-Ks 15.5%
2000s: Ks 15.4%, non-Ks 16.4% (up to 2003)

60s to 00s: Ks 14.5%, non-Ks 14.2%

The biggest difference was in the 1920s -- only 1.2%. The biggest modern difference is the 60s and 80s, with 0.6%. From 2000-2003, the effect went the other way -- 15.4% for K players, 16.4 for others.

So I don't see where the authors' 1.6% difference comes from. I'll try a signficance test simulation later and update this post.

UPDATE: The "K" players in real life had 566,374 PA. I ran a simulation that had an average 568,558 PA. The SD of strikeout rate was 0.43 percentage points, which is higher than the 0.3 points that I observed.

So the big question remains: why did the authors get such a high strikeout rate difference?

FURTHER UPDATE: Mystery solved by Tango! It looks like the authors weighted every player equally, instead of weighting every PA equally. See comments.

Labels: baseball, critiquing a study based on secondhand sources

3 Comments:

At Wednesday, November 28, 2007 10:41:00 AM, Tangotiger said...: Try weighting each player the same.
At Wednesday, November 28, 2007 10:56:00 AM, Phil Birnbaum said...: Genius! That did it!

Weighting each player the same duplicates the study's results. I get

18.6% for "K" players
16.9% for others

These are lower than what the study found because I stopped at 2003.

Of course, weighting a player with 100 PA the same as a player with 10,000 PA isn't the best idea. In any case, how would you determine a significance level for that without simulating?

I hope the authors didn't just use a binomial or something.

In any case: mystery solved!
At Sunday, December 02, 2007 10:06:00 PM, js said...: Fwiw, I have a discussion of this paper, along with some follow-up analyses, here at my blog.

Also, they used a t-test, treating career strikeout rate as a normally distributed continuous variable.

Marketing professors.

<< Home

Sabermetric Research

Monday, November 26, 2007

The "K" study for real

3 Comments:

About Me

Previous Posts