Monday, November 26, 2007

The "K" study for real

(Previous posts on the K study: one two three.)

[UPDATE: originally, this post had that the study used only players whose initials were both K. Commenters told me that I misread the study, that it was players where EITHER initial was K. Sorry for my screw-up. The original (incorrect) post can be obtained from me if you want it. This is now new.]

Commenters pointed me to the actual study, which is

The authors checked all players whose first name or last name began with K. They found those players struck out 18.8% of the time, against only 17.2% for non-K players.

Again, I couldn't reproduce this. Overall, I got numbers of 12.4% vs. 11.8%. This gets an effect in the same direction as the original study, but of a smaller magnitude.

And, again, this can be explained by eras, as in
my previous post. From 1960+, the numbers are 14.5% for Ks, and 14.2% for others. Here are the decade numbers:

1910s: Ks 8.7%, non-Ks 8.5% (starting 1913)
1920s: Ks 7.4%, non-Ks 6.2%
1930s: Ks 7.9%, non-Ks 7.5%
1940s: Ks 8.1%, non-Ks 8.2%
1950s: Ks 9.1%, non-Ks 10.3%
1960s: Ks 14.2%, non-Ks 13.6%
1970s: Ks 12.8%, non-Ks 12.7%
1980s: Ks 14.1%, non-Ks 13.5%
1990s: Ks 15.6%, non-Ks 15.5%
2000s: Ks 15.4%, non-Ks 16.4% (up to 2003)

60s to 00s: Ks 14.5%, non-Ks 14.2%

The biggest difference was in the 1920s -- only 1.2%. The biggest modern difference is the 60s and 80s, with 0.6%. From 2000-2003, the effect went the other way -- 15.4% for K players, 16.4 for others.

So I don't see where the authors' 1.6% difference comes from. I'll try a signficance test simulation later and update this post.

UPDATE: The "K" players in real life had 566,374 PA. I ran a simulation that had an average 568,558 PA. The SD of strikeout rate was 0.43 percentage points, which is higher than the 0.3 points that I observed.

So the big question remains: why did the authors get such a high strikeout rate difference?

FURTHER UPDATE: Mystery solved by Tango! It looks like the authors weighted every player equally, instead of weighting every PA equally. See comments.

Labels: ,


At Wednesday, November 28, 2007 10:41:00 AM, Blogger Tangotiger said...

Try weighting each player the same.

At Wednesday, November 28, 2007 10:56:00 AM, Blogger Phil Birnbaum said...

Genius! That did it!

Weighting each player the same duplicates the study's results. I get

18.6% for "K" players
16.9% for others

These are lower than what the study found because I stopped at 2003.

Of course, weighting a player with 100 PA the same as a player with 10,000 PA isn't the best idea. In any case, how would you determine a significance level for that without simulating?

I hope the authors didn't just use a binomial or something.

In any case: mystery solved!

At Sunday, December 02, 2007 10:06:00 PM, Blogger js said...

Fwiw, I have a discussion of this paper, along with some follow-up analyses, here at my blog.

Also, they used a t-test, treating career strikeout rate as a normally distributed continuous variable.

Marketing professors.


Post a Comment

<< Home