Thursday, November 22, 2007

"K" study results can't be duplicated

The "Dave Kingman strikes out a lot because his name starts with K" study found that "K" batters struck out 18.8% of the time, versus 17.2% for everyone else. It also found the difference to be statistically significant.

I can't duplicate either result.

First, the numbers are very high. The authors found at least a 17.2% strikeout rate. But between 1913 and 2003, the years in the study, only four seasons had an overall strikeout rate at least that high. And the highest overall was only 17.8%, while some of the early years had rates below 10%, and the middle years are in the low teens. (And all those numbers come from considering only AB and BB in the definition of PA.)

The authors did limit their dataset to players with 100 PA or more, but that should *lower* the strikeout rates, by eliminating lots of pitchers. So how did they get 17.2%?

Maybe they used AB as the denominator instead of PA. That gives an overall rate of only about 14% up to 2003.

Second, the difference between the K players and everyone else isn't as big as the authors say. The authors found a 1.6 percentage-point difference.
David Gassko's study (using a different dataset) found a difference of only 0.5 percentage points (15.5% versus 15.0%). I checked all players from 1913 to 2003, and found a difference of only 0.2 points:

13.1% for "K" players (50761 out of 387611)
12.9% for them others (1352281 out of 10472268)


(I used all players, though, not just players with enough PA.)

Finally, to check statistical significance, I ran a simulation. I pulled random players who were born after 1934 (to roughly match Gassko's dataset), and arbitrarily decided their last names started with "K". I kept pulling until the total PA went past 464664, to come close to Gassko's number (although they would all be a bit higher than 464664). Then I computed their overall strikeout rate.

I repeated that 100 times. The results:

Mean 14.94%, SD 0.44%

That means that Gassko's result – an 0.5 point difference -- is only about 1 SD higher than the mean. And my result is only half an SD higher than the mean. So, no statistical significance.

So what's going on? I guess we have to wait for the original study to be released before we find out.

P.S. From a quick glance, it looks like only one letter in Gassko's study is statistically signficant – the "O". And exactly one significant result out of 26 is itself not significant.

Labels:

4 Comments:

At Sunday, November 25, 2007 4:12:00 PM, Anonymous joe arthur said...

I can't fully replicate the Nelson/Simmons results either, but I can get a somewhat similar result, when the initial used is the initial of the FIRST name.

As you mention, the authors appear to have used a definition of PA=AB+BB. Using this I do match the number of players mentioned in the newsweek article as meeting the plate appearance criterion. For K/PA, 1st initial K strikes out 14.69% versus 12.90% for everyone else. Using K/AB, 1st initial K strikes out 16.15% compared to 14.15% for everyone else. Using the initial of the last name, there is virtually no difference between "K" names and everyone else (13.00/12.94 for K/PA and 14.26/14.19 for K/BA).

This is probably an illustration of the danger of relying on a second hand source's summary of the original. When I read the Newsweek article (by Sharon Begley) I saw a weird mix of examples of the effect in action for hitting, school grades and law school admissions, sometimes using first names and sometimes using last names.

As you say, we'll have to wait for the study itself to fully understand what the reasearchers did ...

 
At Sunday, November 25, 2007 5:27:00 PM, Blogger Phil Birnbaum said...

Aha ... FIRST names! Thanks! I should have thought of that. When the SI article mentioned Kingman, I assumed it was last names. As you say, secondhand sources can be unreliable. Off I go to check first names!

 
At Sunday, November 25, 2007 6:15:00 PM, Blogger Phil Birnbaum said...

OK, addressed in my new post, here.

 
At Monday, April 20, 2009 4:08:00 AM, Blogger cvxv said...

看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,日本料理,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,場地,住宿,訂房,HOTEL,飯店,造型系列,學位,SEO,婚宴,捷運,學區,美髮,儀器,髮型,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,場地,結婚,場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,場地,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,關鍵字排名,網路行銷,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,EMBA,MBA,PMP,在職進修,專案管理,出國留學,EMBA,MBA,PMP,在職進修,專案管理,出國留學,EMBA,MBA,PMP,在職進修,專案管理,出國留學,婚宴,婚宴,婚宴,婚宴,漢高資訊,漢高資訊,比利時,比利時聯合商學院,宜蘭民宿,台東民宿,澎湖民宿,墾丁民宿,花蓮民宿,SEO,找工作,汽車旅館,阿里山,日月潭,阿里山民宿,東森購物,momo購物台,pc home購物,購物漢高資訊,漢高資訊,在職進修,漢高資訊,在職進修,住宿,住宿,整形,造型,室內設計,室內設計,漢高資訊,在職進修,漢高資訊,在職進修,住宿,美容,室內設計,在職進修,羅志祥,周杰倫,五月天,住宿,住宿,整形,整形,室內設計,室內設計,比利時聯合商學院,在職進修,比利時聯合商學院,在職進修,漢高資訊,找工作,找工作,找工作,找工作,找工作,蔡依林,林志玲

 

Post a Comment

Links to this post:

Create a Link

<< Home