Sabermetric Research: New issues of "By the Numbers"

Wednesday, April 15, 2009

New issues of "By the Numbers"

Two new issues of "By the Numbers," the SABR baseball research newsletter I edit, are now available for download at my website.

Labels: baseball

1 Comments:

At Tuesday, April 21, 2009 10:15:00 AM, Guy said...: A comment on Jim Albert's article on hit streaks:

Albert revisits the analysis of Trent McCotter (which we discussed here in another thead), which compares the actual frequency of hit streaks to the frequency obtained when you randomize the order of players' games within a season. (McCotter found more actual streaks than predicted by a random sequence of games.)

Albert argues that if this effect is signficant, we should be able to observe it within a single season. So he repeats McCotter's analysis for 5 separate seasons (McCotter aggregated many seasons), finding a significantly higher streak frequency in some seasons, but not others. He concludes: "From this brief analysis, the IID model appears useful in explaining the variation...for some seasons."

I think this is real misuse of the idea of statistical significance. By breaking the sample into smaller parts, Albert makes the significance disappear (in most seasons). But you can do that with any study. That we can't see the effect clearly in every single season just means the effect is subtle, as we would expect (if it exists at all), and can only be seen over a large number of game sequences. After all, 20- and 30-game streaks are rare. For example, the number of 20-game streaks in Albert's data is significantly higher than expected in only 1 of his 5 seasons. But over all 5 seasons, there are 31% more streaks than expected. Breaking up the sample just hides the relationship we're trying to find.

I have other concerns with McCotter's method which I raised in the other thread. But the fact that the difference he identifies is often not statistically significant within a single season has no bearing at all on how real his findings are. Demanding in-season statistical significance is an arbitrary standard.

Sabermetric Research

Wednesday, April 15, 2009

New issues of "By the Numbers"

1 Comments:

About Me

Previous Posts