### New issues of "By the Numbers"

Two new issues of "By the Numbers," the SABR baseball research newsletter I edit, are now available for download at my website.

A comment on Jim Albert's article on hit streaks:

Albert revisits the analysis of Trent McCotter (which we discussed here in another thead), which compares the actual frequency of hit streaks to the frequency obtained when you randomize the order of players' games within a season. (McCotter found more actual streaks than predicted by a random sequence of games.)

Albert argues that if this effect is signficant, we should be able to observe it within a single season. So he repeats McCotter's analysis for 5 separate seasons (McCotter aggregated many seasons), finding a significantly higher streak frequency in some seasons, but not others. He concludes: "From this brief analysis, the IID model appears useful in explaining the variation...for some seasons."

I think this is real misuse of the idea of statistical significance. By breaking the sample into smaller parts, Albert makes the significance disappear (in most seasons). But you can do that with any study. That we can't see the effect clearly in every single season just means the effect is subtle, as we would expect (if it exists at all), and can only be seen over a large number of game sequences. After all, 20- and 30-game streaks are rare. For example, the number of 20-game streaks in Albert's data is significantly higher than expected in only 1 of his 5 seasons. But over all 5 seasons, there are 31% more streaks than expected. Breaking up the sample just hides the relationship we're trying to find.

I have other concerns with McCotter's method which I raised in the other thread. But the fact that the difference he identifies is often not statistically significant within a single season has no bearing at all on how real his findings are. Demanding in-season statistical significance is an arbitrary standard.

