Is the distribution of no-hitters "memoryless"?
Here's a new research study (.pdf) by Michael Huber and Andrew Glen, recently published on the Retrosheet website. The paper tries to confirm whether certain rare events in baseball – namely, no hitters, triple plays, and hitting for the cycle – are "memoryless".
A "memoryless" distribution would mean that the odds of seeing a no-hitter (or cycle, or triple play) on any given day don’t depend on whether or not there was a no-hitter yesterday, or last month, or last year. If the distribution is memoryless, it doesn't matter how long it's been since the last one – the probability of seeing the next one should be the same, regardless.
Now, obviously, there are reasons to believe this should not be the case, at least theoretically. Consider no hitters. In the low-hitting environment of the mid-1960s, 27 consecutive batting outs would be a more frequent event than in the modern era. All things being equal, then, we might expect more no-hitters then than now. And so if there was a no hitter on day X, it's more likely that X is in 1967 than 2006, and the odds of a no hitter on day X+1 should therefore be higher. Memorylessness should not be the case.
But, in practice, the era effect is small. Figure 4 of the study compares theoretical inter-arrival times (that is, the number of games between successive no-hitters) to actual, and the two curves look very close.
The authors do later break down the results by era, but, unfortunately, they don’t show any more graphs. They do give us significance levels, though. They reject memorylessness (at p= .05) for triple plays 1961-76, hitting for the cycle 1901-19, and no-hitters in 1920-41 and 1961-76. Again, these are probably era effects – 1961-76 is not a homogeneous era. It comprises high offense in the early-60s, low offense in the mid-to-late 60s, and part of the DH era.
Also, many of the other eras show effects that are "nearly" significant – for cycles, only two of the six eras show levels above 0.25, and, for no-hitters, only one of six is above 0.25.
So can we still say that the rare events are "roughly" memoryless? This might be one of those cases where one picture is indeed worth a thousand numbers. All those significance levels confirm what we already knew – that playing conditions change over time, and that we have to reject pure memorylessness. But they don't tell us *how close* to memoryless the distributions actually are. Thanks to Figure 4, though, we can probably say "close enough."