Why does a more precise fielding system give less precise results?
I was puzzled by something in Dan Fox's latest post on the SFR fielding method (which, as described in my previous post, is similar to the Sean Smith's "Total Zone" method). My puzzlement isn't about the method itself, but about how its results compare to the "plus/minus" system in "The Fielding Bible."
Dan got the "plus/minus" numbers from the 2008 Hardball Times (which I don't have yet – it's in the mail). What bothers me is that their numbers are so much more extreme than Dan's.
If you look at Dan's chart, his top three extremes (ignoring sign) are 62, 48, and 46. If you look at the THT extremes, though, they're bigger. The top three are 98, 81, and 68 plays; at 0.8 runs per play, that's 78, 65, and 54. The standard deviation of Dan's results is 28.6 runs; the SD of the Hardball Times' results is 33.5 runs.
The SDs aren't all *that* different – what bothers me is not the difference, but the fact that Dan's SD is smaller than THT's SD. It should be bigger.
That's because the more luck in the statistic, the larger the variance. As Tangotiger has written many times,
Variance of observations = Variance of talent + Variance of luck
We can break down the luck further. In the case of Dan's measure, there's luck in the sense of, here's a ground ball that the shortstop will get to 80% of the time, but this time the ball just gets past him, and, over the season, just by that kind of luck, he winds up at 78.5% instead of 80%. Call that "binomial" luck.
Then, there's luck involved in misclassifying balls. Dan docks the shortstop for a percentage of all ground-ball singles to center. Some of those were actually playable, and some weren't. Dan doesn't know, and so sometimes a shortstop will be assigned too few chances, or too many. Call that "misclassification luck."
So for Dan's measure, we get
Var(observations) = var(talent) + var(binomial luck) + var(misclassification luck)
But now, for THT's observations, there's no misclassification luck – every ball is observed precisely, and assigned exactly to the proper fielder. So for THT,
Var(observations) = var(talent) + var(binomial luck)
Comparing the two, and remembering that variance is always positive, it seems that it's Dan's results that should have more variance, not THT's results.
(If you don't like the formulas, here's a non-mathematical explanation. It's a fact of life that what you observe has a wider spread than what caused the observations. For instance, if you repeatedly toss a coin 10 times, sometimes you'll get 3 heads, sometimes 5, sometimes 7, and so on. But what *caused* this distribution is that the coin always has a "talent" of 5 heads – but sometimes it gets lucky. The observations are 3, 5, 7, but the talent is narrower -- 5, 5, 5.
The same thing happens for fielders. Fielders may range from, say, 75% to 80%. But, just like a coin may land more than 50% heads, a fielder may "land" more than 80% of ground balls. Indeed, with so many fielders, it's almost certain that at least a few will overshoot their talent and break the 80% mark – and so some fielders will show more than 80%, even though none of them is really good enough to average more than 80%. So the observations may be 70%-85%, but the talent is narrower: 75%-80%.
Now, suppose you add even more luck. For each player, you take a random number of plays and reverse their status, converting makeable plays to nonmakeable, or nonmakeable ones to makeable. What happens? You get an even wider spread. Because it's likely that one of the players at the top will be boosted even higher by a positive random number. If your three top players are at 85, 86, and 87%, and you move the three of them randomly by a few points, you're likely going to boost one of them past the 87% mark, and the spread will be even wider. So the observations might now be 68%-87%, but the talent is narrower: still 75%-80%.
In general: the more independent sources of randomness, the wider the spread of observations.)
So what's going on? Is my logic wrong? Why are we seeing the reverse of what we expect? Any ideas? Because I'm stumped.