Wednesday, December 26, 2007

Why does a more precise fielding system give less precise results?

I was puzzled by something in Dan Fox's latest post on the SFR fielding method (which, as described in my previous post, is similar to the Sean Smith's "Total Zone" method). My puzzlement isn't about the method itself, but about how its results compare to the "plus/minus" system in "The Fielding Bible."

Dan got the "plus/minus" numbers from the 2008 Hardball Times (which I don't have yet – it's in the mail). What bothers me is that their numbers are so much more extreme than Dan's.

If you look at Dan's chart, his top three extremes (ignoring sign) are 62, 48, and 46. If you look at the THT extremes, though, they're bigger. The top three are 98, 81, and 68 plays; at 0.8 runs per play, that's 78, 65, and 54. The standard deviation of Dan's results is 28.6 runs; the SD of the Hardball Times' results is 33.5 runs.

The SDs aren't all *that* different – what bothers me is not the difference, but the fact that Dan's SD is smaller than THT's SD. It should be bigger.

That's because the more luck in the statistic, the larger the variance. As Tangotiger has written many times,

Variance of observations = Variance of talent + Variance of luck

We can break down the luck further. In the case of Dan's measure, there's luck in the sense of, here's a ground ball that the shortstop will get to 80% of the time, but this time the ball just gets past him, and, over the season, just by that kind of luck, he winds up at 78.5% instead of 80%. Call that "binomial" luck.

Then, there's luck involved in misclassifying balls. Dan docks the shortstop for a percentage of all ground-ball singles to center. Some of those were actually playable, and some weren't. Dan doesn't know, and so sometimes a shortstop will be assigned too few chances, or too many. Call that "misclassification luck."

So for Dan's measure, we get

Var(observations) = var(talent) + var(binomial luck) + var(misclassification luck)

But now, for THT's observations, there's no misclassification luck – every ball is observed precisely, and assigned exactly to the proper fielder. So for THT,

Var(observations) = var(talent) + var(binomial luck)

Comparing the two, and remembering that variance is always positive, it seems that it's Dan's results that should have more variance, not THT's results.

(If you don't like the formulas, here's a non-mathematical explanation. It's a fact of life that what you observe has a wider spread than what caused the observations. For instance, if you repeatedly toss a coin 10 times, sometimes you'll get 3 heads, sometimes 5, sometimes 7, and so on. But what *caused* this distribution is that the coin always has a "talent" of 5 heads – but sometimes it gets lucky. The observations are 3, 5, 7, but the talent is narrower -- 5, 5, 5.

The same thing happens for fielders. Fielders may range from, say, 75% to 80%. But, just like a coin may land more than 50% heads, a fielder may "land" more than 80% of ground balls. Indeed, with so many fielders, it's almost certain that at least a few will overshoot their talent and break the 80% mark – and so some fielders will show more than 80%, even though none of them is really good enough to average more than 80%. So the observations may be 70%-85%, but the talent is narrower: 75%-80%.

Now, suppose you add even more luck. For each player, you take a random number of plays and reverse their status, converting makeable plays to nonmakeable, or nonmakeable ones to makeable. What happens? You get an even wider spread. Because it's likely that one of the players at the top will be boosted even higher by a positive random number. If your three top players are at 85, 86, and 87%, and you move the three of them randomly by a few points, you're likely going to boost one of them past the 87% mark, and the spread will be even wider. So the observations might now be 68%-87%, but the talent is narrower: still 75%-80%.

In general: the more independent sources of randomness, the wider the spread of observations.)

So what's going on? Is my logic wrong? Why are we seeing the reverse of what we expect? Any ideas? Because I'm stumped.

Labels: , ,


At Wednesday, December 26, 2007 12:33:00 PM, Anonymous Anonymous said...

I can't think of any obvious reasons for the disparity. Dan's method might share some of the credit/blame for performance on line drives with the OF, which would potentially reduce variance. But my guess is that even if you looked only at GBs, the Plus/minus SD would still be larger.

One way to get a sense of the correct variance would be to look at the SD for team DER on GBs. That should be the "ceiling" on real talent variance, since it includes both the infielders' talent and the pitching staff impact (how hard/easy the GBs were to field).

At Wednesday, December 26, 2007 1:42:00 PM, Anonymous Anonymous said...

Phil: It occurs to me now that your run value of .8 runs is much too high for the IF. I'd guess the average run value of a GB single/error (some of which don't leave the IF and have less runner advancement value) is more like .60or .65, which would result in a lower SD for the THT/Dewan data (though not much smaller).

At Wednesday, December 26, 2007 1:59:00 PM, Anonymous Anonymous said...

IIRC, the infield single is worth some .06 runs less than an average single.

A groundball single is probably some .03 runs or so less than an overall single.

The gap is nowhere near what you are implying. I wouldn't make it any less than 0.70 runs for a GB single minus GB out.

At Wednesday, December 26, 2007 2:30:00 PM, Blogger Phil Birnbaum said...

For the record, a regression between the two columns gives 0.59 runs per out.

I agree with Tango that a GB single shouldn't be much less than an average single. Infield singles aren't that common, and aren't a lot of GB singles slow enough that runners can advance two bases?

At Wednesday, December 26, 2007 7:14:00 PM, Anonymous Anonymous said...

I had the out value wrong, so yes, it's more like .70 for infield hit/error. But we can skip all that and just work with plays made. The SD for plays is 36 in SFR and 39 for THT/Dewan. So THT is a bit larger, despite ostensibly controlling for difficulty of chances and park. Still, I'm not sure a difference of 3 plays in one season of data necessarily signals a problem.

At Wednesday, December 26, 2007 7:16:00 PM, Blogger Phil Birnbaum said...

Guy, how did you get 36 and 39?

At Wednesday, December 26, 2007 9:18:00 PM, Anonymous Anonymous said...

Phil: I just used the data from Dan's table, taking the difference between expected and actual runners.

At Wednesday, December 26, 2007 9:34:00 PM, Anonymous Anonymous said...

I wonder if the difference is mostly because plus/minus doesn't use a platoon adjustment. Thus for plus/minus there is "luck" in whether that precisely located ground ball a little to the left of straightaway SS was hit by a LHB or a RHB; in reality it was either an easy play or a hard play, but plus/minus will make it a blended "average" difficulty.

Retrosheet doesn't have precise location, but Dan and (I think) Sean make adjustments for batter handedness, so they are removing some luck that plus/minus leaves in.

As for the run value to use to convert plays made to runs, it sticks in my mind that I got a value of .77 or .78 a year ago for missed plays by infielders, counting outfield singles and extra base hits. 2007 may differ slightly, but .80 is a reasonable approximation. [.70 is reasonable specifically for infield singles (though I didn't follow why Guy started talking about those)]. There is an element of "luck" in the run value conversion as well, since a few teams may be outliers in terms of the trade-off they make in guarding the line, and regular plus/minus does not capture that. This probably is only a small part of the difference, but the retrosheet systems count extra base hits exactly and therefore directly "measure" the runs prevented tradeoff between taking away extra singles and allowing extra doubles.

At Wednesday, December 26, 2007 10:09:00 PM, Blogger Phil Birnbaum said...

Guy: Thanks! Interesting that the SD of plays made isn't just the SD of runs divided by .8 or whatever ...

Joe: the platooning point could be it, but that would assume that there's a difference between lefty/righty balls that Dewan's method isn't picking up. And they *are* supposedly considering speed and direction ...

At Wednesday, December 26, 2007 10:35:00 PM, Anonymous Anonymous said...

there is a difference which isn't picked up by Dewan's plus/minus - the starting position of the fielder. They move around mostly according to the pull tendency of the batter. Thus the "medium hit ground ball" just to the left of straightaway SS is not full information about its difficulty. For a right-handed batter, the SS might have been positioned a couple of steps away from where the ball went; for a left handed batter, maybe 6 steps away. And the lefthanded batter should on average get to first a couple tenths of a second quicker. All in all, a significantly tougher play with a left handed batter. Dewan's system "knows" more precisely where the ball went and how fast, but batter handedness is also important information about the real difficulty. The retrosheet systems use it and Dewan's plus/minus to my knowledge does not use it.

At Wednesday, December 26, 2007 10:39:00 PM, Blogger Phil Birnbaum said...

Joe: Okay, good point, I see what you mean now. I should do a little back-of-the-envelope calculation to see how much difference that might make...

At Wednesday, December 26, 2007 10:50:00 PM, Anonymous Anonymous said...

I'm surprised to learn that Plus/minus does not use batter handedness as a parameter. I agree with Joe that it can be an important factor. Joe: does Dewan include runner-on-first as variable? Can have a big impact on 1B ratings in particular.

Thinking about the need to hold runners on 1B, I wonder if some of the credit for Pujols' incredible numbers should actually go to Molina. Molina really shuts down the running game to an incredible degree -- not just in CS%, but few SBAs. That may allow Pujols to 'cheat' off the bag much more than other 1Bmen, because he can rely on Molina to prevent the SB (Molina's reputation alone keeps most runners from attempting a steal). As long as Pujols covers the bag often enough that the runner can't feel totally immune from a pickoff, he can probably play further from the bag than most other 1Bmen. Anyone watch enough StL games to have an opinion on this?

At Wednesday, December 26, 2007 11:06:00 PM, Anonymous Anonymous said...

"Interesting that the SD of plays made isn't just the SD of runs divided by .8 or whatever."

Phil: that's because Dan has the run value of extra runners allowed varying a bit by team. For example, WAS is -9 runs on -9 plays, while PHI is -9 runs on -15 plays. His average run value looks to be about .73.

At Thursday, December 27, 2007 12:14:00 AM, Anonymous Anonymous said...

Interesting theory on Pujols - should be able to test it to some degree with the retrosheet systems, by a variation of Tango's with-or-without-you system - checking Pujols' performance with different catchers!

On-line descriptions of the plus/minus system are here. The paper fielding bible mentioned that there was a "hit-and-run" adjustment for middle infielders (based on whether or not the runner was going). First base has a holding runner/not holding runner adjustment. It's possible the descriptions are incomplete, but I don't think I've seen batter handedness mentioned in any of the descriptions of plus/minus.

John Walsh published some nice graphs on ground ball out-rates (using hit locations) earlier this year.

At Thursday, December 27, 2007 11:16:00 PM, Blogger Phil Birnbaum said...

I'm liking the theory that the extra variance is caused by not adjusting for lefty/righty hitters.

So now, the question: does the higher variance mean that SFR is a better (more accurate) method than Dewan? After all, usually when an adjustment causes the SD to drop, it's because it's correcting for something real.

It seems strange that a simple lefty/righty correction is more important than detailed measurement of ball speed and direction, but maybe that's just the way it is.

At Friday, December 28, 2007 10:24:00 AM, Anonymous Anonymous said...

Phil: one other possible reason for the difference is LDs. Dan's method effectively shares "blame" for missed LDs (which is the large majority of all LDs) among three players -- the OF who picks it up and the two closest infielders. Plus/minus, at least in theory, assigns responsibility much more precisely, mainly or entirely to just one player. So Dan's approach would tend to reduce variance, but not in a way that increases accuracy.

That said, I don't know if that can explain much of the difference in variance.

At Friday, December 28, 2007 11:52:00 PM, Blogger Phil Birnbaum said...

That is a very good point. I was assuming that any increase in accuracy would REDUCE the variance -- for instance, park adjustments reduce variance; removing luck reduces variance; era adjustments reduce variance.

But, in this case, adjusting for the fielder INCREASES variance.

That's because all the other adjustments take something that was attributed to the fielder, and attribute it to something else. But this adjustment takes something that was attributed to something else (the other fielders) and attributes it to the actual fielder. I didn't realize that possibility.

Good catch! I think this is probably a big part of the answer.

At Sunday, January 13, 2008 2:37:00 AM, Blogger Dan Agonistes said...

Sorry I didn't see this thread earlier but in the latest revision of the system, and the one on which the chart with the comparison to Plus/Minus was made, I no longer consider line drives or fly balls that make it into outfield as mentioned here.

At Friday, April 10, 2009 4:22:00 AM, Anonymous Anonymous said...






Post a Comment

<< Home