Sunday, July 08, 2007

Charlie Pavitt: A long-winded rant concerning the evaluation of fielding

By Charlie Pavitt

I start with Branch Rickey’s famous quote from his (and Allan Roth’s) ground-breaking Life Magazine statistical-analytic article “Goodby to Some Old Baseball Ideas” (August 2, 1954); “there’s nothing anybody can do with fielding” (this and subsequent quotes all from page 83). Rickey/Roth realized that fielding averages were “utterly worthless as a yardstick” because they say nothing about fielding range. But their conclusion that “fielding could not be measured” is surprising given their insights into evaluating batting and pitching. There’s quite a bit we can do with fielding. But we’ve not gotten as far in this regard as we have with batting and pitching; and when I see many of the evaluation methods used by some of our otherwise-best analysts, I am disappointed. My goal in this blog is to rant long-windedly about how I think we ought to be thinking about fielding evaluation in the first place.

Let’s start with the basics. We make progress in evaluating an aspect of baseball when we can successfully break the aspect into its component skills, measure each skill, and then combine these measurements in a meaningful common metric. Take batting prowess. There are four major component skills; the ability to get base-hits, the ability to hit for power, the ability to coax base on balls, and the ability to steal bases without being caught. One factor common among the four that suggests a metric is that successful performance occurs when bases are gained without the loss of outs. Each can be measured fairly easily in that way, and the measurements combined, resulting in tools such as OPS (sans the steals) and total average, both of which work fairly well despite their simplicity. Another common metric is number of runs gained, leading to runs created and a large set of regression-based methods.

Defense as a whole works analogously; it is relatively simple to concoct measures of bases or runs given up relative to either outs gained or opportunity (e.g., innings played). The trick is to distinguish the pitching and fielding parts of the equation. Recent work starting with
Voros McCracken’s insight implies that pitchers are best evaluated through examination of events that fielders cannot influence. This means that pitching prowess has the following three components; the ability to hit the strike zone (as measured by walks and hit-by-pitches), the ability to miss bats (as measured by strikeouts), and the ability to keep batted balls in play (as measured by home runs allowed). Measures based only on these skills would follow; the Baseball Prospectus people have proposed just that (DIPS ERA) in their recent Baseball Between the Numbers book. (I should add that pitchers do have some influence on batted balls in play, which implies a fourth skill; but this influence is far less than we thought pre-McCracken, and at this time I would agree with those recommending that we ignore this influence for the time being.)

This leaves batted balls in play as the responsibility of the fielder, so that the evaluation of what fielders do with these is the issue at hand. Can we distinguish the component skills? (We need to exclude catcher defense here, which is a whole ‘nother matter.) Surely they include range and sure-handedness on batted balls and throwing ability. For middle infielders, one must consider adeptness as the middle man in a double play; for first basemen, the knack of picking up throws in the dirt. The trick lies in measuring each of them and then combining them into a common metric. This is a challenging project. Baseball differs from football, basketball, soccer, etc. in that it is an individual sport in a team context; i.e. its outcomes are primarily due to the pitcher versus hitter matchup. However, with fielding, particularly infielding, team coordination matters. As a whole it makes sense to credit fielders with the out when they successfully field a ball in their territory. But what do we do with the second out in a double play? Should the fielder get full credit, or the middleman, or should they split it? What do we do with assists from the outfield, especially when cut-off men are involved?

In order to make this rant particularly long-winded, I shall continue with a bit of history. Back in the March 1976 issue of "Baseball Digest", Bill James proposed the seemingly-novel idea that we measure infielding by the number of putouts and assists the infielder makes per game (in so doing reinventing, a hundred years later, an identical but soon-forgotten measure credited to Al Wright). Range factor was clearly an advance over fielding percentage, but it was laced with problems. First, it intermixes different skills without previous reflection concerning each. Infielders amass putouts and assists both through fielding batted balls and through participation in double plays and force outs, but these are reflective of different skills, only the first of which is directly relevant to range. I did a couple of studies in which I attempted to solve this problem by measuring infielding purely by assists, under the assumption that they were a purer measure of range than putouts. Bill published both, respectively in issues 24 (June 1986) and 31 (August 1987) of the "Baseball Analyst", although I believe that he disagreed with my method given the display of range that can be shown when infielders catch pop-ups far from their position. And I knew full well that many assists are racked up as a double-play middleman. Second, and these were the issues that my two studies were really about, range factor ignored the fact that pitching staffs with high strikeout totals limit infielder opportunities to field balls; pitching staffs with a high proportion of innings taken by lefthanded pitchers will face a preponderance of righthanded batters, leading to proportionally more grounders to the third baseman and shortstop and fewer to the second and first baseman, when compared to pitching staffs with few lefty innings. I presented this material at a SABR convention near Washington D.C., if I remember correctly in 1986; during my presentation, an audience member noted that pitching staff groundball vs. fly ball tendencies have analogous implications. Interestingly enough, John Dewan assumed the pitching-handedness bias and presented fielding measures adjusted for this problem at the same SABR convention, for Dewan beginning a concern with this issue that has continued to this time.

It was obvious that if we wanted to measure fielding plays made on batted balls independently of participation in double plays and free of biases due to pitching staff tendencies, we would have to go beyond the standard statistical measures of fielding and use play-by-play data to measure the proportion of balls hit into the portion of the ballpark for which each position is responsible that are successfully fielded. Fortunately, at about this time Project Scoresheet was beginning to supply the needed data, and analysts started using it for this purpose. The earliest effort of which I am aware was Pete DeCoursey’s work on what he called defensive average, first published in the March 1989 issue of a (sadly) short-lived publication called the "Philadelphia Baseball File." I believe others among “amateur” statistical analysts continued in this vein, and would be happy to hear from readers who have information on anybody doing this work during the 1990s. As for the “professionals,” and probably thanks to Dewan, the STATS annual Player Profiles books during the 1990s included a measure called zone rating, which unfortunately gave credit for two plays for fielded balls turned into double plays, in so doing conflating two different skills.

What are the lessons I think we should take home from all this? Let’s start with two do nots. First, do not use the standard indices, because no matter how well they are massaged they do not provide valid information. An example of this is the Defense ratings appearing in the Baseball Prospectus group’s annual. They are not always clear about their methods; from a description in "Baseball Between The Numbers" (page 97), it seems that Clay Davenport’s version of fielding runs begins with the standard measures and then adjusts them for park factor and the pitching staff tendencies mentioned above. As far as I can tell, they do not take the double play problem into account, but otherwise these adjustments are right-headed. But the method doesn’t seem to work. If you glance through their books, their Defense ratings for players differ wildly from year to year, at least by eye-ball analysis far more than random factors would allow. And they don’t trust their own numbers, regularly making verbal comments clearly inconsistent with their own calculations. For two examples from the 2007 book: on page 418 they ask whether Chris Duncan is “the single worst defensive outfielder in modern memory,” but his 2006 ratings are slightly above average (+1) in both left and right field; on page 381, they wittily call Pat Burrell “the Zeno’s Paradox Outfielder, in that no matter how close he seems to be to catching the ball he’s only halfway there,” but his 2006 rating (-2) isn’t all that bad. An interesting case, of course, is, the normally-maligned Derek Jeter. According to Davenport’s numbers, after years of futility (-12 in 1999, -22 in 2000, -17 in 2001, -19 in 2002, -15 in 2003) Jeter improved to -4 in 2004 and became a good shortstop the past two seasons (+12 in 2005, +7 in 2006), and this change is the main theme in Chapter 3-1 of "Baseball Between The Numbers" as a result of having Alex Rodriguez next to him. As I will describe below, we have good evidence that Jeter’s defense has not improved, and, while I like most of what BP does, I don’t trust their fielding numbers for a second.

Second, if you are going to combine indices for the different skills involved in fielding, do not do so arbitrarily. The example here is Bill James. I admire what he attempted to do in his Win Shares book, but much of it is based on what seem to be arbitrary decisions that make no sense to me. To begin, pitching is given 67.5 percent of the credit for defense and fielding the remaining 32.5 percent; the reader is never told where these numbers come from. The division of this 32.5 across positions is performed according to criteria that the author himself admits to be arbitrary. Ratings for each position are made in the context of their different skills; here is the method for infielders:


Second base - forty points double plays, thirty points assists, twenty points error percentage, and ten points putouts.
Third base - fifty points assists, thirty points errors, ten points sacrifice hits allows, and ten points double plays.
Shortstops - forty points assists, thirty points double plays, twenty points error percentage, and ten percent putouts.

There is no indication of where these numbers come from: why double plays are the most important part of second base play, why putouts are irrelevant to third base and so low for the other positions; could this be a late recognition that I was correct more than fifteen years before the book was written about removing putouts from range factor? Unless and until we get a convincing rationale for these proportions, as with the BP work I don’t trust any of Bill’s ratings for a second.

What would I like to see? First and foremost, I would like to see all measurements of range and sure-handedness based on play-by-play data. Dewan has continued work in this regard with his Baseball Info Solutions; his book The Fielding Bible is a gold mine of valuable data on defense. I might add that Dewan’s work makes plain Jeter’s continued defensive shortcomings; in 2005, he ranked 31st in Dewan’s metric among 32 rated shortstops. David Pinto’s Probabilistic Model of Range and Mitchel Lichtman’s Ultimate Zone Rating are basically identical with Dewan’s work in this regard.

But more generally, I think it is possible to come up with a fielding metric that does a fairly good job of evaluating most aspects of fielding (catchers excluded) in the context of either bases or runs. Beginning with range and sure-handedness, turning a measure such as Dewan’s or Pinto’s into either a base or run measure should be easy; Lichtman already calculates Ultimate Zone Rating in run metrics. As for the other aspects of fielding: in Volume 10 Number 3 of SABR’s Baseball by the Numbers, Clem Comly proposed a nice method (Average Arm Equivalent Method, or ARM) for evaluating the number of runs outfielders either save or cost their team based on their number of assists relative to their number of opportunities to throw out baserunners. As ARM is already in a run metric, it would merely have to be summed with Ultimate Zone Rating for outfielders. We do need to come up with a good method for dividing up responsibility for the second out on double plays; are there any out there of which I am unaware? I know that Pinto has recently put some attention to the double-play problem. I admit that the first baseman’s ability to turn errant throws into outs gets shortchanged here; I’m not sure whether any of the currently available play-by-play data provides enough detail for us to enter that into the equation. It may not be perfect, but contra Rickey/Roth there’s quite a bit we can do with fielding.





Labels: , ,

6 Comments:

At Monday, July 09, 2007 11:46:00 AM, Blogger Tangotiger said...

UZR already has an arm component for DP and outfielders.

UZR is not "identical" to Dewan, but you can say they are similar. UZR does far more adjustments than Dewan.

My preferred framework for a fielding system is described here.

 
At Monday, July 09, 2007 11:47:00 AM, Blogger Tangotiger said...

I should also note that John Walsh in the THT 2007 annual does the best arm component around.

MGL's arm component is a somewhat lighter version of this.

 
At Wednesday, July 11, 2007 8:17:00 PM, Anonymous JoeArthur said...

"Dewan’s work makes plain Jeter’s continued defensive shortcomings... David Pinto’s Probabilistic Model of Range and Mitchel Lichtman’s Ultimate Zone Rating are basically identical with Dewan’s work in this regard." [emphasis added]

I took Charlie to be saying these systems agreed with Dewan's in their negative evaluation of Jeter, not that Charlie was saying these systems in general were "identical" to Dewan's.

I don't know (directly) the intellectual history of the late 80s/early 90s fielding systems, but the "Great American Baseball Stat Book" based on Project Scoresheet data and published after the 1987 season looks like a step on the way to Pete deCoursay's work, and apparently was published a year before his. That book contained a system credited to Gary Gillette (and Dave Nichols). It presented a form of adjusted range factor which adjusted for batter handedness and total balls in play while the fielder was in the field. I don't know deCoursay's work from its original publication in Philadelphia Baseball File, but rather from an article he contributed to Brock Hanke's 1990 Baseball Sabermetric after the '89 season. In that article he credits Sherri and David Nichols as co-creators. The advance over the Gillette/Nichols system was to measure opportunity less imprecisely by making use of the fairly general hit location data available in the '88 Project Scoresheet data - instead of any ball in play on the field being a opportunity, balls hit to or through the area near the fielder were counted. Performance on ground and "air" balls were subtotalled for infielders, but no adjustment was made in weighting them for the overall "defensive average." So opportunities were accounted for rather directly but no direct effort was made to adjust for the difficulty of those opportunities. At least in the Baseball Sabermetric article, further information on extra base hits allowed and double plays initiated was also presented and discussed, but was not part of the defensive average itself.

In the early '90s in his Baseball Player and Team Ratings, Mike Gimbel used STATS data and a linear weights-style system for hits and errors to construct a plus/minus defensive run value. He also attempted park adjustments. Because STATS measured hit location in a more granular way than Project Scoresheet, Gimbel had more precise zones, reducing some of the uncertainty about a fielder's real opportunities. In the infield, Gimbel counted only ground balls, finessing the problem of accounting for the variant difficulty of opportunity from different hit types.

If I recall properly, MGL has stated that he was independently developing his system about this same time, using the same granular data from STATS as Gimbel, and that "defensive average" was an inspiration. His advance over Gimbel was to exploit the granular data more fully (measuring difficulty in "subzones"), more elaborate park adjustment, and adjustments to recognize of the impact of contextual factors on fielder position and therefore difficulty (base-out situation and batter-handedness).

Gimbel is mainly infamous for his braggadocio as Dan Duquette's statistical consultant with the Red Sox in the mid-90s, but as far as I know his fielding system was the (published) state of the art in the early 90s. Unfortunately for him, his books were published by a small press in the pre-internet age, and his place in fielding-system-history seems to be forgotten.

 
At Tuesday, July 17, 2007 3:14:00 PM, Blogger Tangotiger said...

I just want to make a general point that all the PBP fielding systems are straightforward (as they should be). It is no surprise that the "creators" are the first people to have access to the data.

In short, it requires fairly little inspiration to create a good fielding system. The prerequisites are:
- being a baseball fan
- having as granular data as possible
- time to parse through the data

This puts you 90% of the way there.

 
At Thursday, January 01, 2009 6:04:00 AM, Blogger sexy said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣用品,情趣用品,情趣,情趣,A片,A片,A片,A片,A片,A片,情趣用品,A片,情趣用品,A片,情趣用品,a片,情趣用品

A片,A片,AV女優,色情,成人,做愛,情色,AIO,視訊聊天室,SEX,聊天室,自拍,AV,情色,成人,情色,aio,sex,成人,情色

免費A片,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

色情A片,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色視訊,免費成人影片,視訊交友,視訊聊天,言情小說,愛情小說,AV片,A漫,avdvd,情色論壇,視訊美女,AV成人網,情色文學,成人交友,成人電影,成人貼圖,成人小說,成人文章,成人圖片區,成人遊戲,愛情公寓,情色貼圖,成人論壇


免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城

視訊聊天室,聊天室,視訊,,情色視訊,視訊交友,視訊交友90739,免費視訊,免費視訊聊天,視訊聊天,UT聊天室,聊天室,美女視訊,視訊交友網,豆豆聊天室,A片,尋夢園聊天室,色情聊天室,聊天室尋夢園,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,080中部人聊天室,080聊天室,美女交友,辣妹視訊

 
At Monday, April 20, 2009 4:03:00 AM, Blogger cvxv said...

看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,日本料理,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,場地,住宿,訂房,HOTEL,飯店,造型系列,學位,SEO,婚宴,捷運,學區,美髮,儀器,髮型,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,場地,結婚,場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,場地,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,關鍵字排名,網路行銷,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,SEO,廣告,關鍵字,關鍵字排名,網路行銷,網頁設計,網站設計,網站排名,搜尋引擎,網路廣告,EMBA,MBA,PMP,在職進修,專案管理,出國留學,EMBA,MBA,PMP,在職進修,專案管理,出國留學,EMBA,MBA,PMP,在職進修,專案管理,出國留學,婚宴,婚宴,婚宴,婚宴,漢高資訊,漢高資訊,比利時,比利時聯合商學院,宜蘭民宿,台東民宿,澎湖民宿,墾丁民宿,花蓮民宿,SEO,找工作,汽車旅館,阿里山,日月潭,阿里山民宿,東森購物,momo購物台,pc home購物,購物漢高資訊,漢高資訊,在職進修,漢高資訊,在職進修,住宿,住宿,整形,造型,室內設計,室內設計,漢高資訊,在職進修,漢高資訊,在職進修,住宿,美容,室內設計,在職進修,羅志祥,周杰倫,五月天,住宿,住宿,整形,整形,室內設計,室內設計,比利時聯合商學院,在職進修,比利時聯合商學院,在職進修,漢高資訊,找工作,找工作,找工作,找工作,找工作,蔡依林,林志玲

 

Post a Comment

Links to this post:

Create a Link

<< Home