## Friday, February 04, 2011

### "Scorecasting" on players gunning for .300

A few months ago, I wrote about a study by two psychology researchers, Devin Pope and Uri Simonsohn. The study found that, for players hitting .299 in their last at-bat of the season, they wound up hitting well over .400 in that last at-bat. The authors concluded that it's because .299 hitters really want to get to .300, and, therefore, they try extra hard (and succeed).

But, really, that isn't the case. It's really just an illusion caused by selective sampling. When a player hitting .299 gets a hit to push him over .300, he is much more likely to be taken out (or held out) of the lineup, to preserve the .300. Therefore, it's not that they're more likely to get a hit in their last at-bat -- it's that their last at-bat is more likely to be one that results in a hit.

(For an analogy: when a game ends with less than 3 outs, the last batter probably hits well over .500 (since the winning run must have scored on the play). But that's not because the player rises to the situation; it's because, as it were, the situation rises to the player. When he gets a hit, he's the last batter because the game ends. When he doesn't, he's not the last batter.)

Since the original study and article, the authors have modified their paper a bit, saying that the batting average effect is "likely to be at least partially explained" by selective sampling. However, the data given in the previous posts does suggest that almost the *entire* effect is explained by selective sampling. (PDFs: Old paper; new paper.)

There is one part of the study's findings that's probably partially real, and that's the issue of walks. None of the .299 hitters walked in their last at-bat. That's partially selective sampling -- if they walked, they're still at .299, and stayed in the game, so it's not their last at-bat -- but probably partially real, in that .299 hitters were more likely to swing away.

(My results are in previous posts here and here.)

------

The study is given featured status in "Scorecasting," in the chapter on round numbers. However, while the authors of the original paper mention the selective sampling issue, the authors of "Scorecasting" do not:

"What's more surprising is that when these .299 hitters swing away, they are remarkably successful. According to Pope and Simonsohn, in that final at-bat of the season, .299 hitters have hit almost .430. ... (Why, you might ask, don't *all* batters employ the same strategy of swinging wildly? ... if every batter swung away liberally throughout the season, pitchers would probably adjust accordingly and change their strategy to throw nothing but unhittable junk.) ...

"Another way to achieve a season-ending average of .300 is to hit the goal and then preserve it. Sure enough, players hitting .300 on the season's last day are much more likely to take the day off than are players hitting .299."

"Scorecasting" treats these two paragraphs as two separate effects. In reality, the second causes the first.

You can read an excerpt -- almost the entire thing, actually -- at Deadspin, here.

------

One thing that interested me in the chapter was this:

"But no benchmark is more sacred than hitting .300 in a season. It's the line of demarcation between All-Stars and also-rans. It's often the first statistic cited when making a case for or against a position player in arbitration. Not surprisingly, it carries huge financial value. By our calculations, the difference between two otherwise comparable players, one hitting .299 and the other .300, can be as high as two percent of salary, or, given the average major league salary, \$130,000."

The authors don't say how they calculated that, but it seems reasonable. A free-agent win is worth \$4.5 million, according to Tom Tango and others. That means a run is worth \$450,000. One point of batting average, in 500 AB, is turning half an out into half a hit. Assuming the average hit is worth about 0.6 runs and an out is worth negative 0.25 runs, that means the single point of batting average is worth a bit over 0.4 runs. That's close to \$200,000.

That figure is higher than the authors' figure of \$130,000. The difference is probably just that the authors used the average MLB salary, which includes players not yet free agents (arbs and slaves). However, they imply that the difference between .299 and .300 is worth more than other one-point differences. That might be true, but it would be nice to know how they figured it out and what they found.

------

Finally, two bloggers weigh in. Tom Scocca, at Slate, criticizes the original study. Then, Christopher Shea, at the Wall Street Journal, criticizes Scocca.

#### 4 Comments:

At Saturday, February 19, 2011 8:15:00 AM,  White Boner said...

(I linked to your site after it was mentioned in this interview with a 'Scorecasting' author by Jonah Keri.)

There are just two quick points I want to make.

First, I think that it should absolutely be mentioned that the extensive analysis of why there are more .299 hitters than .300 hitters was actually done (more convincingly and extensively, in my opinion) by Bill James.

In the '2008 Bill James Gold Mine', James wrote about this in an article entitled, "The Targeting Phenomenon." The study extended well beyond .300 hitters and found that RBI numbers were particularly targeted by hitters.

(It's kinda funny how Bill James does this as a throw-away, whereas when other guys do it, they need the permission of some scientific journal and the New York Times thinks it's just terrific.)

The second point I wanted to make was a criticism of your comment accepting the idea that an MLB hit should be worth \$200,000. In the first place, we know that not all hits are created equal - a hit by a 40/40 hitter has more value than an Ichiro Suzuki single. In the second place, there's obviously a strong element of luck or transience with hits and batting average, and batting average doesn't repeat with great accuracy. In the third place, I believe that the most accurate way to judge a hit would be to somehow relate the value of the PA to the value of a replacement level hitter.

Finally, the biggest issue I have with the notion that a hit is worth \$200,000 is the fact that, were that true, then the average MLB salary would be \$283,626,667!

This is because there were 42,544 hits in MLB last season. If each one is worth \$200k (multiply them), and all 30 teams paid equally for them:

(42,544 x \$200,000)/30 = \$283,626,667.

(This is assuming that pitchers were paid the same amount as hitters, of course.)

But I'm happy I came across your blog, and I'm happy that you're crititiquing 'Scorecasting'. I haven't read the book, but listening to Jonah Keri's interview with the author, it really seems to lapse into tons of psychological hypotheses (which I think is lazy and are unprovable) and it just seems to have that aura of shallow arrogance that made 'Moneyball' (and a hundred copycats) kinda disingenuous. That's my opinion.

At Saturday, February 19, 2011 9:53:00 AM,  Phil Birnbaum said...

Thanks, David ... I wasn't aware of the podcast.

Right, Bill James did it first ... the difference is that he just looked at whether targeting happened. These other guys looked at the actual *performance* as the target approached.

The value of a hit is not calculated from zero hits, but, rather, from "replacement level," which is what you can get out of a minor-leaguer who would earn league-minimum salary.

That replacement level is, IIRC, two wins, or 20 runs, below average for a full-time player (10 runs for a half-time player, and so on).

If you retry your calculation, at \$200,000 per half-hit (not per hit, per half-hit), but only for all hits above average, plus maximum 25 hits below average, you should come closer to the average free-agent salary.

(That \$200,000 also includes one less half-out.)

If you want to be even more accurate, use runs instead of hits. Convert to runs using Runs Created, and then use \$500K per run. Remember to count only runs that are over replacement value.

At Saturday, February 19, 2011 11:02:00 AM,  White Boner said...

Phil,

I definitely think that what's most significant about the "More .300 hitters than .299 hitters" concept is the concept itself.

Because of this, I think that it's safe to say that Bill James's article is the pioneer (as far as I know, anyway). The specifics and smaller distinctions of the more-hyped peer reviewed article in 'Scorecasting' don't trump Bill James's work, definitely not in my opinion. (In fact, I'll take Bill James's article over what I've read about 'Scorecasting' at DeadSpin and the NY Times any day of the week - James's work is much more impressive, in my opinion.)

Separately, I think that lots of the details about the dollar value of a hit is a little bit over my head (or at least it definitely requires more concentration and work than I can put in), and so I'll just kinda defer on that dialogue. But I will say that, in reading your article, the idea that I took from it is that each ML hit is worth \$200k. But I'll definitely concede that I might've misinterpreted your point.

Thanks for the response.

At Saturday, February 19, 2011 11:22:00 AM,  Phil Birnbaum said...

David,

Search this blog for "targeting" ... I think I have a couple of posts on the Bill James article and 20-win pitchers.