Wednesday, February 25, 2009

Can the fans identify clutch hitters?

Tom Tango has released the results of his "Great Clutch Project."

Last year, Tom asked his readers to come up with a list of the 30 players, one per team, who they would most like to see up at bat in a clutch situation. Often, the "fans" (as Tom called them) just picked the best hitter on the team; but, sometimes, they came up with a different player, one whom they thought would raise his game in important situations. Presumably, when they picked Derek Jeter (who had an .840 OPS in 2007) over Alex Rodriguez (1.067), they thought his clutchness would compensate for the 227 point difference in overall OPS.

In this particular case, it didn't. Jeter OPSed .771 in the clutch last year; A-Rod came in at .965. So Rodriguez was still the superior hitter in clutch situations. But you can perhaps give the fans credit in that the gap did, in fact, narrow a little; instead of 227 point difference, it's only 194 points. Even if you consider that clutch numbers are lower overall (probably because the opposition puts in its best pitchers in those situations), that's still a narrowing.

When Tango added up the numbers for all the fans' choices that weren't actually the best player on the team, it turns out they were worse overall by 21 points of "wOBA" (that was 11 points worse of OBP and 46 points worse of SLG). But, in the clutch, they were worse by only 12 points.

So you have to admit the fans *did* wind up successfully identifying clutch players, to the tune of 9 points of wOBA. That's still much less than the platoon difference, which is 20 points of wOBA. So Tango writes,

"Let's let this clutch debate end today (please?), and simply agree that: a) yes, clutch exists, b) yes, fans can perceive clutch players, but c) the impact of clutch players is limited to less than the platoon advantage. "

Tango is being a bit sarcastic here. As a skeptic, he says he's willing to offer to admit that clutch hitting exists in exchange for believers admitting that the effect is very small. But fans who passionately believe in clutch hitting are unlikely to accept that the advantage could possibly be only a few points, and are unlikely to take him up on his implicit offer.

From my standpoint, I don't think the results are enough to change my views on clutch. First, as Tango notes, the result is less than one SD above random, which isn't much. Secondly, the fans chose contact hitters as their clutch champions, while the best hitters overall tended to be contact power hitters. I see no reason why it can't just be that these two types of hitters have different ways of adjusting when the game is on the line, especially considering that these situations are usually when singles are more valuable relative to home runs. (More discussion of this point at Tango's blog here.)

But, of course, I agree with Tango that half a platoon advantage isn't really enough to worry about. It's certainly not enough to prefer Jeter over A-Rod, and there's a good chance it's just a random effect anyway.

So this study won't affect my decision on whether to offer my clutch bet again this year.

Labels: ,

Sunday, February 15, 2009

Shane Battier as the NBA's answer to "Moneyball"

I'm not completely sure what to make of this long Michael Lewis article extolling the Houston Rockets and their forward Shane Battier.

Lewis's treatment of Battier reminds me a lot of his treatment of Billy Beane in Moneyball. The idea is that Battier excels at things that aren't counted in the box score, and that makes affordable and underrated.

How so? Mostly on defense. Battier is said to cover the NBA's superstars exceptionally well. The evidence is mainly hearsay – the Rockets argue that they have a version of a "plus/minus" stat, the kind that figures out how the Rockets do when Battier is on the floor, and compares it to how the Rockets do when he's on the bench.

The stat is not a new one, and from what I've read, there are obvious problems with it, problems that Lewis acknowledges. Specifically, how the team does when a player is on depends on who he's playing with. You can control for that, but then you might wind up with insufficient data. For instance, if player A plays with B 90% of the time, then you wind up with only maybe three minutes per game when A plays without B. That makes the comparison difficult, because, first, you only have three minutes, and, second, you also have to take into account the quality of player C, who replaced player B.

There's nothing in the article on how that problem was solved, except this:

"[Rockets GM Daryl] Morey says that he and his staff can adjust for these potential distortions — though he is coy about how they do it — and render plus-minus a useful measure of a player’s effect on a basketball game."

Morey says that over his career, Battier is a +6, which means that, per game, when he's on the court, his team will score six more points than the opposition. I'm not sure if that's per 48 minutes, or per 33 minutes (Battier's average).

In any case, if you figure that Battier alone is worth 6 points per game, then, over a season, that's 492 points. At 30 points per game, which is David Berri's estimate in "The Wages of Wins," you get 16.4 wins. (Morey says the effect is larger, that +6 "is the difference between 41 wins and 60 wins." That works out to 26 points per win.)

How does Battier do it? According to Lewis, the Rockets have figured out players' strengths and weaknesses, and Battier tries to defend in such a way that the opposition is forced to do things they're weak at. For Kobe Bryant:

"When he drives to the basket, he is exactly as likely to go to his left as to his right, but when he goes to his left, he is less effective. When he shoots directly after receiving a pass, he is more efficient than when he shoots after dribbling. He’s deadly if he gets into the lane and also if he gets to the baseline; between the two, less so."

So what happens is that Shane Battier gets all this data before the game – he's the only player the Rockets give it to – and he tries to force Kobe into going to his left instead of his right.

"The ideal outcome, from the Rockets’ statistical point of view, is for Bryant to dribble left and pull up for an 18-foot jump shot; force that to happen often enough and you have to be satisfied with your night. “If he has 40 points on 40 shots, I can live with that,” Battier says. “My job is not to keep him from scoring points but to make him as inefficient as possible.” The court doesn’t have little squares all over it to tell him what percentage Bryant is likely to shoot from any given spot, but it might as well."

The effect, according to the article, is that when Battier guards Kobe Bryant, he does it so well that Kobe is rendered a below-average player.

Battier is also said to be "abnormally unselfish," and exceptionally good at playing the intangibles. "Instead of grabbing uncertainly for a rebound ... Battier would tip the ball more certainly to a teammate." "Guarding a lesser rebounder, Battier would, when the ball was in the air, leave his own man and block out the other team’s best rebounder." "He blocked the ball when Bryant was taking it from his waist to his chin, for instance, rather than when it was far higher and Bryant was in the act of shooting." "His whole thing is to stay in front of guys and try to block the player’s vision when he shoots."

Anyway, as I said, I'm a bit skeptical, still. I accept that Battier must be exceptionally good at defense, since (a) he plays 33 minutes a game and doesn't have very much in the way of traditional offensive statistics; (b) the Rockets have watched him and studied him and think he's great; and (c) his teams have done well. Still, from a scientific standpoint, the article is mostly anecdote and hearsay.

It shouldn't be all that hard to confirm the article's thesis and measure the size of the effect. If Kobe is good from one place but worse from another, that can be figured out by watching games and counting. If Battier holds him to those low-percentage shots when covering him, that can be counted too. And at the most fundamental level, can't you see what Kobe (and the other players) do when covered by Battier, and compare to what they do against the Rockets when Battier's on the bench? Something is better than nothing.

It's not really that I don't believe the Rockets. It's just that +6 points a game -- when it's acknowledged that Battier isn't all that great on offense – seems pretty high to me, and my instinct is to ask for more evidence.

Oh, and one more question for readers who actually know something about basketball (which I really don't). Assuming that everything in the article is correct, how much of Battier's value is due to his athletic skill? That is, suppose you took a league-average player and trained him to try to handle Kobe Bryant the same way that Battier does. Could he do it almost as well, or are Battier's instincts so good that he's exceptional in this regard?

(Hat tip: The Sports Economist)

Labels: , ,

Monday, February 09, 2009

Charlie Pavitt reviews a hitting streak study

I [Charlie Pavitt] am writing this in response to Trent McCotter’s piece on hitting streaks from the 2008 Baseball Research Journal. I want to begin by commending Trent on this fine piece of work. In short, a series of Monte Carlo tests revealed that the number of actual hitting streaks of lengths beginning with 5 games and ending with 35 games or more between 1957 and 2006 was, in each case, noticeably greater than what would have been expected by chance. It is always good to see evidence inconsistent with our “received wisdom.” What I have to say here in no way attempts to contradict his research findings. My problem is with his attempt to explain them.

Trent first proposed three “common-sense” explanations for what he found. The first was that a batter might face relatively poor pitching for a significant stretch of time, increasing the odds of a long streak. But, in his words (page 64), “the problem with this explanation is that it’s too short-sided; you can’t face bad pitching for too long without it noticeably increasing your numbers, plus you can’t play twenty games in a row against bad pitching staffs, which is what would be required to put together a long streak.” He then goes on (page 65) “The same reasoning is why playing at a hitter-friendly stadium doesn’t seem to work either, since these effects don’t continue for the necessary several weeks in a row.” His third “common-sense explanation” is that, as hitting overall is thought to be better during the warm months, hitting streaks may be more common than expected during June through August. This is because, and this is critical (page 65), “hitting streaks are exponential…a player who hits .300 for two months will be less likely to have a hitting streak than a player who hits .200 one month and .400 the next...[because]…hitting streaks tend to highly favor batters who are hitting very well, even if it’s just for a short period.” This is absolutely correct. Unlike the first two proposed explanations, in this case Trent looked for relevant evidence, claiming that he looked for more streaks in June, July or August and found no more than in May. Trent, how about April and September?

Anyway, rejecting all three of these, Trent then proposed two possible psychological explanations. The first is that hitters aware of a streak intentionally change their approach to go for more singles, particularly when the streak gets long; and he has evidence that longer streaks occur less randomly than shorter ones, which would occur under this assumption (players would more likely think about keeping their streak going when it was long ongoing). The second is that hot hands really exist, and his claimed evidence is that taking games out of his random sample in which the player does not start increases the number of predicted hitting streaks, bringing it more in line with the number that actually occurred. Makes sense; a hitting streak is easier to maintain the more at bats one has in a game. He proposes that this could reflect real life because managers would start a player proportionally more often when he was hitting well. True, but we should keep in mind that the same statistical effect for starting games would occur whether there is a hot hand or not. In other words, I don’t think his evidence is very telling.

I want to be very clear here about my position on this issue. I have absolutely no problem with the suggestion that players’ performance is impacted by psychological factors; I don’t see how they aren’t. My problem is with the way in which those suggestions are treated. If we are serious about sabermetrics as a science, then we have to meet the standards of scientific explanation. As esteemed philosopher Karl Popper pointed out in his now-classic 1934 book "The Logic of Scientific Discovery," if a proposed explanation for observations is impossible to disconfirm, then we can’t take it seriously as scientific explanation. This is my problem with Trent’s treatment. Let us suppose that rather than finding more hitting streaks than chance would allow, Trent had found fewer. He could then say that the reason for this is that batters crumble under the stress of thinking about the streak and perform worse than they would normally. If Trent found no difference, he could then say that batters are psychologically unaffected by their circumstance. The point is that this sort of attempted explanation can be used to explain anything, and given our present store of knowledge about player psychology they are impossible to evaluate. Again, Trent’s proposals may be correct, but we can’t judge them, so we can’t take them as seriously as Trent appears to.

In contrast, the first three proposed explanations can be disconfirmed, so we can take them more seriously. Trent claims to have disconfirmed the third, but we need to know about April and September. But the real issue I have is with his dismissal of the first two, because he did not apply the logic in their case that he correctly applied for his “hot weather” proposal. Let me begin with the first. A batter does not have to face a bad pitching staff in consecutive games for his odds of a hitting streak to increase. Let us suppose that a batter faces worse pitching than average during only 10 of 30 games in May and makes up for it by facing worse pitching than average during 20 of 30 games in June. We use the same exact logic that Trent used correctly for the “hot weather” proposal; his odds of having a batting streak, which would occur during June, would be greater than another batter that faced worse pitching than average during 15 games in May and 15 games in June. The same explanation goes for hitter-friendly and hitter-unfriendly ballparks, and is strengthened in this case because of well-supported known differences in ballpark effects. If a player’s home field was hitter-friendly and, during a stretch of time, many of his road games were in hitters’ parks, he could easily have 20 or more games in this context in a given month.

I have no idea whether either of these two explanations for Trent’s findings is correct. But the difference between these and his psychological proposals is that we could test these two and not those he favors. Given the importance of Trent’s original findings, I would obviously like to see that happen. And I would very much like it if we remain very careful about not taking our psychological speculations too seriously.

Labels: , ,

Tuesday, February 03, 2009

A neural-net Hall of Fame prediction method

"Predicting Baseball Hall of Fame Membership using a Radial Basis Function Network," by Lloyd Smith and James Downey

This JQAS article, from the most recent issue, is a new system to empirically predict who is and who is not in the Hall of Fame. But it's not a series of formulas -- it's an algorithm called a "radial basis function network," which is a type of neural net. I don't know much about this kind of thing, but it's called a "machine learning approach" because the algorithm figures out the algorithm, as it were.

The advantage, it seems to me, is that you don't have to figure out an algorithm yourself. But the disadvantage is that you have no idea what the "real" qualifications for the HOF are -- the neural net spits out a probability for each player, but you have no idea were that came from. As the authors note,

"A disadvantage of the approach described here is that the neural network model is opaque -- it is impossible to understand, with any degree of confidence, why the model fails to classify a player such as Lou Brock as a Hall of Fame member."

That, perhaps, is an unfortunate choice of example. Obviously, Brock is in the Hall of Fame because of his stolen bases. But the authors didn't feed steals into the model. Indeed, the words "stolen base" don't appear anywhere in the paper!

What *does* the model include? For pitchers, it considers: wins, saves, ERA, winning percentage, win shares, and number of times selected to the All Star Game. And it seems to do a reasonable job distinguishing HOFers from non. For pitchers retiring between 1950 and 2002, it makes only six errors -- it mistakenly calls Billy Pierce, Lee Smith, and John Wetteland Hall of Famers; but omits Fergie Jenkins, Hoyt Wilhelm, and Dennis Eckersley.

For hitters, the model includes: hits, HR, OPS, WS, and again All-Star selections. This time the algorithm misidentifies 13 players (Rice was listed as an error, but now is not). The incorrect selections are: McGwire, Dawson, Garvey, Baines, Santo, and Parker. The incorrect omissions are Brock, Appling, Yount, Kiner, Boudreau, Campanella, and Jackie Robinson.

Many of the errors are understandable; McGwire, Campanella, and Robinson, for instance, whose status is heavily influenced by factors other than their statistics. But a couple of the mistakes arise from the choice of data; Brock, of course, but also Robin Yount, who winds up misclassified because he had only three all-star selections -- by far the lowest of any HOFer in the 1950-2002 era. (The next lowest was 6, by Willie McCovey and Billy Williams. And all of the missing HOFers that the model failed to predict had on the list had 8 or fewer.)

The authors defend the use of All-Star selections on the basis that it's a proxy for position played; that's somewhat reasonable, and I guess that's why it somewhat works.

Anyway: is this method better than others, most notably Bill James' algorithms? Strangely, although the authors cite both of James' methods, they don't compare them to their own. My guess is that Bill's methods are probably at least as accurate as the ones in this paper. And Bill's have the advantage that we actually learn something from them -- they help us figure out what it takes to get into the Hall of Fame. The method in this paper, while perhaps being objective, accurate, and complex, doesn't tell us anything except its predictions, and so we don't learn very much about baseball from it.

P.S. The paper assumes that all sabermetrics comes from SABR. This is, of course, not true.

Labels: ,