Thursday, May 25, 2017

Pete Palmer on luck vs. skill

Pete Palmer has a new article on skill and luck in baseball, in which he crams a whole lot of results into five pages. 

It's called "Calculating Skill and Luck in Major League Baseball," and appears in the new issue of SABR's "Baseball Research Journal."  It's downloadable only by SABR members at the moment, but will be made publicly available when the next issue comes out this fall.

For most of the results, Pete uses what I used to call the "Tango method," which I should call the "Palmer method," because I think Pete was actually the first to use it in the context of sabermetrics, in the 2005 book "Baseball Hacks."  (The mathematical method is very old; Wikipedia says it's the "Bienaymé formula," discovered in 1853. But its use in sabermetrics is recent, as far as I can tell.)

Anyway, to go through the method yet one more time ... 

Pete found that the standard deviation (SD) of MLB season team wins, from 1981 to 1990, was 9.98. Mathematically, you can calculate that the expected SD of luck is 6.25 wins. Since a team's wins is the total of (a) its expected wins due to talent, and (b) deviation due to luck, the 1853 formula says

SD(actual)^2 = SD(talent)^2 + SD(luck)^2

Subbing in the numbers, we get

9.98 ^ 2 = SD(talent)^2 + 6.25^2 

Which means SD(talent) = 7.78.

In terms of the variation in team wins for single seasons from 1981 to 1990, we can estimate that differences in skill were only slightly more important than differences in luck -- 7.8 games to 6.3 games.

------

That 7.8 is actually the narrowest range of team talent for any decade. Team skill has been narrowing since the beginning of baseball, but seems to have widened a bit since 1990. Here's part of Pete's table:

decade 
ending   SD(talent)
-------------------
 1880     9.93
 1890    14.44
 1900    14.72
 1910    15.33
 1920    13.06
 1930    12.51
 1940    13.66
 1950    12.99
 1960    11.95
 1970    11.17
 1980     9.75
 1990     7.78
 2000     8.46
 2010     9.87
 2016     8.91

Anyway, we've seen that many times, in various forms (although perhaps not by decade). But that's just the beginning of what Pete provides. I don't want to give away his entire article, but here some of the findings I hadn't seen before, at least not in this form:

1. For players who had at least 300 PA in a season, the spread in their batting average is roughly evenly caused by luck and skill.

2. Switching from BA to NOPS (normalized on-base plus slugging), skill now surpasses luck, by an SD of 20 points to 15.

3. For pitchers with 150 IP or more, luck and skill are again roughly even.

In the article, these are broken down by decade. There's other stuff too, including comparisons with the NBA and NFL (OK, that's not new, but still). Check it out if you can.

-------

OK, one thing that surprised me. Pete used simulations to estimate the true talent of teams, based on their W-L record. For instance, teams who win 95-97 games are, on average, 5.6 games lucky -- they're probably 90 or 91-win talents rather than 96.

That makes sense, and is consistent with other studies that tried to figure out the same thing. But Pete went one step further: he found actual teams that won 95-97 games, and checked how they did next year.

For the year in question, you'd expect them to have been 91 win teams. For the following year, you'd expect them to be *worse* than 91 wins, though. Because, team talent tends to revert to .500 over the medium term, unless you're a Yankee dynasty or something.

But ... for those teams, the difference was only six-tenths of a win. Instead of being 91 wins (90.8), they finished with an average of 90.2.

I would have thought the difference would have been more than 0.6 wins. And it's not just this group. For teams who finished between 58 and 103 wins, no group regressed more than 1.8 wins beyond their luck estimate. 

I guess that makes sense, when you think about it. A 90-win team is really an 87-win talent. If they regress to 81-81 over the next five seasons, that's only about one win per year. It's my intuition that was off, and it took Pete's chart to make me see that.






Labels: , ,

Wednesday, May 17, 2017

The hot hand debate vs. the clutch hitting debate

In the "hot hand" debate between Guy Molyneux and Joshua Miller I posted about last time, I continue to accept Guy's position, that "the hot hand has a negligible impact on competitive sports outcomes."

Josh's counterargument is that some evidence for a hot hand has emerged, and it's big. That's true: after correcting for the error in the Gilovich paper, Miller and co-author Adam Sanjurjo did find evidence for a hot hand in the shooting data of Gilovich's experiment. They also found a significant hot hand in the NBA's three-point shooting contest

I still don't believe that those necessarily suggest a similar hot hand "in the wild" (as Guy puts it), especially considering that to my knowledge, none has been found in actual games. 

As Guy says,


"Personally, I find it easy to believe that humans may get into (and out of) a rhythm for some extremely repetitive tasks – like shooting a large number of 3-point baskets. Perhaps this kind of “muscle memory” momentum exists, and is revealed in controlled experiments."

-------

Of course, I keep an open mind: maybe players *do* get "hot" in real game situations, and maybe we'll eventually see evidence for it. 

But ... that evidence will be hard to find. As I have written before, and as Josh acknowledges himself, it's hard to pinpoint when a "hot hand" actually occurs, because streaks happen randomly without the player actually being "hot."

I think I've used this example in the past: suppose you have a 50 percent shooter when he's normal, but he turns in to a 60 percent shooter when he's "hot," which is one-tenth of the time. His overall rate is 51 percent.

Suppose that player makes three consecutive shots. Does that mean he's in his "hot" state? Not necessarily. Even when he's "normal," he's going to have times where he makes three consecutive shots just by random luck. And since he's "normal" nine times as often as he's "hot," the normal streaks will outweigh the hot streaks.

Specifically, only 19 percent of three-hit streaks will come when the player is hot. In other words, four out of five streaks are false positives.

(Normally, he makes three consecutive shots one time in 8. Hot, he makes three consecutive shots one time in 4.63. In 100 sequences, he'll be "normal" 90 times, for an average 11.25 streaks. In his 10 "hot" times, he'll make 2.16 streaks. That's about a 4:1 ratio.)

Averaging the real hotness with the fake hotness, the player will shoot 51.9 percent after a streak. But his overall rate is 51.0 percent. It takes a huge sample size to notice the difference between 51 percent and 51.9 percent.

Even if you do notice a difference, does it really make an impact on game decisions? Are you really going to give the player the ball more because his expectation is 0.9 percent higher, for an indeterminate amout of time?

-------

And that's my main disagreement with Josh's argument. I do acknowledge his finding that there's evidence of a "muscle memory" hot hand, and it does seem reasonable to think that if there's a hot hand in one circumstance, there's probably one in real games. After all, *part* of basketball is muscle memory ... maybe it fades when you don't take shots in quick succession, but it still seems plausible that maybe, some days you're more calibrated than others. If your muscles and brain are slightly different each day, or even each quarter, it's easy to imagine that some days, the mean of your instinctive shooting motion is right on the money, but, other days, it's a bit short.

But the argument isn't really about the *existence* of a hot hand -- it's about the *size* of the hot hand, whether it makes a real difference in games. And I think Guy is right that the effect has to be negligible. Because, even if you have a very large change in talent,  from 50 percent to 60 percent -- and a significant frequency of "hotness", 10 percent of the time -- you still only wind up with a 0.9 percent increased expectation after a streak of three hits. 

You could argue that, well, maybe 50 to 60 percent understates the true effect ... and you could get a stronger signal by looking at longer streaks.

That's true. But, to me, that argument actually *hurts* the case for the hot hand. Because, with so much data available, and so many examples of long streaks, a signal of high-enough strength should have been found by now, no? 


-------

This debate, it seems to me, echoes the clutch hitting debate almost perfectly.

For years, we framed the state of the evidence as "clutch hitting doesn't exist," because we couldn't find any evidence of signal in the noise. Then, a decade ago, Bill James published his famous "Underestimating the Fog" essay, in which he argued (and I agree) that you can't prove a negative, and the "fog" is so thick that there could, in fact, be a true clutch hitting talent, that we have been unable to notice.

That's true -- clutch hitting talent may, in fact, exist. But ... while we can't prove it doesn't exist, we CAN prove that if it does exist, it's very small. My study (.pdf) showed the spread (SD) among hitters would have to be less than 10 points of batting average (.010). "The Book" found it to be even smaller, .008 of wOBA (a metric that includes all offensive components, but is scaled to look like on-base percentage). 

To my experience, a sizable part of the fan community seizes on the "clutch hitting could be real" finding, but ignores the "clutch hitting can't be any more than tiny" finding. 

The implicit logic goes something like, 

1. Bill James thinks clutch hitting exists!
2. My favorite player came through in the clutch a lot more than normal!
3. Therefore, my favorite player is a clutch hitter who's much better than normal when it counts!

But that doesn't follow. Most strong clutch hitting performances will happen because of luck. Your great clutch hitting performance is probably a false positive. Sure, a strong clutch performance is more likely to happen given that a player is truly clutch, but, even then, with an SD of 10 points, there's no way your .250 hitter who hit .320 in the clutch is anything near a .320 clutch hitter. If you did the math, maybe you'd find that you should expect him to be .253, or something. 

Well, it's the same here, with the hot hand:

1. Miller and Sanjurjo found a real hot hand!
2. Therefore, hot hand is not a myth!
3. My favorite player just hit his last five three-point attempts!
4. Therefore, my player is hot and they should give him the ball more!

Same bad logic. Most streaks happen because of luck. The streak you just saw is probably a false positive. Sure, streaks will happen given that a player truly has a hot hand, but, even then, given how small the effect must be, there's no way your usual 50-percent-guy is anything near superstar level when hot. If you had the evidence and did the math, maybe you'd find that you should expect him to be 52 percent, or something.

-------

For some reason, fans do care about whether clutch hitting and the hot hand actually happen, but *don't* care how big the effect is. I bet psychologists have a cognitive fallacy for this, the "Zero Shades of Grey" fallacy or the "Give Them an Inch" fallacy or the "God Exists Therefore My Religion is Correct" fallacy or something, where people are unwilling to believe something into existence -- but, once given license to believe, are willing to assign it whatever properties their intuition comes up with.

So until someone shows us evidence of an observable, strong hot hand in real games, I would have to agree with Guy:


"... fans’ belief in the hot hand (in real games) is a cognitive error."  

The error is not in believing the hot hand exists, but in believing the hot hand is big enough to matter. 

Science may say there's a strong likelihood that intelligent life exists on other planets -- but it's still a cognitive error to believe every unexplained light in the sky is an alien flying saucer.



Labels: , ,