Tuesday, December 02, 2014

Players being "clutch" when targeting 20 wins -- a follow-up

In his 2007 essay, "The Targeting Phenomenon," (subscription required), Bill James discussed how there are more single-season 20-game winners than 19-game winners. That's the only time that happens, that the higher number happens more frequently than the lower number. 

This is obviously a case of pitchers targeting the 20-win milestone, but Bill didn't speculate on the actual mechanisms for how the target gets hit. In 2008, I tried to figure it out. But, this past June, Bill pointed out that my conclusion didn't fit with the evidence:

"... the Birnbaum thesis is that the effect was caused one-half by pitchers with 19 wins getting extra starts, and one-half by poor offensive support by pitchers going for their 21st win, thus leaving them stuck at 20. But that argument doesn't explain the real life data. 

"[If you look closely at the pattern in the numbers,] the bulge in the data is exactly what it should be if 20 is borrowing from 19 -- and is NOT what it should be if 20 is borrowing both from 19 and 21."

(Here's the link.  Scroll down to OldBackstop's comment on 6/6/2014.)

So, I rechecked the data, and rethought the analysis, and ... Bill is right, as usual. The basic data was correct, but I didn't do the adjustments properly.

-----

My original study covered 1940 to 2007. This study, though, will cover only 1956 to 2000. That's because I couldn't find my original code and data. The "1956" is what I happened to have handy, and I decided to stop at 2000 because Bill did. 

First, here are the raw numbers of seasons with X wins:

17 wins: 159
18 wins: 132
19 wins:  92
20 wins: 113
21 wins:  56
22 wins:  35
23 wins:  20
24 wins:  20

You can see the bulge we're dealing with: there are way too many 20-win pitchers. And it can't be that the excess comes from the 21-win bucket, because, then, the average of 20 and 21 would stay the same, and wouldn't be much lower than 19. That can't be right. And, as Bill pointed out, even if only *half* the excess came from the 21 bucket, 20 would still be too big relative to 19.

So, let me try fixing the problem.

In the other study, I checked four ways in which 20 wins could get targeted:

1. Extra starts for pitchers getting close
2. Starters left in the game longer when getting close
3. Extra relief apparances for pitchers getting close
4. Better performance or luck when shooting for 20 than when shooting for 21.

I'll take those one at a time.

-------

1. Extra starts

The old study found that pitchers who eventually wound up at 19 or 20 wins did, in fact, get more late-season starts than others -- about 23 more overall. In this smaller study (1956-2000 instead of 1940-2007), that translates down to maybe 18 extra starts. 

That's about 9 extra wins. Let's allocate four of them to pitchers who wound up at 19 instead of 18, and the other five to pitchers who wound up at 20 instead of 19. If we back that out of the actual data, we get:

18 wins: 132 136
19 wins:  92  93
20 wins: 113 108
21 wins:  56  56

(If you're reading this on a newsfeed that doesn't support font variations: the first column is the old values, which should be struck out.)

What happens is: the 18 bucket gets back the four pitchers who won 19 instead. The 19 bucket loses those four pitchers, but gains back the five pitchers who won 20 instead of 19. The 20 bucket loses those five pitchers.

(In the other study, I didn't bother doing this, backing out the effects when I found them, so I wound up taking some of them from the wrong place, which caused the problem Bill found.)

So, we've closed the gap from 21 down to 15.

--------

2. Starters left in the game longer

After I had posted the original study, Dan Rosenheck commented,
"You didn't look at innings per start. I bet managers leave guys with 19 W's in longer if they are tied or trailing in the hope that the lineup will get them a lead before they depart."

I checked, and Dan was right. In a subsequent comment, I figured Dan's explanation accounted for about 10 extra twenty-game winners. Those are all taken from the 19-game bucket, because the effect occurred only for starters currently pitching with 19 wins.

For this smaller dataset, I'll reduce the effect from 10 seasons to 7. 

So:

18 wins: 136 136
19 wins:  93 100
20 wins: 108 101
21 wins:  56  56

Now, the bulge is down to 1.  We still have a ways to go, if the 19 is to be significantly higher than the 20, but we're getting there.

---------

3. Extra Relief Apparances

The other study listed every pitcher who got a win in relief while nearing 20 wins. Counting only the ones from 1956 to 2000, we get:

3 pitchers winding up at 19
5 pitchers winding up at 20
2 pitchers winding up at 21

Backing those out:

18 wins: 136 139
19 wins: 100 102
20 wins: 101  98
21 wins:  56  54

The gap now goes the proper direction, but only slightly.

------

4. Luck

This was the most surprising finding, and the one responsible for the "getting stuck at 20" phenomenon. Pitchers who already had 20 wins were unusually unlikely to get to 21 in a subsequent start. Not because they pitched any worse, but because they got poor run support from their offense.

When Bill pointed out the problem, I wondered if the run-support finding was just a programming mistake. It wasn't -- or, at least, when I rewrote the program, from scratch, I got the same result.

For every current starter win level, here are the pitchers' W-L records in those starts, along with the team's average runs scored and allowed:

17 wins:   483-311 .557   4.30-3.61
18 wins:   350-250 .608   4.30-3.61
19 wins:   260-182 .588   4.24-3.56
20 wins:   150-136 .524   3.81-3.54
21 wins:    94- 61 .606   4.49-3.44
22 wins:    59- 23 .720   4.26-2.80

The run support numbers are remarkably consistent -- except at 20 wins. Absent any other explanation, I assume that's just a random fluke.

If we assume that the 20-win starters "should have" gone 171-115 (.598) instead of 150-136 (.524), that makes a difference of 21 wins.

The mistake I made in the previous study was to assume that those wins were all stolen from the "21-win" bucket. Some were, but not all. Some of the unlucky pitchers eventually got past the 20-win mark; a few, for instance, went on to post 23 wins. In their case, it becomes the 23-win bucket stealing a player from the 24-win bucket.

I checked the breakdown. For every starter who tried for his 21st win but didn't achieve it that game, I calculated where he eventually finished the season. From there, I scaled the totals down to 21, the number of wins lost to bad luck. The result:

20  wins:  9 pitchers
21  wins:  5 pitchers
22  wins:  3 pitchers
23  wins:  1 pitcher
24  wins:  2 pitchers
25+ wins:  less than 1 pitcher

So: the 20-win bucket stole 9 pitchers from the 21-win bucket. The 21-win bucket stole 5 pitchers from the 22-win bucket. And so on. 

Adjusting the overall numbers gives this:

18 wins: 139 139
19 wins: 102 102
20 wins:  98  89
21 wins:  54  50
22 wins:  35  33

-------

And that's where we wind up. It's still not quite enough, to judge by Bill's formula and even just the eyeball test. It still looks like there's a little bulge at 20, by maybe five pitchers. If 20 could steal five more pitchers from 19, we'd be at 107/84, which would look about right.

But, we've done OK. We started with a difference of +21 -- that is, 21 more twenty-game winners than nineteen-game winners -- and finished with a difference of -13. That means we found an explanation for 34 games, out of what looks like a 39-game discrepancy.

Where would the other five come from? I don't know. It could be luck and rounding errors. It could also be that the years 1956-2000 aren't a representative sample of the original study, so we lost a bit of accuracy when I scaled down.  Or, it could be some fifth real factor I haven't thought of.

In any case, here's the final breakdown of the number of "excess" 20-game winners:

-- 5 from getting extra starts;
-- 7 from being left in games longer than usual;
-- 3 from getting extra relief appearances;
-- 9 from bad run support getting them stuck at 20;
-- 5 from luck/rounding/sources unknown.

By the way, one important finding still stands through both studies. Starters didn't seem to pitch any better than normal with their 20th win on the line, so you can't accuse them of trying harder in the service of a selfish personal goal.




Labels: , , ,

Sunday, August 02, 2009

Pitchers targeting 20 wins -- followup and slides

Last year, I ran a study on why there are more pitchers who win 20 games in a season than 19. I updated that study slightly for my presentation at last week's SABR convention, and the Powerpoint slides (.ppt) are now available on my website, or by direct click here.


Labels: , , ,

Thursday, March 27, 2008

Players being "clutch" when targeting 20 wins -- a study

In my previous post, I speculated about the anomaly, discovered by Bill James, that there are more 20-game winners than 19-game winners in the major leagues. That is the only case, between 0 and 30, where a higher-win season happens more frequently than a lower-win season.

Here, once again, are some of the win frequencies. For instance, there were 123 seasons of exactly 19 wins since 1940. (All numbers in this study are 1940-2007.)


16 311
17 221
18 185
19 123
20 144
21 92
22 54

In the previous post, I suggested that the bulge at twenty wins appears to be about 29 "too high". So we'll proceed as if there are an extra 29 twenty-win seasons to be explained.

I did a little digging to see if I could figure out what caused this to happen. I think I have an answer, and it's a bit of a surprise.


1. Extra Starts
The first thing I looked at was whether pitchers with 18 or 19 wins late in the season would be given an extra start near the end of the season to try to hit the 20. So for each group of pitchers, I checked what percentage of their starts came in September or later:

16  win pitchers: 17.53% of starts in September
17  win pitchers: 17.77% of starts in September
18  win pitchers: 18.36% of starts in September
19  win pitchers: 18.49% of starts in September
20  win pitchers: 18.47% of starts in September
21  win pitchers: 18.15% of starts in September
22+ win pitchers: 18.18% of starts in September


So it looks like there's a positive relationship between September starts and eventual wins, and a little bulge that happens in the 18-20 range. Maybe those pitchers are getting extra starts, or, as Greg Spira suggested, perhaps the other pitchers miss a start in favor of a minor-league callup, while the pitchers with a shot at 20 are given all their usual starts. The bulge appears to be about a quarter of a percent.

If we assume that, if not for targeting 20 wins, the 19-20 pitchers would have been 0.25% lower without the special treatment, that's 23 of their 9,229 combined starts. If half those starts led to a 19-game winner becoming a 20-game winner, that's an extra 11 pitchers in the twenty-win column. It seems reasonable – it's less than 29, anyway, which is the number we're trying to explain.


2. Relief Appearances
Greg also suggested, in the previous post, that pitchers with 19-wins may be given an extra late-season relief appearance to try to get their twentieth win. I checked Retrosheet game logs, and Greg is right – there has been some of that going on.

I found all September relief appearances for eventual 20-game winners, where they had at least 18 wins at the time of the relief appearance, and they got a decision (Retrosheet game logs won't list a reliever unless he wins or loses, but that doesn't matter for this study). Here they are:

1951: Early Wynn gets his 18th win [in relief]
1951: Mike Garcia gets his 19th win
1956: Billy Hoeft gets his 20th win
1957: Jim Bunning gets his 20th win
1964: Dean Chance has 19 wins but loses
1966: Chris Short gets his 20th win
1991: John Smiley gets his 19th win
1997: Randy Johnson gets his 20th win


So that accounts for seven 20-game winners who would otherwise have won only 19 games.

But what about 19-game winners? Those guys may get extra relief appearances too. I checked, and there are fewer of them.

1956: Lawrence Brooks (18th win, loss at 19 wins)
1962: Art Mahaffey has 19 wins but loses
1964: Bob Gibson gets his 19th win
1964: Jim Bunning gets his 19th win
1974: Ken Holtzman has 19 wins but loses


(Just out of interest, Bob Gibson's appearance was in the last game of the 1964 season, so he couldn't have been going for 20.)

And here are the late-season relief decisions for the eventual 21-game winners:

1940: Bobo Newsom (20th win)
1940: Rip Sewell (20th win)
1946: Howie Pollet (18th win)
1947: Johnny Sain (21st win)
1959: Sam Jones (18th win, loss at 20 wins)
1960: Warren Spahn (loss at 17 wins)
1960: Ernie Broglio (19th win)
1965: Mudcat Grant (loss at 21 wins)


So it looks like pitchers do get extra relief appearances in pursuit of high win totals (or *did* -- most of these guys were pre-1970). There were four 19-game winners created this way; seven 20-game winners; and six 21-game winners.

The difference between 19 and 20, here, is three players – a lot fewer than I would have thought. But 3 is something, again when there's only 29 to explain.


3. Clutch Pitching
Maybe, when going for their 20th win, a player will bear down and pitch better than usual. I found all starting pitchers with exactly 19 wins, and looked at how they did in the start(s) that would give them their 20th win.

In 490 such starts, they went 227-142 (.655). I couldn't get their ERA or runs allowed from the Retrosheet game logs, but I did get the average number of runs their *team* allowed in those games. It was 3.63.

That doesn't mean much without context. Here are the results for some other win totals:

17 wins: 3.72 runs allowed, .658, 670-348 in 1385 starts
18 wins: 3.54 runs allowed, .652, 487-260 in 982 starts
19 wins: 3.54 runs allowed, .655, 367-193 in 704 starts
20 wins: 3.62 runs allowed, .615, 227-142 in 490 starts
21 wins: 3.53 runs allowed, .676, 138-66 in 273 starts
22 wins: 3.34 runs allowed, .774, 82-24 in 148 starts


Now, we have something: immediately after hitting the 20-win mark, the starters suddenly became a lot less likely to win. Instead of a winning percentage of maybe .660, which you would have expected (remember that the more wins, the better the pitchers, so the winning percentage should increase down the list), they wound up only .615. That's .045 points in 369 decisions, or about 17 wins – almost half the 35 wins we're trying to explain!

By this measure, it looks like this half of the anomaly is not too many 20-game winners relative to 19-game winners, but that poor performance at 20 causes a logjam keeping the 20s from getting to 21.

But: if you look at runs allowed, the performance at 20 wins doesn't seem all that bad. It should be around 3.54, and it's at 3.62. That's .08 runs for each of their 490 starts -- about 40 runs. How did these pitchers win 17 fewer games while allowing only 40 extra runs? Forty runs is 4 games, not 17 games.

The answer: run support. Here's the pitchers' run support for each category:

17 wins: 4.39 run support
18 wins: 4.41 run support
19 wins: 4.45 run support
20 wins: 4.05 run support
21 wins: 4.46 run support
22 wins: 4.48 run support


The 4.05 is not a typo. When starting a game with 20 wins, pitchers got four-tenths of a run less support than they should have. That's huge. Over 490 games, it's almost 200 runs. That wipes out 20 wins, which keeps twenty 20-win pitchers from getting to 21 wins.

I have no idea why this should happen. I suppose it's possible that, seeing how the ace already has 20 wins, the manager might play his bench for this meaningless September game. But how often would that happen? No way it would be enough for 0.4 runs per game, would it?

By the way, it looks like these 20-game winners beat Pythagoras in these starts. They finished only 17 games below expectation, while losing 240 runs (40 pitching, 200 hitting). Assigning blame in proportion over those 17 extra games, we'll say that 3 of the extra losses came from pitching, and 14 from run support.

I find it something of a relief that it was run support, and not (positive) clutch performance on the part of their pitchers, that caused the effect – it wasn't the case that they pitched better when close to a (selfish) goal. Going for their 20th win, pitchers did not appear to do any better or worse than when going for their 18th, 19th, or 22nd wins. And they pitched only marginally better than when going for their 21st.

It's human nature that pitchers want to win 20 for personal reasons, but at least the evidence is that they try just as hard every other game of the year.


4. Concluding
Summarizing these results, we were looking for 29 "extra" 20-game seasons. We got:

-- 11 from extra starts
-- 3 from extra relief appearances
-- 3 from pitchers' own poorer performance in subsequent games
-- 14 from poor run support from their teammates in subsequent games.

That adds up to 31 games, which is close enough to our original estimate of 29.

It's interesting that about half the effect comes from 19-game winners getting extra chances to hit 20, and the other half comes from 20-game winners being unable to rise to 21.

And, to me, the biggest surprise is that almost 40% of the 20-game-winner effect came from that huge hole in run support. In other words, a big part of the surplus of 20-game pitchers is probably just random luck.



------

UPDATE (12/2014, almost seven years later): Bill James points out that the results don't quite work.  I've updated the analysis in a new post here.


Labels: , , ,

Tuesday, March 25, 2008

Do players turn "clutch" when chasing a personal goal?

The better a level of performance, the harder it is to achieve it. There should be more .270 hitters than .275 hitters, more .275 hitters than .285 hitters, and so on.

But, surprisingly, there's an exception: significantly more players hit .300 to .304 in a season than .299 to .296.

That finding comes from Bill James' study, "
The Targeting Phenomenon" (subscription required, but the essay is also on page 67 of "The Bill James Gold Mine.")

For pitcher wins, Bill found a similar exception that's even more striking. More pitchers win 0 games than 1. More pitchers win 1 game than 2. More pitchers win 2 games than 3. And so on, all the way up to 30 wins. But there's one exception – 20. Significantly more pitchers finish with 20 wins than with 19.

Why? Because, Bill argues, players care about hitting their "targets."

"[Brooks Robinson] had a miserable year in 1963, and went into his last at bat of the season hitting exactly .250—147 for 588. If he made an out, he wound up the season hitting under .250—but he got a hit, and wound up at .251. He said it was the only hit he got all season in a pressure situation. ...
"[P]layers WANT to wind up the season hitting .250, rather than in the .240s. They tend to make it happen."


The implication is that there's a kind of clutch effect happening here, where the player somehow gets better when the target is near. But if that's true, wouldn't that point to baseball players as selfish? Studies have shown very little evidence for clutch hitting when the *game* is on the line. If players care more about hitting .300 than winning the game, that doesn't say much for their priorities.

(Although, in fairness, it should be acknowledged that the opposition is probably trying harder to stop Brooks Robinson from driving in the game-winning run than it is to keep him from getting to .250. For the record, Robinson's final 1963 hit drove in the third run in the ninth inning of a
7-3 loss to the Tigers.)

The study also finds that while this kind of targeting happens for batting average, RBIs, wins, and (pitcher) strikeouts, there's no evidence for targeting in SLG, OBP, OPS, saves, or runs scored. For ERA, there's some evidence of targeting, but not enough to say for sure.

Also, Bill finds that targeting seems to have started around 1940. He argues that's the same time as a jump in fan interest in players' statistical accomplishments.

These are very interesting findings, and I wouldn't have expected as much targeting as seems to have actually occurred. But I'm a bit skeptical about clutchness, and whether players really can boost their performance in target-near situations. I wondered if, instead of clutch performance, it might be something else. Maybe, if a player is close to his goal, he is given additional playing time in support of reaching the target.

That is, if a pitcher has 19 wins late in the season, perhaps the manager will squeeze in an extra start for him. Or if a player is hitting .298, maybe they'll let him play every day until he gets to .300, instead of resting him in favor of the September callup. If and when he reaches .300, then they could sit him (as, I think I remember reading, Bobby Mattick did for Alvis Woods in 1980).

To test the "extra start theory," I looked at pitchers since 1940, grouping them by number of wins. I then looked at their winning percentage, number of starts, and the number of seasons in the group:

16 .606 32.3 311
17 .613 33.0 221
18 .648 33.3 185
19 .650 34.4 123
20 .667 34.9 144
21 .673 34.7 92
22 .691 35.9 54
23 .705 35.9 34
24 .707 38.5 23


So, reading one line of the chart, 20-win pitchers had a .667 winning percentage and an average of 34.9 starts that year. There were 144 seasons in the group.

Looking at the numbers, we do see a bit of an anomaly. More wins normally means more starts, except that pitchers with 20 wins had more starts than pitchers with 21 wins. And, there's a big jump between 18 and 19, more than you'd expect based on the other gaps in that win range.

Suppose we wanted to smooth out the "number of starts" column. We might adjust them like this:

17 33.0
18 33.3
19 34.4 33.8
20 34.9 34.3
21 34.7
22 35.9


Now we have a smooth increasing trend. To get it, we had to remove 0.6 starts from each of the 19- and 20-win groups.

One possible interpretation: when a pitcher has 19 wins near the end of the season, they give him 1.2 extra starts. Half the time, that gives him an extra win, and he goes to 20 (which now shows 0.6 extra starts). The other half, he fails to get the win, and stays at 19 (which also shows 0.6 extra starts).

Another way to look at this in the "win percentage" column: pitchers with 19 wins have almost the same winning percentage as the 18-win guys, which means more losses. And the 20-win guys, at .667, are only .006 away from the 21-win pitchers, which suggests more wins. That's exactly what happens if you take a bunch of 19-win guys, give them an extra start, and reclassify them.

So what do you think of this as an explanation? Does the *average* 19 win late-September pitcher really get 1.2 extra starts? That seems too high to me, although I don't really know. And, some of the effect might not be extra starts, but leaving the pitcher in the game longer when he's losing or tied, long enough for his offense to bail him out and give him the win.


Now look at the last column, the number of seasons. If we were to smooth out that column, we might do it this way:

17 221
18 185
19 123 151
20 144 115
21 92
22 54
23 34


The difference is 29 pitchers in the nineteen-win row, and 29 pitchers in the twenty-win row. Assume those 29 pitchers moved from 19 to 20 because of the extra start. If you figure that these pitchers generally win half their starts, that means about 58 pitchers were given that one extra shot.

58 pitchers in the 68 baseball seasons since 1940 means about a little less than one pitcher a year getting that extra start. There are normally only about two 19-win pitchers a year, so that means about half of them would have to get the special treatment.

Again, that seems high. However, in support of this theory, the effect diminishes after 1980. In fact, there are now *fewer* pitchers winning 20 than 19:

17 97
18 84
19 43
20 41
21 25
22 12


There's still a bit of an effect, but not as much – in line with Bill's idea that, these days, managers are less likely to pitch an ace on short rest (or leave him in longer in a tie game) just to help him reach a personal goal.

There are probably other things that might be causing this, that I haven't thought of.

In any case, it wouldn't be too hard to figure out a decent answer: just head to Retrosheet and look at 19- and 20-game winners. See if their days of rest varied late in the season, which would mean the "extra start" theory is correct. Check whether they were left in the game longer than normal. And check whether they pitched better in late-season games, which would mean the "clutch" theory is correct.

And you can do the same thing for hitting, for players around .300. Is it just a matter of opportunities, or is there some clutchness too? If the latter, that would be a very significant finding. It would suggest, perhaps, that

(a) clutch hitting does exist, and either

(b1) it only shows up for personal goals, or
(b2) it only shows up when the situation is not clutch for the other team.

Maybe I'll do this myself, if nobody else does ...



UPDATE: Part II is here.


Labels: , , , ,