Which players are making baseball games so slow?
Who are the fastest and slowest players in baseball, in terms of keeping the games moving along?
Four years ago, I wrote two posts that tried to answer that question. I took Retrosheet game logs from 2000 to 2009, and ran a huge regression, trying to predict game time (hours and minutes) from the 18 players in the starting lineups, the two starting pitchers, the two teams, the years, the number of pitches, the number of stolen base attempts, and anything else I could think of.
It turned out that players appeared to vary quite a bit -- at the extremes, it appeared that some batters could affect game time by as much as four minutes, and pitchers as much as seven minutes. For the record, here are the "fastest" and "slowest" players I found, in minutes and seconds they appear to add:
+4:10 Slowest batter: Denard Span
-4:11 Fastest batter: Chris Getz
+7:43 Slowest pitcher: Matt Garza
-7:13 Fastest pitcher: Tim Wakefield
+5:45 Slowest catcher: Gary Bennett
-4:39 Fastest catcher: Eddie Perez
A full spreadsheet of my 2010 estimates is here.
Last week, Carl Bialik, of FiveThirtyEight, revisited the question of slow games, and I learned that FanGraphs now has real data available. They took timestamped PITCHf/x game logs, and calculated the average time between pitches, for each batter and pitcher. They call that statistic "pace," and it's available for 2007 to 2014.
So, now, there's some hard data to verify my numbers with.
I found 290 batters who were in both my study and the Fangraphs data. Then, I ran a regression to predict FanGraphs' number from mine.
It turned out that for every extra minute per game that I found, FanGraphs found only about a half a second per pitch (.4778). So if I have a player at +2 minutes per game, you'd expect FanGraphs to have clocked him at +0.95 seconds per pitch.
How many pitches does a starting batter see in a game? Maybe, 16? So, I have 120 seconds, and FanGraphs has 15 seconds. That's eight times the difference!
But, hang on. The correlation between the two measures is surprisingly high -- +0.40. That suggests that, generally, I got the *order* of the players right. Generally, the regression was successful in separating the fast batters from the slow batters.
If you look at the 20 players Fangraphs has as slowest, my study estimated 19 of them as slower than average. (Wil Nieves was the exception.)
For their fastest 19 batters (I skipped over the six-way tie for 20th), my study wasn't quite as good. It found only 11 of 19 as faster than average. But, of Fangraphs' next 19 fastest, my study was 15-4.
My feeling is ... it's not great, but it's not bad. I'm willing to argue that the regression is reasonably capable of differentiating the speedy batters from the time-wasters.
For pitchers, the fit was much better.
The correlation was +0.73, much higher than I had expected. And the units were decent, too. For every minute of slowness per game the regression found, it translated to 42 seconds in Fangraphs (based on a guess of 94 pitches per start).
Why did the pitchers come out so much better than the hitters? My guess: it's hard to separate the effect of Derek Jeter from the other eight batters, because they play together so much of the time. In contrast, for every starting pitcher, there are at least 120 games without him where the rest of the lineup is almost the same. That makes the differences much more obvious for the regression to figure out.
(BTW, Derek Jeter was the second "slowest" batter in my study, at +3.78 minutes per game.)
But, still ... why is the regression so far off for the batters, by a factor of 8?
No doubt, some of it is the effects of random luck. But I think the most important factor is that time between pitches isn't the only factor affecting the length of a game.
In his article, Carl Bialik noted that from 2008 to 2014, Yankees games took 12.8 minutes longer, on average, than Blue Jays games. But, after adjusting for the pace of the hitters and pitchers, there were still 6.5 minutes unexplained.
For the Red Sox, it was +11.8 minutes including pace, and +3.0 minutes without.
That's just a two-team sample, but "pace" appears to be only about a half to two-thirds of the reason games take so long.
What's the rest?
There are lots of possibilities. Time between innings? 16 half-inning breaks, an extra 5 seconds each, adds up to 80 seconds. Seventh-inning stretches, I bet those take longer in some parks than others. The time it takes to announce a batter ... are some parks faster than others, and does the batter wait for it? Does it take five extra seconds for crowd cheering when Derek Jeter bats? What about defense ... how long it takes to get the ball back to the pitcher after an out? Do some outfielders take their time throwing the ball back?
To check for the park-related factors, I re-ran my regression, but, this time, I used an extra dummy variable for home/road. And, indeed, some significant differences showed up. Even after controlling for everything else, including the players, games take an extra 1.9 minutes at Yankee Stadium, and an extra 1.4 minutes at Fenway Park.
And ... it looks like there might possibly be a pattern: larger market teams appear to be hosting slower games than small-market teams. Here are the teams with the largest unexplained slowness:
+1.45 Red Sox
And the most unexplained quickness:
-2.94 Blue Jays
Maybe it's not a perfect pattern, but it's still suggestive.
So, that's some support for the idea that there is some kind of park-related effect. It doesn't explain why the batter numbers are so extreme, since this after controlling for players. But, it could still be part team and part player, if (say) the fans cheer an extra 10 seconds for any Yankee batter, but an extra 30 seconds for Derek Jeter.
Another thing that might be happening: there are many players who miss only a few games. I checked Derek Jeter in 2010. He started all but seven games, and every one of those seven was on the road. Could the regression be conflating Jeter games with home games? (It doesn't really show up as a huge confidence interval in the regression, though.)
Or, maybe everyone works faster in meaningless end-of-season games, and those are the ones where Derek Jeter is more likely to be sat out.
You can probably think of other possibilities, some being a real effect, and some a statistical artifact. Or, I might have made a mistake somewhere.
But, I suspect there's something real going on, something other than what's measured by "pace." And, the regression seems to be assigning a lot of it to individual batters -- with some to the team in general, and some to the home park.
I'm not sure what it is, though.