Monday, April 21, 2014

Which players are making baseball games so slow?

Who are the fastest and slowest players in baseball, in terms of keeping the games moving along?

Four years ago, I wrote two posts that tried to answer that question.  I took Retrosheet game logs from 2000 to 2009, and ran a huge regression, trying to predict game time (hours and minutes) from the 18 players in the starting lineups, the two starting pitchers, the two teams, the years, the number of pitches, the number of stolen base attempts, and anything else I could think of.

It turned out that players appeared to vary quite a bit -- at the extremes, it appeared that some batters could affect game time by as much as four minutes, and pitchers as much as seven minutes.  For the record, here are the "fastest" and "slowest" players I found, in minutes and seconds they appear to add:

+4:10 Slowest batter:  Denard Span
-4:11 Fastest batter:  Chris Getz

+7:43 Slowest pitcher: Matt Garza
-7:13 Fastest pitcher: Tim Wakefield

+5:45 Slowest catcher: Gary Bennett
-4:39 Fastest catcher: Eddie Perez

A full spreadsheet of my 2010 estimates is here.

----

Last week, Carl Bialik, of FiveThirtyEight, revisited the question of slow games, and I learned that FanGraphs now has real data available.  They took timestamped PITCHf/x game logs, and calculated the average time between pitches, for each batter and pitcher.  They call that statistic "pace," and it's available for 2007 to 2014.

So, now, there's some hard data to verify my numbers with.

I found 290 batters who were in both my study and the Fangraphs data.  Then, I ran a regression to predict FanGraphs' number from mine.

It turned out that for every extra minute per game that I found, FanGraphs found only about a half a second per pitch (.4778).  So if I have a player at +2 minutes per game, you'd expect FanGraphs to have clocked him at +0.95 seconds per pitch.

How many pitches does a starting batter see in a game? Maybe, 16?  So, I have 120 seconds, and FanGraphs has 15 seconds.  That's eight times the difference!

But, hang on.  The correlation between the two measures is surprisingly high -- +0.40.  That suggests that, generally, I got the *order* of the players right.  Generally, the regression was successful in separating the fast batters from the slow batters.

If you look at the 20 players Fangraphs has as slowest, my study estimated 19 of them as slower than average.   (Wil Nieves was the exception.)

For their fastest 19 batters (I skipped over the six-way tie for 20th), my study wasn't quite as good.  It found only 11 of 19 as faster than average.  But, of Fangraphs' next 19 fastest, my study was 15-4.

My feeling is ... it's not great, but it's not bad.  I'm willing to argue that the regression is reasonably capable of differentiating the speedy batters from the time-wasters.

------

For pitchers, the fit was much better.

The correlation was +0.73, much higher than I had expected. And the units were decent, too.  For every minute of slowness per game the regression found, it translated to 42 seconds in Fangraphs (based on a guess of 94 pitches per start).

Why did the pitchers come out so much better than the hitters?  My guess: it's hard to separate the effect of Derek Jeter from the other eight batters, because they play together so much of the time.  In contrast, for every starting pitcher, there are at least 120 games without him where the rest of the lineup is almost the same.  That makes the differences much more obvious for the regression to figure out.

(BTW, Derek Jeter was the second "slowest" batter in my study, at +3.78 minutes per game.)

------

But, still ... why is the regression so far off for the batters, by a factor of 8?

No doubt, some of it is the effects of random luck.  But I think the most important factor is that time between pitches isn't the only factor affecting the length of a game.

In his article, Carl Bialik noted that from 2008 to 2014, Yankees games took 12.8 minutes longer, on average, than Blue Jays games.  But, after adjusting for the pace of the hitters and pitchers, there were still 6.5 minutes unexplained.

For the Red Sox, it was +11.8 minutes including pace, and +3.0 minutes without.

That's just a two-team sample, but "pace" appears to be only about a half to two-thirds of the reason games take so long.

What's the rest?

There are lots of possibilities. Time between innings?  16 half-inning breaks, an extra 5 seconds each, adds up to 80 seconds.  Seventh-inning stretches, I bet those take longer in some parks than others.  The time it takes to announce a batter ... are some parks faster than others, and does the batter wait for it?  Does it take five extra seconds for crowd cheering when Derek Jeter bats?  What about defense ... how long it takes to get the ball back to the pitcher after an out?  Do some outfielders take their time throwing the ball back?

To check for the park-related factors, I re-ran my regression, but, this time, I used an extra dummy variable for home/road. And, indeed, some significant differences showed up.  Even after controlling for everything else, including the players, games take an extra 1.9 minutes at Yankee Stadium, and an extra 1.4 minutes at Fenway Park.

And ... it looks like there might possibly be a pattern: larger market teams appear to be hosting slower games than small-market teams. Here are the teams with the largest unexplained slowness:

+1.91 Yankees
+1.88 Nationals
+1.71 Braves
+1.45 Red Sox
+1.40 Dodgers

And the most unexplained quickness:

-2.94 Blue Jays
-2.45 Giants
-2.25 A's
-1.83 Tigers
-1.17 Cubs

Maybe it's not a perfect pattern, but it's still suggestive.

-----

So, that's some support for the idea that there is some kind of park-related effect.  It doesn't explain why the batter numbers are so extreme, since this after controlling for players.  But, it could still be part team and part player, if (say) the fans cheer an extra 10 seconds for any Yankee batter, but an extra 30 seconds for Derek Jeter.

Another thing that might be happening: there are many players who miss only a few games.  I checked Derek Jeter in 2010.  He started all but seven games, and every one of those seven was on the road.  Could the regression be conflating Jeter games with home games?  (It doesn't really show up as a huge confidence interval in the regression, though.)

Or, maybe everyone works faster in meaningless end-of-season games, and those are the ones where Derek Jeter is more likely to be sat out.

You can probably think of other possibilities, some being a real effect, and some a statistical artifact.  Or, I might have made a mistake somewhere.

But, I suspect there's something real going on, something other than what's measured by "pace." And, the regression seems to be assigning a lot of it to individual batters -- with some to the team in general, and some to the home park.

I'm not sure what it is, though.

Labels: , , ,

At Monday, April 21, 2014 2:01:00 PM,  Guy said...

Phil: did you control for the number of hits, BBs, or GDP in the game? I see that you controlled for runs scored, but that's not quite the same thing. At first glance, it appears that your "slow" hitters are guys who hit for high BA. And many of the "fast" guys are low-BA and/or high-GDP hitters. So I'm wondering if your regression is capturing the impact of these offense events on game duration....

At Monday, April 21, 2014 2:08:00 PM,  Phil Birnbaum said...

Yup, I've got GIDP ... click on the link to the results, then the first worksheet. It's row 11.

I don't have hits, but I have PA. But, good point, not all PA are equal. Still, what's an extra hit ... 30 seconds? A .300 hitter would add 30 seconds every five games compared to a .250 hitter.

But, yes, maybe I should put in all kinds of hits, not just HR. Strikeouts are fast, too.

At Monday, April 21, 2014 2:41:00 PM,  Guy said...

OK, now I see it -- I was looking at an earlier spreadsheet. It might not explain a lot, but I would consider adding 1B, 2B/3B, HBP, and ROE to the model. HBP especially can take a fair amount of time, as it can lead to umpire warnings, players arguing, etc.

Managers can also have a big impact, in terms of tendencies to argue with umpires. I suppose the team variables are capturing that to some extent.

At Monday, April 21, 2014 3:14:00 PM,  Phil Birnbaum said...

Right, managers! Team variables might not capture manager effects perfectly, since they're aggregated over a bunch of years.

At Monday, April 21, 2014 3:47:00 PM,  Guy said...

I would think managers can have a pretty big impact, in terms of mound visits (frequency and duration), fighting with umpires, retaliating for hit batsmen, and tendency to remove pitchers mid-inning (which I don't think you control for).

You might also want to add PB/WP to the model, if you don't want that ability to be incorporated in your catcher and pitcher coefficients.

At Monday, April 21, 2014 3:49:00 PM,  Phil Birnbaum said...

Ah, mid-inning pitcher changes! I'll check, I don't think I controlled for those.

Managers will have an impact, but, over a 10-year span, that would largely even out, right? And it would be a big coincidence if NYY an BOS managers were the most argumentative. (Or perhaps not.)

At Monday, April 21, 2014 3:58:00 PM,  Phil Birnbaum said...

Retrosheet game logs don't include mid-inning pitching changes. I'll have to do a bit of extra work to get those in.

At Monday, April 21, 2014 4:44:00 PM,  Guy said...

I agree that managers probably don't explain much of the puzzle of large coefficients for certain position players. But they probably do explain a considerable portion of the team coefficients.

If mid-inning pitching changes are hard to get, maybe you could add # of relievers/relief innings as a proxy. And include a dummy variable for starter pitched fractional innning?

Are ejections (players and/or managers) in the game logs? I bet that variable would be highly significant. :>)

At Tuesday, April 22, 2014 12:23:00 AM,  Phil Birnbaum said...

And, I added dummies for all managers who managed 300+ games, but not more than half for any given team. (Otherwise, how do you differentiate Mike Scioscia from the Angels?)

Not much changed. A couple of managers came out significant. Any ideas which ones those "should" be? If the numbers match your intuition, we can be more confident we've found something real.

I suspect a good part of it has to do with mid-inning pitching changes, which is the next task.

At Tuesday, April 22, 2014 12:24:00 AM,  Phil Birnbaum said...

Oh, an HBP added 26 seconds. My guess is: 0 for the first one, 30 for the second one, and 120 for the third one. :)

Actually, I'll check that next run!

At Tuesday, April 22, 2014 12:44:00 AM,  Phil Birnbaum said...

HBP were about the same even considering only games where there were 3 or more. Interesting!

At Tuesday, April 22, 2014 1:32:00 PM,  Guy said...

I would guess LaRussa made games longer, because of frequent pitching changes. And maybe Bobby Cox (most career ejections)?

It might be fun to throw in home plate umpires. Not because they will bias any of your other estimates (they shouldn't), just to see if the umps have much impact.

At Tuesday, April 22, 2014 1:35:00 PM,  Phil Birnbaum said...

It's hard to isolate LaRussa and Cox because they're so closely tied to a single team ... but you can use the team estimates for that.

At Tuesday, April 22, 2014 1:44:00 PM,  Guy said...

In terms of the large differences by ballparks, I wonder whether there is variation in terms of which teams still pause to play "America the Beautiful" and/or do a tribute to U.S. veterans. The Nationals still do this regularly, but I'm not sure all U.S. teams still do. This might explain Toronto being such a fast team!

At Tuesday, April 22, 2014 1:47:00 PM,  Phil Birnbaum said...

Do you know what the Yankees and Red Sox do?

At Tuesday, April 22, 2014 1:58:00 PM,  Guy said...

The Yankees still play America the Beautiful, according to Wikipedia: http://en.wikipedia.org/wiki/Seventh-inning_stretch. I don't know about the Red Sox, but the Sox do play "Sweet Caroline" by Neil Diamond in the 8th inning. According to the article, there are a lot of ballpark-specific traditions regarding 7th and 8th innings "intermissions." This is probably a big factor in team coefficients (as well as any differences in the break for TV advertising).

At Tuesday, April 22, 2014 6:15:00 PM,  Phil Birnbaum said...

OK, mid-inning pitching changes helped a lot, for the batters. The extremes dropped from the low 4s to the mid 3s, roughly.

Getz is around -3:46, Span +3:24, Jeter +3:03.

Pitching and catching didn't change much, I don't think.

One possible explanation: pitchers' numbers are an actual reflection of their slowness, while, for batters, there's more luck. Adding the extra field took out that luck, so the batters are affected more.

At Tuesday, April 22, 2014 6:16:00 PM,  Phil Birnbaum said...

BTW, a mid-inning reliever adds 2:18 as compared to a between-innings reliever, according to the regression. Seems reasonable to me.

At Wednesday, April 23, 2014 10:30:00 AM,  Guy said...

Interesting that pitching changes impact the hitters that much. It's still hard to imagine that Jeter could increase the length of games by 3 minutes, so it seems you are still capturing some other factor other than random variation (the sample sizes for Span and Getz, on the other hand, are pretty small).

*

Question: Since you have a variable for each ballpark, does it make sense to also include a franchise variable? I'm thinking that is really just capturing the impact of the manager, and maybe you'd be better off dropping it in favor of including all managers. What else could the franchise variable be telling us, if you are controlling for park, manager, and players?

At Wednesday, April 23, 2014 10:58:00 AM,  Alex said...

I don't know if it's true now, with so many games being televised one place or another, but I wonder if some games are longer due to commercials; maybe these still differ for national versus local broadcasts? In that case, some team differences could be due to popularity, and thus being on national TV more often.

At Wednesday, April 23, 2014 12:09:00 PM,  Phil Birnbaum said...

Guy,

Right, I think the franchise numbers are smaller than the park numbers ... I'll try substituting managers and see what happens. You'd be losing road speed that's team-specific ... like, say, booing all the Yankees. But that doesn't seem like it would be a big deal.

Maybe it takes 30 seconds for the Yankees to dig up that Bob Sheppard recording?

At Wednesday, April 23, 2014 12:09:00 PM,  Phil Birnbaum said...

Alex: yes, I'd love to know if there are team differences for national broadcasts. People have said they don't think so, but I haven't seen any real data.

At Wednesday, April 23, 2014 12:40:00 PM,  Phil Birnbaum said...

Looked at a quick non-random sample of PitchFX files ... it does look like it takes longer for Jeter to see his first pitch than other players, by a few seconds or so?

When Jeter leads off an inning, it's usually around 3 minutes after the previous pitch. For other players, the average seems to be lower, somewhere between 2:30 and 3:00.

I could be imagining it ... confirmation bias? I guess I should see about getting a full season's worth of data.

At Wednesday, April 23, 2014 2:08:00 PM,  Guy said...

I see that Nomah has a negative coefficient in your study. That blows my mind. I would have thought he'd be at the top of the longer-game list, with all that unwrapping and wrapping of the batting glove before each pitch.

At Wednesday, April 23, 2014 4:37:00 PM,  Phil Birnbaum said...

Good call on the managers. Substituting manager for visiting team drops Span to 3.06, Jeter to 2.36.