### Why are Yankees/Red Sox games so slow?

There's been lots of talk lately about how Yankees and Red Sox games take too long and move too slow.

Part of the "take too long" part is that those games tend to have lots of plate appearances and pitches. But another part is just that Red Sox and Yankee players tend to play slowly -- Derek Jeter appearing to be the worst of many offenders.

I figured that out with a study that's basically a large regression (thanks to WSJ's Carl Bialik for requesting it; Carl wrote about it earlier this week).

Here's what I did. I took every game from 2000 to 2009, and tried to predict game time from a bunch of different factors -- the number of pitches in the game, the number of innings, how many of the last few innings were close, how many steal attempts there were (to try to account for pickoff throws), the attendance, how many relievers were used, how many plate appearances there were, and how many runs were scored.

Those coefficients were mostly as you'd expect -- every extra pitch took about 23.3 seconds extra. Every half-inning that was close (as opposed to when the score was a blowout) added 47 seconds. Every reliever added 2:13 (probably the time to warm up the pitcher, if the change was mid-inning). And so on.

I also adjusted for the season in which the games took place. The results surprised me a bit; I hadn't realize that, all else being equal, games were almost four minutes slower in 2000-2001 than they were in 2009. But 2009 was the slowest game time since 2003. The fastest games of the decade were in 2004, when they were 4:54 faster than 2009, all else being equal.

I checked months, too, and April and September are fastest. Summer games are about two minutes longer than April. Maybe everyone hurries a bit more to get out of the cold?

Then, the fun part: For 1105 different players, I assigned each a dummy variable, which represented whether or not he was in the starting lineup that day (I was using Retrosheet game logs, so starting lineup was all I could get without digging into play-by-play data). I included any player who had at least one season of 250 AB between 2000 and 2009, or, for pitchers, at least one season of 25 games started.

Finally, I calculated one factor separately for every team, after adjusting for the players in the starting lineup. Those weren't that interesting. I think most of what they represent is how fast the *other* players are -- spot starters, relief pitchers, September callups who never got to 250 AB.

However: Boston and the Yankees were still among the slowest: their games were a minute or two longer than the average other team, even after adjusting for individual players. I wonder if that means there's something else going on: maybe the Yankees and Red Sox announcers are really slow, and the batters have to wait longer to get started? More research is required there.

Anyway, after doing all that, I got estimates for the effect of every player separately -- that is, how much longer or shorter games were with him in the lineup, compared to a player in the lineup instead who wasn't one of the 1,104 others in the study. It turns out that the missing players and relievers are faster than the regulars and starters, by about 26 seconds a batter, 11 seconds a catcher, and 19 seconds a pitcher. Because of that, I adjusted every regular by subtracting the average of his group, because it makes more sense to compare him to the other 1,104 regulars instead of the September callups and relievers.

So the final result gives a kind of "with or without you" factor for every player. For instance, take Derek Jeter. He was the second-highest in "delay of game" factor among batters (excluding catchers). All else being equal, the regression tells us that a game with Derek Jeter in the starting lineup was 3 minutes and 30 seconds slower than the exact same game where he wasn't in the lineup. It turns out that Jeter was the second-slowest batter in the study.

3:30 seems like a lot to me. How much is Jeter really involved in the play? Maybe 4 or 5 plate appearances a game, which is 15 or 20 pitches? That works out to between 10 and 15 seconds a pitch.

Does Derek Jeter really take an extra ten seconds between pitches than the average batter? I haven't watched him bat that closely, but maybe you guys can let me know if that's a reasonable estimate. There is some randomness involved in the regression, and, since Jeter was the second highest in the league, you might want to regress his number to the mean a little bit. But still -- his game factor of 3:30 was 3.6 standard deviations from zero, so it's almost certain that he's pretty slow.

I suppose that theoretically, it could also be his defense ... when he catches a pop up, does he do a little 30 second dance before throwing the ball? Doesn't seem likely to me. Baserunning seems like a better candidate ... maybe Jeter draws a lot of throws when he's on first base, but doesn't steal much (steal attempts appear in the regression). I should have had the regression account for pickoff throws, but I didn't think of it until now.

As in any regression, there could be something else going on. It could be that there's something special about the games that Jeter misses that make them a lot faster than otherwise. For instance: when I did my first pass at this, I found that Jeter was slow by almost six minutes, rather than three-and-a-half. Why? Because my first pass didn't control for season. And about half the games Jeter missed in the decade came in 2003, when games were about four minutes shorter than normal. So the games he missed were faster than the games he played for reasons other than his slowness.

So, if you want to take a guess at other reasons Jeter's number might be too high, you need something like that. Something that makes games faster, that would be disproportionately applicable to games that Jeter missed. And that "something" has to be something not controlled for in the study -- something other than year, month, attendance, other players in the lineup, and so on.

I couldn't come up with anything, but that doesn't mean that you won't.

-----------

Anyway, let me show you the top ten fast and slow players, and you can decide for yourself if the results seem reasonable. Here are the slow batters. Minutes are in decimals (4.5 equals four minutes thirty seconds) because I'm too lazy to convert:

+4.31 Denard Span

+3.51 Derek Jeter

+3.28 Miguel Tejada

+2.87 Rickie Weeks

+2.72 Albert Belle

+2.45 Dustin Pedroia

+2.43 Dante Bichette

+2.25 Greg Dobbs

+2.22 David Segui

+2.21 Reggie Abercrombie

And here are the fast batters:

-3.41 Chris Getz

-3.12 Kevin Jordan

-2.97 Nick Markakis

-2.79 Jake Fox

-2.74 Mark Ellis

-2.62 Will Venable

-2.49 Jose Lopez

-2.29 Mark McGwire

-2.23 Warren Morris

-2.19 Chris Davis

I did catchers separately, because you can't know how much of their speed is due to their batting, and how much is due to their catching. If a guy catches 140 pitches a game, and takes an extra half-second to throw each one back to the pitcher, that's an extra minute he's adding to the game. So you'd expect the catchers to have more extreme numbers than the other batters, and they do. Here are the slow catchers:

+6.01 Gary Bennett

+5.63 Benito Santiago

+4.73 Einar Diaz

+4.43 Tom Wilson

+4.12 Ryan Hanigan

+3.76 Doug Mirabelli

+3.05 Javier Valentin

+2.87 Eliezer Alfonzo

+2.44 Kelly Shoppach

+2.40 Mike Piazza

And the fast catchers:

-4.67 Eddie Perez

-4.09 Josh Bard

-3.88 Omir Santos

-3.88 Chris Coste

-3.58 Jeff Clement

-3.35 Ken Huckaby

-3.33 Charles Johnson

-2.97 John Flaherty

-2.97 Tom Lampkin

-2.61 Ben Davis

Finally, starting pitchers. These guys have a huge impact on game time ... I guess they vary a lot in how much time they take to get ready for the next pitch. Slow pitchers:

+7.32 Gil Heredia

+7.14 Steve Trachsel

+6.98 Matt Garza

+5.49 Armando Reynoso

+5.32 Jason Johnson

+5.22 Kevin Appier

+5.04 Chien-Ming Wang

+4.98 Ross Ohlendorf

+4.88 Edinson Volquez

+4.86 Elmer Dessens

And the fast pitchers. It's ironic that the guy who throws the slowest actually pitches the fastest. (Well, maybe not *that* ironic, but certainly more ironic than rain on your wedding day.)

-7.71 Tim Wakefield

-7.36 Kevin Tapani

-6.69 Glendon Rusch

-5.88 Steve Sparks

-5.39 Kirk Rueter

-5.13 James Baldwin

-5.08 Joe Blanton

-5.02 Ben Sheets

-4.99 Mark Buehrle

-4.98 Matt Morris

Tim Lincecum is 16th fastest, by the way, at -4.24.

------------

Now that we know the slow and fast players, we can do teams by adding up all the players. I'll just do a version of the Yankees and Red Sox, to see if those guys really do slow down the game. Here's the starting lineups from the Red Sox/Yankees game of April 4, 2010:

+3.5 Derek Jeter

-0.8 Nick Johnson

-0.7 Mark Teixeira

+0.9 Alex Rodriguez

+1.9 Robinson Cano

+1.3 Jorge Posada

-0.9 Curtis Granderson

+0.0 Nick Swisher

+1.1 Brett Gardner

-0.4 CC Sabathia

-----------------------

+5.9 Yankees total

-0.2 Jacoby Ellsbury

+2.5 Dustin Pedroia

+1.2 Victor Martinez

+0.6 Kevin Youkilis

+0.4 David Ortiz

+0.3 Adrian Beltre

-0.6 J.D. Drew

+0.2 Mike Cameron

+1.2 Marco Scutaro

+3.6 Josh Beckett

----------------------

+9.1 Red Sox total

So, our estimate is that the game took 15 minutes longer than it would have if average teams had been playing, instead of Boston and New York. That seems like a lot to me.

As it turns out, it was a 9-7 slugfest that went 3:46.

----------

On August 18, 2006, the first game of a doubleheader, the Red Sox beat the visiting Yankees 10-4, in 3 hours and 55 minutes. The starting lineups for those two teams featured players who would be expected to be 24.4 minutes slower than average. That was the slowest-playered game in the decade; of the 20 men in the combined starting lineups, 18 of them were slow. Only Jason Giambi and Craig Wilson were faster than average, by a combined 50 seconds.

The game with the fastest players last decade took place on April 16, 2008. Seattle beat Oakland 4-2. The regression predicted that the game should have taken 20.2 minutes less than normal. It was indeed a very fast game, at 2:09, but, of course, that's partly because it didn't turn out to be much of a slugfest.

---------

Keep in mind that these estimates for individual players really aren't all that precise. The standard error of a typical player is between half a minute and a minute. When Kevin Youkilis comes in at +0.6 minutes slower than average, but his standard error is also 0.6, there's a pretty good chance (about 1 in 6) that he could very well be *faster* than average.

You're on more solid ground when you assume that an extreme player (like Jeter or Markakis) is fast or slow, or when you add a bunch of players together.

--------

I've put the data up on my website, in an Excel spreadsheet. It contains two worksheets: one that gives you slowness estimates for all the players, and another that's the full regression results. I'll annotate that one later so it's easier to understand, and I'll come back and update this post.

I might also rerun this for the 1980s ... if only to see just how slow Mike Hargrove actually was.

## 24 Comments:

Phil -- in the second half of the article you spend a lot of time talking about whether Jeter is a slower player etc.

The question for me is which was does the causation go?

Is it that the NYY have selected players who play more slowly? Or is it that a player who plays for NYY plays more slowly?

In other words is it a player effect or a team effect? For instance, it could be something about how the Yankees coaches want the players to set-up; or it could be that the pressure of playing for the Yankees means that for whatever reason Yankee players just play more slowly. Or, as you assert, if could be that Jeter does a jig after every out.

As you say the NYY variable that you have mostly control for other (bench) players. But the Jeter variable actually conflates, I think, a Jeter effect and a NYY effect.

I also suspect that the Yankees play more slowly against the Red Sox - to control for that we'd need to look at team interactions - which I think you could do if you omitted all the player dummy variables.

To answer that I think we need to look at specific players when they swapped teams. Even then it isn't easy as we probably need to control for pitcher and perhaps catcher - although if the player moves within league then perhaps that isn't so important.

One thing I was going to do with your player coefficients was to group them by team and see at a team level what the pattern is (but I don't have the data to hand to assign team codes to player codes).

If you did that I reckon you might find that the big market teams come out highest. It could also be a contention issue - do teams leading the division play more slowly perhaps because they feel more pressure. I thought that might be going on when I looked at the month coefficients as they rise from April to August. But you report that Sep has a lower coefficient which could discount that theory (unless there are enough teams out of contention to outweigh the effect of teams in contention) - btw I couldn't see the sep coefficient in the xl download.

The other way to test the team effect is to re-run the regression without the players, although you are still conflating players and teams if you did that.

As you say, more research required, but fascinating.

Just to follow up my last comment if you look at the Yankees' team table you show only Tex, CC, Johnson show negative time.

Those players are new to the roster in 2010 so I suspect the negative coefficient is a result that. Had they been with the Yankees for the last five years I bet their coefficients would have been positive.

I guess I'm saying I'm not sure you can use the regression to predict future game time where players have moved teams UNLESS you adjust for their moving team.

Does that make sense??

Hi, John,

I think the way the regression works, it should be able to separate the team effect from the player effects. If it had trouble doing that, you'd see the results in terms of a high standard error on the coefficient it had trouble with.

I think that's how it works.

If what you suggest (in your second comment) was happening, you'd expect to find players with the Yankees all decade to have higher coefficients than players who were with NY for only part of their careers. You could check that (and for Boston too).

As for causation ... yes, I suppose there might be something about playing for New York that slows players down. I speculated that it might be the PA announcer's timing, but only because that was the only example I could think of.

Phil

Yes, the standard errors aren't too bad so the regression may separate player and team.

I guess that with the 250AB cut-off you've probably got 80-90% of the PA accounted for with the player variables so the team variables should be reasonably clean. That makes sense.

However, just going through the list of some former Yankee hitters/pitchers in the past who weren't on the list you get:

- Matsui +0.5

- Cabrera +0.6

- Damon +0.4

- Phillips +1.2

- Abreru +0.9

- Williams +1.2

- Sheff +0.6

- Mussina +3.7

- Wang +5.4

- Clemens +2.8

- Pettitte +1.2

- Giambi -0.5

I realize that some of those guys have moved around but based on the above and on your 2010 list (where the negative guys are either brand new or, like Tex, have one year of service) I'd say that there is probably some elements of the team effect in the player effect.

Part of the issue is that to understand it properly you need to look at pairwise comparisons of hitters or pitchers that move teams. That's quite a bit of work. And if you're good, once you join the Yankees you tend not to leave so the sample size of two-team players is small. If you're bad you tend to hop around many teams and don't really rack up enough PA I suspect to make the comparison that meaningful.

What do you think?

[REPOSED AFTER EDITING]

Phil

Maybe the list of +ve and -ve players is meaningless. Adding up all the batter coefficients I get 242 minutes and all the pitcher coefficients gives me 140 minutes ... so you'd expect a lot more +ve than -ve. And even if you get all +ve that doesn't of course imply the team effect is wrong, although I think it'd mean you want to take a closer look.

Later on I'll put together a dummy regression and see what happens when I play with team and player effects. I'll post the results here.

Stepping back, intuitively I'd have thought that different teams would have had a bigger effect that your results report. But perhaps that's not right. Teams with much better defense or in a pitcher's park should have faster games and the opposite true in higher run scoring environments. Also teams who score more runs are generally more likley to be in contention, which may have an effect as October rolls round.

Hi, John,

Good stuff! For the list of plusses and minuses, use the other worksheet, the one where everyone has been adjusted for the average. Par of the reason you're getting more pluses than minuses because the regression is comparing everyone to the average utility player, who's faster.

Subtract 0.4 from the pitchers, .2 from the catchers, and .3 from the hitters.

You'll still have "too many" pluses, but to a lesser extent. That could indeed have something to do with an attribute of the Yankees we haven't considered.

If you look at the Yankees coefficient, the SE is 1.56. Maybe (say) it should be higher by a minute, and all the Yankees players lower by a minute. That would be well within the bounds of statistical insignificance.

Let me think about that a bit ...

Oops, my last comment wasn't right ... if you bump the Yankees team coefficient by 2 minutes, you have to bump the Yankees *player* coefficients by 1/10 of 2 minutes, since there are 10 regulars per game and the numbers still have to add up.

Maybe 1/9, if you figure one of the ten starting players is a utility guy.

But still, that won't explain why so many of the Yankees are positive. Maybe the majesty of being a Yankee makes you more likely to take time to survey your kingdom. :)

Were prime time games and Fox saturday games taken into consideration? The reason I ask is that some claim the commercials during those games are longer than normal. Since the Yankees & Red Sox played in more prime time & Fox games over the last decade, maybe this was a factor?

Someone else suggested that ... how much longer are the commercials than normal?

Suppose there are 40 games like that a year, and each is 8 or 9 minutes longer. That means each team would get 1.3 games if they were distributed equally. Suppose the Yankees get 6 of those games instead of 1.3. They roughly get 5 more than anyone else. That's about 41 minutes in 162 games, which is 0.25 minutes per game.

That could be part of it, if someone can confirm the commercial breaks really are longer.

This comment has been removed by the author.

Looking at the Yankee's May schedule, I see 5 games on ESPN, 2 on FOX and 2 on TBS. That's 9 games out of 29. If that rate holds up over the 162-game schedule, it means they are on National TV 50 times.

According to this Yahoo report (http://sports.yahoo.com/mlb/news;_ylt=Am_N6HucrWr7s26X2P6vLNgRvLYF?slug=ti-westpace041510):

"(The typical allotment between innings is two minutes, five seconds. The commissioner’s office allows the networks – ESPN, TBS, Fox – an additional 30 seconds for commercials.)"

There are a lot more than 40 games on ESPN, TBS and FOX, not sure how many. We'd need to know that to calibrate for the average team, but I wonder if those 50 games that the Yankees play on National TV (almost definitely a lot more than the average team) is the reason for Yankee slowness.

Yes, if there are that many games with longer between-inning times, that might explain a big part of the discrepancy.

But wouldn't "Fox" also refer to the regional FSNs? And didn't TBS broadcast a lot of Braves games in the early part of the decade?

Still, that might explain why Derek Jeter's times are so long ... his missed games were mostly in 2003, when maybe MLB didn't give them that extra 30 seconds.

One more thing is that Mike Fast's analysis of NYA/BOS games vs. SEA/TEX games showed that time between pitches was a much bigger factor that time between innings ... and that's only the last three seasons, so any "extra 30 seconds" factor would be fully included there.

Okay, this took a little longer than I planned any maybe I've simplified too much. Anyway, let me throw it out there.

I constructed a really simple regression to try to understand whether the team and player effects combine.

Here's what I did. I created 3 teams each with 20 players playing 162 games. I give each player a 'true talent' time they add to a game. Foe simplicity I have them a uniform distribution, randomly generated, between -5 mins and + 5 mins. I chose 10 players to start each game (randomly generated). So over the course of 162 games each player should play 81. I simulated 3 seasons and assumed that no player will move teams.

I also give each game a baseline time of 100 minutes. To calculate the game time I add 100 minutes to all the player minutes adjusting for player time - i.e. which players started.

There are no other team effects included. To ensure different game times I add extra minutes to the game (from 0 to 25 minutes). So in aggregate the average game time should be 112.5 minutes

I ran a regression with minutes as the dependent variable and with the teams and starter players as dummy variables.

Here are some findings:

1/ The player coefficient matched true talent reasonably well, and were stabilizing. The team coefficients didn't stablize - i.e. high standard error ...

I was half expecting the team coefficients to capture some of the player true talent - i.e., show that the team and player effects mix but this didn't happen

2/ When I add in an extra 5 minutes, say, for Team 3 - this, as you'd expect directly affects the Team 3 coefficient -- suggesting that the player and team coefficients are working

So in summary the short model is working as you, Phil, thought it would.

****

So given is there anything else that could be causing the NYY team coefficient to underplay what it should be?

I think it is hard to say unless you do the pairwise comparisons. And the only reason I can think of that the Yankees should suffer from this issue is because they are more successful - they score more runs, are more in contention etc. that other teams (more regularly) ... but you seem to control for all that.

The Fox TV ads may have something to do with it. Perhaps it is only the Fox national channels which have longer ad breaks and FSN doesn't?

Even that though you'd expect to find in the team variable.

Hey Phil: Very nice study. To the extent you may have missing variables of any significance (and you may not), I would think the biggest element would be how often a runner is on first with 2nd open. Throws to 1B and stepping off can add a fair amount of time, and SBAs may not fully capture that. So players who hit a lot of singles and/or draw NIBBs, and are fast enough to be SB threat, may lengthen games. Similarly, IFs who allow more singles/errors, and turn fewer GDPs, create that same situation more often on defense. (And Derek Jeter fits all those criteria!)

So it might be that including singles/ROEs and BBs in the model would strengthen it. I also wonder if including DPs might help -- the time effect is probably fully captured by other variables (PA, pitches, runs), but maybe not.

Thanks, Guy. Maybe I'll add pickoff throws -- even if they're not perfect, they're better than nothing -- and the other stuff you suggest.

It can't hurt to add more stuff ... if it's not significant, you haven't lost anything.

BB/1B, pickoff throws, maybe DPs if they're in the game logs ... and day of week, in case nationally-televised games with longer inning breaks happen mostly on weekends.

If it turned out that games with Derek Jeter turned out to be longer because of his *defense* ... that would be pretty cool. I doubt it's true, but that won't stop me from contemplating just how interesting it would be.

May as well throw in HRs as well. They might have a negative effect on game length, once you account for runs and PAs.

Sure, why not!

Perhaps you could sue, Phil

http://www.baseballprospectus.com/article.php?articleid=10753

John,

Nice experiment, thanks! That's what I thought.

Now, do you have any ideas why Russell's regression comes up with a coefficient of .742 for inning breaks, when we know an inning break is at least two minutes long?

Seems like if the regression can distinguish between player effects and team effects, it should be able to distinguish between inning breaks and other causes ...

OK, done. Added weekend dummy, 1B+BB, pickoff throws, HR, GIDP. pickoff throws were scarce for 2000-2004, but that would just mostly bump up the year dummies for those years.

Day of week (weekend vs. not) was not significant. Everything else was, as expected. The coefficients for players and teams still appear to be pretty close to what they were in the original regression.

Results available on request.

Phil

Can you send me the coefficients - keen to see. I think you have my email. If not is is jrbeamer at gmail - thanks.

A couple of thoughts on the inning coefficeint. I think someone said elsewhere, I can't remember where, that perhaps this is driven by lack of variance in the variable since 90% of games will be either 8.5 or 9 innings so that coefficient gets subsumed somewhere. Now that makes sense to me.

Only issue is that I'm not sure where. The ONLY place I can even part justify it is in the PA and pitches variables. Because of the above that's where it gets subsumed - i.e., we know that 40 PA = 9 innings, or whatever. And same on the pitch variable - they work together (the PA variable doesn't work in isolation you need to use pitches AND PA to get a semi-sensible answer i.e., a four pitch PA = 4*0.4 + 0.6 = 2 mins ... that sounds about right). Could that be a touch high so be taking some of the inning coefficient too??

I see from your xl that for your half-inning variable you got a -0.15 coefficient, which seems odd too. Is that tracking the same thing as Pizza's inning variable?? Assume it is.

Probably the right list of variables there if you wanted to be fully independent is something like:

- Pitches per PA

- PA per inning

- Inning

although that makes the results a bit harder to interpret.

Interesting to note that between inning pitcher changes adds 3 minutes, which is probably about right (if not marginally high).

Yeah, I originally thought the low variance in innings might be part of it, but now I'm not sure. Because there's also low variance in all the player dummies -- there are fewer games with Denard Span in them than there are extra-inning games.

Your suggestion that PAs must be "stealing" some of the inning value sounds reasonable ... but why does that happen? I bet if you did a simulation like the one you did for players/teams, where a PA takes X minutes and an inning takes Y minutes, you'd get that exact result coming out. If that's true, which it might not be, why doesn't that happen in real life like it would in the simulation?

Ah! I think I used full innings instead of half-innings (so a game that goes 8.5 innings went in as 9). Maybe that was enough fuzziness to confuse the regression? I'll check.

Post a Comment

<< Home