Monday, April 21, 2014

Which players are making baseball games so slow?

Who are the fastest and slowest players in baseball, in terms of keeping the games moving along?

Four years ago, I wrote two posts that tried to answer that question.  I took Retrosheet game logs from 2000 to 2009, and ran a huge regression, trying to predict game time (hours and minutes) from the 18 players in the starting lineups, the two starting pitchers, the two teams, the years, the number of pitches, the number of stolen base attempts, and anything else I could think of.  

It turned out that players appeared to vary quite a bit -- at the extremes, it appeared that some batters could affect game time by as much as four minutes, and pitchers as much as seven minutes.  For the record, here are the "fastest" and "slowest" players I found, in minutes and seconds they appear to add:

+4:10 Slowest batter:  Denard Span 
-4:11 Fastest batter:  Chris Getz

+7:43 Slowest pitcher: Matt Garza 
-7:13 Fastest pitcher: Tim Wakefield

+5:45 Slowest catcher: Gary Bennett
-4:39 Fastest catcher: Eddie Perez

A full spreadsheet of my 2010 estimates is here.


Last week, Carl Bialik, of FiveThirtyEight, revisited the question of slow games, and I learned that FanGraphs now has real data available.  They took timestamped PITCHf/x game logs, and calculated the average time between pitches, for each batter and pitcher.  They call that statistic "pace," and it's available for 2007 to 2014.

So, now, there's some hard data to verify my numbers with.  

I found 290 batters who were in both my study and the Fangraphs data.  Then, I ran a regression to predict FanGraphs' number from mine.

It turned out that for every extra minute per game that I found, FanGraphs found only about a half a second per pitch (.4778).  So if I have a player at +2 minutes per game, you'd expect FanGraphs to have clocked him at +0.95 seconds per pitch.

How many pitches does a starting batter see in a game? Maybe, 16?  So, I have 120 seconds, and FanGraphs has 15 seconds.  That's eight times the difference!  

But, hang on.  The correlation between the two measures is surprisingly high -- +0.40.  That suggests that, generally, I got the *order* of the players right.  Generally, the regression was successful in separating the fast batters from the slow batters.

If you look at the 20 players Fangraphs has as slowest, my study estimated 19 of them as slower than average.   (Wil Nieves was the exception.)  

For their fastest 19 batters (I skipped over the six-way tie for 20th), my study wasn't quite as good.  It found only 11 of 19 as faster than average.  But, of Fangraphs' next 19 fastest, my study was 15-4.  

My feeling is ... it's not great, but it's not bad.  I'm willing to argue that the regression is reasonably capable of differentiating the speedy batters from the time-wasters. 


For pitchers, the fit was much better.  

The correlation was +0.73, much higher than I had expected. And the units were decent, too.  For every minute of slowness per game the regression found, it translated to 42 seconds in Fangraphs (based on a guess of 94 pitches per start).  

Why did the pitchers come out so much better than the hitters?  My guess: it's hard to separate the effect of Derek Jeter from the other eight batters, because they play together so much of the time.  In contrast, for every starting pitcher, there are at least 120 games without him where the rest of the lineup is almost the same.  That makes the differences much more obvious for the regression to figure out.

(BTW, Derek Jeter was the second "slowest" batter in my study, at +3.78 minutes per game.)


But, still ... why is the regression so far off for the batters, by a factor of 8?

No doubt, some of it is the effects of random luck.  But I think the most important factor is that time between pitches isn't the only factor affecting the length of a game.  

In his article, Carl Bialik noted that from 2008 to 2014, Yankees games took 12.8 minutes longer, on average, than Blue Jays games.  But, after adjusting for the pace of the hitters and pitchers, there were still 6.5 minutes unexplained.  

For the Red Sox, it was +11.8 minutes including pace, and +3.0 minutes without.

That's just a two-team sample, but "pace" appears to be only about a half to two-thirds of the reason games take so long.

What's the rest?  

There are lots of possibilities. Time between innings?  16 half-inning breaks, an extra 5 seconds each, adds up to 80 seconds.  Seventh-inning stretches, I bet those take longer in some parks than others.  The time it takes to announce a batter ... are some parks faster than others, and does the batter wait for it?  Does it take five extra seconds for crowd cheering when Derek Jeter bats?  What about defense ... how long it takes to get the ball back to the pitcher after an out?  Do some outfielders take their time throwing the ball back? 

To check for the park-related factors, I re-ran my regression, but, this time, I used an extra dummy variable for home/road. And, indeed, some significant differences showed up.  Even after controlling for everything else, including the players, games take an extra 1.9 minutes at Yankee Stadium, and an extra 1.4 minutes at Fenway Park.

And ... it looks like there might possibly be a pattern: larger market teams appear to be hosting slower games than small-market teams. Here are the teams with the largest unexplained slowness:

+1.91 Yankees
+1.88 Nationals
+1.71 Braves
+1.45 Red Sox
+1.40 Dodgers

And the most unexplained quickness:

-2.94 Blue Jays
-2.45 Giants
-2.25 A's
-1.83 Tigers
-1.17 Cubs

Maybe it's not a perfect pattern, but it's still suggestive.


So, that's some support for the idea that there is some kind of park-related effect.  It doesn't explain why the batter numbers are so extreme, since this after controlling for players.  But, it could still be part team and part player, if (say) the fans cheer an extra 10 seconds for any Yankee batter, but an extra 30 seconds for Derek Jeter.  

Another thing that might be happening: there are many players who miss only a few games.  I checked Derek Jeter in 2010.  He started all but seven games, and every one of those seven was on the road.  Could the regression be conflating Jeter games with home games?  (It doesn't really show up as a huge confidence interval in the regression, though.)

Or, maybe everyone works faster in meaningless end-of-season games, and those are the ones where Derek Jeter is more likely to be sat out.  

You can probably think of other possibilities, some being a real effect, and some a statistical artifact.  Or, I might have made a mistake somewhere.  

But, I suspect there's something real going on, something other than what's measured by "pace." And, the regression seems to be assigning a lot of it to individual batters -- with some to the team in general, and some to the home park.

I'm not sure what it is, though. 

Labels: , , ,

Monday, April 14, 2014

Accurate prediction and the speed of light III

There's a natural "speed of light" limitation on how accurate pre-season predictions can be.  For a baseball season, that lower bound is 6.4 games out of 162.  That is, even if you were to know everything that can possibly be known about a team's talent -- including future injuries -- the expected SD of your prediction errors could never be less than 6.4.   (Of course, you could beat 6.4 by plain luck.)

Some commenters at FiveThirtyEight disagree with that position.  One explicitly argued that the limit is zero -- which implies that, if you had enough information, you could be expected to get every team exactly right.  That opinion isn't an outlier  -- other commenters agreed, and the original comment got five "likes," more than any other comment on the post where it appeared.


Suppose it *were* possible to get the win total exactly right. By studying the teams and players intently, you could figure out, for instance, that the 2014 Los Angeles Dodgers would definitely go 92-70.

Now, after 161 games, the Dodgers would have to be 91-70 or 92-69.  For them to finish 92-70 either way, you would have to *know*, before the last game, whether it would be a win or a loss.  If there were any doubt at all, there would be a chance the prediction wouldn't be right.

Therefore, if you believe there is no natural limit to how accurate you can get in predicting a season, you have to believe that it is also possible to predict game 162 with 100% accuracy.

Do you really want to bite that bullet?  

And, is there something special about the number 162?  If you also think there's no limit to how accurate you can be for the first 161 games ... well, then, you have the same situation.  For your prediction to have been expected to be perfect, you have to know the outcome of the 161st game in advance.

And so on for the 160th game, and 159th, and so on.  A zero "speed of light" means that you have to know the result of every game before it happens.  


From what I've seen, when readers reject the idea that the lowest error SD is 6.4, they're reacting to the analogy of coin flipping.  They think something like, "sure, the SD is 6.4 if you think every baseball game is like a coin flip, or a series of dice rolls like in an APBA game.  But in real life, there are no dice. There are flesh-and-blood pitchers and hitters.  The results aren't random, so, in principle, they must be predictable."

I don't think they are predictable at all.  I think the results of real games truly *are* as random as coin tosses.  

As I've argued before, humans have only so much control of their bodies. Justin Verlander might be wanting to put the ball in a certain spot X, at a certain speed Y ... but he can't.  He can just come as close to X and Y as he can, and those discrepencies are random.  Will it be a fraction of a millimeter higher than X, or a fraction lower?  Who knows?  It depends on which neurons fire in his brain at which times.  It depends on whether he's distracted for a millionth of a second by crowd noise, or his glove slipping a bit.  It probably depends on where the seam of the baseball is touching his finger. (And we haven't even talked about the hitter yet.)

It's like the "chaos theory" example of how a butterfly flapping its wings in Brazil can cause a hurricane in Texas. Even if you believe it's all deterministic in theory, it's indistinguishable from random in practice.  I'd bet there aren't enough atoms in the universe to build a computer capable of predicting the final position of the ball from the state of Justin Verlander's mind while he goes into his stretch -- especially, to an extent where you can predict how Mike Trout will hit it, assuming you have a second computer for *his* mind.

What I suspect is, people think of the dice rolls as substitutes for identifiable flesh-and-blood variation. But they aren't. The dice rolls are substitutes for randomness that's already there. The flesh-and-blood variation goes *on top of that*.

APBA games *don't* consider the flesh-and-blood variation, which is why it's much easier to predict an APBA team's wins than a real-life team's wins.  In a game, you know the exact probabilities before every plate appearance.  In real life, you don't know that the batter is expecting fastballs today, but the pitcher is throwing more change-ups.  

The "speed of light" is *higher* in real-life than in simulations, not lower.


Now, perhaps I'm attacking a straw man here.  Maybe nobody *really* believes that it's possible to predict with an error of zero.  Maybe it's just a figure of speech, and what they're really saying is that the natural limit is much lower than 6.4.

Still, there are some pretty good arguments that it can't be all that much lower.

The 6.4 figure comes from the mathematics of flipping a fair coin.  Suppose you try to predict the outcome of a single flip. Your best guess is, "0.5 heads, 0.5 tails".  No matter the eventual outcome of the flip, your error will be 0.5.  That also happens to be the SD, in this case.  

(If you want, you can guess 1-0 or 0-1 instead ... your average error will still be 0.5.  That's a special case that works out for a single fair coin.)  

It's a statistical fact that, as the number of independent flips increases, the expected SD increases by the square root of the number of flips.  The square root of 162 is around 12.7. Multiply that by 0.5, and you get 6.364, which rounds to 6.4. That's it!

Skeptics will say ... well, that's all very well and good, but baseball games aren't independent, and the odds aren't 50/50!  

They're right ... but, it turns out, fixing that problem doesn't change the answer very much.

Let's check what happens if one team is a favorite.  What if the home team wins 60% of the time instead of 50%?

Well, in that case, your best bet is to guess the home team will have a one-game record of 0.6 wins and 0.4 losses.  Six times out of 10, your error will be 0.4.  Four times out of ten, your error will be 0.6.  The root mean square of (0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6) is around 0.4899. Multiply that by the square root of 162, and you get 6.235.  

That, indeed, is smaller than 6.364.  But not by much.  Still, the critics are right ... the assumption that all games are 50/50 does make for slightly too much pessimism.

Mathematically, the SD is always equal to the square root of (chance of team A winning)*(chance of team B winning)*(162).  That has its maximum value when each game is a 50/50 toss-up.  If there's a clear favorite, the SD drops;  the more lopsided the matchup, the lower the SD.  

But the SD drops very slowly for normal, baseball levels of competitive balance.  As we saw, a season of 60/40 games (60/40 corresponds to Vegas odds of +150/-150 before vigorish) only drops the speed of light from 6.36 to 6.23.  If you go 66/33 -- which is roughly the equivalent of a first-place team against a last-place team -- the SD drops to exactly 6.000, still pretty high.

In real life, every game is different; the odds depend on the teams, the starting pitchers, the park, injuries, and all kinds of other things. Still, that doesn't affect the logic much. With perfect information, you'll know the odds for each individual game, and you can just use the "average" in some sense.  

Looking at the betting lines for tomorrow (April 15, 2014) ... the average favorite has an implied expected winning percentage of .568 (or -132 before vigorish).  Let's be conservative, and say that the average is really .600.  In that case, the "speed of light" comes out to 6.2 instead of 6.4.  

For an analogy, if you like, think of 6.4 as the speed of light in a vacuum, and 6.2 as the speed of light in air.  


What about independence?  Coin flips are independent, but baseball games might not be.  

That's a fair point.  If games are positively correlated with each other, the SD will increase; if they're negatively correlated, the SD will decrease.  

To see why: imagine that games are so positively correlated that every result is the same as the previous one.  Then, every team goes 162-0 or 0-162 ... your best bet is to predict each team at their actual talent level, close to .500. Your error SD will be around 81 games, which is much higher than 6.2.

More importantly: imagine that games are negatively correlated so that every second game is the opposite of the game before.  Then, every team goes 81-81, and you *can* predict perfectly.

But ... these are very extreme, unrealistic assumptions.  And, as before, the SD drops very, very slowly for less implausible levels of negative correlation.

Suppose that every second game, your .500 team has only a 40% chance of the same result as the previous game.  That would still be unrealistically huge ... it would mean an 81-81 team is a 97-65 talent after a loss, and a 65-97 talent after a win.  

Even then, the "speed of light" error SD drops only slightly -- from 6.364 to 6.235.  It's a tiny drop, for such a large implausibility.

But, yes, the point stands, in theory.  If the games are sufficiently "non-independent" of each other, you can indeed get from 6.4 to zero.  But, for anything remotely realistic, you won't even be able to drop your error by a tenth of a game. For that reason, I think it's reasonable to just go ahead and do the analysis as if games are independent.  


Also, yes, it is indeed theoretically possible to get the limit down if you have very, very good information.  To get to zero, you might need to read player neurons.  But, is it more realistic to try to reduce your error by half, say?  How good would you need to be to get from 6.4 down to 3.2?  

*Really* good.  You'd need to be able to predict the winner 93.3% of the time ... 15 out of 16 games.


You have to be able to say something like, "well, it looks like the Dodgers are 60/40 favorites on opening day, but I know they're actually 90/10 favorites because Clayton Kershaw is going to be in a really good mood, and his curve will be working really well."  And you have to repeat that 161 times more.  And 90/10 isn't actually enough, overall ... you need to average around 93/7. 

Put another way: when the Dodgers win, you'd need to have been able to predict that 15/16 of the time.  And when the Dodgers lose, you'd need to have been able to predict that 15/16 of the time.  

That's got to be impossible.  

And, of course, you have to remember: bookies' odds on individual games are not particularly extreme.  If you can regularly predict winners with even 65% accuracy, you can get very, very rich.  This suggests that, as a practical matter, 15 out of 16 is completely out of reach.  

As a theoretical possibility ... if you think it can be done in principle, what kind of information would you actually need in order to forecast a winner 93% of the time?  What the starter ate for breakfast?  Which pitches are working and which ones aren't?  The algorithm by which the starter chooses his pitches, and by which each batter guesses?  

My gut says, all the information in the world wouldn't get you anywhere close to 93%.

Take a bunch of games where where Vegas says the odds are 60/40.  What I suspect is: even if you had a team of a thousand investigative reporters, who can hack into any computer system, spy on the players 24 hours a day, ask the manager anything they wanted, and do full blood tests on every player every day ... you still wouldn't have enough information to pick a winner even 65 percent of the time.  

There's just too much invisible coin-flipping going on.

Labels: , , ,

Tuesday, April 08, 2014

Predictions should be narrower than real life

Every year since 1983 (strike seasons excepted), at least one MLB team finished with 97 wins or more.  More than half the time, the top team had 100 wins or more.

In contrast, if you look at ESPN's 2014 team projections, their highest team forecast is 93-69.  

What's going on?  Does ESPN really expect no team to win more than 93 games?

Nope.  I bet ESPN would give you pretty good odds that some team *will* win 94 games or more, once you add in random luck.  

The standard deviation (SD) of team wins due to binomial randomness is around 6.4.  That means on average, nine teams per season will be lucky by six wins or more.  If you have a bunch of teams forecasted in the low 90s -- and ESPN has five of those -- chances are, one of them will get lucky and finish around a hundred wins.

But you can't predict which teams will get that luck.  So, if you care only about getting the best accuracy of the individual team forecasts, you're always going to project a narrower range than the actual outcomes.

A more obvious way to see that is to imagine simulating a season by flipping coins -- heads the home team wins, tails the visiting team wins.  Obviously, any one team is as "good" as any other.  Under those circumstances, the best prediction you can make is that every team will go 81-81.  Of course, that probably won't happen, and some teams, by just random chance, will go 92-70, or something.  But you don't know which teams those will be, and, since it's just luck, there's no way of being able to guess.  

It's the same logic in for real baseball.  No *specific* team can be expected to win more than 93 games.  Some teams will probably win more than 96 by getting lucky, but there's no way of predicting which ones.


That's why your range of projections has to be narrower than the expected standings.  How much narrower?

Over the past few seasons, the SD of team wins has been around 11.  Actually, it fluctuates a fair bit (which is expected, due to luck and changing competitive balance).  In 2002, it was over 14.5 wins; in 2007, it was as low as 9.3.  But 11 is pretty typical.

Since a team's observed performance is the sum of talent and luck, and because talent and luck are independent, 

SD(observed)^2 = SD(talent)^2 + SD(luck)^2.

Since SD(observed) equals 11, and SD(luck) = 6.4, we can figure that, after rounding,

SD(talent) = 9

So: if a season prediction has an SD that's significantly larger than 9, that's a sign that someone is trying to predict which teams will be lucky.  And that's impossible.  


As I wrote before, it's *really* impossible, as in, "exceeding the speed of light" impossible.  It's not a question of just having better information about the teams and the players.  The "9" already assumes that your information is perfect -- "perfect" in the sense of knowing the exact probability of every team winning every game.  If your information isn't perfect, you have to settle for even less than 9.

Let's break down the observed SD of 11 even further.  Before, we had

Observed = talent + luck

We can change that to:

Observed = talent we can estimate + talent we can't estimate + luck

Clearly, we'd be dumb in trying to estimate talent we don't know about -- by definition, we'd just be choosing random numbers.  What kind of things are there that affect team talent that we can't estimate?  Lots:

-- which players will get injured, and for how long?
-- which players will blossom in talent, and which will get worse?
-- how will mid-season trades change teams' abilities?
-- which players will the manager choose to play more or less than expected?

How big are those issues?  I'll try guessing.

For injuries: looking at this FanGraphs post, the SD of team player-days lost to injury seems to be around 400.  If the average injured player has a WAR of 2.0, that's an SD of about 5 wins (400 games is around 2.5 seasons).  

But that's too high.  A WAR of 2.0 is the standard estimate for full-time players, but there are many part-time players whose lost WAR would be negligible.  The Fangraphs data might also include long term DL days, where the team would had time to find a better-than-replacement substitute.

I don't know what the right number is ... my gut says, let's change the SD to 2 wins instead of 5.

What about players blossoming in talent?  I have no idea.  Another 2 wins?  Trades ... call it 1 win?  And, playing time ... that could be significant.  Call that another 2 wins.

Use your own estimates if you think I don't know what I'm talking about (which I don't).  But for now, we have:

9 squared equals
 -- SD of known talent squared +
 -- 2 squared [injuries] +
 -- 2 squared [blossoming] +
 -- 1 squared [trades] +
 -- 2 squared [playing time].

Solving, and rounding, we get 

SD of known talent = 8


What that means is: the SD of your predictions shouldn't be more than around 8.  If it is, you're trying to predict something that's random and unpredictable.

And, again: that calculation is predicated on *perfect knowledge* of everything that we haven't listed here.  If you don't have perfect knowledge -- which you don't -- you should be even lower than 8.

In a sense, the SD of your projections is a measure of your confidence.  The higher the SD, the more you think you know.  A high SD is a brag.  A low SD is humility.  And, a too-high SD -- one that violates the "speed of light" limit -- is a sign that there's something wrong with your methdology.


What about an easy, naive prediction, where we just project based on last year's record?  

This blog post found a correlation of .58 between team wins in 2012 and 2013.  That would suggest that, to predict next year, you take last year, and regress it to the mean by around 42 percent.  

If you do that, your projections would have an SD of 6.38.  (It's coincidence that it works out nearly identical to the SD of luck.)

I'd want to check the correlation for other pairs of years, to get a bigger sample size for the .58 estimate.  But, 6.38 does seem reasonable.  It's less than 8, which assumes excellent information, and it's closer to 8 than 0, which makes sense, since last year's record is still a pretty good indicator of how good a team will be this year.


A good practical number has to be somewhere betwen 6.38 (where we only use last year's record), and 8 (where we have perfect information, everything that can truly be known). 

Where in between?  For that, we can look to the professional bookmakers.  

I think it's safe to assume that bookies pretty much need to have the most accurate predictions of anyone.  If they didn't, smart bettors would bankrupt them.

The Bovada pre-season Over/Under lines had a standard deviation of 7.16 wins.  The Las Vegas Hotel and Casino also came in at 7.16.   (That's probably coincidence -- their lines weren't identical, just the SD.)

7.16 seems about right.  It's almost exactly halfway between 6.38 and 8.00.

If we accept that number, it turns out that more of the season is unpredictable than predictable.  The SD of 11 wins per team comes from 7.2 wins that the sports book can figure out, and 8.3 that the sports book *can't* figure out (and is probably unknowable).  


So, going back to ESPN: how did they do?  When I saw they didn't predict any teams higher than 93 wins, I suspected their SD would come out reasonable.  And, yes, it's OK -- 7.79 wins.  A little bit immodest, in my judgment, but not too bad.

I decided to check some others.  I did a Google search to find all the pre-season projections I could, and then added the one I found in Sports Illustrated, and the one from "ESPN The Magazine" (without their "chemistry" adjustments).  

Here they are, in order of [over]confidence:

  11.50 Sports Illustrated
   9.23 Mark Townsend (Yahoo!)
   9.00 Speed of Light (theoretical est.)
   8.76 Jeff Passan (Yahoo!)
   8.72 ESPN The Magazine 
   8.53 Jonah Keri (Grantland)
   8.51 Sports Illustrated (runs/10)
   8.00 Speed of Light (practical est.)
   7.83 Average ESPN voter (FiveThirtyEight)
   7.79 ESPN Website
   7.78 Mike Oz (Yahoo!)
   7.58 David Brown (Yahoo!)
   7.16 Vegas Betting Line
   6.90 Tim Brown (Yahoo!)
   6.38 Naive previous year method (est.)
   5.55 Fangraphs (4/7/14)
   0.00 Predict 81-81 for all teams
Anything that's not from an actual prediction is an estimate, as discussed in the post.  Even the theoretical "speed of light" is an estimate, since we arbitrarily chose 11.00 as the SD of observed wins.  None of the estimates are accurate to two decimals (or even one decimal), but I left them in to make the chart look nicer.

Sports Illustrated ran two sets of predictions: wins, and run differential.  You'd think they would have predicted wins by using Pythagoras or "10 runs equals one win", but they didn't.  It turns out that their runs estimates are much more reasonable than their win predictions.  

Fangraphs seems way too humble.  I'm not sure why.  Fangraphs updates their estimates every day, for the season's remaining games.  At time of writing, most teams had 156 games to go, so I pro-rated everything from 156 to 162.  Still, I think I did it right; their estimates are very narrow, with no team projected better than 90-72.

FiveThirtyEight got access to the raw projections of the ESPN voters, and ran a story on them.  It would be fun to look at the individual voters, and see how they ranged, but FiveThirtyEight gave us only the averages (which, after rounding, are identical to those published on the ESPN website).

If you have any others, let me know and I'll add them in.  


If you're a forecaster, don't think you need to figure out what your SD should be, and then adjust your predictions to it.  If you have a logical, reasonable algorithm, it should just work out.  Like, when we realized we had to predict 81-81 for every coin.  We didn't need to say, "hmmm, how do I get SD to come out to zero?"  We just realized that we knew nothing, so 81-81 was the right call.  

The SD should be a check on your logic, not necessarily part of your algorithm.


Related links: 

-- Last year, Tango and commenters discussed this in some depth and tested the SDs of some 2013 predictions.  

-- Here's my post on why you shouldn't judge predictions by how well they match outliers.  

-- Yesterday, FiveThirtyEight used this argument in the context of not wanting to predict the inevitable big changes in the standings.

-- IIRC, Tango had a nice post on how narrow predictions and talent estimates, but I can't find it right now.  

Labels: , ,