Sabermetric Research: A breakdown of the luck in MLB season records

If you take a .500 baseball team, and flip it 162 times, you should expect it to come up "win" 81 times. But that will vary -- sometimes it'll win fewer, and sometimes it'll win more. You can calculate that the distribution of wins should follow a normal distribution, with a mean of 81, and a standard deviation of 6.36.

Using the rule of thumb that 95% of observations are within two standard deviations of the mean, you can figure that, around one time in 20, that team will win 94 or more games, or fewer than 68, just by luck alone.

Where does that luck show up?

The way I see it, you can break it up into five mutually-exclusive observations (as I described in a previous post):

1. The team's hitters could have better or worse performances than their talent expectation -- that is, "career years" in either direction -- in terms of their basic batting line.

2. The team's pitchers could do the same (that is, the opposing team's batters could have "career years").

3. The team could score more (or fewer) runs than expected from its composite batting line. In other words, it could beat its Runs Created (or Linear Weights, or Base Runs) estimate. That usually happens if the team hits better (or worse) than expected in high-leverage situations in terms of runs scoring -- such as, for instance, bases loaded and two outs.

4. The opposition could do the same.

5. The team could win more (or fewer) games than expected from the number of runs it scored and allowed. In other words, it could beat its Pythagorean projection (or, alternatively, the "10 runs equals a win" rule of thumb). That usually happens when a team scores more runs in high-leverage situations in terms of game outcome -- such as, for instance, with the score tied in the ninth inning.

If my logic is right, those five calculations cover all the binomial luck, and none of them overlap (that is, no luck is counted twice).

The question I spent the last couple of days looking at, is: how does the overall variation break down into the five parts? That is, which of the five types is most important? It wasn't that hard to figure out; I'm not sure why I didn't do it years ago.

------

First, the "career years" thing. How do you figure that out? Well, it's pretty easy to get an estimate. I took the overall MLB stats for the 1984 season (1984 chosen arbitrarily), and divided by 26 to get a team average. It worked out that there were 25.27 hitless at-bats per game, per team. (It's not 27 because of bottom-of-the-ninth issues, outs made on base, double plays, and so on.)

So, I ran a little simulation. I created random plate appearances, with league-average probabilities, until I got to 25.27 times 162 batting outs. Then, I calculated the Linear Weights runs for that simulated batting line. (I used weights of .47/.85/1.02/1.4/0.33 for 1B/2B/3B/HR/BB. The value assigned to the out didn't matter here, because every season had the same number of outs.)

In that simulation, the standard deviation of runs was 31.9. So, that's my estimate of how much random variation there is in terms of "career years". It's 31.9 for the team's hitters, and 31.9 for the team's pitchers.

------

For beating the "runs created" estimate, I looked at real life data. I actually used Linear Weights, instead, because I think it's more accurate. I used the above weights for the basic events, and I calculated the value of the batting out for each season (I probably should have used league-season, but I don't feel like redoing it.)

I think I did 1960 to 2001, omitting strike seasons.

The standard deviation of Linear Weights luck was 23.9 runs. I did only batting, because I didn't have detailed statistics handy for opposition batting. But, I'm going to assume that would have worked out about the same.

------

Finally, for Pythagoras, I looked at real-life teams from 1973 to 2001 (again omitting strike seasons). The standard error was 3.91 wins, which I converted to 39.1 runs.

------

So, here are the results:

31.9 -- career years by hitters
31.9 -- career years by pitchers
23.9 -- linear weights luck for hitters
23.9 -- linear weights luck for opposition hitters
39.1 -- Pythagoras luck

To get the overall SD, you square the five numbers, add them up, and take the square root. If you do that, you get 68.6. That's somewhat higher than what we expected, which was 63.6.

68.6 -- five categories combined
63.6 -- theoretical expectation

Why the difference? I'm not sure. Some possibilities:

1. Linear Weights is known to overestimate luck for very bad and very good teams, and underestimate it for medium teams. That would inflate those two SDs, a bit.

2. Teams that have good Linear Weights Luck score more runs. Teams that score more runs play fewer bottom-of-the-ninths on offense, and more bottom-of-the-ninths on defense. That would compress their run differential, which would make them look like they had more Pythagoras luck than they did. That is: we *are* double counting a little bit of luck in this case, due to the fact that Pythagoras doesn't take innings into account, just games.

3. I used real-life teams for Pythagoras luck. But, in real life, there are things that make a team beat its Pythagoras that have nothing to do with luck. For instance, a team with much better relief pitchers will be able to hold on to small leads, and win more games than expected. Also, managers who make blowouts worse by using their worst pitchers to mop-up will show more apparent Pythagoras luck, by giving up more runs that don't affect who wins.

4. The same thing could be true for Linear Weights luck. Linear Weights assumes that each individual event -- a single, say -- occurs in a league-average context of runners on base. But teams that score primarily by the home run hit in a below-average context (to see why, imagine that a team hits ONLY home runs. Each will be worth only 1.0 runs, instead of the 1.4 the formula thinks it's worth). And teams that score primarily by singles probably hit in an above-average context. So, that would tend to magnify the errors, in either direction.

I suspect the Pythagoras is the biggest thing ... maybe what I'll do, eventually, is pick real-life games randomly broken among all different teams, and calculate Pythagoras error that way. And maybe I'll do the same thing for Linear Weights. I'm guessing that will bring the numbers down at least a bit. Whether the total will drop from 68.6 all the way to 63.6 ... well, I doubt it, but you never know.

------

If you like talking in terms of variances -- that is, r-squareds -- or you like it when things add up to 100 percent, here are the five variances as a percentage of the total. (I'll also change "linear weights luck" to "cluster luck", in honor of Joe Peta, since Linear Weights luck is the result of clutch hitting, which means clustering of offensive events.)

22% -- career years by hitters
22% -- career years by pitchers
12% -- cluster luck by offense
12% -- cluster luck by opposition offense
32% -- Pythagoras luck

And, combining offense/defense:

44% -- career years by team's players
24% -- overall cluster luck
32% -- Pythagoras luck

I suspect that after that other simulation I plan on doing, the career years numbers will wind up a bit higher, and the others a bit lower. But I think this will still be pretty close.

Labels: baseball, luck, randomness

6 Comments:

At Friday, April 26, 2013 3:25:00 AM, j holz said...: Maybe I'm reading this incorrectly, but it looks like you're conflating two sources of variance: 1) deviation from expected performance, and 2) variation of outcomes from known performance.

Remember that the 6.36 win standard deviation is true only if we are "flipping coins", i.e. the winning probability for each game is known in advance. When you add uncertainty of performance, that number is going to go up. My research indicated that the best preseason projections have a SD of about 8.5 wins, due to all the reasons you have outlined, plus injuries, trades, etc.
At Friday, April 26, 2013 1:26:00 PM, Phil Birnbaum said...: I'm arguing that both sources of variance are part of the coin-flipping luck. A team can flip 91 heads because it hit and pitched well, or because it won more games than expected given its hitting and pitching, or a combination of both.
At Friday, April 26, 2013 1:28:00 PM, Phil Birnbaum said...: That is: uncertainty of performance is part of the 6.36. Uncertainty of TALENT is not.
At Tuesday, April 30, 2013 2:21:00 PM, James said...: Phil, am I thinking about this properly?

Assuming the best preseason projections have a SD of 8.5 wins, and the uncertainty of performance has a SD of 6.36 wins, then that implies the projections have an uncertainty of talent SD of 5.64. Is that correct?
At Tuesday, April 30, 2013 2:48:00 PM, Phil Birnbaum said...: Seems right to me ... keeping in mind that "talent" means "average team talent", and includes trades, callups, injuries, etc.
At Friday, August 07, 2015 7:22:00 PM, Anonymous said...: Please stop calling anything you can't quantify or explain "luck" unless you can PROVE it is luck, and not some failure on your part to grasp an element of your subject.

<< Home

Sabermetric Research

Thursday, April 25, 2013

A breakdown of the luck in MLB season records

6 Comments:

About Me

Previous Posts