Monday, December 03, 2018

Answer to: a flawed argument that marginal offense and defense have equal value

The puzzle from last post was this:  What's wrong with this argument, that a run scored has to be worth exactly the same as a run prevented?

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

The answer is hinted at by a comment from Matthew Hunt:

"Is it the zero lower bound for runs? You can always increase the number of offensive runs, but you can't hold an opponent to -1 runs."

It's not specifically the zero lower bound -- the argument is wrong even if shutouts are rare -- but it does have to do with the issue of runs prevented.


(Note: for this post, I'm going to treat runs as if they have a Poisson distribution, to make the argument smoother. In reality, runs in baseball come in bunches, and aren't Poisson at all. If that bothers you, just transfer the argument to hockey or soccer, where goals are much closer to Poisson.)


The answer, I think, is this:  If you want to properly remove a single opponent's run from a season, you don't do it by choosing a random game. You have do it by choosing a random *run*.

When you *add* runs, it's OK to do it by choosing a game first, because all games have roughly equal opportunities to score more runs. But when you *remove* runs, you have to remove a run that's already there ... and you have to weight them all equally when deciding which one to remove.

If you don't weight them the runs equally ... well, suppose you have game A with ten runs, and game B with two runs. If you choose a random game first, each B run has five times the chance of being chosen as each of the A runs. 

Here's another way of looking at it. Suppose you randomly allocate 700 runs among 162 games, and then you realize you made a mistake, you only meant to allocate 699 runs. You'd look up the 700th run you added, and reverse it. 

But, that 700th run is more likely to come from a high-scoring game than a low-scoring game. Why? Because, before you added the last run, the game you were about to add it to was as average as the 161 other games. But after you add the run, that game must now be expected to be one run more than average. (Actually, 699/700 more, but close enough).

So, if you removed a 700th run by choosing a random game first, you'd be choosing it from an expected average game, not an expected above-average-game. And so your distribution will be more bunched up than it should be, and it would no longer be the same as the distribution would be if you just stopped at 699 runs.

And, of course, you might randomly choose a shutout, which brings that game's runs to -1, proving more obviously that your distribution is wrong.

You don't actually have to reverse the 700th run ... there's nothing special about that one compared to the other 699. You can pick the first run, or the 167th run, or a random run. But you have to choose a particular run without regard to the game it's in, or any other context.


Why does a random run have a different value from a run from a random game? 

Because the probabilities change. 

For one thing, you're now much less likely to choose a game where you only allowed one run. You probably won those games anyway, so those runs are less valuable than average. Since you choose less valuable runs less often than before, the value of the run goes up.

But, for another thing, you're now much more likely to choose a game where you gave up a lot of runs. You probably lost those games anyway, so the saved run again probably wouldn't help; you'd just lose 8-3 instead of 9-3. Since you're more likely to choose these less-valuable runs than before, the value of the run goes down.

So some runs where the value is low, you're more likely to choose. Others, you're less likely to choose. Which effect dominates? I don't think we can decide easily from this line of thinking alone. We'd have to do some number crunching.

If we did, we'd find out (as the other argument proved) that "choose a run instead of a game" makes runs prevented more valuable when you already score more than you allow, but less valuable when you allow more than you score. 

But, I don't see a way to prove that from this argument. If you do, let me know!


Finally, let me make one part of the argument clearer. Specifically, why is it OK to pick a random game when adding a run *scored*, but not when subtracting a run *allowed*? Shouldn't it be symmetrical?

Actually, it *is* symmetrical.

When you add a run, you're taking a non-run and changing it to a run. Well, there are so many occurrences of non-runs that they're roughly equal in every game. If you think about changing an out to a run, every game has roughly 27 outs, so every game is already equal.

If you think about hockey ... say, every 15-second interval has a chance of a goal. That's 240 segments per game. In a two-goal game, there are 238 non-goal segments that can be converted into a goal. In a 10-goal game, there are only 230 segments. But 230 is so closer to 238 that you can treat them as equal.*

(* In a true Poisson distribution, they're exactly equal, because you model the game as an infinite number of intervals. Infinity minus 2 is equal to infinity minus 10.)

When you subtract a run ... the process is symmetrical, but the numbers are different. A two-goal game has only two chances to convert a goal to a non-goal, while a ten-goal game has ten -- five times as many. Instead of a 230:238 ratio, you have a 2:10 ratio. The 2 and 10 aren't close enough to treat as equal.

In theory, the two cases are symmetrical in the sense that both are wrong. But, in practice, choosing goals scored by game is wrong but close enough to treat as right. Choosing goals allowed by game is NOT close enough to treat as right.

The fact that goals are rare compared to non-goals is what makes the difference. That difference is why the statistics textbooks say that Poisson is used for the distribution of "rare events."  

Goals are rare events. Non-goals are not.

Labels: ,

Tuesday, November 27, 2018

A flawed argument that marginal offense and defense have equal value

Last post, I argued that a defensive run saved isn't necessarily equally as valuable as an extra offensive run scored.  

But I didn't realize that was true right away.  Originally, I thought that they had to be equal.  My internal monologue went like this:

Imagine giving a team an extra run of offense over a season.  You pick a random game, and add on a run, and see if that changes the result.  Maybe it turns an extra-inning loss into a nine-inning win, or turns a one-run loss into an extra-inning game.  Or, maybe it just turns an 8-3 blowout into a 9-3 blowout.

(It turns out that every ten games, that run will turn a loss into a win ... see here.  But that's not important right now.)

But, it will always be the same as giving them an extra run of defense, right?  Because, it doesn't matter if you turn a 5-4 loss into a 5-5 tie, or into a 4-4 tie.  And it doesn't matter if you turn an 8-3 blowout into a 9-3 blowout, or into a 8-2 blowout.  

Any time one more run scored will change the result of a game, one less run allowed will change it in exactly the same way!  So, how can the value of the run scored possibly be different from the value of the run allowed?

That argument is wrong.  It's obvious to me now why it's wrong, but it took me a long time to figure out the flaw in this argument.

Maybe you're faster than I was, and maybe you have an easier explanation than I do.  Can you figure out what's wrong with this argument?  

(I'll answer next post if nobody gets it.  Also, it helps to think of runs (or goals, or points) as Poisson, even if they're not.)

Labels: ,

Monday, October 01, 2018

When is defense more valuable than offense?

Is it possible, as a general rule, for a run prevented to be worth more than a run scored?

I don't think so. 

Suppose every team in the league scored one fewer run, and allowed one fewer run. If runs prevented were more valuable than runs scored, every team would improve. But, then, the league would no longer balance out to .500.

But the values of offensive and defensive runs *are* different for individual teams.

Suppose a team scores 700 runs and allows 600. That's an expected winning percentage of .57647 (Pythagoras, exponent 2). 

Suppose it gains a run of offense, so it scores 701 instead of 700. At 701-600, its expectation becomes .57717, an improvement of .00070.

Now, instead, suppose its extra run comes on defense, and it goes 700-599. Now, its expectation is .57728, an improvement of .00081.

So, for that team, the run saved is more valuable than the run scored.

It turns out that if a team scores more than it allows, a run on defense is more valuable than a run on offense. If a team allows more than it scores, the opposite is true. 


Just recently, I figured out an intuitive way to show why that happens, without having to use Pythagoras at all. I'm going to switch from baseball to hockey, because if you assume that goals scored have a Poisson distribution, the explanation works out easier.

Suppose the Edmonton Oilers score 5 goals per game, and allow 4. If they improve their offense by a goal a game, the 5-4 advantage becomes 6-4. If they improve their defense by a goal, the 5-4 becomes 5-3.

Which is better? 

Even though both scenarios have the Oilers scoring an average two more goals than the opposition, that doesn't happen every game, because there's random variation in how the goals are distributed among the games. With zero variation, the Oilers win every game 5-3 or 6-4. But, with the kind of variation that actually occurs, there's a good chance that the Oilers will lose some games. 

For instance, Edmonton might win one game 7-1, but lose the next 5-3. Over those two games, the Oilers do indeed outscore their opponents by two goals a game, on average, but they lose one of the two games.

The average is "Oilers finish the game +2". The Oilers lose when the result is at least two goals against them. In other words, when the result varies from expectation by -2 goals or greater.

The more variation around the mean of +2, the greater the chance the Oilers lose. Which  means the team with the advantage wants less variation in scores, and the underdog wants more variation.

Now, let's go to the assumption that goals follow a Poisson distribution.*  

(*Poisson is the distribution you get if you assume that in any given moment, each team has its own fixed probability of scoring, independent of what happened before. In hockey, that's a reasonable approximation -- not perfect, but close enough to be useful.)

For a Poisson distribution, the SD of the difference in goals is exactly the square root of the total goals scored.

In the 5-3 case, the SD of goal differential is the square root of 8. In the 6-4 case, the SD is the square root of 10. Since root-10 is higher than root-8, the underdog should prefer 6-4, but the favored Oilers should prefer 5-3.

Which means, for the favorite, a goal of defense is more valuable than a goal of offense.

This "proof" is only for Poisson, but, for the other sports, the same logic holds. In baseball, football, soccer, and basketball, the more goals/runs/points per game, the more variation around the expectation.

Think about what a two goal/point/run spread means in the various sports leagues. In the NBA, where 200 points are scored per game, a 2-point spread is almost nothing. In the NFL, it means more. In MLB, it means a lot more. In the NHL, more still. And, in soccer, where the average is fewer than three goals per game, a two-goal advantage is almost insurmountable.

Labels: , , , ,

Thursday, September 13, 2018

Are soccer goals scored less valuable than goals prevented?

During this year's World Cup of Soccer, I found a sabermetric soccer book discounted at a Toronto bookstore. It's called "The Numbers Game," and subtitled "Why Everything You Know About Soccer Is Wrong."

Actually, I don't know that much about soccer, but much of the book fails to convince me -- for instance, when the authors argue that defense is more important than offense:

"To see if attacking leads to more wins, and whether defense leads to fewer wins and more draws, we conducted a set of rigorous, sophisticated regression analyses on our Premier League data."

As far as I can tell, the regressions tried to predict team wins based on team goals scored and conceded. The results:

0.230 wins -- value of a goal scored
0.216 wins -- value of a goal conceded

The authors write,

"That means goals created and goals prevented contribute about equally to manufacturing wins in English soccer."

But, when it came to losses:

0.176 losses -- value of a goal scored
0.235 losses -- value of a goal conceded


"... defense provides a more powerful statistical explanation for why teams lose. ... when it came to avoiding defeat, the goals that clubs didn't concede were each 33 percent more valuable than the goals they scored."


The authors argue that 

(a) goals scored and conceded contribute equally to wins;
(b) goals conceded contribute more to losses than goals scored.

Except ... aren't those results logically inconsistent with each other?

Suppose you look at the last 20 games where Chelsea faced Arsenal. From (b), you would deduce, 

If Chelsea had scored one goal fewer, but also conceded one goal fewer, they'd probably have had fewer losses.

That's because, according to the author's numbers, the lost goal would have cost Chelsea 0.176 losses, but the goal prevented would have saved them 0.235 losses. Net gain: 0.059 fewer losses.

But Chelsea's goals scored are Arsenal's goals conceded, and vice versa. Also, Chelsea's losses are Arsenal's wins, and vice versa. So, you can rephrase that last quote as,

If Arsenal had conceded one goal fewer, but also scored one goal fewer, they'd probably have had fewer wins.

Except ... the authors just argued that goals scored and conceded are *equal* in terms of wins.

Without realizing it, the book simultaneously makes two contradictory arguments!


So why did the coefficents for goals scored and goals allowed come out so different in the regression? I think it's just random chance.

If a team scores 20 goals and concedes 20 goals, you'd expect them to win as many games as they lose. But that might not happen if the goals aren't evenly distributed over games. For instance, the team might have lost 19 games by a score of 1-0, while winning a 20th game 20-1. 

In other words, team wins and losses vary randomly from their goal differential expectation. If the teams that underperformed happened to be teams that scored more than they conceded, and the teams that overperformed happened to be teams that conceded more than they scored ... in that case, the regression notices that overperformance is correlated with defense, and adjusts accordingly. And you wind up with the result the authors got.

(Another source of error is that performance isn't linear in terms of goals; it's pythagorean. But that's probably a minor issue compared to simple randomness.)

I'd bet that, for the "wins" regression, there was no pattern for which teams randomly outperformed their win projections. But for the "losses" regression, there *was* that kind of pattern, where the teams with better defense did lose fewer games than projected.

I'd bet that if you grouped the games differently, and reran the regression, you'd get a different result. Instead of your regression rows being team-based, like "Chelsea's 38 games from 2007-08," make them time-based, like "the first four weeks of the 2007-08 schedule." That will scramble up the projection anomalies a different way, and I'd bet that the four coefficient estimates wind up much closer to each other.

Labels: ,

Thursday, May 03, 2018

NHL referees balance penalty calls between teams

That finding, from Michael Lopez, shows that the next penalty in an NHL game is significantly less likely to go to the team that's had more penalties so far in the game.

That was a new finding to me. A few years ago, I found that the next penalty is more likely to go to the team that had the (one) most recent penalty -- but I hadn't realized that quantity matters, too.

(My previous research can be found here: part one, two, three.)

So, I dug out my old hockey database and see if I could extend Michael's results. All the findings here are based on the same data as my other study -- regular season NHL games from 1953-54 to 1984-85, as provided by the Hockey Summary Project as at the end of 2011.


Quickly revisiting the old finding: referees do appear to call "make-up" penalties. The team that got the benefit of the most recent power play is almost 50 percent more likely to have the next call go against them. That team got the next penalty 59.7% of the time, versus only 40.3% for the previously penalized team.

39599/98167 .403 -- team last penalized
58568/98167 .597 -- other team

Now, let's look at total numbers of penalties instead. I've split the data into home and road teams, because road teams do get more penalties -- 52 percent vs. 48 percent overall.  (That difference is mitigated by the fact that referees balance out the calls. The first penalty of the game goes to the road team 54 percent of the time. The drop from 54 percent for the first call, down to 52 percent overall, is due to the referees balancing out the next call or calls.)

So far, nothing exciting. But here's something. It turns out that the *second* call of the game is much more likely than average to be a makeup call:

.703 -- visiting penalty after home penalty
.297 -- home penalty after home penalty

.653 -- home penalty after visiting penalty 

.347 -- visiting penalty after visiting penalty

Those numbers are huge. Overall, there are more than twice as many "make up" calls as "same team" calls.

In this case, quantity and recency are the same thing. Let's move on to the third penalty of the game, where they can be different.  From now on, I'll show the results in chart form:

.705 0-2 
.462 1-1
.243 2-0

Here's how to read the chart: when the home team has gone "0-2" in penalties -- that is, both previous penalties to the visiting team -- it gets 70.5% of the third penalties. When the previous two penalties were split, the home team gets 46.2%, similar to the overall average. When the home team got both previous penalties, though, it draws the third only 24.3% of the time (in other words, the visiting team drew 75.7%).

Here's the fourth penalty. I've added sample sizes, in parentheses.

.701 0-3 (755)
.559 1-2 (6951)
.373 2-1 (5845)
.261 3-0 (468)

It's a very smooth progression, from .701 down to .261, exactly what you would expect given that make-up calls are so common. 

Here's the fifth penalty:

.677 0-4 ( 195)
.619 1-3 (3244)
.465 2-2 (6950)
.351 3-1 (2306)
.316 4-0 ( 117)

That's the chart that corresponds to Michael Lopez's tweet, and if you scroll back up you'll see that these numbers are pretty close to his.

Sixth penalty:

.667 0-5 (  48)
.637 1-4 (1182)
.520 2-3 (4930)
.413 3-2 (4134)
.323 4-1 ( 773)
.226 5-0 (  31)

Again, the percentages drop every step ("monotonically," as they say in math).

Seventh penalty:

.692 0-6 (  13)

.585 1-5 ( 369)
.577 2-4 (2528)
.489 3-3 (4140)
.399 4-2 (1798)
.379 5-1 ( 219)
.200 6-0 (  13)

Eighth penalty:

.667 0-7 (   3)
.607 1-6 ( 122)
.588 2-5 ( 969)
.527 3-4 (2721)
.422 4-3 (2414)
.374 5-2 ( 652)
.412 6-1 (  68)
.000 7-0 (   1)

Still a perfect pattern.  It breaks up just a little bit here, for the ninth penalty, but that's probably just small sample size.

.000 0-8 (   1)
.553 1-7 (  38)
.586 2-6 ( 348)
.566 3-5 (1358)
.484 4-4 (2063)
.392 5-3 (1037)
.340 6-2 ( 191)
.333 7-1 (  21)

(This is getting boring, so here's a technical note to break the monotony. I included all penalties, including misconducts. I omitted all cases where both teams took a penalty at the same time, even if one team took more penalties than the other. In fact, I treated those as if they never happened, so they don't break the string. This may cause the results to be incorrect in some cases: for instance, maybe Boston takes a minor, then there's a fight and Montreal gets a major and a minor while Boston gets only a major. Then, Montreal takes a minor. In that case, the study will treat the Montreal minor as a make-up call, when it's really not. I think this happens infrequently enough that the results are still valid.)

I'll give two more cases. Here's the twelfth penalty:

.692 2-9 ( 13)
.623 3-8 ( 61)
.532 4-7 (250)
.506 5-6 (478)
.488 6-5 (459)
.449 7-4 (198)
.457 8-3 ( 35)
.200 9-2 (  5)

Almost perfect.  But ... the pattern does seems to break down later on, at the 14th to 16th penalty (I stopped at 16), probably due to sample size issues. Here's the fourteenth, which I think is the most random-looking of the bunch. You could almost argue that it goes the "wrong way":

.000  2-11 (  1)
.375  3-10 (  8)
.333  4- 9 ( 27)
.516  5- 8 ( 95)
.438  6- 7 (169)
.480  7- 6 (148)
.465  8- 5 ( 71)
.577  9- 4 ( 26)
.600 10- 3 (  5)

Still, I think the overall conclusion isn't threatened, that quantity is a factor in make-up calls.


OK, so now we know that quantity matters. But couldn't that mean that recency doesn't matter? We did find that the team with the most recent penalty was less likely to get the next one -- but that might just be because that team is also more likely to have a higher quantity at that point. After all, when a team takes three of the first four penalties, there's a 75 percent chance* it also took the most recent one. 

(* It's actually not 75 percent, because make-up calls make the sequence non-random. But the point remains.)

So, maybe the recency effect is just an illusion, by the quantity effect. Or vice versa.

So, here's what I did: I broke down every row in every table by who got the more recent call. It turns out: recency does matter.

Let's take that 3-for-4 example I just used:

.613 home team overall     (3244)
.508 after VVVH            ( 486)
.639 after other sequences (2758)

From this, it looks like there's both aspects here. When the home team is "up 3-1" in penalty advantage, it gets only 51 percent of the penalties if its previous penalty was the last of the four. That's still more than the 46.1 percent it gets to start the game, or the 46.5 percent it would get if it had been 2-2 instead of 3-1.

This seems to be true for most of the breakdowns -- maybe even all the ones with large enough sample sizes. I'll just arbitrarily pick one to show you ... the ninth penalty, home team 3-5.

.392 home team overall     (1037)
.362 when most recent was H (743)
.469 when most recent was V (294)

Even better: here's the entire chart for the eighth penalty: overall vs. last penalty went to home team ("last H") vs. last penalty went to visiting team "last V". 

overall   last H    last V
 .607      .750      .596      1-6 
 .588      .477      .609      2-5 
 .527      .446      .584      3-4 
 .422      .372      .518      4-3 
 .374      .357      .466      5-2 
 .412      .406      .500      6-1 

Clearly, both recency and quantity matter. Holding one constant, the other still follows the "make-up penalty" pattern. 

Can we figure out *how much* is recency and *how much* is quantity?  It's probably pretty easy to get a rough estimate with a regression. I'm about to leave for the weekend, but I'll look at that next week. Or you can download the results (speadsheet here) and do it yourself.

Labels: , , ,

Tuesday, February 20, 2018

How much of success in life is luck instead of skill?

How much of MLB teams' success is due to skill, and how much due to luck? We have a pretty good idea of the answer to that. But what about success in life, in general? If a person is particularly successful in their chosen field, how much of that success is due to luck?

That's the question Robert Frank asks in his 2016 book, "Success and Luck."  He believes that luck is a substantial contributor to success, as evidenced by his subtitle: "Good Fortune and the Myth of Meritocracy."

On the basic question, I agree with him that luck is a huge factor in how someone's life turns out. There is a near-infinite number of alternative paths our lives could have taken. If a butterfly had flapped its wings differently in China decades ago, I might not even exist now, never mind be sitting here typing this blog post.

In his preface, Frank favorably quotes Nicholas Kristof:

"America's successful people['s] ... big break came when they were conceived in middle-class American families how loved them, read them stories, and nurtured them .... They were programmed for success by the time they were zygotes."

But ... that's not a very practical observation, is it? Sure, I am phenomenally lucky that my parents decided to have sex that particular moment that they did, and that the winning sperm cell turned out to be me. In that light, you could say that luck explains almost 100 percent of my success. 

So, maybe a better question is: suppose I was born as me, but in random circumstances, in a random place and time. How much more or less successful would I be, on average?

As Frank writes:

"I often think of Birkhaman Rai, the young hill tribesman from Bhutan who was my cook long ago when I was a Peace Corps volunteer in a small village in Nepal. To this day, he remains perhaps the most enterprising and talented person I've ever met....

"... Even so, the meager salary I was able to pay him was almost certainly the high point of his life's earnings trajectory. If he'd grown up in the United States or some other rich country, he would have been far more prosperous, perhaps even spectacularly successful."

Agreed. Those of us who are alive in a wealthy society in 2017 are pretty much the luckiest people, in terms of external circumstances, of anyone in the history of the world.  For all of us, almost all of our success is due to having been born at the right time in the right place. 

But, again, that's not a very useful answer, is it? Even the most talented, hardest-working person would have nothing if he had been born in the wrong place and time, so you have to conclude that every successful person has been overwhelmingly lucky.

I think we have to hold our personal characteristics as a given, too. Because, almost everyone who is successful in a given field has far-above-average talent or interest in that field. I was lucky to have been born with a brain that likes math. Wilt Chamberlain was lucky to have been born with a genetic makeup that made him grow tall. Bach was born with the brain of a musical genius.

It gets even worse if you consider not just innate talent for a particular field, but other mental characteristics that we usually consider character rather than luck. Suppose you have an ability to work hard, or to persevere under adversity. Those likely have at least some genetic -- which is to say, random -- basis. So when someone with only average musical talent becomes a great composer by hard work, we can say, "well, sure, but he was lucky to have been born with that kind of drive to succeed."

Frank says:

"I hope we can agree that success is much more likely for people with talents that are highly valued by others, and also for those with the ability and inclination to focus intently and work tirelessly. But where do those personal qualities come from? We don't know, precisely, other than to say that they come from some combination of genes and the environment. ...

"In some unknown proportion, genetic and environmental factors largely explain whether someone gets up in the morning feeling eager to begin work. If you're such a person, which most of the time I am not, you're fortunate."

So, even if you got to where you are by working hard, Frank says, that's still luck! Because, you're lucky to have the kind of personality that sees the value of hard work.

I don't disagree with Frank that the kind of person you are, in terms of morals and virtues, is partly determined by luck. But, in that case, what *isn't* luck?


That's the problem with Frank's argument. Drill down deep enough, and everything is luck. You don't even need a book for that; I can do it in one paragraph, like this:

There are seven billion people in the world right now. Which one I am, out of those seven billion, is random, as far as I'm concerned; I had no say in which person I would be born as. Therefore, if I wind up being Bill Gates, the richest man in the world, I hit a 6,999,999,999 to 1 shot, and I am very, very lucky!

What Frank never explicitly addresses is: what kind of success does he consider NOT caused by luck? I don't think that anywhere, in his 200-page book, he even gives one example. 

We can kind of figure it out, though. At various points in the book, Frank illustrates his own personal lucky moments. There was the time he got his professor job at Cornell by the skin of his teeth (he was the last professor hired, in a year where Cornell hired more economics professors than ever before). Then, there was the time he almost drowned while windsurfing, but just in time managed to free himself from under his submerged sail. "Survival is somtimes just a matter of pure dumb luck, and I was clearly luck's beneficiary that day."

Frank's instances of luck are those that occurred on his path while he was already himself. He doesn't say how he was almost born in Nepal and destined for a life of poverty, or he was lucky that one of his cells didn't mutate while in the womb to make him intellectually disabled. 

I'll presume, then, that the luck Frank is talking about is the normal kind of career and life luck that most of us think about, and that the "your success is mostly luck because you were born smart" is just a rhetorical flourish.


We don't have a definition problem in our usual analysis of baseball luck, because we are careful to talk about what we consider luck and what we don't. For a team's W-L record, we specify that the "luck" we're talking about is the difference between the team's talent and the team's outcome. So, if a team is good enough to finish with an average of 88 wins, but it actually wins 95 games, we say it was lucky by 7 games.

We specifically ignore certain types of luck, such as injuries and weather and bad calls by the umpire. And, we specifically exclude certain types of luck, like how an ace pitcher randomly happened to meet and marry a woman from Seattle, which led him to sign at a discount with the Mariners, which meant that they wound up more talented than they would have otherwise.

By specifically defining what's luck and what's not, we can come up with a specific answer to the specific question. We know the difference between talent (as we define it) and luck (as we define it) can be measured by the binomial approximation to the normal distribution. So, we can calculate that the effect of luck is a standard deviation of about 6.4 games per season, and the variation in talent is about 9 games per season.

From that, we can calculate a bunch of other things. Such as: on average, a team that finishes with a 96-66 record is most likely a 91-71 team that got lucky. In other words, if the season were replayed again, like in an APBA simulation, that team would be more likely to finish with 91 wins than with 96.

I think that's the question Frank really wants to answer -- that if you took Bill Gates, and made him play his life over, he wouldn't come close to being the richest man in the world. He just had a couple of very lucky breaks, breaks that probably wouldn't have come is way if God rolled the dice again in his celestial APBA simulation of humanity.


Another reason to think that's what Frank means is that, when he gets down to mathematical business, that seems to be the definition he uses. There, he talks about luck as distinct from "skill" and "effort". 

When Frank does that, his view of success and luck is a lot like the sabermetrician's view of success and luck. We assume a person (or team) has a certain level of talent, and the observed level of success might be higher or lower than expectations depending on whether good luck or bad luck dominates.

In his Chapter 4, and its appendix, Frank tries to work that out mathematically.

Suppose everyone has a skill level distributed uniformly between 0 and 100, and a level of luck distributed uniformly between 0 and 100 (where 50 is average). And, suppose that the level of success is determined 95 percent by skill and 5 percent by luck.

Even though luck creates only 5 percent of the outcome, it's enough to almost ensure that the most skilled person winds up NOT the most successful. With 1,000 participants, the most skilled will "win" about 55 percent of the time. With 100,000 participants, the most skilled will win less than 13 percent of the time.

Frank gives an excellent explanation of why that happens:

"The most skilled competitor in a field of 1,000 would have an expected skill level of 99.9, but an expected luck level of only 50.
"It follows that the expected performance level of the most skillful of 1,000 contestants is P=0.95 * 99.9 + 0.05 * 50 = 97.4 ... but with 999 other contestants, that score usually won't be good enough to win.

"With 1,000 contestants, we expect that 10 will have skill levels of 99 or higher. Among those 10, the highest expected luck level is ... 90.9. The highest expected peformance socre among 1,000 contestants must therefore be at least P = 0.95 * 99 + 0.05 * 90.9 = 98.6, which is 1.2 points higher than the expected performance score of the most skillful contestant. 

"... The upshot is that even when luck counts only for a tiny fraction of total performance, the winner of a large contest will seldom be the most skillful contestant but will usually be one of the luckiest."*

(* I feel like I should point out that this sentence, while true, is maybe misleading. Frank is comparing the chance of being the *very highest* in skill with the chance of being *one of the highest* in luck. When skill is more important than luck (it's 19 times as important in Frank's example), it's also true (perhaps "19 times as true") that "the winner of a large contest will seldom be the luckiest contestant but will usually be one of the most skillful."  And, it's also true that "the winner of a large contest will seldom be the most skillful contestant, but even more seldom be the most lucky.")


So, the most skilled of 10,000 competitors will wind up the winner only 55 percent of the time. Doesn't that prove that success is largely due to luck?

It depends what you mean by "largely due to luck."  Frank's experiment does show that, often, the luckier competitor wins over the more skillful competitor. Whether that alone constitutes "largely" is up to you, I guess. 

You could argue otherwise. As it turns out, the competitor with the most skill is still the one most likely to win the tournament, with a 55 percent chance. The person with the most luck is much less likely to win. Indeed, in Frank's simulation, perfect luck is only a 2.5 point bonus over average luck. So if the luckiest competitor isn't in the top 5 percent of skill, he or she CANNOT win.

It's true that the most successful competitors were likely to have been very lucky. But it's not true that the luckiest competitors were also the most successful.

Having said that ... I agree that in Frank's simulation, luck was indeed important, and the winner of the competition should realize that he or she was probably lucky -- especially in the 100,000 case, where the best player wins only 13 percent of the time. But Frank doesn't just talk about winners -- he talks about "successful" people. And you can be successful without finishing first. More on that later.


A big problem with Frank's simulation is that the results wind up enormously overinflated on the side of luck. That's because he uses uniform distributions for both luck and skill, rather than a bell-shaped (normal) distribution. This has the effect of artificially increasing competition at the top, which makes skill look much less important than it actually is. 

Out of 100,000 people in Frank's uniform distribution, more than 28,000 are within 1 SD of the highest-skilled competitor. But in a normal distribution, that number would be ... 70. So Frank inflates the relevant competition by a factor of 400 times.

To correct that, I created a version of Frank's simulation that used normal distributions instead of uniform. 

What happened? Instead of the top-skilled player winning only 13 percent of the time, that figure jumped to 88 percent.

Still ... Frank's use of the uniform distribution doesn't actually ruin his basic argument. That's because he assumed only 5 percent luck, and 95 percent skill. This, I think, vastly understates the amount of luck inherent in everyday life. 

It's easy to see that luck is important. The important question is: *how* important? I don't know how to find the answer to that, and when I discovered Frank's book, I was hoping he'd at least have taken a stab at it.

But, since we don't know, I'm just going to pick an arbitrary amount of luck and see where that leads. The arbitrary amount I'm going to pick is: 40 percent luck, and 60 percent skill. Why those numbers? Because that's roughly the breakdown of an MLB team's season record. Most readers of this blog have an intuitive idea of how much luck there is in a season, how often a team surprises the oddsmakers and its fans.

In effect, we're asking: suppose there are 100,000 teams in MLB, with only one division. How often does the most talented team finish at the top of the standings?

The answer to that question appears to be: about 11 percent of the time. 

(That's pretty close to the 13 percent that Frank gave, but it's coincidence that his uniform distribution with a 5/95 luck/talent split is close to my normal distribution with a 40/60 split.)

Here's something that surprised me. Suppose now, instead of 100,000 competitors, you make the competition ten times as big, so there's 1,000,000. How often does the best competitor win now?

I would have expected it to drop significantly lower than 11 percent. It doesn't. It actually rises a bit, to 14 percent. (Both these numbers are from simulations, so I'm not sure they're "statistically significantly" different.)

Why does this happen? I think it's because of the way the normal distribution works. The larger the population, the farther the highest value pulls away from the pack. 

On average, the most talented one-millionth of the population are more than around 4.75 SD from the mean. Suppose the average of those is 4.9 SD. So, we'll say the best competitor out of a million is around 4.9 SD from the mean.

If "catching distance" is 0.7 SD, you need to be 4.2 SD from the mean, which means your main competition consists of 13 competitors (out of a million).

But if there are only 100,000 in the pool, the most talented player is only around 4.4 SD from the mean, and "catching distance" only 3.7 SD. How many competitors are there above 3.7 SD? About 11 (out of 100,000).

The more competitors, the farther out a lead the best one has, which means the fewer competitors there are with a decent chance to catch him.


I decided to use the larger simulation, with a million competitors. A couple of results:

On average, the top performer in the simulation was the 442nd overall in talent. At first that may sound like merit doesn't matter much, but 442nd out of one million is still the top one-fiftieth of one percent -- the 99.95 percentile.

Going the other way, if you searched for the top player by talent, how did he or she perform? About 99th overall, or the 99.99 percentile. 


We know (from Tango and others) that to get from observed performance to talent, we regress to the mean by this amount:

1 - (SD(talent)/SD(observed))^2

Assume SD(talent) = 60, and SD(luck) = 40. That means that SD(observed) = 72.1, which is the square root of 60 squared plus 40 squared.

So, we regress to the mean by 1-(60/72.1)^2, which is about 31 percent. 

If our top performer is at 4.9 SD observed, that's 72.1*4.9 = 353.29 units above average. Regressing 31 percent gives us an estimate of 243.77 units of talent. Since talent has an SD of 60, that's the equivalent 4.06 SD of talent.

That means if the top performer comes in at 4.9 SD above zero, his or her likeliest talent is 4.06 SD. That's about 27th out of a million, or some such.

In other words, the player with top performance should be around 27th in talent.

(Why, then, did the simulation come up with 442nd instead of 27th? I think it's because converting SDs to rankings isn't symmetrical when you can vary a lot.

For instance: suppose you wind up with two winners, one at 3.06 SD and one at 5.06 SD. The average of the SDs is 4.06, like we said. But, the 5.06 ranks first, while the 3.06 ranks 1000th or something. The average of the ranks doesn't wind up at 27 -- it's about 500.)


The book is called "Success and Luck," but it really could be called "Money and Luck," because when Frank talks about "success," he really means "high income."  The point about luck is to support his idea of a consumption tax on the rich.

Frank's argument is that successful people should be willing to put up with higher taxes. His case, paraphrased, goes like this: "Look, the ultra-rich got that way because they were very lucky. So, they shouldn't mind paying more, especially once they understand how much their success depended on luck, and not their own actions."

About half the book is devoted to Frank discussing his proposal to change the tax system to get the ultra-rich to pay more. That plan comes from his 1999 book, "Luxury Fever." There and here, Frank argues that the ultra-rich don't actually value luxuries for their intrinsic value, but, rather, for their ability to flaunt their success. If we tax high consumption at a high rate, Frank argues, the wealthiest person will buy a $100K watch instead of a $700K watch (since the $100K watch will still cost $700K after tax) -- but he or she will still be as happy, since his or her social competitors will also downgrade the price of their watch, and the wealthiest person will still have the most expensive watch, which was his or her primary goal in the first place.

So, the rich still get the status of their expensive purchases, but the government has an extra $600K to spend on infrastructure, and that benefits everyone, including the rich. 

There are only a few pictures in the book, but one of them is a cartoon showing a $150,000 Porsche on a smooth road, as compared to a $333,000 Ferrari on a potholed road. Wouldn't the rich prefer to spend the extra $183,000 on taxes, Frank asks, so that the government can pave the roads properly and they can have a better driving experience overall? 

Almost every chapter of the book mentions that consumption tax ... especially Chapter 7, which is completely devoted to Frank's earlier proposal.


Since money is really the topic here, it would be nice to translate luck into dollars, instead of just standard deviations. Especially if we want to make sure Frank's consumption tax burden is fair, when compared to estimates of luck.

If money were linear with talent, it would be easy: we just regress 31 percent to the mean, and we're done. But, it's not. Income accelerates all the way up the percentile scale: slowly at the bottom, but increasingly as you get to the top. 

If you look at the bottom 97% of income tax filers, their income goes from zero to about a million dollars. If income were linear, the top 3% would go from $1 million to $1.03 million, right? But it doesn't: it explodes. In fact, the top 3% go from $1 million to maybe $500 million or more. 

(Income numbers come from IRS Table 1.1 here, for 2015, and articles discussing the 400 highest-income Americans.)

That means plain old regression to the mean won't work. So, I ran another simulation.

Well, it's actually the same simulation, but I added one thing. I assigned each performance rank an income, based on the IRS table, in order down, as the actual value of "talent". I assumed the most talented person "deserved" $500 million, and that's what he or she would earn if there were no luck involved. I assigned the second most talented person $300 million, and the third $200 million. Then, I used the IRS table to assign incomes all the way down the list of the 1 million people in the simulation. I rescaled the table to a million people, of course, and I assumed income was linear within an IRS category.

(BTW, if you disagree with the idea that even the most talented individuals deserve the high incomes seen in the IRS chart, that's fine. But that's a separate issue that has nothing to do with luck, and isn't discussed in the book.)

With the IRS table, I was able to calculate, for all performance ranks, how much they "should have" earned if their luck was actually zero.

The best performer earned $500 million. How much would he or she have earned based on talent alone, and no luck? A lot less: $129 million. The second-place finisher earned $300 million but deserved only $78 million. The third-place finisher earned $100 million instead of $48 million.

So, the top three finishers were lucky by $371 million, $222 million, and $52 million, respectively.

The 4-10 finishers were lucky by an average of $62 million. 

The 11 to 100 finishers were lucky by less, only $40 million.

The 101 to 500 finishers were lucky by a bit more, $42 million each.

At this point, we're only at the first 500 competitors out of a million. You'd expect that the trend to continue, that the next few thousand high-earners would also have been lucky, right? I mean, we're still in the multi-million-dollar range.

But, no.

At around 500, luck turns *negative*. Starting there, the participants actually made *less* than their skill was worth.

Those who finished 501-1000 are still in the income stratosphere -- they're the top 0.05% to the top 0.1%, earning between $10 million and $2.3 million. But, on average, their incomes were $460,000 less than what each would have earned based on skill alone.

It continues unlucky from there. The next 8000 people -- that is, the top 0.2 to 0.9 percent -- lost significant income to luck, more than $250,000 each. It's not just random noise in the simulation, either, because (a) every group shows unlucky, (b) there's a fairly smooth trend, and (c) I ran multiple simulations and they all came out roughly equivalent.

Here's a chart of all the ranges, dollar figures in thousands:

     1-10   +$61107
   11-100   +$39906
  101-500   +$ 4227
 501-1000   -$  460
1001-2000   -$  503
2001-3000   -$  401
3001-4000   -$  320
4001-5000   -$  265
5001-6000   -$  135
6001-7000   -$  224
7001-8000   -$  178
8001-9000   -$  201

(My chart stops at 9,000, because 9000 was about all I could keep track of with the software I was using. I believe the results would soon swing from unlucky back to lucky, and stay lucky until the average income of around $68,000.)

If we believe the data, we find that it's true that the ultra, ultra rich benefitted from good luck, at least the top 0.05% of the population. The "only" ultra-rich, the 0.05 to 0.9 percentile, the vast majority of the "one percenters" -- those people actually *lost* income due to *bad* luck.

This surprised me, but then I thought about it, and it makes sense. It's a consequence of the fact that income rises so dramatically at the top, where the top 0.01 percent earn ten times as much as the next 0.99 percent.

Suppose you finish 3,000th in performance, earning $1 million. If you're 2500th in talent, you should have had $2 million. If you were 3500th in talent but lucky, you should have earned maybe $900,000.

If you were lucky, you gained $100,000. If you were unlucky, you lost $1 million. 

So if those two have equal probabilities (which they almost do, in this case), the unlucky lose much more than the lucky gain. And that's why the "great but not really great" finishers were, on average, unlucky in terms of income.


Here's a baseball analogy. 

Normally, we think of team luck in MLB in terms of wins. But, instead, think of it in terms of pennants. 

The team that wins the pennant was clearly lucky, winning 100% of a pennant instead of (say) the true 40% probability given its talent. The other teams must have all been unlucky.

Which teams were the *most* unlucky? Clearly not the second division, which wouldn't have come close to winning the pennant even if the winning team hadn't gotten hot. The most unlucky, obviously, must be the teams that came close. Those are that teams where, if the winning team had had worse luck, they would have been able to take advantage and finish first instead.

In our income simulation, the top 100 is like a pennant, since it's worth so much more than the rankings farther below. So, when a participant gets lucky and finishes in the top 100, where did the offsetting bad luck fall? On the participants who actually had a good chance, but didn't make it.

Suppose only the top 1 percent in skill have an appreciable chance to make the top 100 in income. That means that if the top 0.01 had good luck and made more than they were worth, it must have been the next 0.99 percent who had bad luck and made less than they were worth, since they were the only ones whose failure to make the top 100 was due to luck at all.


Frank does seem to understand that it's the very top of the scale that's benefitted disproportionately from luck. In 1995, he co-wrote a book called "The Winner-Take-All Society", which argues that, over time, the rewards from being the best rise much faster than the rewards from being the second best or third best.

Recapping that previous book, Frank writes:

"[Co-author Philip] Cook and I argued that what's been changing is that new technologies and market institutions have been providing growing leverage for the talents of the ablest individuals. The best option available to patients suffering from a rare illness was once to consult with the most knowledgeable local practitioner. But now that medical records can be sent anywhere with a single click, today's patients can receive advice from the world's leading authority on that illness.

"Such changes didn't begin yesterday. Alfred Marshall, the great nineteenth-century British economist, described how advances in transportation enabled the best producers in almost every domain to extend their reach. Piano manufacturing, for example, was once widely dispersed, simply because pianos were so costly to transport ...

"But with each extension of the highway, rail, and canal systems, shipping costs fell sharply, and at each stop production became more concentrated. Worldwide, only a handful of the best piano producers now survive. It's of course a good thing that their superior offerings are now available to more people. but an inevitable side effect has been that producers with even a slight edge over their rivals went on to capture most of the industry's income.

"Therein lies a hint about why chance events have grown more important even as markets have become more competitive ..."

In other words: these days, the best doctor nationally has taken business away from the best doctor locally. But, the best doctor is the best doctor in part because of luck. So, luck rewards the best doctor nationally, but hurts the best doctor locally. And the best doctor locally is still pretty successful, maybe one of the richest people in town.

Which is what we see here, that the "ultra rich" gained from luck, and the merely "very rich" were actually hurt by it. Frank writes about the first part, but ignores the second part.


Frank's implicit argument is that if people's success is more due to luck, it's more appropriate to tax them at a higher rate. I say "implicit" because I don't think he actually says it outright. I can't say for sure without rereading the book, but I think Frank's explicit argument is that if the rich are made to realize that they got where they were substantially because of good luck, they would be less resistant to his proposed high-rate consumption tax.

But if Frank *does* believe the lucky should pay tax at a higher rate, it follows logically that he has to also believe that the unlucky should pay tax at a lower rate. If Joe has been taxed more than Mary (at an identical income) because he was luckier, then Mary must have been taxed less than Joe because she was unluckier.

By Frank's own logic (but my simulation), that would mean that those who earned between $3,000,000 and $300,000 last year were unlucky, and deserve to pay less tax. I bet that's not what Frank had in mind.


Of course, the model and numbers are debatable. In fact, they're almost certainly wrong. The biggest problem is probably the assumption that luck is normally distributed. There must be thousands of cases where a bit of luck turns a skilled performer, maybe someone normally in the $100K range, into a multi-million-dollar CEO or something. 

But who knows who those people are? They must be the minority, if we continue to assume that talent matters more than luck. But how small a minority, and how can we identify them to tax them more?

Anyway, regardless of what model you use, it does seem to me that the "second tier" of success, whoever those are, must be unlucky overall. 

In most cases, when you look at whoever is at the top of their category, they were probably lucky. If they hadn't been, who would be at the top instead? Probably the second or third in the category. Steve Wozniak instead of Bill Gates. Betamax (Sony) instead of VHS (JVC). Al Gore instead of George W. Bush. 

It seems pretty obvious to me that Wozniak, Betamax, and Al Gore have been very, very successful -- but not nearly as successful as they could have been, in large part because of bad luck. 

The main point of "The Winner-Take-All Society" is that the lucky (rich) winner winds up with a bigger share of the pie compared to the unlucky (but still rich) second-best, the unlucky (but still pretty rich) third best, and so on. In other words, the more "winner take all" there is, the bigger the difference between first and second place. 

The same forces that make the winner's income that much more a matter of good luck, must make the second-place finisher's income that much more a matter of bad luck. In a "Winner-Take-All Society," where only pennants pay off ... that's where luck becomes less important to the second division, not more.

Labels: , , ,