Wednesday, July 24, 2013

Luck, careers, and income inequality

According to this (anonymous) blog post from "The Economist," increased income inequality is partly due to random chance becoming more important in picking winners and losers.  

The post (and accompanying article) cites a recent paper on "herd effects," where a song or book becomes popular kind of randomly, because people tend to flock to whatever other people have already flocked to.  So, you can have two songs of equal merit, but one is a smash hit just because, for random reasons, its early adopters had better social networks.

Furthermore, there's experimental evidence.  In a recent academic study, researchers sent job applications to two different employers.  The resumes were identical except that, in one of them, the applicant said he had been unemployed for several years, while in the other, the applicant said he had a current job.  

Overall, the "out of work" applicants got around half the callbacks of the currently employed applicants.  Presumably, the employers believed that those who were out of work were less likely to be suitable candidates than those who weren't.  

The author writes,

" ... the implication is troubling: someone who ends up unemployed through bad luck, and for some idiosyncratic reason doesn't quickly land a job, finds his chances of reemployment diminish until he’s part of the long-term unemployed. ... 
"... this research opens a new and troubling dimension on  inequality. Unlike deficiencies of skill, it’s hard to tell society's  losers they should go back to school to become luckier. "


Certainly, there a huge, huge amount of luck involved in career success. There has to be: there's already luck in hockey, golf, and chess, and real life is infinitely more random and complicated than those.

For starters, there's when and where you were born, what kind of schools you went to, what books you happened to read, what kind of influences you had from your friends and parents, how good your teachers were in various subjects, chance remarks from strangers about what careers are likely to be lucrative, and many more.

Even in the more concrete sense, there are many, many small random things that can make a big difference.  What company you choose to join for your first job, what your first manager thinks of you, how your bosses think of your performance amidst the noise of team accomplishments, whether your co-workers give you credit or blame behind your back ... and you can probably think of hundreds more.

Basically, where you wind up in your life, in your career or any other sense, is overwhelmingly the result of so many random events that it's basically ... well, it's kind of a miracle that we are where we are now instead of somewhere else.  If we had to live our lives over again, from any given point, we'd probably be in different jobs, doing different work, with a different spouse, maybe living in a different city, earning a different income.  

In that light, I find it a bit perplexing that the authors chose these two specific examples as evidence of how luck factors into it.  They seem pretty minor, don't they?  I mean, the unemployed applicants still got half as many calls as the employed ones ... that seems like a pretty small career stopper as opposed to, say, if their first manager didn't like them and put them into a dead-end position and never promoted them, among hundreds of other possible events.

As for the "herding" argument ... wouldn't it be the case that there's LESS herding luck now, with the internet?  It used to be that you only knew songs you heard on the radio.  Now, you can find whatever you want on YouTube, on demand. There's more opportunity to find a niche.  Isn't that established wisdom?  Isn't that what "The Long Tail" is about, that, these days, there are many different markets that never existed before?

In terms of "herding," I'd bet that luck is much LESS a factor than it used to be.  


Also: as much as luck is a huge, huge part of where any individual winds up, I don't think it's a big factor in the overall distribution.

In baseball, luck is, what, around 30 percent of the variation in team W-L percentage?  

Now, suppose you expanded the major leagues to include 30 little league teams.   Now, luck is a much smaller factor, isn't it?  The "real" teams always beat the kids, so, now, whether or not you're over .500 is completely a matter of talent.  Luck is still important within your group, but, overall, I'd bet the r-squared drops from 30 percent to ... I dunno, maybe 5 percent. 

Something similar happens with careers.  Two accountants, Andy and Beth, graduate from the same school with the same talent.  Andy winds up in a job where the culture matches his talents, while Beth does not.  Just by that random happening, Andy eventually winds up a CFO making $1 million a year, and Beth winds up in some middle-manager type position, making $70,000.  

Yes, Andy outearns Beth because of a lucky break.  But, the difference is contained within the accountants' income distribution, in the same way luck in our expanded MLB is contained within the teams in a single age level.

Luck matters more, obviously, when the differences in talent are small.  Within the subgroup of accountants, luck is going to matter a lot.  Within the subgroup of burger-flippers, luck is going to matter a lot.  But within the COMBINED group of accountants and burger-flippers, luck isn't that big a deal. You're not going to have many burger flippers, with average talent, getting lucky and outearning a typical accountant.

You can probably see this for yourself with a little introspection.  For my own part, I can see my career, and life, turning out radically differently based on a few key random events and decisions.  But ... I don't think I'd be poor.  I don't think I'd be making minimum wage.  I like programming, and I'd find a programming job somewhere, and even with the worst luck, I'd be making a decent living.  

Also, I don't think I'd be ultra-rich.  Even with extreme good luck -- say, 3 or 4 SDs above average -- I doubt I'd be a charismatic, celebrated CEO of a big company.  That requires skills and personality traits that I just don't have.

There's a lot of luck in individual baseball batting performance, but that doesn't mean you'll see Ozzie Smith hitting 60 home runs.  And there's a lot of luck in career outcomes too, but that doesn't mean you'll see lots of Bill Gateses working minimum wage.

At least, not because of the kind of luck the authors are talking about here.  


But, fundamentally ...  I wonder about the entire premise of the argument, that the prevalence of luck causes inequality to increase.  I wonder if it's the other way around.

A "system that rewards ability", as the author puts it, is an important and desirable goal.  A job, or promotion, or salary, or other reward, should go to the person who does the job best, not the one with the right resume, or the right skin color, or the right DNA.

Nor, by extension, to the one who happened to obtain the best roll of the dice.  

But, why does it follow that a system that rewards ability must also be more equal?  I think that's the wrong conclusion. It's understandable ... we see that some people get rewarded for being lucky, and we think, well, if only those people didn't get to be so rich, randomly, we'd all be more equal!  

But, I think that's backwards, at least in the hiring context. The more merit matters, the *bigger* the differences in salary.  Because, an employer is only going to pay you the expected value of your production.  

Let's use a simplified sports analogy.  

A team is trying to sign players, who are $100 on average ... but, some are worth more, perhaps much more, and some less.  

Suppose you have no scouts.  So, you have no idea of the players' relative talent (assume, also, that neither do the other teams, or the players themselves).  

That means you have to hire randomly.  What are you going to pay?  Obviously, $100 per player.  You have complete income equality among your players, even though the guys on the team got there by 100% luck.

Later, you hire some mediocre scouts.  You're able to tell the better players from the worse ones, to a certain extent, based on "whether they look like ballplayers".  So, you hire some at $130, and some at $70.   Less equality.

Finally, you hire some sabermetricians and some expert scouts, and now you can estimate each player's value within $10.  You hire the superstar at $400, and a mediocre mop-up man at $5.  Even less equality.

Isn't that a better description of how real life works?  You can't be a well-paid employee unless you can prove to an employer that you're worth it.  The more merit-based the system, the better chance you have of getting full value for what you're worth.

If you don't like that analogy ... try this.  There are a hundred brand new Honda Civics on the lot, and they all cost the same.  But, now, suppose you have more information about their merit -- somehow, it becomes possible to predict reliability.  That blue Civic over there will run trouble-free for 20 years. But that red one ... the transmission is going to blow at 80,000 miles.  

Now, the top 1% of Civics will sell for a lot more than the bottom 1%.  Right?


In general: the closer you look for merit, the more differences you find.  The more differences you find, the wider the distribution of the sum of those differences.   So, the more you hire by merit, the more spread you get in salaries.

Is that a problem?  Not for me.  I'm strongly in favor of hiring by merit ... and I don't think income inequality is necessarily always a bad thing.  So, to me, there's no serious trade-off. But, if, you like both merit and equality, I think you have to choose which is the lesser evil, in this context.

As a society, I think we've made our consensus trade-off ... because, there are many, many unspoken ways in which we deliberately ignore merit.  For instance, we pay our teachers by seniority and experience, rather than by how good teachers they are.  We try to avoid situations where long-term employees get demoted or has their pay cut, even when their performance drops.  We discourage comparisons of which employees are better workers than their colleagues in the same position.

We give lip service to the idea that we should be paid and rewarded by merit, but we don't actually believe it as much as we say we do.  We claim we can't have nepotism because then we're being unfair to candidates with more talent, but then we believe we CAN favor older, worse teachers, without being unfair to younger teachers with more merit.  

Like a lot of other things in our society, we start with the outcome we want and work backwards to rationalize it.  We hate nepotism, and we hate firing mediocre teachers.  Merit is just an incidental consideration.

Labels: , , ,

Tuesday, July 16, 2013

Is there evidence for a "hot hand" effect?

You're in the desert, exactly 10 miles south of home.  Instead of walking straight home, you decide to walk east one mile first.  How far away from home are you when you're done?

The answer: about 10.05 miles.  Your one-mile detour only added 0.05 miles to your return trip: one-half of one percent.

That's just a simple application of the Pythagorean theorem.  Draw a right triangle with sides 10 miles and 1 mile, and the hypotenuse will be 10.05 miles long.


That triangle thing is meant to explain why it might be difficult to find evidence for or against streakiness in baseball hitting expectations.

There's a belief that there are times when a player has a "hot hand", in the sense that he's somehow "dialed in" and is likely to perform better than his usual.  And, there might be periods when he's "cold," and should be expected to perform worse.  

Torii Hunter hit .370 in April, 2013.  Did he have a hot hand, or was he just lucky?  That's the question that we want to answer.  Maybe not specifically about Torii Hunter, but, in general ... when a player has a hot streak or a cold streak, is there something behind that?

The difficulty answering that is that there's a lot of luck involved, and it's hard to separate out the luck from the "hot hand" (if any).

If you assume each AB is independent of the previous one, then, over a month's worth -- say, 100 AB, which was Torii Hunter's total -- the SD of batting average, by luck alone, will be about 43 points.  That's massive ... it means that one time out of 20, a player will be 86 points off expected in a given month.

Now, suppose there's also a real "hot hand" effect, that's 1/10 the size of the luck effect.  Now, the SD of batting average, by luck and hot hands combined, will be one half of one percent higher, just like in the Pythagorean triangle example.  That's almost nothing -- .0002 in batting average.

Effectively, this level of hot hand is invisible, if it exists.

This finding might be good news to those who believe that players are often "on" or "off" (Morgan Ensberg is one of those believers).  But ... there are still things we *can* say about hot-hand effects, and those things won't be compatible with some of the ways people talk about streakiness.


First, let's try to make an intelligent guess on how much hot-handedness can remain invisible.  As we saw, 10 percent of luck is impossible to see because the SD difference is only 0.5 percent.  If hot hand variation were 20 percent of luck, the SD would go up by two percent.  If hot hands were 30 percent of luck, the SD would go up by around 4.5 percent.  If it were 40 percent of luck, the SD would go up by around 7.7 percent. 

Actually, let me quote Bill James (subscription required), from his recent essay on streakiness that inspired me to write this:

" ... how much causal variation is it reasonable to think might be completely hidden in a mix of causal and random variation? 

"Well, if it was 50-50, it would be extremely easy ... [i]f you add a second level of variation equal to the first, it will create obviously non-random patterns.

"If it 70-30—in other words, if the causal variation was roughly half the size of the random variation—that, again, would be easy to distinguish from pure random variation.    Even if it was 90-10, we should be able to distinguish between that and pure random variation.   If it was 99-1, maybe we would have a hard time telling one from the other."

I think Bill is a little optimistic on the 90-10 ... you're still talking one-half of one percent.  On 70-30, though, I think he's right.  That would be around a 9 percent increase in the standard deviation, which I think we'd be able to find without too much difficulty.

For the sake of this argument, let's say that the limit we can observe is ... I dunno, let's say, 75-25.  That would mean the SD of hot hands was 33 percent of the SD of luck, which means the overall SD goes up by around 5.3 percent.  I think we could find that if we looked.  I think.

I may be wrong ... substitute your own number if you prefer.


It might be legitimate for a hot-hand advocate could say to us, "well, you don't know for sure that there's no talent streakiness, because you admit that your methods can't pick it up."

Well, maybe.  But if you make that argument, don't forget to also limit yourself to hot hand effects that are 33 percent of luck.  That's around 14 points (again, that's batting average over 100 AB).

That means only one player in 20 will be more than 28 points off his expected batting average for any given month.  Only one player in 400 will be more than 42 points "hot" or "cold" for the month.

Are you prepared to live with that?  When you find ten players hitting 100 points off their expected for the month of August, you'd have to say, "Well, it's virtually impossible that *any* of them were *that* hot ... that's 7 standard deviations, which never happens.  They may have been a bit hot, but they were mostly lucky."

Seriously, are you prepared to say that?

There's more.  With a 75-25 mix, the correlation between "hot hand" and performance would be a bit over 0.3.  That means, if you find a player hitting 2 SD better than expected, his "hot hand" expectation is +0.6 SDs.  That's around 8 points.

Seriously.  When a career .250 player hits .340 in May, you'd have to be saying, "wow, I bet that player has a hot hand.  I bet in May, he was really a .258 hitter!!!"

My guess is, most "hot hand" enthusiasts wouldn't bite that bullet.  They don't just want "hot hands" to exist, they want them to be a big part of the game.  When they see a player on a hot streak, they want to believe that player is *significantly better*, that he's a force to be reckoned with.  "He's 8 points hot" is just not a very good narrative.

But that's pretty much what it amounts to.  If 75-25 is the right ratio, then only around TEN PERCENT of a player's observed "hotness" would come from a hot hand.  The other 90 percent would still be luck.


Now, one assumption in all those calculations is that "hot hands" are random and normally distributed.  Followers of streakiness would probably argue that's not the case.  Intuitively, it would seem that most players are just "themselves" most of the time, but, occasionally, someone will get exceptionally hot and really be 100 points better for a brief period.

For instance, maybe hothandedness manifests itself in outliers.  Maybe 38 out of 40 players are just normal.  But, the 39th player is cold and drops 60 points, while the 40th is hot, and gains those 60 points.   That still works out to the SD of around 14 points that we hypothesized as our limit.

If that's the case, then, the "check the SD method" would still fail to find anything.  

But ... now, we'd have other methods that would actually work.  We could count outliers.  If one player in 40 gains .060 every month, we should see a lot more exceptional months than we would otherwise.  

So, if that's how you think hot hands work, let us know.  Those kinds of hot hands *won't* be invisible.


And even if they exist at that level, it won't tell us much.  

Suppose a player has a month where he actually hits .060 better than normal.  Can you now say, "Ha!  See, that player had a hot hand!"

Nope.  Because, by the model, the chance of a player having a hot hand is 1 in 40.  But, the chance of a player having a +60 point month, just by luck, is, I think, something like 1 in 12.

A quick naive calculation is that, even in this extreme case, there's a 3-in-4 chance the player was just lucky!  (It's probably actually higher than that, if you do the proper calculation.)


So here's what I think we have:

1.  Moderate variations in talent might be impossible to disprove by the standard method.  For monthly "hot hands", that might be .014 points of batting average, as a guess.

2.  If those exist, they will rarely be higher than, say, 30 or 40 points.  So, when a .270 player hits .380 in August, it *can't* all be a hot hand.  It'll still be mostly luck.

3.  If hot hands manifest themselves in outliers, we should have no problem finding them.  The idea that players will regularly get "scorching hot" in talent ... well, I bet we could debunk that pretty easily.

4.  Even if hot hands do exist, unusual performances will be more driven by luck than by performance variations.  There will never be a case where you can look at a hot month, and have good reason to believe it wasn't just luck.

(Hat tip and more discussion: Tango)

Labels: , , ,

Saturday, July 13, 2013

Disputing Hakes/Sauer, part III

(This may not be of general interest ... it's mostly a technical addendum to part I and part II, which is where the meat of the argument is.   You should read those first, if you haven't already.)

To recap again: the Hakes/Sauer study ran a regression to predict the log of player salary from last season's "eye" (BB/PA), "bat" (batting average), and "power" (TB/H).  I argued that one of the problems with the regression is that they were using a geometric model for an arithmetic relationship.

Specifically: when it comes to salaries, research has shown that every expected walk is worth the same -- about $150,000, for a free agent.  But the Hakes/Sauer model had every walk increasing salary by a certain *percentage*.  That would make good players' walks appear to be worth more than mediocre players' walks, at the margin.

I just wanted to show evidence that that's happening.  In my own regression, where I used "next year salary value of offense" instead of just salary, I got these coefficients, which I reported last post:

       eye  bat power
2001   4.06 5.52 1.43
2002   5.82 3.53 1.20
2003   5.97 4.76 1.46
2004   8.48 4.76 1.46
2005   6.92 2.58 1.08
2006   7.61 5.98 0.82

Now, I'll give you the same chart, but with the sample divided into "full time" (400+ PA) and "part time" (130-399) players.

This is full time:

       eye  bat power
2001   7.49 8.94 1.98
2002   4.90 3.28 1.39
2003   5.10 9.97 1.21
2004   6.04 5.68 1.29
2005   4.43 5.26 1.26
2006   4.77 9.90 1.63

And this is part time:

       eye  bat power
2001   0.41 1.87 0.88
2002   6.89 3.84 1.10
2003   7.19 1.03 1.61
2004  11.58 4.63 1.07
2005  10.48 1.50 1.03
2006  10.33 3.60 0.77

There's a huge difference in coefficients for the two groups.   

The return on "eye" is consistently higher for part-time players than for full-time players, as I suspected.  It's the reverse, though, for batting average.  

How come?  Isn't the logic the same?  

Maybe the difference is this: when one part-time player has a higher batting average than another part-time player, it's probably just luck, and there's not much relationship to next year's value.  But when one part-time player has a higher *walk rate* than another part-time player, that's more likely to be real, and that comes out in next year's stats.  

Walk talent does vary more among players than batting average talent ... the SD of batting average, among players with 4000+ career AB, is about 21 points.  For walk average, it's about 30 points.  Furthermore, the season SD due to luck is about 50% higher for batting average (since hits happen more often than walks).  

So, overall, the "signal to noise ratio" is more than twice as high for walks as for hits.

I think that's what's going on, why the walk numbers show one effect and the batting average numbers show another.


Oh, and, in case it's not obvious: the entire "Moneyball" effect for walks comes from the part-time players.  There's no effect at all for the full-time players.   What drove the Hakes/Sauer result, I think, is that, in 2004 and 2005, part-time players with walks just happened to have good numbers the following year.  

Labels: , ,

Tuesday, July 02, 2013

Disputing Hakes/Sauer, part II

(continued from previous post)

To recap: the Hakes/Sauer study ran a regression to predict the log of player salary from last season's "eye" (BB/PA), "bat" (batting average), and "power" (TB/H).  From 2001 to 2006, they got these coefficients:

       eye  bat power
2001  0.53 5.28 0.84
2002  1.52 3.64 0.68
2003  2.12 3.07 0.57
2004  5.26 4.14 0.78
2005  4.19 5.38 0.86
2006  2.14 4.66 0.58

The salary return to "eye" was much higher in 2004 and 2005, immediately following the release of "Moneyball."   The authors conclude that's because GMs were quick to adjust their salary offers in light of the new information.

Last post, I argued why, in theory, the results and conclusion are doubtful at best.  Now, I'm going to argue using the data.


Let me tell you what I think is *really* going on.  I think it has little to do with valuation of walks.  I think that, in 2004 and 2005, it just turned out that the players who walked the most happened to be more valuable than normal.  

As an analogy ... the data show that general managers paid much more for cars in 2004 than in 2003.  But it's not because "Motorball" showed them that automobiles were underpriced.  It's because in 2003, they were buying Chevrolets, but in 2004, they were buying Cadillacs.  

Why do I think that?  Because I ran a similar regression to the authors, but using performance rather than salary.  And I got similar results.


Here's what I did.  Like the authors, I found all players in 2003 who had at least 130 plate appearances.  And I ran a regression on those players for 2004.  But: instead of trying to predict their 2004 *salary*, I tried to predict their 2004 *performance*.

Specifically, I tried to predict their "batting runs," which is the linear weights measure of their offensive performance.  However, I wanted to use the logarithm, like the authors did.  The problem is that runs can be negative, and you can't take the log of a negative number.

So, I converted batting runs to "runs above replacement," by adding 20 runs per 600 PA, and rounding any negative numbers to zero.  That still leaves a "log of zero" problem.  Since even "zero" players make a minimum salary equal to about 4/5 of a run, I added 0.8 to every player's total.  

Actually, when I got to that point, I figured, why not just convert to dollars?  So I took that "runs above replacement plus 0.8" and multiplied it by $500,000.  Effectively, I converted performance to "free agent value" -- what a team would have paid in salary if they had known in advance what the performance would be.  

And then, of course, I took the logarithm of that earned salary.  

(Notes: Unlike the Hakes and Sauer regression, mine didn't include a term for position, and I didn't include year dummies.  Also, I didn't include free-agent status, which doesn't matter much here because I used "earned salary" on the free agent scale for all players.  Oh, and I used only a player's first "stint" both years, just to save myself programming time.  I don't think any of those compromises affect the results too much.  

I did, however, include a variable for plate appearances, as the original study did.)

I ran my regression for every pair of seasons from 2000-01 to 2005-06.  Here are my coefficients:

       eye  bat power
2001   4.06 5.52 1.43
2002   5.82 3.53 1.20
2003   5.97 4.76 1.46
2004   8.48 4.76 1.46
2005   6.92 2.58 1.08
2006   7.61 5.98 0.82

My coefficients are higher than the originals.  That's because I used a fake salary exactly commensurate with the player's performance.  In real life, much of the performance is random, which means it wouldn't be reflected in salary.

That is: some mediocre players might hit .300 just by luck.  My study would value that .300 at face value.  In real life, though, that player would probably have been paid as the .250 hitter he really is, which means the real life coefficient would be smaller.

But my point isn't about the actual magnitude of the coefficients -- it's about the year-to-year trend.  For the "eye" coefficient that we're talking about here, what does the Hakes/Sauer study show? 

Small increases to 2004, then a big jump, then a small dropoff to 2005, then a bigger dropoff to 2006.  

Mine show almost exactly the same pattern, except for 2006.

Here, I'll put them in the same chart for you:

       H/S   Me
2001  0.53  4.06
2002  1.52  5.82
2003  2.12  5.97
2004  5.26  8.48
2005  4.19  6.92
2006  2.14  7.61

See that they're the same trend?  If not, run a regression on these two columns.  You'll see that the correlation coefficient is +.82.  That's significant at the .05 level (actually, .0465).  

(It's cheating to drop the last datapoint ... but, if you drop it anyway, the correlation rises to .96, and is now significant at 1 percent.)


What does this tell us?  It tells us that the observed effect remains *even when you don't look at the team's salary decisions*.  That is: the spike in 2004/5 is NOT about how much the teams paid the players.  It's about the players' performances.  It just so happens -- whether for random reasons or otherwise -- that walks were a particularly good predictor in 2003 of how well a player would do in 2004.  

Hakes and Sauer claim teams paid more for players with walks in 2004 because they were enlightened by Moneyball.  However, this analysis shows that the teams would have paid more for those players *even if Moneyball had never happened* -- because those players were simply better players, even under the old, flawed methods of evaluation.

The spike, I would argue, is not because teams chose to pay more for Chevrolets in 2004 and 2005.  It's because they just happened to buy more Cadillacs those years.


One loose end:   As I showed, there's a strong correlation between my "eye" column and the Hakes/Sauer "eye" column.  But there's no similar correlation for the other two columns.  Why not?

It's many factors combined.  One of them is probably the size of the effect.  For whatever reason, the values in the walk column jump around a lot more than the other two columns.  In the Hakes/Sauer regression, the range is huge ... from 0.53 to 5.26, a factor of nearly 10 times.  The other two columns are much narrower.

When I rerun my regression, I'm effectively taking the Hakes/Sauer regression, changing some of the stuff and removing some randomness and adding other randomness and leaving out some variables.  That shakes the sand castle around some, and many of the features get evened out a bit.  Only in the "eye" column was the original situation extreme enough to preserve the shape in the new situation.

More on this subject next post.

Labels: , ,

Monday, July 01, 2013

Disputing Hakes/Sauer, part I

The renowned 2003 book, "Moneyball," famously suggested that walks were undervalued by baseball executives.  Jahn K. Hakes and Raymond D. Sauer wrote a paper studying the issue, in which they concluded that teams immediately adapted to the new information.  They claim that, as early as the very next season, teams had adjusted their salary decisions to eliminate the market inefficiency and pay the right price for walks.

Hakes and Sauer’s claim seems to have been widely accepted as conventional wisdom, as far as I can tell.  A quick Google search shows many uncritical references.  

Here's Business Week from 2011 Here's Tyler Cowen and Kevin Grier from the same year.  This is J.C. Bradbury from one of his books, and later on the Freakonomics blog Here's David Berri from 2006 and Berri and Schmidt in their second book (on the authors' earlier, similar paper).  Here’s Berri talking about the new paper, just a couple of months ago.  Here's more and more and more and more

I reviewed the study back in 2007, but, on re-reading, I think my criticisms were somewhat vague.  So, I thought I’d revisit the subject, and do a bit more work.  What I think I found is strong evidence that what the authors found *has nothing to do with Moneyball or salary.*  There is no evidence there was an inefficiency, and no evidence that teams changed their behavior. 

Read on and see if you agree with me.  I’ll start with the intuitive arguments and work up to the hard numbers.


First, the results of the study.  Hakes and Sauer ran a regression to predict (the logarithm of) a player’s salary, based on three statistics they call "eye", "bat," and "power".  "Eye" is walks per PA.  "Bat" is batting average.  "Power" is bases per hit.

They predict this year’s salary based on last year’s eye/bat/power, on the reasonable expectation that a player’s pay is largely determined by his recent performance.  They included a variable for plate appearances, and dummy variables for year, position, and contracting status (free agent/arbitration/neither).

Here are the coefficients the authors found:

       eye  bat power
1986  0.69 2.26 0.22
1987  1.27 3.87 0.46
1988  0.20 2.76 0.37
1989  1.15 4.04 0.50
1990  1.48 1.75 0.63
1991  1.13 1.20 0.52
1992  0.40 2.76 0.57
1993  0.71 4.42 0.65
1994  0.36 4.78 0.86
1995  2.86 5.33 0.76
1996  0.78 1.85 0.73
1997  1.84 5.80 0.52
1998  2.21 4.23 0.74
1999  2.77 3.81 0.77
2000  2.72 5.30 0.73
2001  0.53 5.28 0.84
2002  1.52 3.64 0.68
2003  2.12 3.07 0.57
2004  5.26 4.14 0.78
2005  4.19 5.38 0.86
2006  2.14 4.66 0.58

Moneyball was published in 2003 … the very next season, the coefficient of "eye" -- walks -- jumped by a very large amount!  Hakes and Sauer claim this shows how teams quickly competed away the inefficiency by which players were undercompensated for their walks.

Those 2004/2005 numbers are indeed very high, compared to the other seasons.  The next highest "eye" from 1986 on was only 2.86.  It does seem, intuitively, that 2004 and 2005 could be teams adjusting their payroll evaluations.

But it’s not, I will argue.


First: it’s too high a jump to happen over one season.  At the beginning of the 2004 season, most players will have already been signed to multi-year contracts, with their salaries already determined.  You’d think any change in the market would have to show itself more gradually, as contracts expire over the following years and players renegotiate in the newer circumstances.

Using Retrosheet transactions data, I found all players who were signed as free agents from October 1, 2003 to April 1, 2004.  Those players wound up accumulating 40,840 plate appearances in the 2004 season.  There were 188,539 PA overall, so those new signings represented around 22 percent.

The Retrosheet data doesn’t include players who re-signed with their old team.  It also doesn’t include players who signed non-free-agent contracts (arbs and slaves).  Also, what’s important for the regression isn’t necessarily plate appearances, but player count, since Hakes and Sauer weighted every player equally (as long as they had at least 130 PA in 2003). 

So, from 22 percent, let’s raise that to, say, 50 percent of eligible players whose salary was determined after Moneyball. 
That means the jump in coefficient, from 2.12 to 5.26, was caused by only half the players.  Those players, then, must have been evaluated at well over 5.26.  If the overall coefficient jumped around 3 points, it must have been that, for those players affected, the real jump was actually six points.

Basically, Hakes and Sauer are claiming that teams recalibrated their assessment of walks from 2 points to 8 points.  That is -- the salary value of walks *quadrupled* because of Moneyball.

That doesn’t make sense, does it?  Nobody ever suggested that teams were undervaluing walks by a factor of four.  I don’t know if Hakes and Sauer would even suggest that.  That’s way too big.  It suggests an undervaluing of a free-agent walk by more than $100,000 (in today’s dollars). 

For full-time players, the SD of walks is around 18 per 500 AB.  That means your typical player would have had to have been misallocated -- too high or too low -- by $1.8 million.  That seems way too high, doesn’t it?  Can you really go back to 2003, adjust each free agent by $1.8 million per 18 walks above or below average, and think you have something more efficient than before?

Also: even if a factor of four happened to be reasonable, you’d expect the observed coefficient to keep rising, as more contracts came up for renewal.  Instead, though, we see a drop from 2004 to 2005, and, in 2006, it drops all the way back to the previous value!  Even if you think the effect is real, that doesn’t suggest a market inefficiency -- it suggests, maybe, a fad, or a bubble.  (Which doesn't make sense either, that "Moneyball" was capable of causing a bubble that inflated the value of a walk by 300 percent.)

In my opinion, the magnitude, timing, and pattern of the difference should be enough to make anyone skeptical.  You can’t say, "well, yeah, the difference is too big, but at least that shows that teams *did* pay more, at least for one year."  Well, you can, but I don’t think that’s a good argument.  When you have that implausible a result, it’s more likely something else is going on.

Suppose I ask a co-worker what kind of car he has, and he says, "well, I have three Bugattis, eight Ferraris, and a space shuttle."  You don’t leave his office saying, "well, obviously his estimate is too high, but he must at least have a couple of BMWs!"  (Even if it later turns out that he *does* have two BMWs.)


Second: the model is wrong.

We know, from existing research, that salary appears to be linear in terms of wins above replacement, which means it’s linear in terms of runs, which means it’s linear in terms of walks.  That is: one extra walk is worth the same number of dollars to a free agent, regardless of whether he’s a superstar or just an average player. 

The rule of thumb is somewhere around $5 million per win, or $500K per run.  That means a walk, which is worth about a third of a run, should be worth maybe around $150,000.  (Turning an out into a walk is more, maybe around $250,000.)

But the Hakes/Sauer study didn’t treat events as linear on salary.  They treated them as linear on the *logarithm* of salary.  In effect, instead of saying a walk is worth an additional $150K, they said a walk should be worth (say) an additional 0.5% of what the salary already is.

That won’t work.  It will try to fit the data on the assumption that, at the margin, a $10 million player’s walk is *ten times as valuable* as a $1 million player’s walk.

The other coefficients in the regression will artificially adjust for that.  For instance, maybe plate appearances takes over the slack … if double the plate appearances *should* mean 5x the salary, the regression can decide, maybe, to make it only 2x the salary.  That way, the good player’s walk may be counted at 10 times as much as it should, but his plate appearances will be counted at only 40 percent as much as they should. 

There are other factors that work in one direction or another.  For instance, a utility player’s walks actually *should* be worth less, since, with fewer plate appearances, differences between players are more likely to be random luck.  Also, the authors used walk *percentage*, and it takes fewer walks to increase walk percentage with fewer AB.  So, that will also work to absorb some of the "10 times" difference.

But there’s not guarantee all that stuff evens out … in fact, it would be an incredible coincidence if it did. 

So that means that the coefficient of walks now means something other than what you think it means.  And, so, when you have the coefficient of a walk jumping between seasons … you can’t be sure it’s really measuring the actual salary assigned to the walk.  It could be just a difference in the distribution of plate appearances, or one of a thousand other things.

Again, I would argue that this flaw -- on its own -- is enough to have us reject the conclusions of the study.  When you try to fit a linear relationship to a non-linear regression -- or vice versa -- all bets are off.  The results can be very unreliable.  I bet I could create an artificial example where walks would appear be worth almost any reasonable-sounding value you could name.


These two objections are nice in theory, but I bet they won’t convince many people who already believe the study’s conclusions are correct.  My arguments sound too conjectural, too nitpicky.  There, you have a real study with hard numbers and confidence intervals, and, here, you just have a bunch of words about why it shouldn't work.  

So, next post, I’ll get into the numbers.  Instead of arguing about why my coworker's sarcasm shouldn't be used as evidence, I'll try to actually show you his driveway.

Labels: , ,