Sabermetric Research

Thursday, May 12, 2016

How did Leicester City do it?

At the start of the season, you could get 5000:1 odds on Leicester City F.C. winning the 2015-16 English Premier League Championship. Of course, Leicester did win, in what one writer called "the unlikeliest feat in sports history".

A friend wrote me that Leicester City is often said to have "defied the odds" to finish on top. That's a metaphor, of course; odds aren't something you can literally "defy," like a bad law or your supervisor's instructions. What does the metaphor mean? To me, it implies that the odds actually *were* 5000:1, that the team did actually hit the longshot outcome, that they were the "1" instead of one of the "5000".

Let's suppose, for whatever reason, I offer you 5000:1 odds that a fair coin will land heads. You bet $10, the coin does land heads, and I pay you $50,000. Did you really "defy" the 5000:1 odds? That doesn't sound right. At best, you "defied" odds of 1:1.

So, the question is: at the start of the year, was Leicester City's expectation really a 1 in 5,001 chance? Were the Foxes really that bad a team, in terms of talent? The evidence suggests not.

After 17 of the season's 38 matches, Leicester City sat at the top of the table (I think "top of the table" is English for "first in the standings"), two points up on second-place Arsenal, and fourth overall in goal differential (+13, behind teams at +17, +14, and +14).

Suppose Leicester were truly a bad team, and had just had a run of good luck. In that case, what would their chance be of hanging on to win the championship? Still pretty poor, right? They're only two points up, with several superior teams right on their tail.

But, the bookmakers had them solidly in the mix. Here are the revised odds on December 21, after 17 matches:

Odds Pts
--------------------------------
1. Leicester 10:1 38
2. Arsenal 10:11 36
3. Manchester City 15:8 32
4. Tottenham 20:1 29
5. Manchester United 18:1 29
--------------------------------
9. Liverpool 22:1 24
--------------------------------
15. Chelsea 66:1 18

Clearly, Leicester City is still considered a lower-quality team than its rivals. Despite Arsenal trailing by two points, bookmakers still give it ten times as much chance of winning as Leicester.

But ... the odds against Leicester hanging on were only 10:1. If the Foxes were really as gawd-awful a team as was thought at the beginning, their odds would be much worse.

After every Premier League season, the bottom three teams are "relegated" to the lower-tier Championship League, while that lower league's best three teams are promoted to replace them. At the beginning of the season, Leicester City was thought to have a 25 percent chance of relegation -- you could get 3:1 odds that they'd be in the bottom three. If they were still thought to be that bad, wouldn't they be much worse than 10:1 to win it all?

For a mirror-image comparison, look at Chelsea, one of the league's elite teams. After those first 17 matches, Chelsea sat 20 points behind Leicester, in fifteenth place out of twenty teams. Despite the poor start, they were still given a 1-in-67 chance (66:1 against) of coming all the way back to take first place. That's because Chelsea was understood to be a very skilled team -- they were the 13:8 favorite when the season started.

Now: if Leicester were as bad as Chelsea were good, you'd think the chance of them dropping to relegation would be about the same as the odds of Chelsea rising to the top. Right? It's kind of symmetrical. Not completely, because Leicester is at the top and Chelsea is only *near* the bottom. But, that's mitigated by Chelsea needing to be *the* top team, and Leicester City needing only to be in the bottom four.

Not perfectly symmetrical, but reasonable.

But: the symmetry doesn't extend to those mid-season odds. A Chelsea comeback was pegged at 66-1. But a Leicester collapse was 3500:1.

Clearly, by December 21, the betting market evaluated that Leicester was a pretty good team. Not a great team, but a good team. For more evidence of that, they're pegged at around 25:1 to repeat next year. The bookmakers clearly don't think Leicester City was a bad team that just got very, very, very lucky.

So, I'd argue that Leicester didn't "defy the odds." They were just a better team than 5000:1 from the beginning.

--------

Well, that might not be strictly true. Maybe as the season started, they were an awful team, but they got better quickly. Maybe in week 2, they signed the soccer equivalent of Babe Ruth and Wayne Gretzky and Peyton Manning and Michael Jordan and Pele. (Sorry, I don't know anything about soccer ... "Pele" was the best I could do.) Or maybe the coach figured out a strategy to take a bunch of mediocre players and make them great (or make the team able to win despite the players' mediocrity).

But, it seems more likely to me that the team was just good from the beginning, and the oddsmakers got it wrong. Well, more importantly, the community of betting soccer fans got it wrong. Because, you'd think, if even a few sharp bettors figured out that Leicester was a pretty good team, they would have moved the odds -- not just by betting on a championship, but probably on available side bets too.

So, what happened? That's a huge question. I think this is the biggest betting market inefficiency I've ever seen. It's not like the 1969 Mets, who probably *did* just get lucky ... or the 1980 "Miracle on Ice" team, who, when you watch the game, you can tell were very much inferior but very much lucky. This is a case where a legitimately very good team got evaluated as very bad -- by *everyone*, even the best sabermetric soccer types and bettors.

Of course, Leicester City isn't actually the most talented team in the Premier League -- at least, not according to betting markets. For next season, the Foxes' 25:1 odds rank only seventh -- the favorite, Manchester City, is at 3:2. Ignoring vigorish, the oddsmakers think Man City has ten times the chance than Leicester does.

Assuming that Leicester should have been that same 25:1 this season instead of 5000:1 ... well, that still seems like an exploitable opportunity that, in theory, never should have happened in an efficient betting market.

So, what did happen? How did everyone miss that Leicester City was a good, if not great, team? That's not a question about oddsmaking. It's a question about soccer. What happened? What made Leicester so good, that nobody had seen coming? How did they build such a successful team on their very low budget? Is there a "Moneyball" secret?

I've seen some articles that broke down some stats, about passing and shots on goal and such. But this is a situation where we need more than that. It's like, if an expansion MLB team goes 105-57 with castoff players, it doesn't help much to say, "well, they won because they had a high OPS and their pitchers struck out a lot of guys." The question is -- how did they get those replacement-level players to have that high OPS and strike out a lot of guys? Was it something they saw in those players? Was it coaching? Was it sign stealing?

-----

OK, after wondering about all this, I figured, hey, the internet exists, maybe I should do some research. (Which consisted of Googling, and getting advice from my friends.)

And, yes, there seems to be an explanation. Apparently there is indeed a bit of a Moneyball story here. My friend John steered me to an article by Leicester City fan John Micklethwait (who this year neglected to make his annual 20 pound bet on Leicester, and missed out on what would have been a 100,000 pound win).

Leicester City made three major acquisitions in the off-season, all with "Moneyball" overtones of underappreciated players. First, N’Golo Kanté, who was number one in France last year in the (apparently overlooked) statistic of interceptions made. Second, Jamie Vardy, who is known for speed (which Leicester used to strategic advantage, as we will see). And, third, Riyad Mahrez, who is known for "a rare ability to dribble past people."*

(* Correction: two of the three were actually signed by Leicester in earlier seasons. See note at end of post.)

They got those guys really cheap.

Then, they analyzed video, and came up with this:

"Leicester players even seem to foul scientifically, slowing down their opponents by taking turns to obstruct them, so that few of the Leicester players get booked or sent off."

And, finally, the most interesting strategic twist: the rapid counterattack.

"This in itself is another innovation. All teams have always counterattacked, but few have based their game so completely around it. In most matches, the team that keeps control of the ball more scores more goals. Teams like Barcelona and Arsenal are famous for never letting their opponents touch it. Not Leicester. Last weekend, Swansea had possession 62 percent of the time, but they still lost 4-0. Leicester’s tactic is to let their opponents have the ball, wait until they make a mistake and then attack at remarkable speed: Hence all those quick players and the unusual disciplined approach."

Well, I love that theory! Because, it's what I argued the Toronto Maple Leafs might have been doing a couple of years ago, when they made the playoffs despite possession stats near the bottom of the league.

Not only do I like that theory the best, but I also believe it's the most plausible. Because, when the Foxes bought those three players, it was public knowledge. It was no secret that Kanté is a great tackler, and Vardy is crazy fast, and Mahrez has dribbling skills (videos are easily found on Google). So, the odds should have accurately reflected those acquisitions.

On the other hand, the counterattacking strategy? Probably, nobody knew manager Claudio Ranieri was going to try that tactic until the season was underway. Even then, it would take a while before it became apparent how well it worked. So, that *could* explain why even the most knowledgeable soccer experts didn't see it coming at all.

For what it's worth, here's an article about Leicester's counterattacking strategy, with accompanying video of some of their quick transition goals. And, something my friend Bob wrote me, from observation:

"They used the counterattack as their primary mode of offense. They had several wins early in the year with possession below 30%. Manager Claudio Ranieri would frequently position one or two players close to midfield on opponents' corner kicks and free kicks in order to better exploit his team's speed advantage. As the season progressed, teams adapted to this and Leicester's possession totals increased. Leicester adapted by tightening up its defense, winning a string of 1-0 games (4 out of 5 in one stretch)."

-------

From all this, here's my wild-ass bottom line, the working hypothesis that my imperfect Bayesian brain is pulling out of its metaphorical butt:

1. Leicester City improved over the off-season by pulling a Moneyball, by acquiring underappreciated players at bargain prices.

2. They implemented a novel strategy emphasizing speedy counterattacking, and it worked, but became less effective as the opposition recognized it and learned to adapt to it.

3. They did play unexpectedly well, in the traditional sense, apart from the strategy.

4. That unexpectedly good play might have been playing over their heads. They were significantly luckier than their talent, judging by the odds during and after the season.**

(**Any championship team is, in retrospect, likely to have played better than its talent, but I'm arguing in this case for even more luck than for a usual champion.)

It's kind of a vague hypothesis, I know, a little bit of everything. But my best guess is ... it *was* a little bit of everything. Because: (a) can you really turn a bad team into a champion with just three players? (b) the odds insist Leicester was luckier than its talent; and (c) even if you discount the opinions of observers, the stats show the Leicester did repeatedly win with very low possession time.

-------

So, what does that mean for next year?

The improvement in players (#1) will remain, if the club doesn't sell them off. And, of course, we know luck doesn't persist (#3 and #4).

That leaves #2, the counterattack. Will the strategy continue to work, or will the opposition adapt to it enough that the advantage will dwindle? Normally, I'd just check the betting market, but I'm not sure what the odds are telling us. As mentioned, Leicester is only the seventh favorite to win next year, at 25:1. That's much smaller than 5000:1, for sure. But how much of the difference is from the skill of their new players, and how much is from an expectation that a less traditionally-skilled team can still win by implementing a disruptive counterattack strategy?

-------

In any case, I think the big story here isn't that Leicester City beat 5000:1 odds. I think the big story here is how Leicester City found a "Moneyball" way to beat the system on a low budget.

I'd argue that this is the "real" Moneyball story, the one we were theoeretically waiting for.

The original story, about the 2002 Oakland A's, isn't that impressive to me. Sure, the 2002 Oakland A's won 103 games on a low budget, but we kind of know how they did it. Yes, they used sabermetrics, but those gains were marginal. Most of their advantage was having a supply of excellent, pre-free-agent players who came cheap, as well as a large dose of luck. (I don't have public luck estimates handy for 2002, but I once figured the A's were lucky by 12 wins that year. Seven of those were from beating their Pythagorean Projection.)

This Leicester City story is different. This is a legitimately bad team, picking up three "free agents" who were legitimately undervalued and overlooked, and then implementing a system that effectively overcame the skill advantage of some of the best and most expensive football talent on the planet.

Even if only half of Leicester City's improvement was Moneyball, and the other half was luck ... well, even then, the Foxes of Leicester City created millions of pounds worth of wins out of basically nothing.

Could that really be what happened?

-----
UPDATE: James Yorke on Twitter has pointed out that two of the three players I mentioned have been with Leicester more than one season. Jamie Vardy was actually signed in 2012, and Mahrez in 2014. Only Kanté was new for 2015-16.

Mr. Yorke also points out that other, lesser players were signed over the past few years, too.

So, Kanté was the main signing in the most recent off-season. This suggests that most of Leicester's success is #2-#4, with less of it being #1 (being mostly Kanté).

Labels: football, football -- but English football that's actually soccer and not NFL or CFL or aussie rules, Leicester City, luck, Moneyball, Premier League, soccer

Monday, July 01, 2013

Disputing Hakes/Sauer, part I

The renowned 2003 book, "Moneyball," famously suggested that walks were undervalued by baseball executives. Jahn K. Hakes and Raymond D. Sauer wrote a paper studying the issue, in which they concluded that teams immediately adapted to the new information. They claim that, as early as the very next season, teams had adjusted their salary decisions to eliminate the market inefficiency and pay the right price for walks.

Here's the paper (.pdf): "The Moneyball Anomaly and Payroll Efficiency: A Further Investigation."

Hakes and Sauer’s claim seems to have been widely accepted as conventional wisdom, as far as I can tell. A quick Google search shows many uncritical references.

Here's Business Week from 2011. Here's Tyler Cowen and Kevin Grier from the same year. This is J.C. Bradbury from one of his books, and later on the Freakonomics blog. Here's David Berri from 2006 and Berri and Schmidt in their second book (on the authors' earlier, similar paper). Here’s Berri talking about the new paper, just a couple of months ago. Here's more and more and more and more.

I reviewed the study back in 2007, but, on re-reading, I think my criticisms were somewhat vague. So, I thought I’d revisit the subject, and do a bit more work. What I think I found is strong evidence that what the authors found *has nothing to do with Moneyball or salary.* There is no evidence there was an inefficiency, and no evidence that teams changed their behavior.

Read on and see if you agree with me. I’ll start with the intuitive arguments and work up to the hard numbers.

-----

First, the results of the study. Hakes and Sauer ran a regression to predict (the logarithm of) a player’s salary, based on three statistics they call "eye", "bat," and "power". "Eye" is walks per PA. "Bat" is batting average. "Power" is bases per hit.

They predict this year’s salary based on last year’s eye/bat/power, on the reasonable expectation that a player’s pay is largely determined by his recent performance. They included a variable for plate appearances, and dummy variables for year, position, and contracting status (free agent/arbitration/neither).

Here are the coefficients the authors found:

eye bat power

---------------------

1986 0.69 2.26 0.22

1987 1.27 3.87 0.46

1988 0.20 2.76 0.37

1989 1.15 4.04 0.50

1990 1.48 1.75 0.63

1991 1.13 1.20 0.52

1992 0.40 2.76 0.57

1993 0.71 4.42 0.65

1994 0.36 4.78 0.86

1995 2.86 5.33 0.76

1996 0.78 1.85 0.73

1997 1.84 5.80 0.52

1998 2.21 4.23 0.74

1999 2.77 3.81 0.77

2000 2.72 5.30 0.73

2001 0.53 5.28 0.84

2002 1.52 3.64 0.68

2003 2.12 3.07 0.57

2004 5.26 4.14 0.78

2005 4.19 5.38 0.86

2006 2.14 4.66 0.58

Moneyball was published in 2003 … the very next season, the coefficient of "eye" -- walks -- jumped by a very large amount! Hakes and Sauer claim this shows how teams quickly competed away the inefficiency by which players were undercompensated for their walks.

Those 2004/2005 numbers are indeed very high, compared to the other seasons. The next highest "eye" from 1986 on was only 2.86. It does seem, intuitively, that 2004 and 2005 could be teams adjusting their payroll evaluations.

But it’s not, I will argue.

--------

First: it’s too high a jump to happen over one season. At the beginning of the 2004 season, most players will have already been signed to multi-year contracts, with their salaries already determined. You’d think any change in the market would have to show itself more gradually, as contracts expire over the following years and players renegotiate in the newer circumstances.

Using Retrosheet transactions data, I found all players who were signed as free agents from October 1, 2003 to April 1, 2004. Those players wound up accumulating 40,840 plate appearances in the 2004 season. There were 188,539 PA overall, so those new signings represented around 22 percent.

The Retrosheet data doesn’t include players who re-signed with their old team. It also doesn’t include players who signed non-free-agent contracts (arbs and slaves). Also, what’s important for the regression isn’t necessarily plate appearances, but player count, since Hakes and Sauer weighted every player equally (as long as they had at least 130 PA in 2003).

So, from 22 percent, let’s raise that to, say, 50 percent of eligible players whose salary was determined after Moneyball.

That means the jump in coefficient, from 2.12 to 5.26, was caused by only half the players. Those players, then, must have been evaluated at well over 5.26. If the overall coefficient jumped around 3 points, it must have been that, for those players affected, the real jump was actually six points.

Basically, Hakes and Sauer are claiming that teams recalibrated their assessment of walks from 2 points to 8 points. That is -- the salary value of walks *quadrupled* because of Moneyball.

That doesn’t make sense, does it? Nobody ever suggested that teams were undervaluing walks by a factor of four. I don’t know if Hakes and Sauer would even suggest that. That’s way too big. It suggests an undervaluing of a free-agent walk by more than $100,000 (in today’s dollars).

For full-time players, the SD of walks is around 18 per 500 AB. That means your typical player would have had to have been misallocated -- too high or too low -- by $1.8 million. That seems way too high, doesn’t it? Can you really go back to 2003, adjust each free agent by $1.8 million per 18 walks above or below average, and think you have something more efficient than before?

Also: even if a factor of four happened to be reasonable, you’d expect the observed coefficient to keep rising, as more contracts came up for renewal. Instead, though, we see a drop from 2004 to 2005, and, in 2006, it drops all the way back to the previous value! Even if you think the effect is real, that doesn’t suggest a market inefficiency -- it suggests, maybe, a fad, or a bubble. (Which doesn't make sense either, that "Moneyball" was capable of causing a bubble that inflated the value of a walk by 300 percent.)

In my opinion, the magnitude, timing, and pattern of the difference should be enough to make anyone skeptical. You can’t say, "well, yeah, the difference is too big, but at least that shows that teams *did* pay more, at least for one year." Well, you can, but I don’t think that’s a good argument. When you have that implausible a result, it’s more likely something else is going on.

Suppose I ask a co-worker what kind of car he has, and he says, "well, I have three Bugattis, eight Ferraris, and a space shuttle." You don’t leave his office saying, "well, obviously his estimate is too high, but he must at least have a couple of BMWs!" (Even if it later turns out that he *does* have two BMWs.)

--------

Second: the model is wrong.

We know, from existing research, that salary appears to be linear in terms of wins above replacement, which means it’s linear in terms of runs, which means it’s linear in terms of walks. That is: one extra walk is worth the same number of dollars to a free agent, regardless of whether he’s a superstar or just an average player.

The rule of thumb is somewhere around $5 million per win, or $500K per run. That means a walk, which is worth about a third of a run, should be worth maybe around $150,000. (Turning an out into a walk is more, maybe around $250,000.)

But the Hakes/Sauer study didn’t treat events as linear on salary. They treated them as linear on the *logarithm* of salary. In effect, instead of saying a walk is worth an additional $150K, they said a walk should be worth (say) an additional 0.5% of what the salary already is.

That won’t work. It will try to fit the data on the assumption that, at the margin, a $10 million player’s walk is *ten times as valuable* as a $1 million player’s walk.

The other coefficients in the regression will artificially adjust for that. For instance, maybe plate appearances takes over the slack … if double the plate appearances *should* mean 5x the salary, the regression can decide, maybe, to make it only 2x the salary. That way, the good player’s walk may be counted at 10 times as much as it should, but his plate appearances will be counted at only 40 percent as much as they should.

There are other factors that work in one direction or another. For instance, a utility player’s walks actually *should* be worth less, since, with fewer plate appearances, differences between players are more likely to be random luck. Also, the authors used walk *percentage*, and it takes fewer walks to increase walk percentage with fewer AB. So, that will also work to absorb some of the "10 times" difference.

But there’s not guarantee all that stuff evens out … in fact, it would be an incredible coincidence if it did.

So that means that the coefficient of walks now means something other than what you think it means. And, so, when you have the coefficient of a walk jumping between seasons … you can’t be sure it’s really measuring the actual salary assigned to the walk. It could be just a difference in the distribution of plate appearances, or one of a thousand other things.

Again, I would argue that this flaw -- on its own -- is enough to have us reject the conclusions of the study. When you try to fit a linear relationship to a non-linear regression -- or vice versa -- all bets are off. The results can be very unreliable. I bet I could create an artificial example where walks would appear be worth almost any reasonable-sounding value you could name.

---------

These two objections are nice in theory, but I bet they won’t convince many people who already believe the study’s conclusions are correct. My arguments sound too conjectural, too nitpicky. There, you have a real study with hard numbers and confidence intervals, and, here, you just have a bunch of words about why it shouldn't work.

So, next post, I’ll get into the numbers. Instead of arguing about why my coworker's sarcasm shouldn't be used as evidence, I'll try to actually show you his driveway.

UPDATE: Here's Part II.

Labels: baseball, economics, Moneyball

Tuesday, February 14, 2012

Two new "Moneyball"-type possibilities

I'm usually doubtful that significant "Moneyball"-type inefficiencies still exist in sports. But, recently, two possibilities came up that got me wondering.

First, in a discussion about baseball player aging, commenter Guy suggested that there are lots of good young players kept in the minors when they're good enough to be playing full-time in the majors. He mentions Wade Boggs, whom the Red Sox held back in the early 80s in favor of Carney Lansford.

It's certainly a possibility, especially when you consider the Jeremy Lin story. Of course, baseball and hockey are different from basketball and football, because they have minor leagues in which players get to show their stuff. But, still.

Second, and even bigger, is something Gabriel Desjardins discovered.

For the past several seasons, the NHL has been keeping track of the player who draws a penalty -- that is, the victim who was fouled. Desjardins grabbed the information and tallied the numbers.

Most of the players near the top of the list are who you would expect -- Crosby, Ovechkin, and so on. But the runaway leader is Dustin Brown, of the Los Angeles Kings.

Over the past seven seasons, Brown drew 380 opposition penalties. Ovechkin was second, with 255; Ryan Smyth was twentieth, at 181.

That means the difference between first and second place was almost twice the difference between second and twentieth place. Dustin Brown is exceptionally good at getting his team a power play.

Desjardins writes,

"Incidentally, 380 non-coincidental penalties is worth roughly $33M in 2012 dollars relative to the league average, and quite a bit more relative to replacement level. ... Dustin Brown has made roughly $15M so far in his career, making him one of the biggest deals in the entire league."

Wow. If you had tried to convince me that you could find an official NHL stat that would uncover $33 million worth of hidden value, I wouldn't have believed you. But there it is.

Labels: aging, baseball, Moneyball, NHL

Friday, December 09, 2011

A "Grantland" article on Moneyball effects

Here's a baseball salary article at Grantland, by economists Tyler Cowen and Kevin Grier. It’s a strange one ... the impression I get is that is that the authors are just going on the basics of the "Moneyball" story, but don’t really follow baseball discussions very much. And so some of their arguments are obviously behind the curve.

For instance, they talk about how closers used to be paid inefficiently, but aren't any more, except by free-spending teams like New York:

"This year, the Yankees' Mariano Rivera was ranked fifth in total saves with 44. At a salary of $14.9 million, that works out to be a hefty $338,600 per save. The four closers ranked ahead of him averaged 46.5 saves and a salary of $2.9 million, or $63,771 per save — quite the bargain."

The problem here is obvious to almost any serious baseball fan: closers aren’t normally evaluated by the number of saves, which is mostly a function of the opportunities the team provides. Rather, and like any other member of the roster, the closer is paid according to how many wins he can contribute to the team's record, as compared to a replacement player. For Rivera to be worth $15 million, he has to contribute about three extra wins (at a going rate of $4.5 million per win). Which means, basically, he has to blow three fewer saves, given his opportunities. Or, rather, he has to be *expected* to blow three fewer saves; there's still a lot of randomness there.

But Cowen and Grier don't mention randomness at all. And their only reference to blown saves is in one sentence that mentions the Twins' Joe Nathan and Matt Capps, who blew 12 saves out of 41 opportunities.

Another thing, too, is that the article doesn't mention one big difference between Rivera and the others: Rivera is a free agent, while young players like Neftali Feliz can be paid whatever the team wants. The Yankees might prefer Feliz to Rivera, but that’s not a choice they have open to them.

It's not a new "Moneyball" discovery that "slaves" make less money than established free-agent stars ... but the article seems to imply that teams don’t realize that the $400,000 stopper can be just as valuable, for the money, as the $15,000,000 stopper.

To me, it looks like the problem is that if you don’t know baseball that well, you tend to overrate the “Moneyball” possibilities, because that’s the story that you’ve heard the most.

-----

The authors then go on to say:

"The best-known Moneyball theory was that on-base percentage was an undervalued asset and sluggers were overvalued. At the time, protagonist Billy Beane was correct. Jahn Hakes and Skip Sauer showed this in a very good economics paper. From 1999 to 2003, on-base percentage was a significant predictor of wins, but not a very significant predictor of individual player salaries. That means players who draw a lot of walks were really cheap on the market, just as the movie narrates."

The authors imply that “walks were really cheap on the market,” means that the A’s had a huge hole to exploit.

But ... even if walks were indeed “really cheap,” it would still be a small hole. Walks are a significant part of a player’s value, but still in the sense of a small edge, not a huge one. Suppose teams valued walks at only half their actual value. If you can pick up a player with 60 walks, for the price of 30, you’ll gain about 10 runs, or one win. Not a big deal.

Of course, if you can do that nine times, that’s nine free wins. But the A’s didn’t. In 2002, they walked 609 times, third in the league. But that was only 157 more walks than Baltimore, second-worst in the league. If 157 was the number of walks they got at half-price, that’s still only two or three wins.

You could choose, instead, to compare the A’s to the 2002 Tigers, who walked only 363 times. That would be completely unrealistic, in my view, to assume the A’s would have been as bad as one of the worst recent teams ever. But if you do, you *still* only gain four wins.

----

The authors also put too much faith in the Hakes/Sauer paper. As I wrote a few years ago, it seems to me that the paper has a few problems, and I don’t think it shows what it purports to show.

The study found a huge increase in the correlation between salary and OBP between 2003 (when the "Moneyball" book was released) and 2004. The numbers for 2004 almost exactly matched the actual value of a walk, so the authors concluded that the market became efficient in the off-season, and teams wised up after reading the book..

But that conclusion doesn’t make sense. Since only a small percentage of players got new contracts between 2003 and 2004, for the overall average to move so much, the market would have had to overcompensate for walks by double, or triple their real value! That doesn’t sound like a reasonable possibility, and it’s certainly not consistent with GMs now learning to be efficient.

-----

Finally, on the subject of correlation:

"Here's something funny about the Moneyball strategy: It is bringing us a world where payroll matters more and more. Spotting undervalued players boosts their salaries and makes money more important for the general manager; little did Billy Beane know that in the long run he would be strengthening the hand of the large home-market teams, such as the Yankees. From 1986 to 1993, payroll explained 2.2 percent of the variation in team winning percentage, and that meant spending more money yielded little return in terms of quality on the field. In the 2004 to 2006 seasons, after the Moneyball revolution was under way, payroll explained 27.1 percent of the variation in team winning percentage, which means a stronger reason to spend more."

I've written about this before, and Tango’s written about it several times: a higher r-squared does NOT necessarily mean money is more important in buying wins. Rather, the r-squared is a combination of:

1. the extent to which money can actually buy wins;
2. the extent to which teams differ in spending, in real-life.

When the authors say, "spending more money yielded little return," they seem to be assuming it’s all the first thing, when it might be all the second thing.

As an example, take dueling, where two people go out at dawn, draw weapons, and one of them kills the other. Back when it was legal, dueling would explain a lot of the variation in death rates of people who didn’t like each other. Now that it’s illegal, it explains zero.

However, the fact that the r-squared dropped doesn’t mean that dueling is any less dangerous than it used to be (point 1) -- it just means that people no longer vary in how often they get killed in duels (point 2).

The same thing could be happening here. I did a Google search and found an article (.pdf) that gives some team payroll data for the period the article covers. From Table 1, the article shows that from 1985 to 1990, fourth quartile teams (the 25% of teams with the highest payrolls) outspend the first quartile teams by only about 2 to 1. From 1998 to 2002, the ratio jumped to 3 to 1. The paper only covers to 2002, but a glance at later numbers seems to show around 2.5 to 1 (but up to 3.1 to 1 for the 2011 season).

This is evidence that at least *some* of the difference is probably caused by teams being willing to spend more.

I may be unfair to the authors here ... that might be partly what they’re saying. If I read them right, they’re saying that, armed with "Moneyball" concepts, teams are realizing they can buy wins cheaper by evaluating players more accurately (1) -- and, that teams are therefore more likely to vary in how much they pay when they know it’s money well spent (2).

But ... well, I think these effects are pretty small. As I argued, walks are a small part of the overall equation, even if they were undervalued by half (which itself is probably an overestimate). It’s not like, in 1990, teams were paying Jose Oquendo as much as Wade Boggs. To be sure, teams weren’t perfect in evaluating players -- but they were still reasonably good. Any improvement since then has to be relatively small, at the margins.

So, the idea that teams would say, "hey, we can now evaluate players slightly more accurately, so let’s go on a spending spree" doesn’t seem all that plausible.

------

What actually *did* happen to tighten the relationship between payroll and wins? As usual, you guys probably know better than I do. I’ll give you my guess anyway, which is that it’s a combination of a bunch of things:

1. It became more "socially acceptable" for teams to pay big money to free agents. Remember, 1985 to 1990 includes the collusion year, and there was probably a significant amount of pressure to keep spending down. That pressure was probably more significant in discouraging headline-grabbing salaries, rather than routine signings, so maybe a player who was twice as valuable wouldn’t be able to sign for twice as much. That would help keep the correlation between salary and success low.

2. When baseball revenues exploded, they grew more in some cities than others. That meant that marginal wins would be extremely valuable to the Yankees, but not so much to the Pirates. That increased the variation in team spending, which pushed up the r-squared.

3. Teams got smarter, in line with Cowen and Grier’s theory. But I think that was a small part of what happened. Also, I’d guess that a lot of improvement in that regard would have happened well before Moneyball, as Bill James’ discoveries got around a bit. Conventional wisdom denies that baseball executives put any faith in what Bill James had to say, but ... I dunno, good ideas tend to get noticed, even if people say they don’t believe in them. Also, Bill James’ ideas showed up early in arbitration hearings, which affected the teams’ bottom lines pretty much immediately.

4. Randomness. In a team payroll to wins regression, Cowen and Grier give an r-squared of .022 for 1986 to 1993.

(By the way, I assume Cowen and Grier's regression adjusted for payroll inflation ... salaries more than doubled between 1986 and 1993. If they didn't adjust, that might explain the low correlation.)

I wonder if that .022 might just be an outlier. Here are equivalent numbers from Berri/Schmidt/Brook in "The Wages of Wins," page 40:

Wages of Wins:

1988 to 1994: r-squared = .062, r = .25
1995 to 1999: r-squared = .325, r = .57
2000 to 2005: r-squared = .176, r = .42

Cowen/Grier:

1986 to 1993: r-squared = .022, r = .15

The numbers sure do move around a lot! It probably doesn’t take much to knock the correlation down: you need a few teams to get lucky in exceeding their talent, and a few teams to get lucky and get some good slaves and arbs. Maybe I’ll try a simulation and see how common a .022 might actually be.

Labels: baseball, economics, Moneyball, payroll, regression

Wednesday, October 19, 2011

How much does "Moneyball" help a team?

How much is sabermetrics worth to a team?

That's probably a hard question to answer. Every team uses statistics to some extent. Even before sabermetrics, teams were looking at player statistics to decide who to play and who not to play. They may not have had any fancy formulas, but they had a pretty good idea of how to weight the relative contributions of players. Nobody ever released a 30-HR guy because he was only hitting .240, and nobody ever released a .330 hitter because he had no power. Intuitive evaluations weren't perfect, of course, but they were pretty reasonable most of the time.

Where sabermetrics helps, I think, is not in evaluating actual performance, but in helping figure out *future* performance. How to extrapolate minor-league performance in to major league performance ... how to take luck out of a player's batting or pitching line ... figuring how different kinds of players age ... that sort of thing.

Suppose you took a team management right out of the early 1970s, and gave them a team today without letting them learn anything discovered after 1977. How much would that team underperform compared to the rest of MLB? I don't have an answer to the question, but I'd be interested in hearing yours.

Anyway, here's a narrower question. How much can a more sabermetric approach *today* benefit a team, compared to, say, the typical team's sabermetric approach? For instance, how much did Billy Beane really mean to the A's?

A couple of weeks ago, Tango did a study to figure out which teams did better or worse than expected, given their payroll. The A's were the team that outperformed the most over the last decade -- about 7 games per season, it looks like. That's a lot, but there's probably a whole bunch of luck there, since we're cherry-picking them as the best of the lot. Also, it's possible that much of their outperformance came in the early years, when, as many critics of "Moneyball" hype have pointed out, they had three underpriced ace starters.

So, we'd have to regress that 7 games to the mean a fair bit. If you made me make an arbitrary guess, I'd be willing to bet that less than half of that seven game advantage came from sabermetrics. (But, I have no real basis for that guess without studying it.)

Anyway, with the Cubs signing Theo Epstein, we now have a market estimate for what sabermetrics might be worth today. Epstein's new agreement is for about $4 million per season. He still had one year to go on his contract with the Red Sox, for which they will receive some sort of compensation from the Cubs. Let's say that compensation will be worth $1 million. So Epstein's value is around $5 million. I don't know how much an ~~average~~ replacement level GM makes, by comparison. To be conservative, let's say it's $500,000, although it's probably more than that. That means that Epstein's excess value is $4.5 million, exactly what it costs in free agent players to gain one extra win.

It looks like that's what Epstein is worth: one win per season.

Is that a lot? Frankly, I don't know. It's a competitive market for players these days, with lots of money on the line, and there's lots of random luck in who makes it and who doesn't. In that light, it could be that one win per season is an exceptional, genius-level performance.

If that's the case, doesn't it mean that the "Moneyball" approach is overrated? I mean, one win a year. At that rate, it would take decades, even centuries, to have good statistical evidence that the sabermetric approach works.

Of course, you have to remember that that's compared to other teams ... and, nowadays, those other teams are doing a fair amount of statistical work themselves. Maybe it's three or four games over a team that won't look at anything new at all, that never heard of Voros McCracken and winds up overpaying pitchers with lucky BABIPs. And, maybe Epstein took less pay than he was worth in order to become a Cub. Maybe it's a win and a quarter, or a win and a half.

Still ... to me, one game doesn't seem that unreasonable. The point might not be that an you can win pennants just by embracing sabermetrics. The point might be that, with every team in a sabermetric arms race against every other team, you certainly can *lose* pennants if you persist in living in the 70s.

But, again ... one game. Doesn't that mean that if a team does well, and someone credits "Moneyball," they're probably just blowing smoke?

UPDATES:

1. In the comments, Bill Waite suggests that sabermetrically-savvy managers might have a significant impact, too. He says that just rejigging the lineup is worth almost half a game a season, and says that the difference between best and worst could be as much as eight games.

Food for thought. It would be interesting to consider how to try to look for this in the historical record (if indeed that is possible), since we know that some managers are indeed more numbers-oriented than others.

2. Matt Swartz e-mailed me about a study where he found a positive correlation between sabermetric management and team performance. It's here.

Labels: baseball, Moneyball, sabermetrics

Sunday, February 15, 2009

Shane Battier as the NBA's answer to "Moneyball"

I'm not completely sure what to make of this long Michael Lewis article extolling the Houston Rockets and their forward Shane Battier.

Lewis's treatment of Battier reminds me a lot of his treatment of Billy Beane in Moneyball. The idea is that Battier excels at things that aren't counted in the box score, and that makes affordable and underrated.

How so? Mostly on defense. Battier is said to cover the NBA's superstars exceptionally well. The evidence is mainly hearsay – the Rockets argue that they have a version of a "plus/minus" stat, the kind that figures out how the Rockets do when Battier is on the floor, and compares it to how the Rockets do when he's on the bench.

The stat is not a new one, and from what I've read, there are obvious problems with it, problems that Lewis acknowledges. Specifically, how the team does when a player is on depends on who he's playing with. You can control for that, but then you might wind up with insufficient data. For instance, if player A plays with B 90% of the time, then you wind up with only maybe three minutes per game when A plays without B. That makes the comparison difficult, because, first, you only have three minutes, and, second, you also have to take into account the quality of player C, who replaced player B.

There's nothing in the article on how that problem was solved, except this:

"[Rockets GM Daryl] Morey says that he and his staff can adjust for these potential distortions — though he is coy about how they do it — and render plus-minus a useful measure of a player’s effect on a basketball game."

Morey says that over his career, Battier is a +6, which means that, per game, when he's on the court, his team will score six more points than the opposition. I'm not sure if that's per 48 minutes, or per 33 minutes (Battier's average).

In any case, if you figure that Battier alone is worth 6 points per game, then, over a season, that's 492 points. At 30 points per game, which is David Berri's estimate in "The Wages of Wins," you get 16.4 wins. (Morey says the effect is larger, that +6 "is the difference between 41 wins and 60 wins." That works out to 26 points per win.)

How does Battier do it? According to Lewis, the Rockets have figured out players' strengths and weaknesses, and Battier tries to defend in such a way that the opposition is forced to do things they're weak at. For Kobe Bryant:

"When he drives to the basket, he is exactly as likely to go to his left as to his right, but when he goes to his left, he is less effective. When he shoots directly after receiving a pass, he is more efficient than when he shoots after dribbling. He’s deadly if he gets into the lane and also if he gets to the baseline; between the two, less so."

So what happens is that Shane Battier gets all this data before the game – he's the only player the Rockets give it to – and he tries to force Kobe into going to his left instead of his right.

"The ideal outcome, from the Rockets’ statistical point of view, is for Bryant to dribble left and pull up for an 18-foot jump shot; force that to happen often enough and you have to be satisfied with your night. “If he has 40 points on 40 shots, I can live with that,” Battier says. “My job is not to keep him from scoring points but to make him as inefficient as possible.” The court doesn’t have little squares all over it to tell him what percentage Bryant is likely to shoot from any given spot, but it might as well."

The effect, according to the article, is that when Battier guards Kobe Bryant, he does it so well that Kobe is rendered a below-average player.

Battier is also said to be "abnormally unselfish," and exceptionally good at playing the intangibles. "Instead of grabbing uncertainly for a rebound ... Battier would tip the ball more certainly to a teammate." "Guarding a lesser rebounder, Battier would, when the ball was in the air, leave his own man and block out the other team’s best rebounder." "He blocked the ball when Bryant was taking it from his waist to his chin, for instance, rather than when it was far higher and Bryant was in the act of shooting." "His whole thing is to stay in front of guys and try to block the player’s vision when he shoots."

Anyway, as I said, I'm a bit skeptical, still. I accept that Battier must be exceptionally good at defense, since (a) he plays 33 minutes a game and doesn't have very much in the way of traditional offensive statistics; (b) the Rockets have watched him and studied him and think he's great; and (c) his teams have done well. Still, from a scientific standpoint, the article is mostly anecdote and hearsay.

It shouldn't be all that hard to confirm the article's thesis and measure the size of the effect. If Kobe is good from one place but worse from another, that can be figured out by watching games and counting. If Battier holds him to those low-percentage shots when covering him, that can be counted too. And at the most fundamental level, can't you see what Kobe (and the other players) do when covered by Battier, and compare to what they do against the Rockets when Battier's on the bench? Something is better than nothing.

It's not really that I don't believe the Rockets. It's just that +6 points a game -- when it's acknowledged that Battier isn't all that great on offense – seems pretty high to me, and my instinct is to ask for more evidence.

Oh, and one more question for readers who actually know something about basketball (which I really don't). Assuming that everything in the article is correct, how much of Battier's value is due to his athletic skill? That is, suppose you took a league-average player and trained him to try to handle Kobe Bryant the same way that Battier does. Could he do it almost as well, or are Battier's instincts so good that he's exceptional in this regard?

(Hat tip: The Sports Economist)

Labels: basketball, Moneyball, NBA

Thursday, December 13, 2007

Stats vs. scouting: a thought experiment

I was thinking about the Moneyball debate, about traditional scouting vs. statistical analysis. Here's a thought experiment I came up with.

Suppose you take the 25 best scouts today, and put them in suspended animation for 40 years. Then, you wake them up. You ask them to evaluate the major-league first basemen of 2047. Of course, none of the scouts know anything about the players, who weren't even born when the scouts went to sleep in 2007.

The scouts get to watch the players hit. You don't want them to evaluate the players by keeping track of their stats, so you make sure all the stats work out the same. To do that, you show them only showing them 300 PA for every player. You pick those plate appearances by making sure to include exactly 80 hits, 10 home runs, 14 doubles, and so on. (The exact PA in each category are picked randomly). The scouts can watch those plate appearances as many times as they want. The technology of 2047 lets them see the everything holographically, in 3D, from any angle. They can even use radar guns if they like. (Indeed, since this is a thought experiment, assume any additional technology you want.)

You then ask the scouts to rank the 30 players by how well they'll do next year. Would they do a decent job?

I'm probably less qualified than most readers of this post to guess at this question, but I'll try anyway. I'd bet that the scouts wouldn't do very well. I'd bet that an Albert Pujols single doesn't look that much different from Kevin Youkilis single. However, I think the scouts might figure out who has power by looking at home run distances, and who walks a lot by noting plate discipline and the ability to lay off pitches. They'd also see who has good speed.

Now suppose you froze 25 sabermetricians. To this group, instead of showing them holographic replays of plate appearances, you were to show them only the players' stats. Would they do better than the scouts? I think it's almost certain they would. The sabermetricians would have the stats for the player's whole career in front of them. The traditional scouts wouldn't have that. They might know a few small things the stats group doesn't – plate discipline, for instance – but unless they counted, their impressions would be off a bit, over 300 PA times 30 players. But the sabermetricians would know a LOT more than the scouts -- batting average, home runs, walk

And suppose that you *included* statistics for all these things for the sabermetricians – speed, pitch counts, home run distances, line drive frequency, average pitch speed against, and so on. In fact, let the sabermetricians have any stats they want (within reason). Would there be anything left for the scouts? Only things that can't be measured. What are those things? Subjective impressions of personality and drive to win? Leadership? Certain aspects of body type? Are those really enough to measure up against all that data?

Doesn't it seem like a copy of the 2047 Baseball Prospectus and 2047 Baseball Forecaster should beat the crap out of a bunch of scouts who aren't allowed to count things?

Before this thought experiment, I felt like traditional scouting was of substantial value – although not as important as the statistical record. But now, it seems to me that hard data would trump live scouting in almost every case.

Here's an experiment you could do right now, to check that. Find your top 25 scouts right now, and ask them: you've seen a lot of current major-league players live this year. For which players have you seen live indications that suggest the player's prospects are better or worse than what his statistical record suggests? Maybe you've seen something like, "hey, Joe Blow normally hits .320, but he's weak on curve balls on the outside corner, and once pitchers catch on, he'll only hit .270." Or, maybe, "you know, these five guys have had stats very similar to those five guys. But these five have drive and leadership, and are going to make themselves into better players. Those other five just coast through the season, and they're going to be washed up before too long."

That is: ask scouts to make testable predictions that are based only on observations of things that can't be measured by sabermetricians.

Can any scouts reliably make successful predictions like that? If they can, that would be evidence that scouting valuable, much more valuable than I think it is. If not, though, isn't that itself evidence that traditional scouting only has value because there isn't enough good data?

It seems to me that scouting is a *substitute* for data, and an inferior one. For those who think it's a *complement* to data, my view is that you have to show me where the benefit is.

P.S. As Tango points out, scouts sometimes add value by noticing trends that statistical analysts can verify. In that case, you can argue that they're really doing sabermetrics ...

Labels: baseball, Moneyball, scouting

Wednesday, October 10, 2007

An updated "Moneyball Effect" study

According to "Moneyball," walks were underrated by major league baseball teams. The Oakland A's recognized this, and were able to sign productive players cheaply by looking for undervalued hitters with high OBPs. This (among other strategies) allowed them to make the playoffs several years on a small-budget payroll.

If this is correct, then, once "Moneyball" was published and the A's thinking was made public, the OBP effect should have disappeared. Teams should have started fully valuing a player's walks, and the salaries of players excelling in that skill should have taken a jump.

About a year ago, I reviewed a study that claimed to find such a sudden salary increase. I wasn't convinced. Now, the same authors have updated their study, with better stats and more seasons worth of data. Again, they claim to find a large effect. And, again, I am not convinced.

The authors, Jahn K. Hakes and Raymond D. Sauer, took the years 1986-2006 and divided them into four time periods. They regressed salary against OPB and SLG. They found that the value of a point of SLG didn't change much over that time period, but OBP did. It increased gradually across the first three periods, but starting in 2004, after the release of "Moneyball," it took a huge jump.

They repeated the study using a measure of bases on balls, instead of OBP (the latter includes hits, which might have confounded the results). Again, they found a huge jump in remuneration for walks starting in 2004.

The numbers are striking, but I'm not sure they mean what the authors think they do. There are several reasons for this. In comments to a post at "The Sports Economist," Guy points out some of them. (The points are Guy's, but the commentary here is mine.)

First, the study grouped together all players, regardless of whether salary was determined by free agency, arbitration, or neither (players with little major league experience have their salaries set by the team; I'll call those "slaves"). In the regression the authors used, the hidden assumption is that for all three types, player salaries increase the same way. That is, if an additional 20 walks over 500 PA will increase a free agent's salary by 10%, it will also increase an arbitration award by 10%, and a team will even offer a slave 10% more.

That's not necessarily true. Suppose that free agent salaries rise because 20% of teams read "Moneyball." That's probably enough that almost 100% of high-OBP players have their salaries bid up. But if the same 20% of arbitrators read "Moneyball," what happens? Only 20% of salaries will increase. And, actually, it'll be less than that, because most of the teams won't be emphasizing walk rate at the hearing.

I'm sure you could come up with scenarios where the changes in compensation are due to changes in patterns between the three groups, rather than walks. For instance, suppose that slave salaries are increasing faster than free-agent salaries, and slave OPBs are increasing faster than free-agent OPBs. That could account for the observed effect. I'm not saying this is true, because I have no idea. But there are lots of hypothetical scenarios that could also account for what the study found.

Second, the authors used a very low cutoff for inclusion: only 130 plate appearances. They do include a multiplier for plate appearances, so that each PA is worth x% more dollars. However, as Guy points out, the study assumes that the performance of a part-time player is as accurate an indication of his talent as it would be for a full-time player. This can cause problems. Anything can happen in 150PA; someone who OBPs .400 in that stretch is probably a mediocre player having a lucky year, not an unheralded star.

Also, salaries probably don't correlate all that well to plate appearances. Someone with 200 PA might be a regular who got injured, or it could be a pinch-hitter to had to play regularly for a month because someone else got injured. So the assumption that salary is proportional to PA adds a lot of noise to the data.

In addition, suppose that salaries for star players are increasing faster than for part-time players. That would make sense; there is a much larger pool of mediocre players than regulars, and the competition among the ordinary players keeps their cost down. When the Rangers decided to spend $252 million, they used it to buy Alex Rodriguez, not to give ten bench players $25 million each.

If that's the case, the observed effect could be an increasing difference in walks between regulars and bench players, rather than an increase in the market value of walks.

Suppose in 2002, full-time players OBP .400 and part-time players are also at .400. In 2004, full-timers are still at .400, but part-timers drop to .300. If that happened, that would certainly account for the observed jump in OBP value. The regression would notice that suddenly the spread between the .400 guys and the .300 guys was on the rise, and would attribute that to their walks instead of their full-time status.

Did this actually happen? I don't have data for 2004 handy, but, in 2003, the full-time guys (500+ AB) outwalked the replacement guys (160-499 AB) by .103 to .093 (BB per AB). That's a difference of .010. I ran the same calculation for a few other years (this is a complete list of the years I checked. It averages all players equally, regardless of AB, and includes pitchers):

2003: .103 - .093 = .010
2002: .105 - .098 = .007
2001: .107 - .091 = .016
1997: .107 - .102 = .005
1992: .101 - .096 = .005
1987: .103 - .102 = .001
1982: .099 - .091 = .008

So there is some evidence of a recent increase in the amount by which regulars outwalk non-regulars, which corresponds to the gradual increase in OBP value the authors found. I don't have data for the jump years 2004-2006, but maybe I'll visit Sean Lahman's site and download some.

Thirdly – and this is now my point, not Guy's – there is a lag between a player's performance and his salary being set. Most free-agents would have negotiated their 2004 salary well before "Moneyball" was released. If you take that into account, the huge effect the authors found must be even huger in real life, having been created by only a fraction of the players!

This should actually make the authors' conclusions stronger, not weaker, except that if you find the size of the jump implausible, it's even more so when you take this effect into account.

----

Now, I'm not saying that there isn't a Moneyball Effect, just that this study doesn't measure it very well. How *can* you measure it? Here's a method. It's not perfect, but it'll probably give you a reasonably reliable answer.

First, find a suitable estimator of a player's expected performance in 2007. Bill James used to use a weighted (3-2-1) average of the player's last three years, which seems reasonable to me. You could regress that to the mean a bit, if you like. Or, you could use published predictions, like PECOTA, or Marcel.

Now, take that estimate and figure the player's expected value to his team in 2007. Use any reasonable method: linear weights, VORP, extrapolated runs, whatever. Let's assume you use VORP.

Take all full-time players ("full-time" based on expectations for 2007, to control for selection bias) who signed a free-agent contract during the off-season. Run a regression to predict salary from expected VORP. Include variables to adjust for age, position, and so on, until you're happy.

Now, run the same regression, *but add a variable for BB rate*. That coefficient will give you the amount by which the market over- or undervalues walks. That is, suppose the regression says that salary is $2 million per win, less $10,000 per walk. That tells you that if you have two identical players, each of whom creates five wins above replacement, but where one walks 20 times more than the other, that one will earn $200,000 less. That would mean that walks are undervalued – you get less money if you have fewer hits and more walks, even if the walks exactly compensate for the hits.

Repeat this for all years from 1986 to 2006 (adjusting salaries for inflation). If the Moneyball Effect is real, you should find that the coefficient for walks is negative up to 2003, then rises to zero after 2004.

A fun side-effect is that you can include all kinds of variables, not just walks – batting average, home runs, and so forth – to see which skills are more or less valued through the years. For instance, you could check for a "Bill James" effect, to see if the perceived value of batting average drops through the 80s and 90s. You could include RBIs, to see if the market pays more for cleanup men than leadoff men. And so on.

Labels: baseball, economics, Moneyball

Sabermetric Research

Thursday, May 12, 2016

How did Leicester City do it?

Monday, July 01, 2013

Disputing Hakes/Sauer, part I

Tuesday, February 14, 2012

Two new "Moneyball"-type possibilities

Friday, December 09, 2011

A "Grantland" article on Moneyball effects

Wednesday, October 19, 2011

How much does "Moneyball" help a team?

Sunday, February 15, 2009

Shane Battier as the NBA's answer to "Moneyball"

Thursday, December 13, 2007

Stats vs. scouting: a thought experiment

Wednesday, October 10, 2007

An updated "Moneyball Effect" study

About Me

My stuff

Hardcore Sabermetric Research Links

Other Sports Research Links

Medium Core Sabermetric/Baseball Links (more to come)

More Baseball Stuff

Blogroll

Previous Posts

Archives