Sabermetric Research: "Trading Bases" beats Vegas odds on baseball

"Trading Bases" is a new book on the theme of sabermetrics and money. The publisher was kind enough to send me a copy.

Author Joe Peta was a Wall St. trader. A couple of years back, a traffic accident seriously injured his leg, and he lost his job shortly thereafter. Unemployed and with limited mobility, he spent his time figuring a method of betting on baseball. This book is primarily about that system, and its results, although Peta also touches on finance, risk, and gambling in general.

You might expect that Peta's betting system worked -- otherwise, the book wouldn't exist. You'd be right. Over the 2011 season, where most of the story takes place, Peta made a profit of 41 percent on his original capital. In an addendum, Peta shows his results for 2012, where he made 14 percent.

------

What was the system? It was pretty simple, actually. Peta started with teams' 2010 stats. He adjusted for Pythagorean luck, and for Runs Created luck. He used PECOTA projections to adjust for player performance luck ... except for relievers, where he assumed each team would perform at league-average rates.

On game day, he would figure the winning percentage for each team, using his projections for the starting pitchers and projected lineups. He then used log5 to calculate the odds of each team winning. If they were substantially different from the Vegas odds, he'd place a bet.

Over the 2010 season, Peta bet on 2,095 games, winding up with a 1087-1008 record. That was enough for a gain of 28.81 percent on his original bankroll. Combining that with a winning record in the postseason, and over/under bets on season wins, gives a final figure of a 41 percent overall return, the number you read about in the blurbs and on the cover.

------

Peta describes the sabermetrics quite well. He explains Pythagoras, and DIPS, and why those are better estimators than actual outcomes. For what I call "Runs Created" luck, Peta uses the term "cluster luck", since it's caused by clustering of offensive events. He doesn't actually mention Runs Created, or any of the other estimators (perhaps because they seem to not work as well these days). Instead, he explains the logic behind them, how you can look at teams with similar batting lines and see how they have different numbers of runs scored. He also gives a good intuitive explanation of log5, again without mentioning the term.

If there is fault, it's that the book sometimes seems a little too breathless in hyping the sabermetrics as revolutionary. For instance,

"Some trial-and-error fiddling with the [Pythagorean] formula reveals that for every ten runs ... [you] can expect to win one more game. ... This really is an amazing revelation, not just for statistically minded fans but for the front offices of major league teams as well."

Of course, that "revelation" is thirty years old, and I'd be shocked if there were any front offices for whom this would be news.

------

One thing that Peta didn't do in his projections, which surprised me, was regress performance to the mean. In one respect, he didn't have to, because, presumably, PECOTA takes care of that for him. But, then, he explicitly rejects the idea when talking about his postseason predictions:

"I wasn't going to adjust for players who had played above or below expectations during the regular season. I had to assume that the best indicator of a player's production over the next week was the body of work he had put forth over the previous 162 games."

And, he didn't do any regressing in his over/unders, either. When adjusting for RC luck and Pythagorean luck, he assumed that the resulting estimate was the team's true talent. The 2010 Astros, for instance, came out as a 66-96 team. Adjusting for talent luck, I might have bumped them up a bit; Peta did not. (In any case, he easily won his "under" bet, as the Astros collapsed to 56 wins.)

-------

A big part of Peta's system was his money management -- deciding how much to bet on each game. The bigger his advantage, the more he'd bet. As a percentage of his capital, his bets went:

Bet 2.0% of capital with advantage 15 percent plus
Bet 1.5% of capital with advantage 13 to 15 percent
Bet 1.0% of capital with advantage 11 to 13 percent
Bet 0.5% of capital with advantage 9 to 11 percent
Bet 0.4% of capital with advantage 6 to 9 percent
Bet 0.2% of capital with advantage 3 to 6 percent
Bet 0.1% of capital with advantage 0 to 3 percent

The advantages are percentage points. For instance, one game Peta projected the Reds at a 57.6% chance of beating the Brewers. Vegas odds were only 51.2%, so Peta "deemed the Reds had a 6.4 percent better chance of winning than the implied odds".

Most of his wagers came with a smaller advantage, rather than a larger one. Here's his table of results (from page 296):

2.0% bets: 26- 13, cash return of +14.8 percent
1.5% bets: 20- 15, cash return of +4.0 percent
1.0% bets: 38- 34, cash return of +2.5 percent
0.5% bets: 59- 56, cash return of -4.1 percent
0.4% bets: 189-164, cash return of +10.2 percent
0.2% bets: 307-339, cash return of -12.1 percent
0.1% bets: 448-387, cash return of +5.6 percent
------------------------------------------------
Total 1087-1008, cash return of +28.8 percent

The great majority of bets were in the bottom three groups.

------

I was surprised that Peta was able to beat the bookies using a system that's really not that complicated. Are the oddsmakers so ill-informed that they be so easily exploited? Does Vegas really leave that many 15 percent overlays, when thousands of people have read Bill James and follow sabermetrics?

So I did some calculating and simulating ... and, no, the oddsmakers aren't quite that bad.

Take a look at Peta's table. He placed 72 bets in the "1%" group. On those bets, Peta went 38-34. On "pick-em" bets of a dollar each, he'd earn $38 for the winners, and lose $35.70 for the losers (losses cost $1.05, which is the source of the casino's profit). That's a total win of $2.70. Since each bet is 1 percent, the total bankroll must be $100. So that's a return of 2.7 percent.

That comes close to Peta's report of 2.5 percent. Why the difference? Because mine is a simplified estimate. First, Peta bet a combination of favorites and underdogs, which would change the numbers slightly ... and, second, because the payroll varied throughout the season, so not all bets were for the same amount.

Now, what "should" have happened? Well, on those bets, Peta thought he had an edge of 11 to 13 percentage points. Let's call it 12, which means that he should have won 62 percent of his bets instead of 50 percent. That is: he "should have" gone roughly 44-27. That would have been a return of over 15 percent, not 2.5 percent.

The difference might just be luck; after all, we're only talking about 72 games. But, if you repeat the calculations for the other categories, you find that, if Peta's assumptions were correct, he should have made WAY more money. Here are the "should haves", based on Peta's percentages (I used an 18 percent edge for the top group, and the midpoint of the range for all the others):

2.0% bets: 26- 13, cash return of +28 percent
1.5% bets: 22- 13, cash return of +15 percent
1.0% bets: 45- 27, cash return of +18 percent
0.5% bets: 69- 46, cash return of +12 percent
0.4% bets: 203-150, cash return of +21 percent
0.2% bets: 352-294, cash return of +12 percent
0.1% bets: 430-405, cash return of + 3 percent
-----------------------------------------------
Total 1147-948, cash return of +108 percent

The system should have made 108 percent, not 28! And, taking compounding into account, because the capital would be growing over the course of the season ... he probably should have wound up with almost triple his original bankroll. (Why triple? Because 100 percent interest, compounded "instantly", yields "e" times your original capital, or 271.8 percent. Here, we have 108 percent compounded daily, which, coincidentally, is very close.)

28 percent is about one-quarter of 108 percent, which suggests that Peta overestimated his edge by about three times!

I ran a simulation to confirm, and, yes, it seems to work out. If I move every edge three-quarters of the way back to even -- that is, regress 75 percent to the mean -- I get almost the same bottom line Peta did:

Peta actual: 1087-1008, cash return of 29 percent
Simulation: 1072-1023, cash return of 30 percent

Of course, there's still plenty of luck involved. In my simulation, the SD of the final return was around 27 percent. Peta was around 1 SD better than breaking even. But: breaking even shouldn't be the reference point. On Peto's bets, the casino had an advantage of around 1.67 percent. That eats away at your capital. If you were no better than an even guesser, my simulation says you'd wind up with around a negative 10 percent return, with an SD of 20 percent or so. In that light, Peta's +29 percent is about 2 SD above random.

And the simulation confirms that almost exactly ... when I run 1,000 unskilled seasons, 25 of them came out above +28.8%.

I'm actually surprised that a random bettor can beat the house edge by this much, even 25 times out of 1000. I expected it to be much harder.

------

So Joe Peta's edge was only about a quarter as large as he thought ... but it was still enough to give him a hefty profit, and finish 2 SD above random.

But ... was Peta just lucky? It's possible. But ... my gut thinks that the system was truly good enough to be profitable. Part of the reason is that Peta turned another profit the year after (albeit a smaller one, only 14 percent). The other part is ... well, the system seems reasonable. If bettors have biases in favor of certain teams or pitchers, or they overbet momentum when they shouldn't, or they have other irrationalities, there should be money to be made using a decent, disciplined system.

Still, I suspect there was *some* luck in Peta's results. If Peta does this again for 2013, my over/under for his profit would be somewhere in the single digits.

I could be wrong.

-----

Update, 3/10: good comments at Tango's site ... in particular, check out MGL's #2 and #3.

4 comments:

AnonymousSaturday, March 09, 2013 11:02:00 PM
If he used the log5 without adjustments he would overestimate the odds of good teams winning and underestimate the chances of weaker teams winning. The log5 formula is biased because it assumes the league average is 0.500 but if you are a 0.600 team your opponents are below 0.500.
j holzSunday, March 10, 2013 1:07:00 AM
I bet on baseball as a profession for five years and averaged a higher ROI than this, but I too made far less profit than I "should have" based on my calculations. It makes perfect sense to me that if my system captures something the market is missing, the converse would also be true.
MikeMonday, March 11, 2013 9:43:00 AM
I assume this is also why my fantasy baseball team never performs up to my original expectations, and why my opponents' teams always do better than I thought they would!
Garcelle BeavuisThursday, March 14, 2013 2:13:00 PM
Nice insight on the book. More and more books and articles like this are exposing the average fan to some in depth and new ways of looking at the game. I'm a big fan of finding a way to use statistics to explain the game and make predictions with an accuracy that most people are amazed by. A book about the same subject that I recently read was Scorecasting, if you get the chance you should check it out, well worth the read.

Pages

Saturday, March 09, 2013

"Trading Bases" beats Vegas odds on baseball

4 comments: