Tuesday, March 19, 2013

NFL coaching decisions cost 0.73 wins per team


By making bad decisions on fourth down, NFL coaches are sacrificing almost three-quarters of a win per season.  That's from Matt Meiselman, who crunched some numbers with Brian Burke and posted on Brian's site.  

In 2012, the Cleveland Browns were the "worst", sacrificing a probabilistic 1.02 wins by making 42 "wrong" decisions.  The Packers were the least "worst", giving up only around half an expected win.  I would have expected New England to represent well in this measure, since Bill Belichick has often been touted as a sabermetrically-savvy coach, but the Patriots were only a bit better than average, at 0.6.

Those numbers are based on expectations for an average team.  It's quite likely that they overstate the cost, if the probabilities vary a lot based on quality of team.  My suspicion is that the quality effect is pretty small, because the spread of "wrongness" is so narrow.  In fact, the spread suggests to me that coaches are generally following the same "book" of conventional wisdom, with individual differences being pretty minor.  

The article implies that the losses are due to coaches generally being risk-averse, but doesn't give the numbers.  Is *every* bad decision caused by playing it too safe?  95 percent?  50 percent?  I don't know the answer.  My gut says ... I dunno, I'll guess 92 percent of cases are when the coach should have gone for it and didn't, instead of when he shouldn't have and did.  Matt/Brian, if you're reading this, am I close?  

-------

I'm shocked at how high the numbers are.  Losing 0.73 wins is huge, considering that the difference between a playoff team and an average team is only, what, two games out of sixteen?  

I'd bet that's by far the biggest in-game coaching factor in any major sport (leaving out the decision of who plays).  In baseball, it's the equivalent of 4.6 games per 162, which is about the same percentage of distance to the playoffs.  But I can't see that MLB managers would have anything near that much influence.

-------

At the Sloan convention, there was a lot of talk about how analytics people can increase their influence ... like, what to do or say to get coaches and management to listen to us numbers geeks.  

But, in this case, I think there's an easier path.  Any time there's a fourth-down decision, the TV broadcast could put the probabilities on the screen.  Like, for instance, "teams that go for it should be expected to win 48% of the time, while teams that punt should win only 30% of the time."  That's simple enough for viewers to understand ... which means, fans will be second-guessing the coach based on the numbers, rather than random feelings.  It would still be fun to discuss ... the ESPN guys could argue about why the percentages don't apply in this particular case, because the offense is poor, or the defense has momentum, or whatever.

In any case, it would change the nature of the second-guessing.  Right now, a coach may attract 1 pound of criticism when he plays it by the book, and 5 pounds when he goes for it.  With the probabilities on the screen, maybe the 1:5 ratio will immediately change to 1:3 or something, and then, over time, as the stats gain acceptance, all the way to 1:1.  Then, you've reached the tipping point where the coaches' incentives change.  Now, they take more flak, and sacrifice more job security, when they *don't* go by the percentages.  It wouldn't take long, I suspect, for things to change after that.



Labels: , ,

Monday, March 18, 2013

Voice of Fire

(Warning: non-sports post.)


This painting is "Voice of Fire," by Barnett Newman.  It's around 18 feet tall and 7 feet wide.  

File:Voice of Fire photo.jpg

In 1990, the National Gallery of Canada purchased it for $1,800,000.  An uproar ensued.  That's because the painting is ... well, it's pretty simple.  It consists of three solid vertical stripes of equal width, blue/red/blue.  

Now, I'm not much into art, so I'm one of those "uproar" people.  I just don't get the idea of spending almost two million taxpayer dollars on this.  But ... well, maybe it's my own ignorance.  Maybe I just don't get art, the same way Joe Morgan doesn't get sabermetrics. 

So, I've put together some possible reasons that someone might think this painting is worth the cost.  I'm hoping you guys can help me out, maybe let me know if I've hit on something.


1.  It's beautiful

Maybe this is just a beautiful work of art, and I don't appreciate its beauty enough.  If that's the case, though ... why did it take until 1967 for someone to discover how well those vertical stripes go together?  Why didn't the fashion industry figure this out and put it on ties or blouses long before that?

And is there some subtlety that I don't get?  Does the blue have to be exactly the right shade, or it looks ugly and amateurish?  Would it not work in other colors completely? 

Even if that's the case, that it's so aesthetically striking ... why not put up a reproduction?  You wouldn't have to copy it exactly, just hire some guy with a canvas and a roller.  Seriously ... I've never heard anyone say that it's the detail, the brushstrokes, that make it so nice.  It really just seems to be stripes.  

I don't mean to suggest that it's ugly or anything ... I honestly like it.  I just don't see how it's so good that the public needs to be able to see it, or how it's worth $2 million.  It's not a question of "is it nice."  There's lots of nice out there.  It's a question of, "is it $2 million National Gallery nice?"  What is it that makes this a "major league" piece of art?


2.  It needs context

Bobby Fischer moves his rook.  Weeks later, after analyzing the game, critics realize what a great move it was -- it won the game for him.  But ... there's nothing special about the WAY he lifted the piece, or even the beauty of the board.  It's that the move was a brilliant answer to the question, "what's the best response to that other move?"

Or, think of a running gag.  "Don't call me Shirley."  It's funny in conversation, only in the right context, and only if the other person understands the reference.  (One of my favorite variations: "Frankly, my dear, I don't give a damn."  "Well, I don't give a damn either, and stop calling me Frankly."  Most people wouldn't get that at all.)

Maybe that's it.  Maybe "Voice of Fire" is a culmination of an abstract art conversation.  One guy does green splotches, and another guy paints yellow cubes, and the connoisseurs see it as a stunning follow-up, and then this guy says, "I have the perfect response!"  And he does 17-foot stripes, and the art people appreciate the subtle nuances of the riposte, because that's the way art works.  

As this article says,


" ... there’s a level of importance to the work, which has social value to many people ... it isn’t enough to simply look at a painting and evaluate it on the basis of how elaborate it is. In order to enjoy the work and see its value, you have to learn more about it."

Still: why not a reproduction?  And an explanation?


3.  Mood affiliation

Tyler Cowen writes about "mood affiliation."  That's when people adopt a certain feeling, position or attitude, and then reject anything not consistent with that mood.

For instance, just try to get any rabid Republican to say anything good about Barack Obama.  Try to get an environmentalist to say something bad about an Al Gore policy prescription.  You can't.  They just identify so strongly with one side, and it feels so good, that you can't help opposing anything on the other, or you'll break your emotional bubble.  

Maybe in this case ... well, some people like art, and they feel that government should fund art, and the public should see art, and art it subtle, and art is best appreciated by experts.  And, so, any view that challenges their good feeling about their views, even if it's reasonable and not really that threatening, doesn't penetrate.  The idea that some paintings aren't worth it -- or that "simple" abstracts are less worthy than, say, portraits -- creates an "urgent feeling that the idea needs to be countered".  

They all love art, and none of their fellow art-lovers would even think of breaking the taboo by suggesting that the painting isn't worth $2 million.


4.  Collectibility

I own a few game-worn NHL jerseys.  They were expensive.  Looking at them, you can't tell them apart from the official jerseys you can buy at stores or online, the ones you have custom made by the same companies that make the real ones.  But ... these were worn by actual players, in actual, real NHL games!  That makes them worth substantially more.  There must be some invisible aura ... or maybe it's Hal Gill's actual, real-life sweat molecules that get me to reach deeper into my wallet.

Is that what's happening?  This is a real, original, Barnett Newman painting.  Who cares how good it is?  He painted it!  It's like the jersey Don Awrey wore in Game 1 of the Canada-Russia series.  Who cares if the sportswriters said he played very poorly that game?  It's an important artifact!

I actually have some sympathy for this view, as you may guess.  It would explain why a reproduction just wouldn't do.  (Well, it would suggest why the price is much higher for the original, anyway.)  And it would explain why galleries don't really seem to offer a lot of reproductions.  You'd think they'd want to show the best art, to the best of their ability ... but, maybe it's about having a nice collection of originals.  Nobody lines up to see a photocopy of the Honus Wagner card.


5.  Investment

Suppose the painting appreciates at the rate of inflation.  Then, the only cost is the foregone interest on the money used to buy it.  Suppose that's 5 percent a year, or $40,000.  

Is it possible that "Voice of Fire" attracts an extra $40,000 in revenue, through Gallery admissions or sales?  Recent admission and parking revenues were around $2 million for the year (.pdf, see page 31).  Most of that is for the special exhibitions, I would think.  It doesn't seem to me like 5 or 10 percent of patrons wouldn't bother coming if "Voice of Fire" weren't there.


6.  Moral Rights

Maybe, legally, you can't exhibit a reproduction, so you have to buy the original.  Or, maybe you can, legally, but, morally, there's an unwritten rule in the art industry that it's wrong to reproduce the idea, even when it's so simple that it can't be copyrighted, out of respect to the artist.

But that only explains why you don't hire a guy with a roller.  It doesn't explain why you should think this canvas is worth $1.8 million, instead of, say, a few thousand.  


7.  BS

Finally, maybe the critics are right ... and it's all just bullsh*t.  It really IS just an 18-foot canvas with three stripes.

In support of this theory: 


'Shirley Thompson, who was the National Gallery’s director at the time, says that with no outrageous content to concentrate on, the clash over Voice kept circling back to how individuals responded to it. “You have to look at yourself,” says Thompson, who turns 80 next month and remains a presence in Ottawa art circles. “You have to look at your understanding of the metaphysical dimension of life.”'  (source)

"You have to look at your understanding of the metaphysical dimension of life."  Geez, I couldn't make up something BSier if I tried.  It's almost self-parody.

And, honestly, I hadn't seen that quote until I started writing this and started Googling.  (Also, I didn't notice her name was Shirley until I was almost finished editing.)

------

So, what is it?  I think it's all of these, in some proportion.  My hypothesis is that there's a certain amount of context involved, but it's still mostly mood affiliation, bullsh*t, and groupthink.  I hypothesize that, just like intelligent Republicans can convince each other that Obama wasn't born in the USA, intelligent art fans can get jobs at the Gallery and convince each other that three stripes is a good way to spend two million dollars.  Nobody who works in the artistic community can, or wants to, say the emperor has no clothes.  And those who do, being outside the art world, are seen as uncomprehending philistines to be dismissed.

Am I wrong?  I know there are art fans reading this.  If you disagree, tell me why this painting is a piece of important art that's worth $2 million.  I truly want to know, and I mean that in the sense that I really am keeping an open mind.









Labels: ,

Saturday, March 09, 2013

"Trading Bases" beats Vegas odds on baseball


"Trading Bases" is a new book on the theme of sabermetrics and money.  The publisher was kind enough to send me a copy.

Author Joe Peta was a Wall St. trader.  A couple of years back, a traffic accident seriously injured his leg, and he lost his job shortly thereafter.  Unemployed and with limited mobility, he spent his time figuring a method of betting on baseball.  This book is primarily about that system, and its results, although Peta also touches on finance, risk, and gambling in general.

You might expect that Peta's betting system worked -- otherwise, the book wouldn't exist.  You'd be right.  Over the 2011 season, where most of the story takes place, Peta made a profit of 41 percent on his original capital.  In an addendum, Peta shows his results for 2012, where he made 14 percent.

------

What was the system?  It was pretty simple, actually.  Peta started with teams' 2010 stats.  He adjusted for Pythagorean luck, and for Runs Created luck.  He used PECOTA projections to adjust for player performance luck ... except for relievers, where he assumed each team would perform at league-average rates.

On game day, he would figure the winning percentage for each team, using his projections for the starting pitchers and projected lineups.  He then used log5 to calculate the odds of each team winning.  If they were substantially different from the Vegas odds, he'd place a bet.

Over the 2010 season, Peta bet on 2,095 games, winding up with a 1087-1008 record.  That was enough for a gain of 28.81 percent on his original bankroll.  Combining that with a winning record in the postseason, and over/under bets on season wins, gives a final figure of a 41 percent overall return, the number you read about in the blurbs and on the cover.

------

Peta describes the sabermetrics quite well.  He explains Pythagoras, and DIPS, and why those are better estimators than actual outcomes.  For what I call "Runs Created" luck, Peta uses the term "cluster luck", since it's caused by clustering of offensive events.  He doesn't actually mention Runs Created, or any of the other estimators (perhaps because they seem to not work as well these days).  Instead, he explains the logic behind them, how you can look at teams with similar batting lines and see how they have different numbers of runs scored.  He also gives a good intuitive explanation of log5, again without mentioning the term.

If there is fault, it's that the book sometimes seems a little too breathless in hyping the sabermetrics as revolutionary.  For instance,


"Some trial-and-error fiddling with the [Pythagorean] formula reveals that for every ten runs ... [you] can expect to win one more game. ... This really is an amazing revelation, not just for statistically minded fans but for the front offices of major league teams as well."

Of course, that "revelation" is thirty years old, and I'd be shocked if there were any front offices for whom this would be news.

------

One thing that Peta didn't do in his projections, which surprised me, was regress performance to the mean.  In one respect, he didn't have to, because, presumably, PECOTA takes care of that for him.  But, then, he explicitly rejects the idea when talking about his postseason predictions:


"I wasn't going to adjust for players who had played above or below expectations during the regular season.  I had to assume that the best indicator of a player's production over the next week was the body of work he had put forth over the previous 162 games."

And, he didn't do any regressing in his over/unders, either.  When adjusting for RC luck and Pythagorean luck, he assumed that the resulting estimate was the team's true talent.  The 2010 Astros, for instance, came out as a 66-96 team.  Adjusting for talent luck, I might have bumped them up a bit; Peta did not.  (In any case, he easily won his "under" bet, as the Astros collapsed to 56 wins.)

-------

A big part of Peta's system was his money management -- deciding how much to bet on each game.  The bigger his advantage, the more he'd bet.  As a percentage of his capital, his bets went:

Bet 2.0% of capital with advantage 15 percent plus
Bet 1.5% of capital with advantage 13 to 15 percent
Bet 1.0% of capital with advantage 11 to 13 percent
Bet 0.5% of capital with advantage  9 to 11 percent
Bet 0.4% of capital with advantage  6 to  9 percent
Bet 0.2% of capital with advantage  3 to  6 percent
Bet 0.1% of capital with advantage  0 to  3 percent

The advantages are percentage points.  For instance, one game Peta projected the Reds at a 57.6% chance of beating the Brewers.  Vegas odds were only 51.2%, so Peta "deemed the Reds had a 6.4 percent better chance of winning than the implied odds".

Most of his wagers came with a smaller advantage, rather than a larger one.  Here's his table of results (from page 296):

2.0% bets:  26- 13, cash return of +14.8 percent
1.5% bets:  20- 15, cash return of  +4.0 percent
1.0% bets:  38- 34, cash return of  +2.5 percent
0.5% bets:  59- 56, cash return of  -4.1 percent
0.4% bets: 189-164, cash return of +10.2 percent
0.2% bets: 307-339, cash return of -12.1 percent
0.1% bets: 448-387, cash return of  +5.6 percent
------------------------------------------------
Total    1087-1008, cash return of +28.8 percent

The great majority of bets were in the bottom three groups.

------

I was surprised that Peta was able to beat the bookies using a system that's really not that complicated.  Are the oddsmakers so ill-informed that they be so easily exploited?  Does Vegas really leave that many 15 percent overlays, when thousands of people have read Bill James and follow sabermetrics?

So I did some calculating and simulating ... and, no, the oddsmakers aren't quite that bad.  

Take a look at Peta's table.  He placed 72 bets in the "1%" group.  On those bets, Peta went 38-34.  On "pick-em" bets of a dollar each, he'd earn $38 for the winners, and lose $35.70 for the losers (losses cost $1.05, which is the source of the casino's profit).  That's a total win of $2.70.  Since each bet is 1 percent, the total bankroll must be $100.  So that's a return of 2.7 percent.

That comes close to Peta's report of 2.5 percent.  Why the difference?  Because mine is a simplified estimate.  First, Peta bet a combination of favorites and underdogs, which would change the numbers slightly ... and, second, because the payroll varied throughout the season, so not all bets were for the same amount.  

Now, what "should" have happened?  Well, on those bets, Peta thought he had an edge of 11 to 13 percentage points. Let's call it 12, which means that he should have won 62 percent of his bets instead of 50 percent.  That is: he "should have" gone roughly 44-27.  That would have been a return of over 15 percent, not 2.5 percent. 

The difference might just be luck; after all, we're only talking about 72 games.  But, if you repeat the calculations for the other categories, you find that, if Peta's assumptions were correct, he should have made WAY more money.  Here are the "should haves", based on Peta's percentages (I used an 18 percent edge for the top group, and the midpoint of the range for all the others):

2.0% bets:  26- 13, cash return of +28 percent
1.5% bets:  22- 13, cash return of +15 percent
1.0% bets:  45- 27, cash return of +18 percent
0.5% bets:  69- 46, cash return of +12 percent
0.4% bets: 203-150, cash return of +21 percent
0.2% bets: 352-294, cash return of +12 percent
0.1% bets: 430-405, cash return of + 3 percent
-----------------------------------------------
Total     1147-948, cash return of +108 percent         

The system should have made 108 percent, not 28!  And, taking compounding into account, because the capital would be growing over the course of the season ... he probably should have wound up with almost triple his original bankroll.  (Why triple?  Because 100 percent interest, compounded "instantly", yields "e" times your original capital, or 271.8 percent.  Here, we have 108 percent compounded daily, which, coincidentally, is very close.)

28 percent is about one-quarter of 108 percent, which suggests that Peta overestimated his edge by about three times!  

I ran a simulation to confirm, and, yes, it seems to work out.  If I move every edge three-quarters of the way back to even -- that is, regress 75 percent to the mean -- I get almost the same bottom line Peta did:

Peta actual: 1087-1008, cash return of 29 percent
Simulation:  1072-1023, cash return of 30 percent

Of course, there's still plenty of luck involved.  In my simulation, the SD of the final return was around 27 percent.  Peta was around 1 SD better than breaking even.  But: breaking even shouldn't be the reference point.  On Peto's bets, the casino had an advantage of around 1.67 percent.  That eats away at your capital.  If you were no better than an even guesser, my simulation says you'd wind up with around a negative 10 percent return, with an SD of 20 percent or so.  In that light, Peta's +29 percent is about 2 SD above random.

And the simulation confirms that almost exactly ... when I run 1,000 unskilled seasons, 25 of them came out above +28.8%.

I'm actually surprised that a random bettor can beat the house edge by this much, even 25 times out of 1000.  I expected it to be much harder.

------

So Joe Peta's edge was only about a quarter as large as he thought ... but it was still enough to give him a hefty profit, and finish 2 SD above random.  

But ... was Peta just lucky?  It's possible.  But ... my gut thinks that the system was truly good enough to be profitable.  Part of the reason is that Peta turned another profit the year after (albeit a smaller one, only 14 percent).  The other part is ... well, the system seems reasonable.  If bettors have biases in favor of certain teams or pitchers, or they overbet momentum when they shouldn't, or they have other irrationalities, there should be money to be made using a decent, disciplined system.

Still, I suspect there was *some* luck in Peta's results.  If Peta does this again for 2013, my over/under for his profit would be somewhere in the single digits.  

I could be wrong. 


-----

Update, 3/10: good comments at Tango's site ... in particular, check out MGL's #2 and #3.  




Labels: , ,

Tuesday, March 05, 2013

Crashing the basket, and the tradeoff between offense and defense


Your NBA teammate is taking a jump shot.  Should you crash the basket, or should you retreat back on defense?  If you crash, you gain a better chance at grabbing the rebound if the shot misses.  If you drop back, you can play better defense on the transition, since you're not trapped behind the play.

One of the research papers presented at the 2013 Sloan Conference (.pdf) talks about this trade-off.  I'm not sure I agree with its conclusions, but I was struck by one of the findings that came out of the data.

It turns out that some teams crash the basket more than others.  That's probably not a surprise ... it's part of a team's particular strategy.  Some coaches like to crash more, and some like to crash less.

Eyeballing figure 4 of the paper, it looks like the Houston Rockets crashed (meaning, in this context, that video replays showed more players moved towards the basket than away from it) on more than 40 percent of missed jump shots (taken minimum 15 feet from the basket).  The Golden State Warriors, on the other hand, crashed only 26 percent of the time.  

Could the difference be random?  I don't think so.  It looks like there are around 500 missed shots in the database for each team.  Coincidentally, that gives us a perfect baseball analogy: the Warriors hit .260 in 500 AB, while the Rockets hit .400.  Intuitively, we know that's a non-random difference.  

Doing the calculations ... the difference is around 4.8 SD.  Even after adjusting for selective sampling -- I chose the highest and lowest of the 12 teams that had enough data to be included in the paper -- it seems like the authors are measuring something real.  

This has obvious implications for the rebounding debate.

We now know that some teams get more rebounding opportunities than others, by choice.  Which means some *players* get more rebounding opportunities than others, by choice -- by the role the coach and system are asking them to play.

And that means that you can't necessarily judge a player's rebounding on the basis of his stats alone.  We've talked about that before, in terms of "diminishing returns", players taking rebounds from their teammates.  This is a bit different -- the *coach* taking rebounds from his players, by his decision to have the team challenge for offensive boards less often.

Now, you might argue that the difference is minimal.  Offensive rebounds are, what, 30 percent of the total?  And we see the most extreme difference between teams is 14 percent of that 30 percent.  And there are five players on the court, so, on average, a player's stats will fall by 20 percent of 14 percent, of 30 percent ... which is less than one percent of the opponent's missed shots.

But ... even that small effect could be significant when ranking players.  And, it could be one particular player taking most of the brunt, if it's always the same guy that the Warriors send back.  And, each offensive rebound not snagged means one more defensive rebound for the other team -- which, again, could be some players more than others.

In any case, in the past, we've had circumstantial evidence about whether rebounding numbers are affected by style of play ... but now, I think, we have evidence that's much more direct. 

What I find even more interesting is that now we have evidence that rebounding numbers aren't even accurate for TEAMS.  The Warriors totals are low by design -- the team is trading rebounds for defense.  Which means, you can't just look at team offensive numbers in isolation.  Offense and defense really are intertwined.  We knew that already, in theory, but ... well, for me, personally, I often just pay it lip service, and forget about it.  But, now, this kinds of throws it in my face.  It's the first time I've seen this well-defined a statistical footprint of the tradeoff between O and D.  

I won't be looking at NBA team stats quite the same way again.  From now on, any time I see a team with poor offense, I'll be thinking, well, how much of that is because they decided to sell some of it off in exchange for defense?  

Labels: , ,