Monday, June 27, 2016

NHL teams strategize when to play for overtime

Here's an article I found a year ago in the Journal of Sports Economics, but didn't get around to writing about until now.

It's by Michael Lopez, and it's called "Inefficiencies in the National Hockey League Points System and the Teams That Take Advantage.

As is well-known, NHL teams have an incentive to get games to go to overtime. If a game is settled in regulation, the winning team gets two standings points, while the loser gets none. However, if a game goes to overtime or a shootout, the winning team gets the same two points -- but the losing team gets one point too.

An overtime game is better for teams, in general, because they get to split three points between them instead of just two. So it's no surprise than NHL teams respond to the incentive. In the first thirteen seasons after the "loser point" rule was adopted, the frequency of overtime games jumped from 20.2 percent to 23.6 percent. (Coincidentally, it's the same 23.6 percent before and after the shootout was adopted.)

Lopez's paper was able to quantify two new additional findings:

1. Games are more likely to go to overtime later in the season than earlier; and

2. Games are more likely to go into overtime when teams are not in the same conference.

These make sense, intuitively. Later in the season, some teams are fighting desperately for a playoff spot, and the extra standings point is much more important for them in terms of leverage. And, whether a team makes the playoffs depends only on the other teams in its own conference, so sharing an extra point with an other-conference opponent doesn't cost anything at all. (Well, maybe it might, rarely, cost home-ice advantage in the finals, but that's highly unlikely.)

As mentioned, 23.6 percent of games went to overtime in the shootout era. But the overtime percentage varies substantially by situation:

25.4% Inter-conference games
23.2% Within-conference games 

22.0% September-December games
23.8% January-February games
25.6% March games
29.3% April games

The conference difference is only significant at p=.08, but the month difference is significant at p=.001.


But, Lopez found, the differences are actually larger than those raw percentages, because the two situations aren't independent. As it turns out, the NHL tends to schedule within-conference games late in the season. That's for drama, so that the most meaningful, high-leverage games are likely to be against historical rivals.

Because of that, the two effects partially cancel each other out. The late-season effect tends to increase overtimes, but those games tend to be within-conference, which decreases them.

Lopez separated out those factors with a regression. Calculating from his coefficients, and assuming teams of equal talent, I get:

23.5% within conference, early in season
26.2% different conference, early in season
31.8% within conference, April
35.0% different conference, April

So, the differences are a lot bigger than the raw numbers show.


Something else that's interesting: the "within conference" effect is very recent.

The overall conference effect was 2.7 percentage points (26.2 versus 23.5). But, almost the entire effect came from the last two seasons in the study. For the study's first twelve years, there was almost no difference at all, on average. But in 2010-11 and 2011-12, the conference effects were 4.7 and 5.8 percentage points, respectively.

It's like teams suddenly caught on to the idea that they don't want to give points away to conference opponents.

But ... well, it seems to me that strategy doesn't really make a whole lot of sense.

Yes, it's true that you don't get an advantage against your rival by playing for three points instead of two. It almost seems like it's worse -- if you win in overtime, you only gain one point on your opponent (you get two, they get one). But if you win in regulation, you gain two points!  Except that it's symmetrical ... if *they* beat *you* in overtime, they only gain one point on you. 

The disadvantage comes not from any negative expectation -- it's symmetrical, after all -- but that the other-conference games come with a *positive* expectation.  You share three points instead of two, but your opponent's gain is not your loss, so the more points to split, the better.

So, against that particular opponent, the inter-conference overtime game is much better for you, with 50 percent more points up for grabs, and no penalty for the points the other team takes, beyond your disappointment at not getting them yourself.

The problem, though, is: that's only true for the one team you're playing against. But, you're not just competing in the standings against this one particular opponent. You're also competing against the other 12 (West) or 14 (East) teams in the conference! If you can raise the expected payoff to 1.5 points each instead of 1.0, you break even against the one same-conference opponent, but gain an expected half point against at least 12 other teams!

Sure, there's *a bit* less incentive within conference, because you stand to gain on only 12 teams, instead of 13 teams when you win an inter-conference game. But, that's so small a drop in incentive that you shouldn't even see it. 

To repeat an analogy I've used in the past: If you see a $2 coin in the street, you'll pick it up. If it's only a $1 coin, sure, you're less likely to pick it up, in theory. But, in practice? You'll still pick it up so often that nobody will be able to tell the difference. It's like a 99.99% chance compared to a 99.98% chance, or something.


Also: why should there be a March/April effect?  Every game counts equally in the standings. A November game is just as important for making the playoffs, on average, as an April game.

Of course, in April you *know* how important the game is, whereas for a November game, it might, in retrospect, turn out to have been meaningless. But, since games count equally, the overall leverages have to be the same. If that's the case, then for every absolutely crucial April game, there must be an offsetting meaningless one, in order for the April average to equal the November average.

I wonder if the April effect applies only to the most important games. Maybe teams are thinking, "well, we feel a bit weird lowering our intensity to play for the regulation tie, so we're only going to do it when it's really, really important."  In other words, the probability of overtime doesn't increase smoothly with leverage -- instead, it takes a big jump when the pressure to gain points is exceptionally high. 

Maybe I'm 100% willing to steal food if I'm on the brink of starvation, but I'm not 50% willing to steal food if I'm only halfway to starving. In the latter case, the risk isn't worth it.

It could be the same thing here. Maybe teams aren't willing to play a less intense strategy (or whatever they do to play for overtime) when it's an ordinary, early-season game. But, when it's *really* important, that's when it's worth the trouble.

Labels: , , , ,

Thursday, June 16, 2016

Consumer Reports on Sunscreens

Consumer Reports' July, 2016 article on sunscreens promises to "expose startling truths about product claims and effectiveness."  Their most important claim is this:

"43% of sunscreens in our tests failed to meet the SPF claim on their labels."

Call me skeptical, but my first reaction was to suspect that their testing wasn't quite right. 

CR and I have quite different priors about how companies operate. I believe that companies are generally honest, and dishonest manufacturers and bad products get shunned by consumers quicker than CR thinks, and there's no way 43 percent of manufacturers are outright lying and ripping us off. 

But, sometimes, it seems like CR thinks the manufacturers would cheat us from head to toe if they thought they could get away with it. 

Here's how they tested:

"A standard amount of sunscreen is applied to small areas of our panelists' backs. Then they soak in a tub of water. Afterward, each of those areas is exposed to six intensities of UVB light from a sun simulator for a set time. About a day later, the six spots are examined for redness."

Aha! CR rated sunscreens only on how they performed after water exposure, like swimming and heavy rain. They didn't include "dry" protection, or even protection after sweating. Their accusation that "almost half" the products fail to "live up to the SPF claim on their labels" is based solely on performance after a soaking period. 

It took me a while to figure that out. The article does state that explicitly, but not in the main article. There are only two places that I can see where CR mentions it.

The first is in a full-page sidebar on page 27. But not in the main text. There, CR touts their "troubling" findings and demands FDA action, but doesn't mention they report and rate only on post-immersion performance. You have to look down into the smaller print, beside a couple of pie charts, to find that out. (It's in larger print on their website, which includes only a part of the full magazine article.)

The second is in the ratings. Well, not the ratings themselves -- the charts themselves are headed "Tested SPF" with no mention of water. You have to look at the footnotes.

In fairness to CR, they do note that every product does make claims of water resistance, either for 40 or 80 minutes, which is why they feel justified in testing this way. But ... they never say that the manufacturers guarantee the exact same SPF value after immersion. Couldn't it be that the sunscreens just vary in how water resistant they are?

In other words -- maybe the sunscreens just vary in water resistance! Isn't it possible that the manufacturers are claiming a "dry" SPF, and the products just vary in how well they perform after swimming? 

That was my first guess, but, probably not. I found a website that says the law *does* require the label SPF to be met after immersion

But, maybe the immersion tests are weaker than real life, the same way that cars usually get lower mileage than their EPA estimates. According to the website, the test involves submerging an arm in a Jacuzzi, then measuring the SPF afterwards. It could easily be that arms are not the same as backs, or a Jacuzzi is not the same as the tub CR used. 

Who knows, maybe, in their 40 or 80 minutes of soaking, CR's participants leaned their backs against the sides of the tub, and some of the lotion rubbed off? I'm betting CR was too careful to let that happen, but ... in any case, it would be nice if CR would consider that there might be something going on other than fraud. 

If it's true that the FDA does require independent lab testing, it wouldn't just be that 43 percent of manufacturers are lying, it would be that 43 percent of manufacturers are falsifying results. That's a strong accusation to be making, and, to me, not a very plausible one.


Even if CR's testing is perfectly accurate, why don't they mention the products' performance in other contexts? Nowhere in the article or sidebars or footnotes does CR mention, even once, how the products did in dry or sweaty conditions. Their accusation that "almost half" the products "fail to live up to the SPF claim on their labels" is based only on performance after soaking. 

I count at least five times where CR accuses the manufacturers of overreporting the SPF, but only one of those times does CR mention "after water immersion" at the same time.

And why don't they report on dry performance at all? In a second test, they report that they "smear sunscreen on plastic plates and pass UV light through and measure the amount of UVA and UVB rays that are absorbed."  So, they actually *do* have test results for UVB effectiveness when dry. 

They report on the dry UVA numbers -- those form the basis of their UVA rating --- but the dry UVB numbers are completely ignored.

(UVB is the part of the spectrum that's relevant to SPF; UVA is a more dangerous component of sunlight that doesn't cause burning or tanning, but still can lead to damage and cancer. CR chose to use a more strenuous test for UVA than the FDA requires. They didn't accuse any products of failing to meet their UVA claims, even those that rated "poor".)

Why don't they tell us how the products performed dry? Is it because, in non-immersive use, they do indeed live up to their claims? Wouldn't it be fair to mention that? 

Why bury the important qualifier that the failures are all related to post-immersion performance? 


It must be that even for the best products, water resistance isn't perfect -- that the performance of all the sunscreens is reduced after a swim. 

In that case, if a product tests at SPF 50 after immersion, doesn't that mean it must have been higher than SPF 50 beforehand? That must be the case, right? Unless some products are so completely, perfectly unaffected by water that they don't degrade at all.

But if products didn't degrade in water ... well, first, wouldn't manufacturers claim more than 80 minutes of wet protection? Second, if natural friction with water isn't sufficient to degrade the layer of sunscreen even a little bit, wouldn't that make it really hard to rub off by other means, like a towel? And, third, if there's no degradation, why are we advised to reapply sunscreen after swimming?

My guess is, if Coppertone Water Babies SPF 50 "meets [its] claim" after 80 minutes of immersion, it must have been a lot more than SPF 50 beforehand. 

Why isn't that a problem for CR, that people are buying what they think is SPF 50, but it's actually SPF 100 when dry? Probably because CR thinks higher is always better. They rate the products by actual (wet) SPF, "not how close a sunscreen comes to meeting its SPF claim."

Which is OK, I guess. The difference between 50 and 100 isn't that big a deal. The SPF is the fraction of UVB the product allows through, so SPF 50 means you're exposed to only 1/50 of what you'd get without it, and SPF 100 means you're exposed to a 1/100 dose. Sure, you get twice as much UVB from SPF 50 as SPF 100, but it's twice a very small amount. The difference, after four hours in the sun, is the equivalent of 2.4 unprotected minutes. 

Fair enough. I can't see anyone saying, "I feel ripped off that my SPF 50 is actually SPF 100. I was counting on those 2.4 extra minutes to tan better."  If you wanted to get a tan, you'd probably use something lower, like SPF 4 or 8 or 15. (That's when you have a problem, that your SPF 4 is actually SPF 10 when dry, which *is* a big difference.)

In any case ... why does CR not advise, explicitly, that higher is better? They imply it, by using a "higher is better" standard in their ratings, but don't actually say, "just get a big number over a small number."


Instead, CR advises you to use only products that are SPF 30 or higher. That "30" doesn't come from any first-hand evidence or argument, but because CR just defers to what the American Association of Dermatology recommends. 

Accordingly, CR didn't test or rate any products labelled lower than SPF 30.

What's strange is ... they don't even *mention the existence* of those other products! More importantly, they don't even mention why lower SPF products exist. I'm guessing it has something to do with tanning but not burning? Except that CR doesn't talk about the difference between a tan and a burn, except that "a burn and the tan are kind of the same response ... you've recieved an injury that can lead to cell mutations that trigger skin cancer."

Nowhere in the article does CR mention what to do if you want to tan as safely as possible.

CR's mission statement says they "empower consumers with the knowledge they need to make better and more informed choices". But, in this case, they don't seem to care about helping us make the choice of whether a tan is worth the risk. For pale caucasians like me, "Stay pasty white in order to be safe!," seems to be only valid choice in their eyes. 


When a product failed to live up to its SPF label after immersion, CR reports the actual SPF they found, like those two kids' sunscreens that tested at 8 instead of 50.

But when a product *did* meet its claim, CR doesn't tell you the actual number. In that column on the chart, they just say, "Meets Claim".

Well, that's strange, isn't it? If an SPF 50 product tested at SPF 60 after immersion, you'd expect CR to say, "hey, this is better than advertised!"

Why don't they? Maybe it's part of their "companies are always out to rip us off" attitude, so they don't want to promote the idea that sometimes we get more than we pay for.

Or, maybe they think that if the manufacturer is only promoting SPF 50, that's all they're obligated to deliver. So, maybe this batch turned out to be SPF 60, but the next batch might not.

That would actually have some logic behind it. But that can't be CR's rationale. If it were, then, for those products, they'd base their ratings just on the label claims. But, instead, they seem to be actually basing their ratings on the full, observed SPF. 

We know they're doing that because: in the lotions chart, "California Baby Super Sensitive SPF 30+" has a "Meets Claim" and a UVB rating of "Good". But, "California Kids #supersensitive SPF 30+" also has a "Meets Claim," but a UVB rating of "Very Good". 

Clearly, they can't be basing the rating on the label "30+" claims, because then the ratings would be identical. So, they must be using the actual, observed SPF.

So, the mystery remains. Why won't they tell us about the products that deliver more than they promise?


A fair bit of the article isn't about specific products, but about sunscreen use in general. And a lot of it doesn't make sense to me. For instance:

"It's not true that sunscreens with higher SPF block double or triple the rays as those with lower ones. They really only provide slightly more protection," [Dr. Mina] Gohara [associate clinical professor of dermatology at Yale School of Medicine] warns. The breakdown: SPF 30 blocks 97 percent of UVB rays, SPF 50 blocks 98 percent, and SPF 100 blocks 100 percent."

OK, sure. But you can recast that like this:

"It IS true that sunscreens with lower SPF allow double or triple the rays as those with higher ones. The breakdown: SPF 30 admits 3 percent of UVB rays, SPF 30 admits 2 percent, and SPF 100 admits only 1 percent."

Put that way, do you still want to say that higher-SPF products only give "slightly more protection?"  

Which perspective makes more sense -- 97 to 100 percent, or 1 to 3 percent? Do you care how much UVB gets rejected, or how much gets through?

Well, you do, in fact, get three times as much harmful UVB using the SPF 30 than you do using the SPF 100. Sure, it's three times a small number, and, sure, you expose yourself to *one hundred* times the harm if you don't use sunscreen at all. So, on the one hand, who cares if the SPF 30 is triple the exposure, if that exposure is small in the first place? 

But, CR isn't about to take that tack, that 1 percent or 3 percent of UVB is nothing to worry about. If they did, they wouldn't stop at SPF 30. Because, they could easily have made the same argument for SPF 20:

"The breakdown: SPF 20 blocks 95 percent of UVB rays, SPF 30 blocks 97 percent, SPF 50 blocks 98 percent, and SPF 100 blocks 100 percent."

But CR won't even consider recommending an SPF 20, even though it takes you 95 percent of the way to safety. It seems kind of arbitrary to insist that (a) 97 percent is acceptable, (b) going up to 99 percent doesn't matter much, but (b) dropping to 95 percent is too dangerous to even talk about.


It's not that hard to quantify the SPF difference in other terms. 

Suppose you spend 8 hours in the sun. With SPF 50, it's like spending 9.6 minutes unprotected, since 8 hours divided by 50 equals 9.6 minutes.

With SPF 100, it's like spending 4.8 minutes unprotected. 

Either way, you're not going to get a sunburn, right? But, maybe 4.8 minutes is still harmful, to a small extent. And maybe that 4.8 minutes worth of damage is cumulative, like smoking. Maybe you don't get cancer from a one-time 4.8-minute session, but expose yourself regularly and your odds get much worse. 

So, is the damage cumulative? CR doesn't tell us. They imply that it is, but their argument doesn't say what they think it does. They hypothetically ask, "If I've never used sunscreen my whole life ... isn't the damage already done?"  And then they respond with a completely irrelevant factoid: 

"Hardly. For years, it was believed that we got the majority of our lifetime dose of UVA and UVB exposure before the age of 18. But now experts know that it's cumulative over your lifetime. By age 40, you've received just 47 percent of your lifetime sun exposure. ... The upshot: It's never too late to start protecting your skin."

That sounds like it's answering the question, but it's not. The "cumulative" here talks only about exposure, not damage.

It's more obvious if you make the same "exposure" argument for car accidents: 

"By age 40, you've driven just 40 percent of your lifetime miles behind the wheel ... so it's never too late to start wearing a seat belt."  

That's true, but that obviously doesn't mean the miles you drove beltless before you were 40 are going to cause you damage in the future.


From the main discussion of sunscreen use in general, here's something that didn't make sense to me:

"Not realizing [the small differences in higher SPF values] may lead people to think that if they use a higher PDF, they don't need to reapply or practice other sun-savvy behaviors, such as seeking the shade or covering up."

Well, I'm "led to think that" because it has to be somewhat true, doesn't it, that if you use a higher SPF, you probably don't have to reapply as much? Not because the higher SPF lotion is more durable, but as it breaks down, there's more protection left over.

The standard advice is to reapply every two hours. But it's not like the protection turns into a pumpkin right at the 2:00 mark. It drops gradually. Suppose, for the sake of argument, that after two hours, you've lost half your sunscreen to friction and sweat, and your SPF 50 is an SPF 25. (This seems reasonable, since CR tells us that if we use half the recommended amount of sunscreen, we get half the SPF.)

If you had used an SPF 100 instead, then after two hours, you're still at SPF 50. That's still pretty good. 

I'd be tempted to get an SPF 100 and reapply every four hours instead of getting an SPF 50 and reapplying every two hours, just for convenience. Would that work? CR doesn't care, or won't tell me. 

But maybe the protection *does* turn into a pumpkin after two hours, after all. Maybe sunscreen drops off bodies at a constant rate, like sand drops through an hourglass -- half is gone after one hour, and the other half after the second hour.* Maybe, when hit by a photon of UVB, each particle of sunscreen absorbs it like a human shield, dying in the attempt, and there are exactly enough particles to last exactly two hours.

(* Actually, I'm pretty sure the second half of the sand in the hourglass drops slower than the first half ... for water, I think I read that the flow rate is proportional to the depth. Maybe it's different for sand. Whatever.  :))

Maybe the deterioration is even faster than constant -- like human aging. If you hire an army of teenagers to protect you, you still have 90 percent of your protection after 40 years. But then your protection gets worse, and by 80 years, your army has almost all died away. 

Would be nice to know, wouldn't it? "Reapply every two hours" sounds like an arbitrary rule, not an attempt to maximize protection given constraints of cost and convenience.

Instead of "reapply every two hours," CR should tell us, "almost all the protection disappears between the second and third hour."  Give us useful information, instead of perplexing advice.


"Won't getting a tan actually protect my skin?" the article asks. And CR answers:

"A tan provides the equivalent of up to an SPF 4, but any darkening is just a sign that your skin is defending itself against a solar assault and attempting to prevent further damage."

So, yes, CR admits, a tan *does* protect my skin. But, CR says, some damage was caused in the process of getting the tan in the first place.

What they don't say is: is the tradeoff worth it? Maybe it takes me 50 hours to tan, but, as a result, my next 4000 hours only count as 1000 hours. In that case, I've saved myself the equivalent of 2950 unprotected hours by doing the extra 50 in the first place!

Of course, it depends. The act of tanning might be especially damaging. I can't get a tan with an hour a week over 50 weeks; it has to be 50 hours over a short period. Maybe those hours are especially dangerous. Or, maybe in the 4000 hours that come later, I'd have gotten a tan anyway. Or, maybe if I didn't get a tan because it was only an hour a day, those hours wouldn't be dangerous at all.

In any case, once I have a tan, I may have an automatic SPF 4. Does that mean if I use an SPF 15, I really have an SPF 60? Because, first the sunscreen lets only 1/15 of UVB get to the surface of the skin, and then the tan admits only 1/4 of what remains. That works out to 1/60. 

It does seem like that should be how it works, based on what CR tells us about how sunscreens provide protection. Doesn't that mean that people with tans can get by with a lower SPF?

Not only does CR not admit the possibility -- they actually go out of their way to implicitly reject the idea, by warning black people to follow the same advice as white people:

"Nor are people of color immune to skin damage from the sun's rays. People of all ethnicities can get sunburn and skin cancer. "There is no circumstance under which dark-skinned people shouldn't be wearing sunscreen when exposed to the sun," Gohara adds."

Well, taking literally, Dr. Gohara's assertion is pure nonsense. Of course there are circumstances in which dark-skinned people -- or people of any other color -- don't need sunscreen. Walking 100 feet to the mailbox, say. I'm pretty sure that even Dr. Gohara herself doesn't slather on the minimum one ounce of SPF 30 to walk from her house to her car before driving to work. 

And what about winter?

This sounds like I'm nitpicking ... after all, I know what she means! But it's not. Because, it's an evasion. CR (and Dr. Gohara) are ducking out of the obligation to admit there's a tradeoff between convenience and safety, and to quantify the tradeoff. To them, wearing sunscreen is a moral obligation, like wearing a bicycle helmet or a seatbelt -- you should do it just because you should do it, because that's what right-thinking people do!

Dr. Gohara isn't just saving words by saying "there is no circumstance."  She, and CR, are trying to bully their way out of acknowledging that the risk is much lower in some situations than others, and that the sun is indeed less dangerous for darker-skinned people than lighter-skinned people. 

I mean, *of course* darker-skinned people can still get skin cancer, just like, *of course* you can still get a concussion if you fall on your head walking without a helmet instead of biking without a helmet. That's not the point. The point is: what's the risk, and is it high enough to justify the trouble and inconvenience?

A quick Google search tells me: white people have *twenty-five times* the risk of skin cancer than African Americans. CR could have, and should have, told us that.

And, given that that's the case, CR should definitely be advocating less stringent requirements for dark-skinned people than light-skinned people. By any cost-benefit calculation under which CR determined that SPF 30 is good enough for white people, the same calculation would come up with a much lower threshold for black people. 


Oh, and by the way: there is no mention of time of day. Ultraviolet radiation, as measured by the "UV index" you see on weather reports, is at its highest around noon -- in the earlier morning, or late afternoon, it looks like it drops to maybe half its peak level. Wouldn't it be OK to wear just SPF 15 if we go out in the morning? Wouldn't a walk in the evening, where the UV index is "low," be one of those non-existent circumstances where dark-skinned (and light-skinned) people can skip the sunscreen? 

I'm pretty sure that if I went to the beach at 7pm, when the shadows are long, and started slathering on the sunscreen, people would look at me like an idiot. But at least Dr. Gohara would be happy with me!

How could CR do an entire article on sunscreens and sunburn and cancer without mentioning the danger varies significantly by time of day?


Are the chemicals in sunscreens toxic? CR says: no. But they hedge their bets. First, they say the risk of getting skin cancer from sunburn trumps any concerns about toxicity. And then, having spent most of the article so far telling us over and over and over that we should use sunscreen, now they tell us, well, maybe it's not a good idea to rely on it too much anyway!

It's a weird non-sequitur. It reads to me like CR admits only reluctantly that sunscreen is safe, because they're vaguely bothered by the idea that people now feel freer to lie on the beach. Because ... well, sunbathing is kind of unwholesome, and even with sunscreen it's not *perfectly* safe.

I bet if you ask CR if smoking causes ingrown toenails, they won't say "no."  They'll say, "no, but it causes lung cancer and emphysema and all kinds of other dangerous conditions, so don't smoke!"  

So, instead of just saying "sunscreen doesn't cause ingrown toenails," CR says "sunscreen doesn't cause ingrown toenails, but it could still be dangerous because you might stay out in the sun longer and not use it properly and wind up getting cancer and even if you do use it you might still get cancer even if you're black and look at all the people who use sunscreen and get cancer anyway!"

Of course, they express it in a higher-class way than that:

"Experts speculate that sunscreen use -- and misuse -- may give some people a false sense of security; people think that no matter how casually they slather the stuff on, they will be protected and therefore they stay out in the sun longer -- often without reapplying. Remember, no sunscreen blocks 100 percent of UV rays, and sun damage is cumulative."

This has absolutely nothing to do with the question of toxicity. They just want to scare us. And, having just finished telling us that sunscreen itself doesn't actually cause cancer in the normal, cause-and-effect way, they proceed to try to scare us into believing it causes cancer statistically. Their statistical arguments are appallingly bad:

"In two European studies, people who used SPF 30 sunscreen sunbathed up to 30 percent longer than those using SPF 10."

Well, duh! Phrase it this way, which means the same thing:

"In two European studies, people who chose to sunbathe longer chose SPF 30 instead of SPF 10."

People are being rational, and making reasonable tradeoffs -- but because they like sunbathing, and CR doesn't, they can't possibly be rational. 

Here's another one:

"... a number of studies have shown a correlation between the use of sunscreen and increased incidence of sunburn."

Well, duh! People who don't go out in the sun (a) don't get sunburn, and (b) don't use sunscreen. CR might as well have written,

"... a number of studies have shown a correlation between the amount of time wearing a seatbelt and increased incidence of car accidents."

This one is just as bad:

"For instance, a Danish study reported that 66 percent of sunburned people had used sunscreen to prolong time spent in the sun."

Well, yes. And probably 66 percent of people who die in car accidents had used seatbelts to prolong time spent in the car.

And, finally:

"On average, your risk for melanoma doubles if you've had more than five sunburns."

Doubles? Meh! That's nothing! You know what? Your chance of induction into the Baseball Hall of Fame probably goes up by twenty-five thousand times if you've had at least one MLB base hit!

The point, of course being: "more than five" includes six, but also includes 50 and 100 and 1000, and people who are so sunburned that they're skin looks dried up and wrinkly and creepy. CR is trying to shoehorn "six sunburn" people into the same classification as "100 sunburn" people, the same way I shoehorned "one base hit" into the same group as "4192 base hits."

Moreover ... your risk doubles compared to what? Compared to zero sunburns? Compared to average? I think it's probably compared to everyone with less than five sunburns, but I'm not sure. So not only is the statement misleading, it's fatally ambiguous.

The American Academy of Dermatology reports that the incidence rate of melanoma, for non-Hispanic Caucasians, is 25 cases per 100,000 population per year. That includes everyone who has had sunburns, as well as those who haven't.

For argument's sake, let's assume that the "five or less" group is the same size as the "more than five" group. In that case, the respective rates must be 16 cases per 100,000, and 32 cases per 100,000, respectively.

So if you're in the latter group, and you're typical of that group -- which probably means substantially more than six sunburns, since six is the minimum -- your excess risk is 16 cases per 100,000 per year. Over 40 years, that's 640 cases cases per 100,000, or 6.4 cases per 1,000. 

In other words, your sunburns have given you an excess risk of six-tenths of a percentage point.

The ADD reports that skin cancer is "highly curable" if detected early, with a 98 percent survival rate. So, the death rate is 50 times smaller than the incidence rate -- 12.8 deaths per 100,000 people over a lifetime of "more than five sunburns."

How risky is that? Well, it's the equivalent fatality risk you get driving an extra 12,800 miles in your lifetime. 

Suppose that the average "more than five" sunburns is 10. That means one sunburn is the equivalent death risk of driving 1,280 miles. 

Coincidentally, I actually make a drive of about that distance, Ottawa to Toronto to Pittsburgh to Ottawa, twice a year. If that's all the risk I get from a sunburn, it seems pretty safe to me. It's a risk I'm certainly willing to take.


Remember when I wrote about Consumer Reports' article on extended automobile warranties? They spent the entire article telling me why I shouldn't buy one, and then, at the end, out of the blue, they said I should!

They did it again here, in the context of "false sense of security":

"Don't think that reapplying sunscreen meticulously allows you to stay in the sun longer than a sunscreen's approximate maximum protection time. After you've exceeded that time, your best option is to cover up, wear a hat, and seek the shade."


Both Dr. Gohara and CR spent the first part of the article warning me to reapply every two hours -- but now CR is telling me that reapplying doesn't work!

But reapplying *must* work, right? I mean, the physics of UVB reflection and the laws of chemistry still apply. If the SPF 30 was blocking 97 percent of ultraviolet for the first two hours, it should do the same for the second two hours, no? 

This feels like the kind of tainted, emotion-driven "logic" that a panicky, anxious parent would say. "You've spent two hours in the sun already! That's enough for one day! Come inside! Playing in the sun is dangerous, you have to do it in moderation!"

Sunbathing has a health stigma attached to it, so it feels like a vice, even when the risk is mitigated. There's no way CR would take this kind of position for activities it approves of.
"When your bicycle helmet breaks, don't think that meticulously reapplying a new helmet allows you to ride safely longer than a helmet's maximum protection time. After you've exceeded that time, your best option is to walk, drive, or seek a safer activity."

That's not any more absurd than what CR wrote about sunscreen. 

Labels: ,

Sunday, June 05, 2016

Charlie Pavitt on "Redskins"

Here's occasional guest blogger Charlie Pavitt commenting on the use of the nickname "Redskins."  He invites intelligent responses, as always.


Along with others who are troubled by the commercial use of the term "Redskins" in reference to the Washington area football team, I was surprised by the results of the survey indicating that about 75% of Native American respondents did not find the term insulting, and only about 10% did (I don’t remember the exact numbers). Now one thing I have learned over my years as a social scientist is to be suspicious of surprising survey results. I automatically think in terms of biased question writing, but I did have the opportunity to read the question, which was worded something like (again memory fails) the following:

Do you or do you not find the term “Redskin” insulting?

I don’t find that biased, so I can’t quibble with the reported results on that issue.

But I wish we could find out how that sample of Native Americans would have responded to two other questions. One was suggested by a statement in response made by an activist on this issue (third memory flaw – forgotten name):

Are you or are you not comfortable with the commercial use of the term “Redskin” by the Washington area football team?

Is that not the issue that is really at stake here?  The second comes from a remark by a sportswriter made on ESPN (fourth memory failure):

Would you or would you not be insulted if a Caucasian referred to you personally as a “redskin”?

My guess here is that a large proportion would find that insulting.  And if I am correct about that guess, then the term is insulting no matter the results of the question that was actually asked.

One more issue – I am perturbed by those who defend the use of the term because of its “tradition.” In a roughly parallel circumstance, more and more people have realized that the “tradition” of flying the Confederate States flag on state government grounds is insulting, because the reason why there was a Confederate States independent of the United States was to protect the institution of slavery. (And I don’t want to hear about “states’ rights”; the reason why the Confederate states wanted their “rights” was to protect the institution of slavery.) And there is a tradition in part of Africa to mutilate female genitalia.  I could go on and on with examples, but I’ll stop here; “tradition” is no reason for continuing a bad practice.

Anyway, I do think that some of the discussion of this issue is overblown; two hundred years of ill-treatment has left the Native American nations with far worse problems than a football team’s commercial trademark. And maybe I am wrong about the answers one would get from the two survey questions suggested above. But it doesn’t matter to me. I shall continue my practice of referring to the “Washington area football team” by that title rather than their commercial trademark.  And I invite all like-minded people to do the same.

-- Charlie Pavitt

Labels: ,

Sunday, May 29, 2016

Leicester City and EPL talent evaluation

Last post, I wondered how accurate bookmakers are at evaluating English Premier League (EPL) team talent. In the comments, Tom Tango suggested a method to figure that out, but one that requires knowing the bookies' actual "over/under" forecasts. I couldn't find those numbers, , but bloggers James Grayson and Simon Gleave came to my rescue with three seasons' worth of numbers, and another commenter provided a public link for 2015-16.

(BTW, Simon's blog contains an interesting excerpt from "Soccermatics," a forthcoming book on the sabermetrics of soccer. Worth checking out.)

So, armed with numbers from James and Simon, what can we figure out?

1. SD of actual team points

Here is the SD of team standings points for the previous three seasons:

2015-16: 15.4
2014-15: 16.3
2013-14: 19.2
Average  17.1

The league was quite unbalanced in 2013-14 compared to the other two seasons. That could have been because the teams were unbalanced in talent, or because the good teams got more lucky than normal and the bad teams more unlucky than normal. For this post, I'm just going to use the average of the three seasons (by taking the root mean square of the three numbers), which is 17.1.

2. Theoretical SD of team luck

I ran a simulation to figure the SD of points just due to luck -- in other words, if you ran a random season over and over, for the same team, how much would it vary in standings points even if its talent were constant? I would up with a figure around 8 points. It depends on the quality of the team, but 8 is pretty close in almost all cases.

3. Theoretical SD of team talent

By the pythagorean relationship between talent and luck, we get

SD(observed) = 17.1
SD(luck)     =  8.0
SD(talent)   = 15.1

4. SD of bookmaker predictions

If teams vary in talent with an SD about 15 points, then, if the bookmakers were able to evaluate talent perfectly, their estimates would also have an SD of 15 points. But, of course, nobody is able to evaluate that well. For one thing, talent depends on injuries, which haven't happened yet. For another thing, talent changes over time, as players get older and better or older and worse. And, of course, talent depends on the strategies chosen by the manager, and by the players on the pitch in real time. 

So, we'd expect the bookies' predictions to have a narrower spread than 15 points. They don't:

16.99 -- 2015-16 Pinnacle (opening)
14.83 -- 2015-16 Pinnacle (closing)
16.17 -- 2015-16 Sporting Index (opening)
15.81 -- 2015-16 Spreadexsports (opening)
17.30 -- 2014-15 Pinnacle (opening)
17.37 -- 2014-15 Pinnacle (closing)
16.95 -- 2014-15 Sporting Index (opening)
15.80 -- 2014-15 Sporting index (closing)
15.91 -- 2013-14 Pinnacle

Only one of the nine sets of predictions is narrower than the expectation of team talent, and even that one, barely. This surprised me. In the baseball case, the sports books projected a talent spread that was significantly more conservative than the actual spread. 

Either the EPL bookmakers are overoptimistic, or the last three Premier League seasons had less luck than the expected 8.0 points. 

5.  Bookmaker accuracy

If the bookmakers were perfectly accurate in their talent estimates, we'd expect their 20 estimates to wind up being off by an SD of around 8 points, because that's the amount of unpredictable performance luck in a team-season.

In 2014-15, that's roughly what happened:

7.85 -- 2014-15 Pinnacle (opening)
6.37 -- 2014-15 Pinnacle (closing)
6.90 -- 2014-15 Sporting Index (opening)
7.75 -- 2014-15 Sporting index (closing)

Actually, every one of the bookmakers' lines was more accurate than 8 points! In effect, in 2014-15, the bookmakers exceeded the bounds of human possibility -- they predicted better the speed of light. What must have happened is: in 2014-15, teams just happened to be less lucky or unlucky than usual, playing almost exactly to their talent. 

But the predictions for 2015-16 were way off:

15.17 -- 2015-16 Pinnacle (opening)
14.96 -- 2015-16 Pinnacle (closing)
15.13 -- 2015-16 Sporting Index (opening)
14.96 -- 2015-16 Spreadexsports (opening)

And 2013-14 was in between:

9.77 -- 2013-14 Pinnacle

Again, I'll just go with the overall SD of the three seasons, which works out to about 11 points.

15 points -- 2015-16
 7 points -- 2014-15
10 points -- 2013-14
11 points -- average

Actually, 11 points is pretty reasonable, considering 8 is the "speed of light" best possible long-term performance.

6. Bookmaker talent inaccuracy

If 11 points is the typical error in estimating performance, what's the average error in estimating talent? That's an easy calculation, by Pythagoras:

 11 points -- observed error
  8 points -- luck error
  8 points -- talent error

That 8 points for talent should really be 7.5, but I'm arbitrarily rounding it up to create a rule of thumb that "talent error = luck error". 

7. Bookmaker bias

In step 4, it looked like the bookmakers were overconfident, and predicting a wider spread of talent than actually existed. In other words, it looked like they were trying to predict luck.

If they did that, it would have to mean they were overestimating the good teams, and underestimating the bad teams. That's the only way to get a wider spread.

But, in 2013-14, it was the opposite! The correlation between the bookies' prediction and the eventual error was -0.07. (The "error" includes the sign, of course. The argument isn't that the bookies are more wrong for good teams and bad teams, it's that they're more likely to be wrong in a particular direction.)

In other words, even though Pinnacle seemed to be trying to predict team luck, it worked out for them!

Which means one of these things happened:

1. Pinnacle got really lucky, and their guesses for which teams would have good luck actually worked out;

2. We're wrong in thinking Pinnacle was overconfident by that much. In other words, the spread of talent is wider than we thought it was. Remember, 2013-14, was more unbalanced than the other two seasons we looked at.

I think it's some of each. The SD(talent) estimate for 2013-14 came out to 17.5 points. In that light, Pinnacle's 15.9-point SD isn't *that* overconfident.

... In 2014-15, on the other hand, Pinnacle *did* overestimate the spread. The better the closing line on the team, the less extreme it performed, with a correlation of +0.35. Sporting Index, with their more conservative line, correlated only at +0.11.

Part of the reason the correlations are so high is because that was the year random luck balanced out so much more than usual. If teams were moving all over the place in the standings for random reasons, that would tend to hide the bookmakers' tendency to rate the teams too extreme. 

... Finally, we come to 2015-16. Now, we see what looks like very strong evidence of overconfidence. For the Pinnacle closing line, the correlation between estimate and overestimate is +.46. The other bookmakers are even higher, at +.52 and +.50.

Much of that comes from two teams. First, and most obviously, Leicester City, predicted at 40.5 points but actually winding up at 81. Second, Chelsea, forecasted the best team in the league at 83 points, but finishing with only 50.

These don't really seem to fit the narrative of "the bookies know who the good and bad teams are, but just tend to overestimate their goodness and badness." But, they kind of do fit the narrative. Favorites like Chelsea are occasionally going to have bad years, so you're going to have an occasional high error. But, that error will be even higher if you overestimated them in the first place.


OK, there's the seven sets of numbers we got from James' and Simon's data. What can we conclude?

Well, the question I wanted to answer was: how much are the bookmakers typically off in estimating team talent? Our answer, from #5: about 8 points.

But ... I'm not that confident. These are three weird seasons. Last  year, we have Leicester City and their 5000:1 odds. The season before, we have "better than speed of light" predictions, meaning luck cancelled out. And, two years ago, as we saw in #1, we had a lot more great and awful teams than the other two seasons, which suggests that 2013-14 might be an outlier as well.

I'd sure like to have more seasons of data, maybe a decade or so, to get firmer numbers. For now, we'll stick with 8 points as our estimate.

An eight-point standard error means that, typically, one team per season will be mispriced by 16 points or more. That's not necessarily exploitable by bettors. For one thing, bookmaker prices match public perception, so it's hard to be the one genius among millions who sees the exact one team that's mispriced. For another thing, some of what I'm calling "talent" is luck of a different kind, in terms of injuries or players learning or collapsing.

We still have the case that Leicester City was off by around 40 points. That's 5 SDs if you think it was all talent. It's also 5 SDs if you think it was all luck. 

The "maximum likelihood," then, if you don't know anything about the team, would be if it were 2.5 SDs of each. The odds of that happening are about 1 in 13,000 (1 in 26,000 for each direction). 

My best guess, though, is to trust the bookmakers' current odds of about 30:1 as an estimate of what Leicester City should have been. How do we translate 30:1 into expected points? As it turns out, Liverpool was 28:1, with an over/under of 66. So let's use 66 points as our true talent estimate.

Under that assumption, Leicester City beat their talent by 15 points of luck (81 minus 66), or a bit less than 2 SD. And their assumed true talent of 66 points beat the bookmakers' estimate of 40 by 26 points, which is 3.25 SD.

That seems much more plausible to me. 

Becuase ... I think it's reasonable to think that luck errors are normally distributed. But I don't think we have any reason to believe that human errors, in estimating team talent, also follow a normal distribution. It seems to me that Leicester City could be a black swan, one that just confounded the normal way bettors and fans thought about performance. They may have been a Babe Ruth jumping into the league -- someone who saw you could win games by breaking the assumptions that led to the typical distribution of home runs.

So, when we see that Leicester was 3.25 SD above the public's estimate of their true talent ... I'm not willing to go with the usual probability of a 3.25 SD longshot (around 1 in 1700). I don't know what the true probability is, but given the "Moneyball" narrative and the team's unusual strategy, I'd suspect those kinds of errors are more common than the normal distribution would predict.

Even if you disagree ... well, with 20 teams, a 1 in 1700 shot comes along every 85 years. It doesn't seem too unreasonable to assume we just saw the inevitable "hundred year storm" of miscalculation.

And, either way, the on-field luck you have to assume -- 15 points -- is less than two standard deviations, which isn't that unusual at all.

So that's my best guess at how you can reasonably get Leicester City to 81 points. 

Labels: , , , , , , ,

Monday, May 23, 2016

How much of Leicester City's championship was luck?

How much of Leicester City's run to the Premier League championship was just luck? I was curious to get a better gut feel for how random it might have been, so I wrote a simulation. 

Specifics of the simulation are in small font below. The most important shortcoming, I think, was that I kept teams symmetrical, instead of creating a few high-spending "superteams" like actually exist in the Premier League (Chelsea, Manchester United, Arsenal, etc.). Maybe I'll revisit that in a future post, but I'll just go with it as is for now.


Details: For each simulated season, I created 20 random teams. I started each of them with a goal-scoring and goal-allowing talent of 1.35 goals per game (about what the observed figure was for 2015-16). Then, I gave each a random offensive and defensive talent adjustment, each with mean zero and SD of about 0.42 goals per game. For each season, I corrected the adjustments to sum to zero overall. I played each game of the season assuming the two teams' adjustments were additive, and used Poisson random variables for goals for and against. I didn't adjust for home field advantage. 


At the beginning of the season, Leicester City was a 5000:1 longshot. What kind of team, in my simulation, actually showed true odds of 5000:1? We can narrow it down to teams with a goal differential (GD) talent of -4 to -9 for the season. In 500,000 random seasons, here's how many times those teams won:

tal   #tms   ch  odds
 -9  166135  20  8307
 -8  168954  25  6758
 -7  171272  26  6587
 -6  173327  22  7879
 -5  175017  53  3302
 -4  177305  61  2907
    1032010 207  4986

In 500,000 seasons of the simulation, 1,032,010 teams had a GD talent between -3 and -9. Only 207 of them won a championship, for odds of 4,985:1 against, which is close to the 5000:1 we're looking for. 

Even half a million simulated seasons isn't enough for randomness to even out, which is why the odds don't decrease smoothly as the teams get better. I'll maybe just go with a point estimate of -8. In other words, for Leicester City to be a 5000:1 shot to win the league, their talent would have to be such that you'd expect them to be outscored by 8 goals over the course of the 38-game season. Maybe it might be 7 goals instead of 8, but probably not 6 and probably not 9.  (I guess I could run the simulation again to be more sure.)


Leicester City actually wound up outscoring their opponents by 32 goals last year. Could that be luck? What's the chance that a team that should be -8 would actually wind up at +32? That's a 40 goal difference -- Leicester City would have had to be lucky by more than a goal a game.

The SD of goal differential is pretty easy to figure out, if you assume goals are Poisson. Last season, the average game had 1.35 goals for each team. In a Poisson distribution, the variance equals the mean, so, for a single game, the variance of goal differential is 2.70. For the season, multiply that by 38 to get 102.6. For the SD, take the square root of that, which is about 10.1. Let's just call it 10.

So, a 40-goal surprise is about four SDs from zero. Roughly speaking, that's about a 1 in 30,000 shot.


If we were surprised that Leicester City won the championship, we should be even *more* surprised that they went +32. In fact, we should be around six times more surprised!

Why are the "+32" odds so much worse than the "championship" odds? Because, on those rare occasions when a simulated -8 team wins the championship, it usually does it with much less than a +32 performance. Maybe it goes +20 but gets "Pythagorean luck" and winds lots of close games. Maybe it goes +17 but the other teams have bad luck and it squeaks in.

If you assume that a team that actually scores +32 in a season has, say, a 3-in-10 chance of winning the championship, then the odds of both things happening -- a -8 talent team going +32 observed, and that being enough to win -- is 1 in 100,000. Well, maybe a bit less, because the two events aren't completely independent.


The oddsmakers have priced Leicester City at around 25:1 for next season. That's a decent first guess for what they should have been this year.

Except ... in retrospect, Leicester should probably have been even better than 25:1 this season (you'd expect them to decline in talent next year -- they have an older-than-average team, they may lose players in the off-season, and other teams should catch on to their strategy). On the other hand, MGL says oddsmakers overcompensate for unexpected random events that don't look random. 

Those two things kind of cancel out. But, commenter Eduardo Sauceda points out that bookmakers build a substantial profit margin into a 20-way bet, so let's lower last season's "true" odds to 35:1, as an estimate.

According to the simulation, for a team to legitimately be a 35:1 shot, its expected goal differential for the season would have to be around +16.

Taking all this at face value, we'd have to conclude:

1. The bookies and public thought Leicester City was a -8 talent, when, in reality, it was a +16 talent. So, they underestimated the club by 24 goals.

2. Leicester City outperformed their +16 talent by 16 goals.

3. And, while I'm here ... the simulation says a team with a +32 GD averages 74 points in the standings. Leicester wound up at 81 points. So, maybe they were +7 points in Pythagorean luck.  


One thing you notice, from all this, is how difficult it is to set good odds on longshots, when you can't estimate true talent well enough.

Suppose you analyze a team as best you can, and you conclude that they should be a league average team, based on everything you know about their players and manager. (I'm going to call them a ".500 team," which ignores, for now, the Premier League scoring asymmetry of three points for a win and one point for a draw.)

You run a simulation, and you find that a .500 team wins the championship once every 770 simulated seasons. If the simulation is perfect, can you just go and set odds of 769:1, plus vigorish?

Not really. Because you haven't accounted for the fact that you might be wrong that Everton is a .500 team. Maybe they're a .450 team, or a .600 team, and you just didn't see it. 

But, isn't there a symmetry there? You may be wrong that they're exactly average in talent, but if your analysts' estimate is unbiased, aren't they just as likely to be -8 as they are to be +8? So, doesn't it all even out?

No, it doesn't. Because even if the error in estimating talent is symmetrical, the resulting error in odds is not. 

By the simulation, a team with .500 talent is about 1 in 940 to win the championship. But, what if half the time you incorrectly estimate them at -8, and half the time you incorrectly estimate them at +8?

By my simulation, a team with -8 GD talent is 1 in 6,758 to win. A team with +8 talent is 1 in 157. The average of those two is not 1 in 940, but, rather, 1 in 307. 

If you're that wildly inaccurate in your talent evaluations, you're going to be offering 939:1 odds on what is really only a 307:1 longshot. Even if you're conservative, going, say, 600:1 instead of 939:1, you're still going to get burned.

This doesn't happen as much with favorites. In my simulation, a +30 team was 1 in 5.4. The average of a +22 team and a +38 team is 1 in 4.7. Not as big a difference. Sure, it's probably still enough difference to cost the bookmakers money, but I bet the market in favorites is competitive enough that they've probably figured out other methods to correct for this and get the odds right.


Anyway, the example I used had the bookies being off by exactly 8 goals. Is that reasonable? I have no idea what the SD of "talent error" is for bookmakers (or bettors' consensus). Could it be as high as 8 goals? 

For the record, the calculation of SD(talent) for 2014-15 (the season before Leicester's win), using the "Tango method," goes like this:

SD(observed) = 22.3 goals
SD(luck)     = 10   goals
SD(talent)   = 19.9 goals

For a few other seasons I checked:

2015-16  SD(talent) = 19.9
2013-14  SD(talent) = 27.8
1998-99  SD(talent) = 18.3 

In MLB, the SD of talent is about 9 wins. How well, on average, could you evaluate a baseball team's talent for the coming season? Maybe, within 3 wins, on average? That's a third of an SD.

In the Premier League, a third of an SD is about 6 goals. But evaluation is harder in soccer than in baseball, because there are strategic considerations, and team interactions make individual talent harder to separate out. So, let's up it to 9 goals. Offsetting that, the public consensus for talent -- as judged by market prices of players -- reduces uncertainty a bit. So, let's arbitrarily bring it back down to 8 goals. 

That means ... well, two SDs is 16 goals. That means that in an average year, the public overrates or underrates one team's talent by 16 goals. That seems high -- 16 goals is about 10 points in the standings. But, remember -- that's just talent! If luck (with an SD of 10 goals) goes the opposite direction from the bad talent estimate, you could occasionally see teams vary from their preseason forecast by as many as 36 or 46 goals.

Does that happen? What's the right number? Anyone have an idea? At the very least, we now know it's possible once in a while, to be off by a lot. In this case, it looked like everyone underestimated the Foxes by (maybe) 24 goals in talent.


In light of all this, bookmaker William Hill announced that, next year, they will not be offering any odds longer than 1,000:1. When I first read that, I thought, what's the point? If they had offered a thousand to one on Leicester City, they still would have lost a lot of money, if the true odds were 35:1.

But ... now I get it. Maybe they're saying something like this: "A Premier League team with middle-of-the-road talent -- one that you'd expect to score about as many goals as it allows -- has about a 1 in 1,000 chance of winning the championship. We're not confident enough that we can say, of any bad team, that they can't change their style of play to become average, or that they haven't improved to average over the off-season, or that they've been a .500 team all along but we've just been fooled by randomness. So, we're never again going to set odds based on an evaluation that a team's talent is significantly worse than average, because the cost of a mistake is just too high."

That makes a certain kind of sense. And the logic makes me wonder: were the odds on extreme longshots always strongly biased in bettors' favor, but nobody realized it until now?

(My previous post on the Leicester City is here.)

Labels: , , ,