Tuesday, September 01, 2015

Consumer Reports on auto insurance, part IV

(Previous posts: part I; part II; part III)


Consumer Reports' biggest complaint is that insurance companies set premiums by including criteria that, according to CR, don't have anything to do with driving. The one that troubles them the most is credit rating:

"We want you to join forces with us to demand that insurers -- and the regulators charged with watching them on ouir behalf -- adopt price-setting practices that are more meaningfully tethered to how you drive, not to who they think you are. ..."

"In the states where insurance companies don't use credit information, the price of car insurance is based mainly on how people actually drive and other factors, not some future risk that a credit score 'predicts'. ..."

"... an unfair side effect of allowing credit scores to be used to set premium prices is that it effectively forces customers to dig deeper into their pockets to pay for accidents that haven't happened and may never happen."

----

Well, you may or may not agree on whether insurers should be allowed to consider credit scores, but, even if CR's conclusion is correct, their argument still doesn't make sense.

First: the whole idea of insurance is EXACTLY what CR complains about:  to "pay for accidents that haven't happened and may never happen." I mean, that's the ENTIRE POINT of how insurance works -- those of us who don't have accidents wind up paying for those of us who do. 

In fact, we all *hope* that we're paying for accidents that don't happen and may never happen! It's better if we don't suffer injuries, and our car stays in good shape, and our premium stays low. 

Well, maybe CR didn't actually mean that literally. What they were *probably* thinking, but were unable or unwilling to articulate explicitly, is that they think credit scores are not actually indicative of car accident risk -- or, at least, not correlated sufficiently to make the pricing differences fair.

But, I'm sure the insurance industry could demonstrate, immediately, that credit history IS a reliable factor in predicting accident risk. If that weren't true, the first insurance company to realize that could steal all the bad-credit customers away by offering them big discounts!

It's possible, I guess, that CR is right and all the insurance companies are wrong. Since it's an empirical question ... well, CR, show us your evidence! Prove to us, using actual data, that bad-credit customers cause no more accidents than their neighbors with excellent credit. If you can't do that, maybe show us that the bad-credit customers aren't as bad as the insurers think they are. Or, at the very, very least, explain how you figured out, from an armchair thought experiment and without any numbers backing you up, that insurance company actuaries are completely wrong, and have been for a long time, despite having the historical records of thousands, or even millions, of their own customers.

------

Just using common sense, and even without data, it IS reasonable that a better credit rating should predict a lower accident rate, holding everything else equal. You get better credit by paying your bills on time, and not overextending your finances -- both habits that demonstrate a certain level of reliability and conscientiousness. And driving safely requires ... conscientiousness. It's no surprise at all, to me, that credit scores are predictive, to some extent, of future accident claims.

And CR's own evidence supports that! As I mentioned, the article lauds USAA as being the cheapest, by far, of the large insurers they surveyed. 

But USAA markets to only a subset of the population. As Brian B. wrote in the comments to a previous post:


"[USAA insurance is available only to] military and families. So their demographics are biased by a subset of hyper responsible and conscientious professionals and their offspring."

Consumer Reports did, in fact, note that USAA limits its customers selectively. But they didn't bother demanding that USAA raise its rates, or stop unfairly judging military families by "what they think they are" -- more conscientious than average.

-----

Not only does CR not bother mentioning the possibility that drivers with bad credit scores might actually be riskier drivers, they don't even hint that it ever crossed their minds. They seem to stick to the argument that nothing can possibly "predict" future risk except previous driving record. They even put "predict" in scare quotes, as if the idea is so obviously ludicrous that this kind of "prediction" must be a form of quackery.

Except when it's not. In the passage I quoted at the beginning of this post, they squeeze in a nod to "other factors" that might legitimately affect accident risk. What might those factors be? From the article, it seems they have no objection to charging more to teenagers. Or, to men. They never once mention the fact that female drivers pay less than males -- which, you'd think, would be the biggest, most obvious non-driving factor there is.

CR demands that I be judged "not by who the insurance companies think I am!" Unless, of course, I'm young and male, in which case, suddenly it's OK.

Why is it not a scandal that I pay more just for being a man? I may not be the aggressive testosterone-fueled danger CR might "think I am."  If I'm actually as meek as the average female, the insurer is going to "profit from accidents I may never have!"

------

I suspect they're approaching the question from a certain moral standard, rather than empirical considerations of the actual risk. It just bugs them that the big, bad insurance companies make you pay more just for having worse credit. On the other hand, men are aggressive, angry, muscle-car loving speeders, and it's morally OK for them to get punished. As well as young people, who are careless risk-takers who text when they drive.

A less charitable interpretation is that CR is just jumping to the conclusion that higher prices are unjustified, even when based on statistical risk, when they affect "oppressed" groups, like the poor -- but OK when they favor "oppressor" groups, like men. (Recall that CR also complained about "good student" discounts because they believe those discounts benefit wealthier families.)

A more charitable interpretation might see CR's thinking as working something like this:

-- It's OK to charge more to a certain group where it's obvious that they generally have a higher risk. Like, teenage drivers, who don't have much experience. Common sense suggests, of course they get into more accidents.

-- Higher rates are like a "punishment," and it's OK, and even good, to punish people who do bad things. People who have at-fault accidents did something bad, so their rates SHOULD go up, to teach them a lesson! As CR says,

"In California, the $1,188 higher average premium our single drivers had to pay because of an accident they caused is a memorable warning to drive more carefully. ... In New York, our singles received less of a slap, only $429, on average."

-- It's OK for men to pay more than women because psychologists have long known that men are more aggressive and prone to take more risks.

-- But it's *not* OK to charge more for someone in a high-risk group when (a) there's no proof that they're actually, individually, a high risk, and (b) the group is a group that CR feels has been unfairly disadvantaged already. Just because someone has bad credit doesn't mean they're a worse driver, even if, statistically, that group has more accidents than others. Because, maybe a certain driver has bad credit because he was conned into buying a house he couldn't afford. First, he was victimized by greedy bankers and unscrupulous developers ... now he's getting conned a second time, by being gouged for his auto policy, even though he's as safe as anyone else!


If CR had actually come out and said this explicitly, and argued for it in a fair and unbiased fashion, maybe I would change my mind and come to see that CR is right. But ... well, that doesn't actually seem to be what CR is arguing. They seem to believe that their plausible examples of bad credit customers with low risk are enough to prove that the overall correlation must be zero!

When a certain model of car requires twice as many repairs as normal, CR recommends not buying it. But when a certain subset of drivers causes twice as many accidents as average, CR not only suggests we ignore the fact -- they even refuse to admit that it's true!

------

Here's a thought experiment to see how serious CR is about considering only driving history.

Suppose an insurer decided to charge more for drivers who don't wear a helmet when riding a bicycle, based on evidence that legitimately shows that people who refuse to wear bicycle helmets are more likely to refuse to wear seatbelts.

But, they note, it's not a perfect correlation. I, for instance, am an exception. I don't wear a bicycle helmet, but I wouldn't dream of driving without a seatbelt (and I might even be scared to drive a car without airbags). 

Would CR come to my defense, demanding that my insurer stop charging me extra?  Would they insist they judge me by how I drive, not by "who they think I am" based on my history of helmetlessness?

I doubt it. I think they'd be happy that I'm being charged more. I think it's about CR judging which groups "deserve" higher or lower premiums, and then rationalizing from there.

(If you want to argue that bicycling is close enough to driving that this analogy doesn't work, just substitute hockey helmets, or life jackets.)

------

I'm not completely unsympathetic to CR's position. They could easily make a decent case.  They could say, "look, we know that drivers with bad credit cause more accidents, as a group, than drivers with good credit. But it seems fundamentally unfair, in too many individual cases, to judge people by the characteristics of their group, and make them pay twice as much without really knowing whether they fit the group average."

I mean, if they said that about race, or religion, we'd all agree, right? We'd say, yeah, it DOES seem unfair that a Jewish driver like Chaim pays less (or more) than a Muslim driver like Mohammed, just because his group is statistically less (or more) risky. 

But, what if it's actually the case that, statistically, one group causes more accidents than the other? We tell the insurance companies, look, it's not actually because of religion that the groups are different. It must be just something that correlates to religion, perhaps by circumstance or even coincidence.  So, stop being so lazy.  Instead of deciding premiums based on religion, get off your butts and figure out what's actually causing the differences! 

Maybe the higher risk is because of what neighborhoods the groups tend to live in, that some neighborhoods have narrower streets and restricted sightlines that lead to more accidents. Shouldn't the insurance company figure that out, so that if they find that Chaim (or Mohammed) actually lives in a safer neighborhood, they can set his premium by his actual circumstances, instead of his group characteristics, which they will now realize don't apply here?  That way, fewer drivers will be stuck paying unfairly high or low premiums because of ignorance of their actual risk factors.

If that works for religion, or race, it should also work for credit score. Can't the insurance companies do a bit more work, and drill down a bit more, to figure out who has bad credit out of irresponsibility, and who has bad credit because of circumstances out of their control?

Yes! And, I'd bet, the insurance companies are already doing that! Their profits depend on getting risk right, and they can't afford to ignore anything that's relevant, lest other insurers figure it out first, and undercut their rates.

And CR actually almost admits that this is happening. Remember, the article tells us that the insurers aren't actually using the customer's standard credit score -- they're tweaking parts of it to create their own internal metric. CR tells us that to complain about it -- it's arbitrary, and secret! -- but it might actually be the way the insurers make premiums more accurate, and therefore fair.  It might be the way insurers make it less likely that a customer will be forced to pay higher premiums for "accidents that may never happen."

-----

But I don't think CR really cares that premiums are mathematically fair. Their notion of fairness seems to be tied to their arbitrary, armchair judgments about who should be paying what. 

I suspect that even if the insurance companies proved that their premiums were absolutely, perfectly correlated with individual driving talent, CR would still object. They don't have a good enough understanding of risk -- or a willingness to figure it out.

A driver's expected accident rate isn't something that's visible and obvious. It's hard for anyone but an actuary to really see that Bob is likely to have an accident every 10 years, while Joe is likely to have an accident every 20. To an outsider, it looks arbitrary, like Bob is getting ripped off, having to pay twice as much as Joe for no reason. 

The thing is: some drivers really ARE double the risk. But, because accidents are so rare, their driving histories look identical, and there doesn't seem to be any reason to choose between them. But, often, there is.

If you do the math, you'll see that a pitcher who retires batters at a 71% rate is at more than double the "risk" of pitching a perfect game than a pitcher with only a 69% rate. But, in their normal, everyday baseball card statistics, they don't look that much different at all -- just a two percentage point difference in opposition OBP.

I think a big part of the problem is just that luck, risk, and human behavior follow rules that CR isn't willing to accept -- or even try to understand.



(to be continued)


Labels: ,

Wednesday, August 26, 2015

Consumer Reports on auto insurance, part III

(This is part III.  Part I is here; part II is here.)

------

As part of its auto insurance research, Consumer Reports says they "analyzed more than 2 billion car insurance price quotes."  

That number seems reasonable. CR looked at all 33,419 general US ZIP codes. Multiply that by the 20 demographic groups they considered, then up to 19 different insurance companies per state, and you're up to about 10 million. That leaves around 200 different combinations of variations for each (credit rating, accident history, speeding tickets, etc).

In practical terms, how do you arrange to get two billion quotes? Even if you can get 20 at a time from a website, typing in all that information would take forever. Even at one quote per second, two billion quotes would take 63 years. Or, just one year, if CR had 63 staffers doing the work.

Well, it's much easier than that. CR reports that in most states, insurers have to file their mathematical pricing formulas -- their actuarial trade secrets, by the sound of it -- with state regulators. A private company, Quadrant Information Services, has access to all those formulas, somehow, and sells its services to clients like CR. So, two billion quotes was probably just a matter of contracting with Quadrant, who would just query its database and send along the results.

I always wondered how Progressive was able to get competitors' quotes, in real time, to compare alongside their own. Now I know.

----

CR says those quotes are the real deal, the actual prices policyholders pay:


"Under the state laws that regulate automobile insurance, carriers are required to adhere to the prices generated by their public rate filings. So the premiums we obtained from Quadrant are what each company legally obligates itself to charge consumers."

But ... I'm skeptical. If those quotes are really the actual premiums paid, that would have to contradict some of the issues CR raises elsewhere in the article.

For instance, one thing they're upset about is that some companies practice "price optimization." That's a euphemism for, jacking up the price for certain customers, the ones the company thinks won't complain. For instance, CR says, some insurers might bump your premium if "you're sticking with Verizon FIOS when DirectTV [sic] might be cheaper."

Except ... how can that be possible, if it's all done by formula? When you ask Progressive for quotes, they don't ask you who your TV provider is (or how many iPhones or beers you've purchased, which are other criteria CR mentions). 

Second, CR mentions that each insurer creates their own proprietary credit score, "very different from the FICO score you might be familiar with."  But, again, the formulas can't be taking that into account, can they? CR requested premiums for "poor," "good," and "excellent" credit scores ... but how would they know which was which, without knowing each insurer's proprietary formula?

Third, and also in the context of price optimization, they advise,


"... don't be shy about complaining a little more [to show you're not a pushover for next time]."

But if those formula prices are fixed and non-negotiable, how will complaining help? Unless "number of times having complained" is in the formulas filed with regulators.  

------

So, it doesn't make sense that the entire pricing system is encapsulated in the online (or Quantum) pricing formulas.

So, what's actually going on? 

Maybe what's happening is that the companies aren't obligated to charge EXACTLY those formula prices -- maybe they're obligated to charge those prices OR LESS. 

Kind of like those prices you see on the back of your hotel room door, the maximum that room would ever go for, like (I imagine) Cooperstown on induction weekend. Or, they're like the MSRP on cars, where you can negotiate down from there. Or, maybe they're like car repair estimates, where, if they don't know for sure how much it will cost, they give you the worst-case scenario, because they can lower their estimate much easier than they can raise it.

If that's what's going on, that would easily and plausibly explain the pricing anomalies that CR found.

Take, for instance, the one that surprised me most -- the finding that some companies discriminate against long-term customers. As CR puts it, "some insurers salute your allegiance with a price hike." 

In the state of Washington, the article says, almost half the insurers surveyed didn't offer any discount at all to customers who had been with them for at least 15 years. That doesn't sound right, but, get this: not only did Geico not offer a discount, they actually charged their loyal customers MORE: almost $700 more, according to the article.

That smells a bit fishy to me.  But here's one that smells ... well, I don't have a good metaphor.  Maybe, like a rotting pile of fish carcasses in your driveway?


"Geico Casualty gave us whiplash with its $3,267 loyalty penalty in New Jersey and its $888 discount just across the state line in New York for longtime customers."

Well, that's just not possible, right? Overcharging loyal New Jersey customers THREE THOUSAND DOLLARS A YEAR? That would triple the typical price, wouldn't it? 

When CR came up with that result, didn't any of their staff think, "WTF, can that really be true?" At Consumer Reports, they must have some weird priors. I know they think insurance companies are out to gouge consumers for everything they can, but ... this is too big to make any sense at all, even under CR's own assumptions. 

------

I'd wager that those particular automated quotes aren't at all representative of what those particular customers actually pay. 

Insurance companies don't ask their long-term policyholders to go online and renew anonymously. They send renewal quotes directly. Which they have to, if premiums are tailored for characteristics that aren't used in online applications, like those details from the customer's credit record, like beer purchases.

What CR found could just be a Geico company practice, of not bothering to produce competitive "formula" quotes for established customers who won't be using them anyway. 

I don't know if that's actually the right answer, but, whatever the true explanation is ... well, I'd bet a lot of money that if the magazine surveyed real long-term Geico policyholders in New Jersey, and asked about their premiums, CR would find that "loyalty penalty" doesn't actually exist.  Or at least, not at anything within an order of magnitude of $3,267.

I might be wrong. Feel free to tell me what I'm missing.




(to be continued)





Labels: ,

Thursday, August 20, 2015

Consumer Reports on auto insurance, part II

(Part I is here.)

------

Last post, I linked to an article showing auto insurance profit margins were very low, less than 10 percent of premiums. And, I wondered, if that's the case, how is it possible that CR reports such a large difference in pricing between companies?

In its research, Consumer Reports got quotes for thousands of different drivers -- that is, different combinations of age/sex/and ZIP code -- from five different companies. The average premiums worked out to:  

$1,540 Allstate
$1,414 Progressive
$1,177 Geico
$1,147 State Farm
$  817 USAA

How is it possible that Allstate charged 34% more than Geico, but still made less profit (and only 5.3% profit, at that)? How does USAA stay in business charging a third what the others charge, when those others are barely in the black, as is?

For anyone to take those numbers seriously, Consumer Reports has to explain this apparent impossibility. Otherwise, the only reasonable conclusion is that something went badly wrong with CR's analysis or methodology.

Which I think is what happened. I'm going to take a guess at what's actually going on. I don't know for sure, but I'd be willing to bet it's pretty close.

------

With margins so low, and competition so tight, companies really, really have to get their risk estimates right. If not, they're in trouble.

Let's make some simple assumptions, to keep the analysis clean. First, suppose all customers shop around and always choose the lowest-priced quote. 

Second, suppose that the average teenage driver carries $3,000 in annual risk -- that is, the average teenager will cause $3,000 worth of claims each year. 

Now, we, the blog readers, know the correct number is $3,000 because we just assumed it -- we gave ourselves a God's-eye view. The insurance companies don't have that luxury. They have to estimate it themselves. That's hard work, and they're not going to be perfect, because there's so much randomness involved. (Also, they're all using different datasets.)

Maybe the actuaries at Progressive come up with an estimate of $3,200, while Geico figures it's $2,700. (I'll ignore profit to keep things simple -- if that bothers you, add $100 to every premium and the argument will still work.)

What happens? Every teenager winds up buying insurance from Geico. And Geico loses huge amounts of money: $300 per customer, as the claims start to roll in. Eventually, Geico figures out they got it wrong, and they raise their premiums to $3,000. They're still the cheapest, but now, at least, they're not bleeding cash.

This goes on for a bit, but, of course, Progressive isn't sitting still. They hire some stats guys, do some "Insurance Moneyball," and eventually they make a discovery: good students are better risks than poor students. They find that good students claim $2,500 a year, while the others claim $3,500.

Progressive changes their quotes to correspond to their new knowledge about the "driving talent" of their customers. Instead of charging $3,200 to everyone, they now quote the good students $2,500, and the rest $3,500, to match their risk profiles. That's not because they like the pricing that way, or because they think good students deserve a reward ... it's just what the data shows, the same way it shows that pitchers who strike out a lot of batters have better futures than pitchers who don't.

Now, when the good students shop around, they get quotes of $2,500 (Progressive) and $3,000 (Geico). The rest get quotes of $3,500 (Progressive) and $3,000 (Geico).

So, what happens? The good students go to Progressive, and the rest go to Geico. Progressive makes money, but Geico starts bleeding again: they're charging $3,000 to drivers who cost them $3,500 per year.

Geico quickly figures out that Progressive knows something they don't -- that, somehow, Progressive figured out which teenage customers are lower risk, and stole them all away by undercutting their price. But they don't know how to tell low risk from high risk. They don't know that it has to do with grades. So, Geico can't just follow suit in their own pricing.

So what do they do? They realize they've been "Billy Beaned," and they give up. They raise their price from $3,000 to $3,500. That's the only way they can keep from going bankrupt.

The final result is that, now, when a good student looks for quotes, he sees

$2,500 Progressive
$3,500 Geico

When a bad student looks for quotes, he sees

$3,500 Progressive
$3,500 Geico

Then Consumer Reports comes along, and gets a quote for both. When they average them for their article, they find

$3,000 Progressive
$3,500 Geico

And they triumphantly say, "Look, Progressive is 14 percent cheaper than Geico!"

But it's not ... not really. Because, no good student actually pays the $3,500 Geico quotes them. Since everyone buys from the cheapest provider, Geico's "good student" quote is completely irrelevant. They could quote the good students a price of $10 million, and it wouldn't make any difference at all to what anyone paid.

That's why averaging all the quotes, equally weighted, is not a valid measure of which insurance company gives the best deal.

------

Want a more obvious example?

Company X:
------------------------------
$1,300      25-year-old male
$1,000      30-year-old female
$1 million  16-year-old male

Company Y:
------------------------------
$1,400      25-year-old male
$1,100      30-year-old female
$4,000      16-year-old male

By CR's measure, which is to take the average, company Y is much, much cheaper than company X: $2,166 to $334,100. But in real life, which company is giving its customers greater value? Company X, obviously. NOBODY is actually accepting the $1,000,000 quote. In calculating your average, you have to give it a weight of zero. 

Once you've discarded the irrelevant outlier, you see that, contrary to what the overall average suggested, company X is cheaper than company Y in every (other) case.

------

Want a non-insurance analogy?

"Darget" competes with Target. Their prices are all triple Target's, which is a big ripoff -- except that every product that starts with "D" sells for $1. By a straight average of all items, Darget is almost 300% as expensive as Target. Still, at any given time, Darget has twenty times the number of customers in the store, all crowding the aisles buying diamonds and diapers and DVD players. 

When evaluating Darget, the "300%" is irrelevant. Since everyone buys deodorant at Darget, but nobody buys anti-perspirant at Darget, it makes no sense to average the two equally when calculating a Darget price index.

-----

And I suspect that kind of thing is exactly what's happening in CR's statistics. Allstate *looks* more expensive than USAA because, for some demographics of customer, they haven't studied who's less risky than whom. They just don't know. And so, to avoid getting bled dry, they just quote very high prices, knowing they probably won't get very many customers from those demographics.

I don't know which demographics, but, just to choose a fake example, let's say, I dunno, 75-year-olds. USAA knows how to price seniors, how to figure out the difference between the competent ones and the ones whose hands shake and who forget where they are. Allstate, however, can't tell them apart. 

So, USAA quotes the best ones $1,000, and the worst ones $5,000. Allstate doesn't know how to tell the difference, so they have to quote all seniors $5,000, even the good ones. 

What Allstate is really doing is telling the low-risk seniors, "we are not equipped to recognize that you're a safe driver; you'll have to look elsewhere."  But, I'm guessing, the quote system just returns an uncompetitively high price instead of just saying, "no thank you."

----

Under our assumption that customers always comparison shop, it's actually *impossible* to compare prices in a meaningful way. By analogy, consider -- literally -- apples and oranges, at two different supermarkets.

Store A charges $1 an apple,  and $10 an orange.
Store B charges $2 an orange, and  $5 an apple.

Who's cheaper overall? Neither! Everyone buys their apples at Supermarket A, and their oranges at Supermarket B. There's no basis for an apples-to-apples comparison.

We *can* do a comparison if we relax our assumptions. Instead of assuming that everyone comparison shops, let's assume that 10 percent of customers are so naive that they by all their fruit at a single supermarket. (We'll also assume those naive shoppers eat equal numbers of apples and oranges, and that they're equally likely to shop at either store.)

Overall, combining both the savvy and naive customers, Store A sells 100 Apples and 10 Oranges for a total of $200. Store B sells 100 Oranges and 10 Apples for a total of $250.

Does that mean Store B is more expensive than Store A? No, you still can't compare, because store B sells mostly oranges, and store B sells mostly apples.

To get a meaningful measure, you have to consider only the 10 percent of customers who don't comparison shop. At store A, they spend $11 for one of each fruit. At store B, they spend $7 for one of each fruit.

Now, finally, we see that store B is cheaper than store A!

But:

1. To be able to say that, we had to know that the naive customers are evenly split both on the fruit they buy, and the stores they go to. We (and CR) don't know the equivalent statistics in the auto insurance case.

2. If "Store B is cheaper" it's only for those customers who don't shop around. For the 90 percent who always accept only the lowest price, the question still has no answer. CR wants us to be one of those 90 percent, right? So, their comparison is irrelevant if we follow their advice!

3. All CR's analysis tells us is, if we're completely naive customers, getting one quote at random from one insurance company, then blindly accepting it ... well, in that case, we're best off with USAA.

But, wait, even that's not true! It's only true if we're exactly, equally likely to be any one of CR's thousands of representative customers. Which we're not, since they gave ZIP code 10019 in Manhattan (population 42,870) equal weight with ZIP code 99401 in Alaska (population 273).

------

CR's mistake was to weight the quotes equally, even the absurdly high ones. They should have weighted them by how often they'd actually be accepted. Of course, nobody actually has that information, but you could estimate it, or at least try to. One decent proxy might be: consider only quotes that are within a certain (small) percentage of the cheapest. 

Also, you want to weight by the number of drivers in the particular demographic, not treat each ZIP code equally. You don't want to give a thousand 30-year-old Manhattanites the same total weight as the three 80-year-olds in a rural county of Wyoming.

By adjusting for both those factors, CR would be weighting the products by at least a plausible approximation of how often they're actually bought. 

------

Anyway, because of this problem -- and others that I'll get to in a future post -- most of CR's findings wind up almost meaningless. Which is too bad, because it was a two-year project, and they did generate almost a billion quotes in the effort. And, they're not done yet -- they promise to continue their analysis in the coming months. Hopefully, their coming analysis will be more meaningful.



(to be continued)





Labels: ,

Tuesday, August 18, 2015

Consumer Reports on auto insurance

The September issue of Consumer Reports (CR) magazine features a "Special Investigation" called "The Truth About Car Insurance." In their investigation, CR discovers a number of "hidden truths" they're concerned about.

CR's biggest beef is that auto insurance companies practice "unfair discrimination" by basing premiums on other factors than just driving record -- credit history being the one they deem most inequitable. They demand that this practice be stopped, by legislative action if necessary. They ask readers to sign a petition to insurance companies and state regulators: "price me by how I drive, not by who you think I am!"

Reasonable people can disagree on what risk factors are "fair" and which are "unfair," so if Consumer Reports laid out a decent argument, you could treat it as a productive part of an ongoing debate. But ... well, not only does CR seem to have no idea how insurance actually works and how prices are actually set, but they wander all over the place, with scattershot attacks on insurance companies for completely unrelated practices. 

The article reads as if, having already decided that insurance companies are the bad guys, CR just decided to blame them for every aspect of auto insurance that makes them feel unhappy. As a result, the article is pretty much a total mess.

Here are some of the more obvious problems; I'll get to the meat of CR's arguments in a future post.

------

At one point in the article, CR asks, "Which Insurers Charge More Or Less?"  It turns out that, by their calculation, Allstate charges $1,570, while State Farm charges $1,147, and USAA charges only $817. 

Well, that's impossible. Auto insurance margins are tiny. According to this article, which corresponds to what I've read elsewhere, less then 10% of premiums are kept as profit -- the remaining amount goes towards paying claims and covering expenses. 

So, with only single digit margins, at best, how can one company possibly sell insurance 40% cheaper than another?

My first reaction was, well, maybe CR didn't adjust for the type of policyholder. It was probably that Allstate had lots of teenagers, while USAA had lots of Grannies who only drove to church on Sundays.

But, no: CR actually did control for state (and, I think, also ZIP code), age, credit store, and driving record. 

So what's going on? I have an idea, which I'll get to in a future post. For now, my point is only that CR *does* believe some companies charge almost twice as much as other companies, on average, for the exact same risk.

I know that because, in light of the wide range of prices, the CR writers very wisely "recommend you check prices from at least a dozen companies in your state, [to] help you assess whether you have a good deal or it might come up with an even better one." And they recommend that when shopping, you start with the lower-priced companies, like USAA.

OK, fair enough. But, then, they turn around and contradict themselves in the very same article, by endorsing the idea that prices actually *don't* vary.


"'The advertising creates the impression of price competition when there actually isn't any,' says Doug Heller, a consumer advocate and an insurance consultant in California."

Do they not see how this is the exact opposite of everything else they say in the article? With a straight face, they tell us that 

(a) some companies are substantially cheaper than others, but
(b) there is no price competition

How can they possibly believe both of these are true? 

------

CR is also critical of the insurance companies' advertised claims. Geico, famously, says that "15 minutes could save you 15 percent or more on car insurance." But CR's "special investigation" reveals:


"But did you know that the word "could" could also mean "could not" just as easily? When we checked, Geico's state-average premiums [were actually higher than two of their four big competitors]."

Well, yes, CR, now that you mention it, I *did* know that the phrase "could save you" is not the same as "is guaranteed to save you" or "is likely to save you."

Similarly, I also know that "can save" is also not a guarantee, even when uttered by Flo, the pitchwoman for Progressive. But CR warns me anyway:


"... Flo likens the number of bouncy balls -- 500 -- to the number of dollars you can save by switching to Progressive. In fact, she says, 'you could save even more.'"

But that's not fair, CR complains, because of selective sampling. Progressive's calculation was based only on those who actually DID save over the other companies. So,


"... whether a typical shopper would save with Progressive is still an open question."

Well, duh. *Some* people save with Allstate, and *some* people save with Progressive -- exactly the same way that, for instance, some items are cheaper at Target, and others are cheaper at K-Mart. What's the problem? 

What's really funny is how Consumer Reports is shocked -- shocked!! -- that the two companies aren't both cheapest across the board. Presumably, before their "special investigation," CR was willing to believe that, on average, Geico was 15% cheaper than Progressive, while, simultaneously, Progressive was $500 cheaper than Geico!

Reading between the outraged lines, Consumer Reports' argument really goes something like this:

-- the insurance companies advertised that you *could* save.
-- a naive shopper, or careless eight-year old, might interpret that as that "overall, we are cheaper than the competition."
-- if the insurance companies said that, they'd be lying.
-- therefore, the insurance companies are almost liars!

------

Oh, and by the way ... on the inside cover of the very same issue, CR advertises its car buying service by saying that "Buyers have saved an average of $2,990 off MSRP."

And ... yup, in a footnote, they confirm: that figure is based only on those customers who actually bought their vehicle after receiving a quote from CR.

So, it's exactly the same practice for which they rake Progressive over the coals! In fact, it's WORSE. Progressive actually compares the savings to the competition, to the policyholder's current offer. CR compares the savings to MSRP, which is higher than almost anyone would pay after even the briefest negotiation.

------

One of the "hidden truths" headlines in the article is that "promised discounts might not materialize." For instance,


"Some of the discounts that are advertised the most, such as ... installing anti-theft equipment, save very little: just ... $2 per year ..."

Well, OK, fair enough, $2 is indeed "very little." But how much was CR expecting? In five minutes on Google, I found the following statistics:

-- There are 253 million cars and trucks on the road in the USA.
-- There were 721,000 thefts in 2012.
-- The average loss per theft was $6,019.

721K divided by 253MM multiplied by $6K is ... $17. Take into account that not all the stolen vehicles were cars -- some were buses and snowmobiles -- and maybe you're down to $15.  (Probably less, because losses from buses and transport trucks are probably substantially more than the $6019 average. But, never mind.)

So, you save $2 from $15 by installing anti-theft equipment. That corresponds to the idea that an anti-theft device reduces the risk by about 15%. That doesn't sound too unreasonable, does it? (Especially when you consider that most new vehicles are *already* built with anti-theft technology.)

So, if $2 is too low, what percentage does CR think it should be? My guess: they haven't thought about it at all.

-----

CR also objects that insurers give discounts to students who can show proof of good academic performance:


"It's nice that Johnny does his homework, but ... the good-student discount doesn't emphasize factors relating to actual driving behavior. ... 'According to our research, young drivers are inexperienced no matter how good a student they are, and that is their primary risk,' says Ruth Shults, senior epidemiologist at the National Center for Injury Prevention and Control at the Centers for Disease Control and Prevention."

Well, that sounds reasonable, that inexperience is the primary source of teenage driving risk. But ... well, it's also the primary source of teenage insurance costs! Because, the discount for academic performance is relatively small -- only "up to 14 percent," as CR told us just one paragraph earlier.

The article already told us how adding a teenage driver bumps premiums by an average 90 percent. So, even with a good student, premiums still rise by a minimum 63 percent (86% of 1.9, assuming the discount applies to the entire family, not just the student's portion).

That certainly sounds like the premium DOES still emphasize "factors relating to actual driving behavior," exactly as CR demands.

------

Oh, and another problem with the good student discount:


" ... it might reward families with high incomes at the expense of lower-income ones."

Notice that CR is assuming, without argument or acknowledgement, that students in high-income families do better in academics. Why would CR assume that? Are poor people dumb? Maybe we need a petition to force magazines to judge students by their actual academic record, not by "who CR thinks they are."

OK, that was just easy sarcasm. Taking CR's argument more seriously: 

First, a lower price is not a "reward," if the insurance companies are accurately evaluating the risk, and better students actually have fewer accidents than worse students. It's like, pork is cheaper than beef, but that doesn't mean supermarkets are "rewarding" Christians and atheists with lower prices for their meat "at the expense" of Muslims and Jews.

Second: if certain students learn math better, for whatever reason, is it not reasonable to assume they also learn *driving* better, for those same reasons? Shouldn't this have at least occurred to CR?

-----

That one example, I think, encapsulates the problems in CR's thinking. They simply choose not to confront the reality of how risk works. They choose to ignore the idea that you can tell, statistically, who's high risk and who's low risk by factors other than actual accident history. 

I don't know whether they don't understand how insurance works, or whether they just choose to ignore it. For what it's worth, the article says nothing about matching premiums to risk, and does not once quote an expert or actuary on the subject. 

By CR's logic, pitchers should be paid only on their W-L record, even if they had ERAs above 5.00. "Judge them by how they win games, not by who you think they are!"




(to be continued.)

Labels: , , ,

Thursday, August 06, 2015

Can a "hot hand" turn average players into superstars?

Last post, I reviewed a study by Joshua Miller and Adam Sanjurjo that found a "hot hand" effect in the NBA's Three-Point Contest. In addition to finding a hot hand, the authors also showed how some influential previous studies improperly underestimated observed streakiness because of the incorrect way they calculated expectations. 

I agreed that the previous studies were biased, and accepted that the authors found evidence of a hot hand in the three-point contest. But I was dubious that you can use that evidence to assume a hot hand in anything other than a "muscle memory" situation.

Dr. Miller, in comments on my post and follow-up e-mails, disagreed. In the comments, he wrote,


"The available evidence shows big effect sizes. Should we infer the same effect in games, given we have no known way to measure them? It is certainly a justifiable inference."

Paraphrasing Dr. Miller's argument: Since (a) the original "no hot hand" studies were based on incorrect calculations, and (b) we now have evidence of an actual hot hand in real life ... then, (c) we should shift our prior for real NBA games from "probably no hot hand" to "probably a significant hot hand."

That's a reasonable argument, but I still disagree.

------

There are two ways you can define a "hot hand":

1. Sometimes, players have higher talent ("talent" means expected future performance) than other times. In other words, some days they're destined to be "hot," better than their normal selves.

2. When players have just completed a streak of good performance, they are more likely to follow it with continued good performance than you'd otherwise expect.

Call (1) the "hot hand talent" hypothesis, and (2) the "streakiness" hypothesis. Each implies the other -- if you have "good days," your successes will be concentrated among those good days, so you'll look streaky. Conversely, if your expectation is to exhibit streakiness, you must be "better in talent" after a streak than after a non-streak.

I think the two definitions are the same thing, under certain other reasonable assumptions. At worst, they're *almost* the same thing.

However, we can observe (2), but not (1). That's why "hot hand" studies, like Miller/Sanjurjo, have to concentrate on streaks.

----

The problem is: it takes a *lot* of variation in talent (1) to produce just a *tiny bit* of observed streakiness (2). 

Observed streakiness is a very, very weak indicator of a varation in talent. That's because players also go on streaks for a lot of other reasons than that they're actually "hot" -- most importantly, luck.

In the three-point contest study, the authors found an average six percentage point increase in hit rate after a sequence of three consecutive hits, from about 53 percent to 59 percent. As Dr. Miller points out, the actual increase in talent when "hot" must be significantly higher -- because not all players who go HHH are necessarily having a hot hand. Some are average, or even "cold," and wind up on a streak out of random luck. 

If only half of "HHH" streaks are from players truly hot at the time, the true "hot hand" effect would have to be double what's observed, or 12 percentage points.

Well, 12 points is huge, by normal NBA standards. I can see it, maybe, in the context of muscle memory, like the uncontested, repeated shots in the Miller/Sanjurjo study -- but not in real life NBA action.

What if there were a 12-point "hot hand" effect in, say, field goal percentage in regular NBA games? Well, for all NBA positions, as far as I can tell, the difference between average and best is much less than 12 points. That would mean that when an average player is +12 points "hot," he'd be better than the best player in the league. 

Hence my skepticism. I'm willing to believe that a hot hand exists, but NOT that it's big enough to turn an average player into a superstar. That's just not plausible.

------

Suppose you discover that a certain player shoots 60% when he's on a three-hit streak, and 50% other times. How good is he when he's actually hot? Again, he's not "hot" every time he's on a streak, because streaks happen often just by random chance. So, the answer depends on *how often* he's hot. You need to estimate that before you can answer the question. 

Let's suppose we think he's hot, say, 10 percent of the time.

So, to restate the question as a math problem:


"Joe Average is normally a 50 percent shooter, but, one time in ten, he is "hot", with a talent of P percent. You observe that he hits 60% after three consecutive successes. What's your best estimate of P?"

The answer: about 81 percent.

An 81 percent shooter will make HHH about 4.25 times as often as a 50 percent shooter (that's 81/50 cubed). That means that Joe will hit 4.25 streaks per "hot" game for every one streak per "normal" game.

However: Joe is hot only 1/9 as often as he is normal (10% vs. 90%). Therefore, instead of 425 "hot" HHH for every 100 "regular" HHH, he'll have 425 "hot" HHH for every *900* "regular" HHH.

Over 1325 shots, he'll be taking 425 shots with an expectation of 81 percent, and 900 shots with an expectation of 50 percent. 

Combined, that works out to 794-for-1325, which is the observed 60%.

Do you really want to accept that the "hot hand" effect turns an ordinary player into an 81-percent shooter? EIGHTY-ONE PERCENT? 

But that's what the assumptions imply. If you argue that:

-- player X is 50% normally;
-- player X is "hot" 10 percent of the time;
-- player X is expected to hit 60% after HHH

Then, it MUST FOLLOW that

-- player X is 81% when "hot".

To which I say: no way. I say, nobody is an 81% shooter, ever -- not Michael Jordan, not LeBron James, nobody. 

To posit that the increase from 50% to 60% is reasonable, you have to assume that an average player turns into an otherworldly Superman one day in ten, due to some ineffable psychological state called "hotness."  

-----

You can try tweaking the numbers a bit, if you like. What if a player is "hot" 25 percent of the time, instead of 10 percent? In that case,

-- player X is 71% when "hot".

That's not as absurd as 80%, but still not very plausible. What if a player is "hot" fully half the time? Now,

-- player X is 64.6% when "hot". 

That's *still* not plausible. Fifteen points is still superstar territory. Do you really want to argue that half the time a player is ordinary, but the other half he's Michael Jordan? And that nobody would notice without analyzing streaks?

Do you really want to assume that the variation in talent within a single player is wider than the variation of talent among all players?.

-----

Let's go the other way, and start with an intuitive prior for what it might mean to be "hot." My gut says, at most, maybe half an SD of league talent. You can go from 50th to 70th percentile when everything is lined up for you -- say, from the 15th best power forward in the league,  to the 9th best. Does that sound reasonable?

In the NBA context, let's call that ... I have no idea, but let's guess five percentage points.* And let's say a player is "hot" one time in five. 

(* A reader wrote me that five percentage points is a LOT more than half an SD of talent. He's right; my bad. Still, that just makes this part of the argument even stronger.)

So: if one game in five, you were a 55% shooter instead of 50%, what would you hit after streaks?

-- For 1000 "hot" shots, you'd achieve HHH 166 times, and hit 91.3 of the subsequent shots.

-- For 4000 "regular" shots, you'd achieve HHH 500 times, and hit 250 of the subsequent shots.

Overall, you'd be 341.3 out of 666, or 51.25%.

In other words: a hot hand hypothesis that posits a reasonable (but still significant) five-point talent differential expects you're only 1.25 percentage points better after a streak. 

Well, you need a pretty big dataset to make 1.25 points statistically significant. 30,000 attempts would do it: 6000 when "hot" and 24,000 when not hot.*

(* That's using binomial approximation, which underestimates the randomness, because the number of attempts isn't fixed or independent of success rate. But never mind for now.)

And even if you had a sample size that big, and you found significance ... well, how can you prove it's a "hot hand"? It's only 1.25 points, which could be an artifact of ... well, a lot of things other than streakiness.

Maybe you didn't properly control for home/road, or you used a linear adjustment for opponent quality instead of quadratic. Maybe the 1.25 doesn't come from a player being hot one game in five, but, rather, the coach using him in different situations one game in five. Or, maybe, those 20 percent of games, the opposition chose to defend him in a way that gave him better shooting opportunities. 

So, it's going to be really, really hard to prove a "hot hand" effect by studying performance after streaks.

------

But maybe there are other ways to analyze the data.

1. Perhaps you could look at player streaks in general, instead of just what happens in the one particular shot after a streak. That would measure roughly the same thing, but might provide more statistical power, since you'd be looking at what happens during a streak instead of just the events at the end. 

Would that work? I think it would at least give you a little more power. Dr. Miller actually does something similar in his three-point paper, with a "composite statistic" that measures other apsects of a player's sequences.

2. Instead of just a "yes/no" for whether to count a certain shot, you could weight it by the recent success rate, or the length of the streak, or something. Because, intuitively, wouldn't you expect a player to be "hotter" after HHHHHH than HHH? Or, even, wouldn't you expect him to be hotter after HHMHMHHMHHHMMHH than HHH? 

I'm pretty sure that kind of thing has been done before, that there are studies that try to estimate the next shot from the success rate in the X previous shots, or some such.

------

But, you can't fight the math: no matter what, it still takes ten haystacks of "hot hand talent" variation to produce a single needle of "streakiness." There just isn't enough data available to make the approach work. 

Having said that ... there's a non-statistical approach that theoretically could work to prove the existence of a real-life hot hand. 

In his e-mails to me, Dr. Miller said that basketball players believe that some of them are intrinsically streakier than others -- and that they even "know" which players those were. In an experiment in one of his papers, he found that the players named as "streaky" did indeed wind up showing a larger "hot hand" effect in a subsequent controlled shooting test.

If that's the case (I haven't read that paper yet), that would certainly be evidence that something real, and observable, is happening.

And, actually, you don't need an a laboratory experiment for this. Dr. Miller believes that coaches and teammates can sense variations in talent from body language and experience. If that's the case, there must be sportswriters, analysts, and fans who can do this too.

So, here's what you do: get some funding, set up a website, and let people log on while watching live games to predict, in real time, which players are currently exhibiting a hot hand. If even one single forecaster proves to be able to consistently choose players who outperform their averages, you have your evidence.  

-----

I'd be surprised, frankly, if anyone was able to predict significant overachievement in the long run. And, I'd be shocked -- like, heart attack shocked -- if the identified "hot" players actually did perform with "Superman" increases in accuracy. 

As always, I could be wrong. If you think I *am* wrong, that the "hot hand" is even half as significant a factor in real life as it is in the three-point contest, I think this would easily be your best route to proving it.



Labels: , , , ,

Tuesday, July 21, 2015

A "hot hand" is found in the NBA three-point contest

A recent paper provides what I think is rare, persuasive evidence of a "hot hand" in a sporting event.

The NBA Three-Point Contest has been held annually since 1986 (with the exception of 1999), as part of the NBA All-Star Game event. A pair of academic economists, Joshua Miller and Adam Sanjurjo, found video recordings of those contests, and analyzed the results. (.pdf)

They found that players were significantly more likely to make a shot after a series of three hits than otherwise. Among the 33 shooters who had at least 100 shots in their careers, the average player hit 54 percent overall, but 58 percent after three consecutive hits ("HHH").  

(UPDATE: the 58 percent figure is approximate: the study reports an increase of four percentage points after HHH than after other sequences. Because the authors left out some of the shots in some of their calculations (as discussed later in this post), it might be more like 59% vs. 55%, or some such. None of the discussion to follow depends on the exact number.)

The authors corrected for two biases. I'll get to those in detail in a future post, but I'll quickly describe the most obvious one. And that is: after HHH, you'd expect a *lower than normal* hit rate -- that is, an apparent "mean-reverting hand" -- even if results were completely random. 

Why? Because, if a player hit exactly 54 of 100 shots, then, after HHH, the next shot must come out of what remains -- which is 51 remaining hits out of 97 remaining shots. That's only 52.6 percent. In other words, the hit rate not including the "HHH" must obviously be lower than the hit rate including "HHH". 

That might be easier to see if you imagine that the player hit only 3 out of 100 shots overall. In that case, the expectation following HHH must be 0 percent, not 3 percent, since there aren't enough hits to form HHHH!

After the authors corrected for this, and for the other bias they noted, the "hot hand" effect jumped from 4 percentage points to 6. 

------

UPDATE: Joshua Miller has replied to some of what follows, in the comments.  I have updated the post in a couple of places to reflect some of his responses.

------

That's a big effect, a difference of 6 percentage points. Maybe it's easier to picture this way:

Of the 33 players, 25 of them shot better after HHH than their overall rate. 

In other words, the "hot hand" beat the "mean-reverting hand" with a W-L record of 25-8. With the adjustments included, the hot hand jumps to 28-5.

------

Could the result be due to something other than a hot hand? Well, to some extent, it could be selective sampling of players.

In the contest, players shoot 25 attempts per round. To get to 100 attempts, and be included in the study, a shooter has to play at least four rounds in his career.  (By the way, here's a YouTube video of the 2013 competition.)

In any given contest, to survive to the next round, a player needs to do well in the current round. That means that players who got enough attempts were probably lucky early. That might select players who concentrated their hits in early rounds, compared to the late rounds, and create a bit of a "hot hand" effect just from that.

And I bet that's part of it ... but a very small part. Even if a player shot 60/60/50 in successive rounds, just by luck, that alone wouldn't be nearly enough to show an overall effect of 6 percentage points, or even 4, or (I think) even 1.

UPDATE: The authors control for this by stratifying by rounds, Dr. Miller replies.

------

One reason I believe the effect is real is that it makes much more intuitive sense to expect a hot hand in this kind of competition than in normal NBA play.

In each round of the contest, players shoot 5 consecutive balls from the same spot on the court, in immediate succession. That seems like the kind of task that would easily show an effect. It seems to me that a large part of this would be muscle memory -- once you figure out the shot, you just want to do exactly the same thing four more times (or however many balls you have left once you figure it out). 

After those five balls, you move to another spot on the arc for another five balls, and so on, and the round ends after you've thrown five balls from each of five locations. However, even though the locations move, the distances are not that much different, so some of the experience gained earlier might extend to the next set of five, making the hot hand even more pronounced.

There's one piece of the evidence that offers support for the "muscle memory" hypothesis. It turns out that the first two shots in each round were awful. The authors report that the first shot was made only 26 percent of the time, and the second shot only 39 percent. For the remaining twenty-three shots, the average success rate was 56 percent.

That "warm up" time is very consistent with a "muscle memory" hot hand.

-----

In fact, those first two shots were so miserable that the authors actually removed them from the dataset! If I understand the authors correctly, a player listed with 100 shots was analyzed for only 92 of those shots.

UPDATE: originally, I thought that rounds were stitched together, so removing those shots would increase observed streakiness from one round to the next. But Dr. Miller notes, in the comments, that they considered streaks within a single round only. In that case, as he notes, removing the first two shots has the effect of reducing "cold hand" streakiness, making the results more conservative.  

The removal of those shots, it seems to me, would be likely to overstate the findings a bit. The authors strung rounds together as if they were just one long series of attempts (even if they spanned different years; that seems a bit weird, that you'd say a player had a "hot hand" if he continued a 2004 streak in 2005, but never mind).

That means that when they string the last five shots of one round with the first five shots of the next, instead of something like


MHHHH MMHMH


they get 


MHHHH   HMH


which tends to create more streaks, since you're taking out shots that tend to be mostly misses, in the midst of a series of shots that tend to be mostly hits. ("M" represents a miss, as you probably gathered.)


I wonder if the significant effect the authors found would still have shown up without those omitted shots. I suspect it would have been, at least, significantly weaker. I may be wrong -- the authors showed streakiness both for hits and misses, so maybe the extra "MM" shots would have shown up in their "cold hand" numbers.


------

I bet you'd find a hot hand if you tried the equivalent contest yourself. Position a wastebasket somewhere in the room, a few feet away. Then, stay in one spot, and try to throw wads of paper into the basket. I'm guessing your first one will miss, and you'll adjust your shot, and then you'll get a bit better, and, eventually, you'll be sinking 80 to 90 percent of them. Which means, you have a "hot hand" -- once you get the hang of it, you'll be able to just repeat what you learned, which means hits will tend to follow hits.

Here's a more extreme analogy. Instead of throwing paper into a basket, you're shown a picture of a random member of the Kansas City Royals, and asked to guess his age exactly. After your guess, you're told how far you were off. And then you get another random player (which might be a repeat).

Your first time through the roster, you might get, say, 1/3 of them right. The second time through, you'll get at least 2/3 of them right -- the 1/3 from last time, and at least half the rest (now that you know how much you were off by, you only have to guess which direction). The third time through, you'll get 100%.

So, your list of attempts will look something like this (H for hit, M for miss):

MMMHMHMHMMHHHMMHHMHMMMHMHHHHMHHHMMHHHHHHHHMHHHHHHHH...

Which clearly demonstrates a hot hand.

And that's similar to what I think is happening here. 

------

The popular belief, among sportswriters and broadcasters, is that the hot hand - aka "momentum" or "streakiness" -- is real, that a team that has been successful should be expected to continue that way. But almost every study that has looked for such an effect has failed to find one.

That led to the coining of the term "hot hand fallacy" -- the belief that a momentum effect exists, when it does not. Hence the title of this paper: "Is it a Fallacy to Believe in the Hot Hand in the NBA Three Point Contest?"

So, does this study actually refute the hot hand fallacy? 

Well, it refutes it in its strongest form, which is the position that there NEVER exists a hot hand of ANY magnitude, in ANY situation. That's obviously wrong. You can prove it with the Kansas City Royals example, or ... well, you can prove it in your own life. If you score every word you misspelled as a miss, and the rest as a hit ... most of your misses are clustered early in life, when you were learning to read and write, so there's your hot hand right there.

The real "fallacy," as I see it, is not the idea that a hot hand exists at all, but the idea that it is a significant factor in predicting what's going to happen next. In most aspects of sports, the hot hand, when it does exist, is so small as to have almost no predictive value. 

Suppose a player has two kinds of days, equal and random -- "on," where he hits 60%, and "off" where he hits only 50%. That would give rise to a hot hand, obviously. But how big a hot hand? What should you predict as the chance of the player making his next shot?

Before the game, you'd guess 55% -- maybe he's on, or maybe he's off. But, now, he hits three straight shots. He has a hot hand! What do you expect now?

If my math is right, you should now expect him to shoot ... 56.3%. Not much different!

The "50/60 on/off" actually represents a huge variation in talent. The problem is that streaks are a weak indicator of whether the player is actually "on," versus whether he just had a lucky three shots. In real life, it's even weaker than a 1.3 percent indicator, because, for one thing, how do you know how long a player is "on" and how long he's "off"? I assumed a full game, but that's wildly unrealistic.

You can probably think of many reasons streakiness is a weak indicator. Here's just one more. 

The "56.3%" illustration was assuming that all shots were identical. In real life, if it's not a special case of a three-point contest ... well, when a player hits HHH, it might be evidence of a hot hand, but it also just could be that those shots were taken in easier conditions, that they were 60% shots instead of 50% shots because the defense didn't cover the shooter very well.

Real games are much more complicated and random than a three-point shooting contest. That's why I don't like the phrasing, that the authors of this NBA study found evidence of "THE hot hand effect". They found evidence of "A hot hand effect", one particular one that's large enough to show up in the contrived environment of a muscle-memory based All-Star novelty event. It doesn't necessarily translate to a regular NBA game, at least not unless you dilute it enough that it becomes irrelevant.

------

The "hot hand" issue reminds me of the "clutch hitting" issue. Both effects probably exist, but are so tiny that they're pretty much useless for any practical purposes. Academic studies fail to find statistically significant evidence, and imply that "absence of evidence" implies that no effect exists. We sabermetricians cheat a little bit, saving effort by saying there's "no effect" instead of "no effect big enough to measure."

So "no effect" becomes the consensus. Then, someone comes up with a finding that actually measures an effect -- this study for the hot hand, and "The Book" for clutch hitting. And those who never disbelieved in it jump on the news, and say, "Aha! See, I told you it exists!"  

But they still ignore effect size. 

People will still declare that their favorite hitter is certainly creating at least a win or two by driving in runs when it really counts. But now, they can add, "Because, clutch hitting exists, it's been proven!" In reality, there's still no way of knowing who the best clutch hitters are, and even if you could, you'd find their clutch contribution to be marginal.

And, now, I suspect, when the Yankees win five games in a row, the sportscasters will still say, "They have momentum! They're probably going to win tonight!" But now, they can add, "Because, the hot hand exists, it's been proven!" In reality, the effect is so attenuated that their "hotness" probably makes them a .501 expectation instead of .500 -- and, probably, even that one point is an exaggeration.  

My bet is: the "hot hand" narrative won't change, but now it will claim to have science on its side.




Labels: , , , ,