Probabilities, genetic testing, and doctors, part II
(Part I is here)
Kira Peikoff ordered "direct-to-consumer" genetic tests from three competing companies. In some cases, they gave her results that were very different from each other. This led Peikoff to think that maybe she got ripped off, or that the firms aren't able to deliver what they promise. In a New York Times article, she writes,
"At a time when the future of such companies hangs in the balance, their ability to deliver standardized results remains dubious, with far-reaching implications for consumers."
But: I think her concern stems from a misunderstanding of how the probabilities work.
The provider "23andMe" -- the one recently shut down by the FDA -- reported to Peikoff that she had a higher-than-normal risk of contracting psoriasis, twice the normal chance. But a rival company, Genetic Testing Laboratories (GTL), told her she had a much *lower* risk -- 80% less than average.
The two companies differed by a factor of ten, a proverbial "order of magnitude". Clearly, those results can't both be right, can they?
Well, actually, they can, because GTL tested more genes than 23andMe.
In the illustration that accompanies the article, we can see that GTL tested eight sets of genes: HLA, IL12B, IL23R, Intergenic_1q21, SPATA2, STAT2, TNFAIP3, and TNIP1.
The article doesn't say what genes 23andMe tested, but, in my own report, my result is based on only 3 tests: HLA-C, IL12B, and IL23R.
So, it's quite reasonable that the two analyses would give different results, since they're based on different information. And, they're both correct, as far as they go. If all you have is the three genes that 23andMe looked at, it's reasonable to say that your risk is twice normal. The extra genes that GTL tested provided more information, and more information always changes an estimate.
This is the essence of Bayesian reasoning: start with your prior, and update your beliefs based on new information.
You flip two coins, and leave them covered. You ask a statistician what the chance is that you got to heads. He says, "one in four." That is the correct answer.
Then, you call in a second statistician, and you uncover the first coin, which turns out to be a head. You ask the same question. The second statistician says, "one in two". That is again the correct answer.
But the first statistician was not wrong. He was absolutely correct. It's perhaps counterintuitive. I mean, he said "one in four," and now we know the answer is "one in two". How could he have been right? You can argue that he did the best he could with the information available, and it's not his fault that he was wrong, but ... his answer wasn't right.
But his answer WAS right. That's because the two statisticians were asked two different questions. The first one was asked, "what's the chance that both coins landed heads?" The second one was asked, "what's the chance that both coins landed heads given that we know the first one did?"
Doesn't this at least demonstrate that 23andMe is being lax in its testing, not using enough information? No, it doesn't. Any information is better than none. 23andMe costs $99 and uses saliva. The GTL test costs $259 and uses blood. I'm sure if you wanted to spend $1000, you could find even more genes to test.
Say you're buying car insurance. Company A asks if you use a seat belt. You say no, and they quote you a high rate. You go to company B. They secretly shadow you around for a week, and discover that you're actually such a safe and cautious driver that it completely cancels out your non-seatbelt-risk, and they quote you a lower rate.
Was Company A wrong in quoting you a high rate? No, they weren't. It was the right answer for the information they had. Unless you fault them for not following you around to get the information a better estimate. If you do choose to fault them for that, then you have to fault every real-life risk estimate ever made, because there's always more information you can get if you take the time to uncover it. Risk estimates *always* change with additional relevant information, which is what Bayes' Theorem is all about.
This is a variation of the thinking I argued against last post: the idea that since there's always more information -- including information that hasn't even yet been discovered -- the information we do have is incomplete, and therefore not relevant. From Peikoff's article:
"Imagine if you took a book and you only looked at the first letter of every other page,” said Dr. Robert Klitzman, a bioethicist and professor of clinical psychiatry at Columbia. (I [reporter Peikoff] am a graduate student there in his Master of Bioethics program.) "You’re missing 99.9 percent of the letters that make the genome. The information is going to be limited.""
Again: the information is limited, but still useful -- like the fact that you don't wear a seatbelt. If gene X is linked to double the risk, it's not reasonable to say, "well, we might later find that gene Y turns off gene X, so don't worry about it."
Interestingly, in the same article, another bioethicist implicitly contradicts Klitzman! Arthur L. Caplan, director of medical ethics at New York University, writes,
"If you want to spend money wisely to protect your health and you have a few hundred dollars, buy a scale, stand on it, and act accordingly."
That completely contradicts Dr. Klitzman, doesn't it? Klitzman is saying, "if you don't have all the information on risk factors, the genetic information you do have isn't useful." Caplan is saying, "if you don't have all the information on risk factors, the obesity information you do have is still very important."
What's going on, where the same story can quote two opposite arguments without noticing the contradiction? I think, maybe, it's the fallacy of mental accounting. There's the obesity mental account, and the DNA mental account. We have full knowledge how fat you are, so we should consider what we know. But we have only partial knowledge of your DNA, so we have to ignore what we know.
Except, probabilities don't work that way. They don't keep separate mental buckets. If there are 100 independent DNA datapoints, and 1 obesity datapoint, the laws of probability treat them the same, as 101 datapoints.
It's like, if you roll 101 dice, but 100 are blue and only one is red ... the first blue die is just as useful in predicting the overall total as the first (and only) red one.
Sure, my obesity might give me twice the risk of disease X. But if a gene you looked at gives you three times the risk ... you should be more worried than me, even if you only looked at one gene, and even if your other 999 genes might cancel it out.
That's just how probabilities work.