Thursday, January 02, 2014

Probabilities, genetic testing, and doctors, part II

(Part I is here)


Kira Peikoff ordered "direct-to-consumer" genetic tests from three competing companies.  In some cases, they gave her results that were very different from each other.  This led Peikoff to think that maybe she got ripped off, or that the firms aren't able to deliver what they promise.  In a New York Times article, she writes,

"At a time when the future of such companies hangs in the balance, their ability to deliver standardized results remains dubious, with far-reaching implications for consumers."

But: I think her concern stems from a misunderstanding of how the probabilities work.

The provider "23andMe" -- the one recently shut down by the FDA -- reported to Peikoff that she had a higher-than-normal risk of contracting psoriasis, twice the normal chance.  But a rival company, Genetic Testing Laboratories (GTL), told her she had a much *lower* risk -- 80% less than average.  

The two companies differed by a factor of ten, a proverbial "order of magnitude".  Clearly, those results can't both be right, can they?

Well, actually, they can, because GTL tested more genes than 23andMe.

In the illustration that accompanies the article, we can see that GTL tested eight sets of genes: HLA, IL12B, IL23R, Intergenic_1q21, SPATA2, STAT2, TNFAIP3, and TNIP1.

The article doesn't say what genes 23andMe tested, but, in my own report, my result is based on only 3 tests: HLA-C, IL12B, and IL23R.

So, it's quite reasonable that the two analyses would give different results, since they're based on different information. And, they're both correct, as far as they go.  If all you have is the three genes that 23andMe looked at, it's reasonable to say that your risk is twice normal.  The extra genes that GTL tested provided more information, and more information always changes an estimate.  

This is the essence of Bayesian reasoning: start with your prior, and update your beliefs based on new information.


You flip two coins, and leave them covered.  You ask a statistician what the chance is that you got to heads.  He says, "one in four."  That is the correct answer.

Then, you call in a second statistician, and you uncover the first coin, which turns out to be a head.  You ask the same question.  The second statistician says, "one in two".  That is again the correct answer.

But the first statistician was not wrong.  He was absolutely correct.  It's perhaps counterintuitive.  I mean, he said "one in four," and now we know the answer is "one in two".  How could he have been right?  You can argue that he did the best he could with the information available, and it's not his fault that he was wrong, but ... his answer wasn't right.

But his answer WAS right.  That's because the two statisticians were asked two different questions.  The first one was asked, "what's the chance that both coins landed heads?"  The second one was asked, "what's the chance that both coins landed heads given that we know the first one did?"


Doesn't this at least demonstrate that 23andMe is being lax in its testing, not using enough information?  No, it doesn't. Any information is better than none.  23andMe costs $99 and uses saliva.  The GTL test costs $259 and uses blood.  I'm sure if you wanted to spend $1000, you could find even more genes to test.

Say you're buying car insurance.  Company A asks if you use a seat belt.  You say no, and they quote you a high rate. You go to company B.  They secretly shadow you around for a week, and discover that you're actually such a safe and cautious driver that it completely cancels out your non-seatbelt-risk, and they quote you a lower rate.  

Was Company A wrong in quoting you a high rate?  No, they weren't.  It was the right answer for the information they had.  Unless you fault them for not following you around to get the information a better estimate.  If you do choose to fault them for that, then you have to fault every real-life risk estimate ever made, because there's always more information you can get if you take the time to uncover it.  Risk estimates *always* change with additional relevant information, which is what Bayes' Theorem is all about.


This is a variation of the thinking I argued against last post: the idea that since there's always more information -- including  information that hasn't even yet been discovered -- the information we do have is incomplete, and therefore not relevant.  From Peikoff's article:

"Imagine if you took a book and you only looked at the first letter of every other page,” said Dr. Robert Klitzman, a bioethicist and professor of clinical psychiatry at Columbia. (I [reporter Peikoff] am a graduate student there in his Master of Bioethics program.) "You’re missing 99.9 percent of the letters that make the genome. The information is going to be limited.""

Again: the information is limited, but still useful -- like the fact that you don't wear a seatbelt.  If gene X is linked to double the risk, it's not reasonable to say, "well, we might later find that gene Y turns off gene X, so don't worry about it."

Interestingly, in the same article, another bioethicist implicitly contradicts Klitzman!  Arthur L. Caplan, director of medical ethics at New York University, writes,

"If you want to spend money wisely to protect your health and you have a few hundred dollars, buy a scale, stand on it, and act accordingly."

That completely contradicts Dr. Klitzman, doesn't it?  Klitzman is saying, "if you don't have all the information on risk factors, the genetic information you do have isn't useful."  Caplan is saying, "if you don't have all the information on risk factors, the obesity information you do have is still very important."

What's going on, where the same story can quote two opposite arguments without noticing the contradiction?  I think, maybe, it's the fallacy of mental accounting.  There's the obesity mental account, and the DNA mental account. We have full knowledge how fat you are, so we should consider what we know.  But we have only partial knowledge of your DNA, so we have to ignore what we know.

Except, probabilities don't work that way.  They don't keep separate mental buckets.  If there are 100 independent DNA datapoints, and 1 obesity datapoint, the laws of probability treat them the same, as 101 datapoints.  

It's like, if you roll 101 dice, but 100 are blue and only one is red ... the first blue die is just as useful in predicting the overall total as the first (and only) red one.

Sure, my obesity might give me twice the risk of disease X. But if a gene you looked at gives you three times the risk ... you should be more worried than me, even if you only looked at one gene, and even if your other 999 genes might cancel it out.  

That's just how probabilities work.  

Labels: , , ,


At Thursday, January 02, 2014 3:32:00 PM, Blogger Zach said...

This isn't really right. Increased chance of obesity might be one of the causes of your increased genetic chance of getting diabetes.

Genes influence how you got to where you are and aren't completely external factors.

Life is one long series of coin flips, the genes tell you the odds you started with and you then start your flips to determine the outcomes.

At Thursday, January 02, 2014 3:37:00 PM, Blogger Phil Birnbaum said...

True, if obesity isn't independent of genes, then there could be some confounding going on.

I've assumed that the testing companies looked at the genetic studies in the journals, and considered that if the paper said that gene X may lead to disease Y through obesity.

It occurs to me ... if that were the case, that obesity is the "delivery system" for the gene, wouldn't 23andMe have told me my risk of obesity? Hmmm, I wonder.

At Friday, January 03, 2014 5:30:00 PM, Anonymous Anonymous said...

Phil, you are assuming that the reason for the vastly different results is a disparity of information. That may or may not be correct.

That is also a Bayesian problem! What are the chances that the the different probabilities are due to looking at different amounts of information (or just different information, not necessarily different amounts) versus what are the chances that one or all of the tests are not very reliable (and then, how unreliable)?


At Friday, January 03, 2014 9:03:00 PM, Blogger Don Coffin said...

"You flip two coins, and leave them covered. You ask a statistician what the chance is that you got to heads. He says, "one in four." That is the correct answer."

I would argue that it is not the correct answer. The correct answer is that there is either a 100% chance of two heads or a 0% chance--because the coins have already been flipped. That is, the *outcome* is already determined, and is no longer probabilistic. What we *can* say is that the a priori--in this case before flipping--probability was 25% that both coins show heads, 50% that one is heads and the other tails, and 25% that both show tails. One of those outcomes *has already occurred,* we just don't know which. You get a 25% probability by saying "I's *about to flip* these two coins; what's the probability that both *will come up* heads?"

And, yes, I'm being picky.

At Saturday, January 04, 2014 11:36:00 AM, Blogger Phil Birnbaum said...


I've assumed that the tests are accurate in what actual genes they find. That's mostly because I've never heard anything about problems actually figuring out what genes are where, and the NYT article, I'm sure, would have pointed out discrepancies. (The test I took tells me the exact SNPs they found, like "CC", so if two different tests had two different sets of letters, that would be a problem at the lab.)

So, yes, I assume the tests are 100% reliable, or close.

I also assume that the results in the studies are robust -- that if a study concludes that this gene combination is 2x the risk, it actually IS 2x the risk. That may not be right -- file drawer problem, regression to the mean, and all that -- but I'm not going to review every study to see if it's mistaken.

If you want to take the tack that even peer-reviewed genetic studies might be mistaken -- and I'm sure many are -- then you have to have that level of skepticism for EVERY result, not just genetic probabilities. So, if you say, "That 2x genetic study might be wrong," I'll say, "that study showing obesity is risky might also be wrong."

At Saturday, January 04, 2014 11:36:00 AM, Blogger Phil Birnbaum said...


Point taken. I'll think about updating the post, just for clarity.

At Saturday, January 04, 2014 4:47:00 PM, Anonymous Anonymous said...

Phil, fair enough about the accuracy of the tests. I assumed, perhaps incorrectly, that these tests may be somewhat of a hokey commercial gimmicky thing.

For the record, I don't think doc is correct, other than there is a "Schroedinger's cat" kind of conundrum.


At Monday, January 06, 2014 3:04:00 PM, Blogger Don Coffin said...

MGL--the Schroedinger's Cat issue is that observing the outcome may change the outcome--the cat is neither dead nor alive until you look. In this case, the coins are either
already, and nothing will change that. What you can't know, until they are revealed is which. How does that make the probability of H-H 1-in-4, *after* the coins have been tossed? (I'm willing to be convinced I'm wrong.)

At Tuesday, January 07, 2014 12:58:00 AM, Blogger Phil Birnbaum said...


I don't think you're wrong. I should have phrased it your way.

At Thursday, January 09, 2014 8:37:00 AM, Blogger Scott Segrin said...

I’m not sure I’m agreeing with Doc’s reasoning. Don’t we use probabilities to make decisions about the unKNOWN and not just the unHAPPENED? If someone walked up to you on the street and said, “I’ll bet you $100 that you can’t guess whether there was measurable rain on the day you were born, in the city you were born.” If I were born in Minneapolis on January 10th, I’d take the bet in a heartbeat and guess ‘no’. If I were born in Miami on May 10th, I probably wouldn't take the bet – unless of course I had looked it up once and knew whether it had rained. You are taking all of the information you know and determining whether the odds are in your favor. Based on that calculation, you take the bet or not. Just because the day you were born has already happened doesn't mean that at that moment in time, in your mind, it is not a probabilistic event.

At Thursday, January 09, 2014 10:42:00 AM, Blogger Phil Birnbaum said...

I think this is a "known" philosophical issue. I don't mind talking about probabilities of events that already happened, but some people do. I should google it and see if there's an "official" position on it.

At Thursday, January 09, 2014 6:49:00 PM, Blogger Don Coffin said...

Scott--I think my position would be that it's meaningless to talk about the "probability" of an event that has occurred. We can talk about whether we have *knowledge* of that event, but our knowledge, or lack thereof, does not change the event. If you asked me whether I could guess whether it rained on the day I was born, in the place where I was born (Feb. 4, 1948; Frankfort, IN), my response would be, "No, I can't guess...but I could find out." (Without looking, I'd say, "Probably not," but I wouldn't take that as a statement about the "probability" of rain on Feb 4, 1948. It either rained, or it didn't.)

That said, there is a range of philosophical positions on the correct meaning of probability theory, ranging from the a priorists to the frequentists to the Bayesians (and others to numerous to mention). But I suspect that the notion that an event that has occurred can be discussed in probabilistic terms is not one that would receive much support.

At Thursday, January 09, 2014 8:38:00 PM, Blogger Don Coffin said...

And, by the way, the answer ("Did it rain on your birth date?") is--no precipitation, with a low temperature of 3 and a high of 28.

At Tuesday, January 28, 2014 4:10:00 PM, Blogger Zach said...

It's not meaningless if you are trying to make a decision but can't check the actual result before making it, as in the bet hypothetical. And as in real life when you're born with a mix of genes and trying to make decisions based on what those genes tell you. Maybe some gene combinations are actually determinative of cancer or death, but the current knowledge can only assign a percentage to a subset. The event has happened, but you can't get access to the result.

I don't know what 23andMe would tell you, and that is a big problem I have with them. I don't think many researchers, let alone gene testing companies, really understand what their confounding variables are. Your understanding of stats is great, but 23andMe's understanding is completely unknown to me. I wouldn't even know how to check it. Do the list all the findings they base their numbers on? Not just the findings relevant to the ones they highlight to you, but all of them?


Post a Comment

<< Home