Does a Trillion-To-One Make Sense?
Do statements about DNA tests make sense?
Nancy Grace is not a theoretician, of course, but was one of the original anchors on Court TV. This channel used to broadcast criminal trials live, and Grace was one of their best commentators—in my opinion.
Today I want to talk about the use of probabilities in trials and in other non-technical areas. I think that there is an interesting theorem that can be proved, but I am unable to exactly formulate it. Perhaps you will be able to help.
I want to add that I no longer watch Court TV, and even do not watch Nancy Grace’s new show. She is now on another channel and anchors a gossip oriented show, which does not appeal to me. Oh well. She is a great commentator.
On Court TV, I would sometimes hear statements like this one:
the probability that D is not the murderer is at most one in a trillion.
Okay, they never said “D,” but they would say a real person’s name in place of “D”. I use “D” since this is a theory discussion and we like variables.
Back to the statement: I have always doubted that such statements made any sense. The part that I think is wrong is the estimation of the probability being so low: a trillion to one is extremely small.
Let me state a point: I am not arguing that anyone who was convicted was innocent; I am arguing that making probability statements that are so unlikely needs to be supported by reason. They cannot be just made based on beliefs.
The people who made the above statements were typically DNA experts. Their reasoning was simple. Suppose a piece of human DNA can be tested for the presence or absence of DNA markers. For ease of calculation I will assume that each marker is either present or not—in general DNA markers are not binary, but that will not change my point. Then, a lab tests D’s blood, the defendant’s blood, and also the blood found at the scene of the crime from an unknown person X. Assuming that each marker is present or not with 50% probability, then they would say:
- If there is even one marker that is different, then it is not D’s blood. Thus, the defendant D is different from X, which is good news for the defendant.
- If all the markers are the same, then the probability the blood is not D’s is at most . Thus, it is extremely likely that the defendant D is the same person as X, which is bad news for the defendant: go to jail and stay there.
Of course statement (1) is true; statement (2) follows only if the tests are independent.
My problem is how can they claim the tests are independent? There may be biological reasons to assume this is the case, but that does not prove that they are independent.
Just as I was writing this I saw in the Sunday New York Times, in an article on Chess, a claim that a certain event was a trillion to one. There is that “trillion to one” again.
a Boston University professor who oversees the federation’s ratings, wrote in an e-mail message that the odds of a player with a 700 rating beating six players in a row in the 1,500 range, as Nguyen did, is about one in a trillion.
With all due respect to the professor, I agree that what David Nguyen did was unlikely, that it has a very low probability, but not a trillion to one. I do not believe this claim is reasonable.
I have a question that I have not been able to formalize, but I would like to state a rough version of it. I hope that you might be able to make it into a precise statement—hopefully a precise true statement.
Imagine a world where you can perform experiments. The experiments are random in nature and each time you perform them you learn something about the “world.” Suppose that you have some statement about this world. Then, you may try to estimate the probability that the statement is true. My assertion is that we can correctly claim that a non-trivial event has very low probability, only after we do many experiments. Quantitatively, to claim that the event has probability less than we must perform at least order experiments. Does this seem right to you?
One might argue that the probability of getting all heads when tossing 100 fair coins is , and that we can be sure of this probability. We haven’t done experiments, but still can be sure. That does not contradict the above thought. This is because in the former case, we were trying to model a “world”, whereas in the latter case we are analyzing events in a known probability space. In the example with DNA markers, to say “one in a trillion”, the experts are assuming independence. To justify the independence assumption on the space that is as large as a trillion, one ought to have conducted order a trillion experiments.
I think in order to make it formal we need to be careful what we mean by a “world.” If the world is random, but changes over time, then clearly it is very hard to state anything about probabilities. So allowing a world that is time dependent seems too strong. In the case of DNA and Chess examples, it is reasonable to assume that the world is approximately time independent.
If the world is simply an unknown distribution, then I believe that we should be able to prove something. One problem—I am not sure if modeling the world as a distribution captures my intuition.
Is this a standard fact that I am unaware of, or is this a new result? I am unsure. I do know that something like this should be true. It is the reason, I believe, that extremely low probability claims about DNA and Chess are wrong. Clearly, if the above is true, then it is absurd for anyone to make claims of a trillion to one for a probability bound, since there is no way they could have done enough experiments.
I can say a bit more about the DNA question. Suppose that two unrelated people in the whole world have the same DNA markers. How hard is it to detect this? Let’s assume that DNA experts have found the markers for people. Then, the probability that they would have found the pair that are the same is,
where is the number of pairs of people in the whole world. Clearly, unless is huge, there is a very good chance that they would not have found the pair.
The point is how can they say things like “no two people in the world have the same markers?” I cannot follow that at all.
There has been some work on testing whether or not a distribution is independent. For example, Tuǧkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White consider the question of independence in their paper.
I note that their bound on independence testing requires at least order samples—actually more. Here measures how close a distribution is to being independent: thus, their result seems to show that getting trillion to one odds is impossible for DNA. I will talk about this paper in the future in more detail.
Can we make the quasi-theorem into a theorem and prove it? Are there related theorems?