# Does a Trillion-To-One Make Sense?

* Do statements about DNA tests make sense? *

Nancy Grace is not a theoretician, of course, but was one of the original anchors on Court TV. This channel used to broadcast criminal trials live, and Grace was one of their best commentators—in my opinion.

Today I want to talk about the use of probabilities in trials and in other non-technical areas. I think that there is an interesting theorem that can be proved, but I am unable to exactly formulate it. Perhaps you will be able to help.

I want to add that I no longer watch Court TV, and even do not watch Nancy Grace’s new show. She is now on another channel and anchors a gossip oriented show, which does not appeal to me. Oh well. She is a great commentator.

** The Problem **

On Court TV, I would sometimes hear statements like this one:

the probability that D is not the murderer is at most one in a trillion.

Okay, they never said “D,” but they would say a real person’s name in place of “D”. I use “D” since this is a theory discussion and we like variables.

Back to the statement: I have always doubted that such statements made any sense. The part that I think is wrong is the estimation of the probability being so low: a trillion to one is extremely small.

Let me state a point: I am not arguing that anyone who was convicted was innocent; I am arguing that making probability statements that are so unlikely needs to be supported by reason. They cannot be just made based on beliefs.

The people who made the above statements were typically DNA experts. Their reasoning was simple. Suppose a piece of human DNA can be tested for the presence or absence of DNA markers. For ease of calculation I will assume that each marker is either present or not—in general DNA markers are not binary, but that will not change my point. Then, a lab tests D’s blood, the defendant’s blood, and also the blood found at the scene of the crime from an unknown person X. Assuming that each marker is present or not with 50% probability, then they would say:

- If there is even one marker that is different, then it is not D’s blood. Thus, the defendant D is different from X, which is good news for the defendant.
- If all the markers are the same, then the probability the blood is not D’s is at most . Thus, it is extremely likely that the defendant D is the same person as X, which is bad news for the defendant: go to jail and stay there.

Of course statement (1) is true; statement (2) follows only **if the tests are independent**.

My problem is how can they claim the tests are independent? There may be biological reasons to assume this is the case, but that does not prove that they are independent.

Just as I was writing this I saw in the Sunday New York Times, in an article on Chess, a claim that a certain event was a trillion to one. There is that “trillion to one” again.

a Boston University professor who oversees the federation’s ratings, wrote in an e-mail message that the odds of a player with a 700 rating beating six players in a row in the 1,500 range, as Nguyen did, is about one in a trillion.

With all due respect to the professor, I agree that what David Nguyen did was unlikely, that it has a very low probability, but not a trillion to one. I do not believe this claim is reasonable.

** A Quasi-Theorem **

I have a question that I have not been able to formalize, but I would like to state a rough version of it. I hope that you might be able to make it into a precise statement—hopefully a precise true statement.

Imagine a *world* where you can perform experiments. The experiments are random in nature and each time you perform them you learn something about the “world.” Suppose that you have some statement about this world. Then, you may try to estimate the probability that the statement is true. My assertion is that we can correctly claim that a non-trivial event has very low probability, only after we do many experiments. Quantitatively, to claim that the event has probability less than we must perform at least order experiments. Does this seem right to you?

One might argue that the probability of getting all heads when tossing 100 fair coins is , and that we can be sure of this probability. We haven’t done experiments, but still can be sure. That does not contradict the above thought. This is because in the former case, we were trying to model a “world”, whereas in the latter case we are analyzing events in a known probability space. In the example with DNA markers, to say “one in a trillion”, the experts are assuming independence. To justify the independence assumption on the space that is as large as a trillion, one ought to have conducted order a trillion experiments.

I think in order to make it formal we need to be careful what we mean by a “world.” If the world is random, but changes over time, then clearly it is very hard to state anything about probabilities. So allowing a world that is time dependent seems too strong. In the case of DNA and Chess examples, it is reasonable to assume that the world is approximately time independent.

If the world is simply an unknown distribution, then I believe that we should be able to prove something. One problem—I am not sure if modeling the world as a distribution captures my intuition.

Is this a standard fact that I am unaware of, or is this a new result? I am unsure. I do know that something like this should be true. It is the reason, I believe, that extremely low probability claims about DNA and Chess are wrong. Clearly, if the above is true, then it is absurd for anyone to make claims of a trillion to one for a probability bound, since there is no way they could have done enough experiments.

** DNA Experiments **

I can say a bit more about the DNA question. Suppose that two unrelated people in the whole world have the same DNA markers. How hard is it to detect this? Let’s assume that DNA experts have found the markers for people. Then, the probability that they would have found the pair that are the same is,

where is the number of pairs of people in the whole world. Clearly, unless is huge, there is a very good chance that they would not have found the pair.

The point is how can they say things like “**no** two people in the world have the same markers?” I cannot follow that at all.

** Testing Independence **

There has been some work on testing whether or not a distribution is independent. For example, Tuǧkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White consider the question of independence in their paper.

I note that their bound on independence testing requires at least order samples—actually more. Here measures how close a distribution is to being independent: thus, their result seems to show that getting trillion to one odds is impossible for DNA. I will talk about this paper in the future in more detail.

** Open Problems **

Can we make the quasi-theorem into a theorem and prove it? Are there related theorems?

Great post! Court-probability claims always bug me.

As you say, we can make the claim that the probability that 100 ideal ‘abstract’ coin flips all turn up heads is 2^-100, since we are only counting elements in a known probability space. But we can also make approximately the same claim about a real coin in the physical world — this is because we have a good understanding of the

mechanismbehind coin flipping.I’m not an expert in biology, but I can imagine that one could similarly understand the mechanism that produces the distribution of DNA markers, and from this, make a claim about independence that does not require treating it as a black box distribution that one must test.

Tangentially, it is worth mentioning that tosses of a fair coin are easily biased. In fact, Shannon actually built a machine to flip coins deterministically. Moreover, catching a coin in midair for a generic flip automatically produces a distribution of (.51,.49), where the .51 corresponds to the initial upward face. Diaconis and coworkers analyzed this in some detail. I have made some more detailed metatheoretical comments about this sort of thing in the context of statistical physics at

http://blog.eqnets.com/2009/08/24/dynamical-bias-in-the-dice-roll/

On another tangential note, I don’t see why the presence or absence of a marker would have a probability of 1/2 even assuming independence.

The question reminds me somewhat of Dempster–Shafer theory, namely the belief and plausibility functions they defined. Sadly, I am not very familiar with the field. (See http://en.wikipedia.org/wiki/Upper_and_lower_probabilities and http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory for some more details).

It’s a very interesting quasi-theorem, that undoubtedly can be formulated in some theoretical setting. Here’s a particular formulation: suppose we get an oracle that samples elements from an unknown distribution D over a known set S. Suppose we get as input the set S, the oracle to D, an element x in S, and a number epsilon. Suppose that we want to determine whether the weight of x in D is more than epsilon or less than epsilon/2, and we want to succeed with probability at least 2/3. Then I’m pretty sure that you’d have to sample at least 1/eps elements from the distribution. I think this is a plausible, albeit simple, theoretical formulation of your quasitheorem.

However, I don’t think it tells us something concrete about the real world. In the real world we have some knowledge about what things can or cannot happen or how things behave; some of this knowledge is obtained using Occam’s razor, which specifically tells us that we do not require many experiments to be reasonable sure of a claim; it’s enough if a simple rule explains many precieved phenomena. There are other ways to get rules of nature, but I believe that Occam’s razor is strikingly opposed to your quasitheorem, and that this is a point worth some extra thought. Is the principle you’re proposing an anti-Occam principle, in the sense that it proposes a scientific method where Occam’s razor is invalid? And does that model a world of ignorance (or of agnosticism), where we cannot trust any scientific knowledge except that gained by taking statistics over experiments, and not using any induction or deduction of any kind?

I, personally, do not agree that your quasitheorem applies to every situation. For example, I claim that there is a chance of less than 10^{-50} that the broken glass on the floor will reunite into a whole glass in the next 24 hours. I assure you that I have not (implicitly or explicitly) done 10^{50} experiments. Rather, I know the laws of physics, and know that an event of the glass coming together is highly unlikely. (My estimate would have been different were I to assign non-miniscule probability to the existence of a personal God; my estimate also takes into account a possible invention by DARPA, say, of a glass-reconstructing machine; but I digress).

By the way, on the topic of entropy, have you watched the Feynman Messenger lectures? They’re great, especially the one on entropy (lecture 5).

What is true, however, is that people make estimates irresponsibly. “One in a trillion” is a rethoric statement, not a factual or rigorous one. It says “There’s a great than 0 probability that the claim is wrong, but I completely believe it and so should you”. When getting a realistic estimate on the probability of mistake of such procedures, a good rule of thumb for a guesstimate on the probability of the event can indeed be 1 over the number of experiments made (explicitly or implicitly). For example, I’d say that it’s a good guesstimate that the probability that the US will collapse tomorrow is 1 over the number of days that the US has existed.

Your quasitheorem also reminds me of the unseen-elements problem (see e.g. here here>. In particular, your argument that 1/eps is necessary intuitively connects in my head with the Laplace add-1-to-all-counters approach (see link above).

Good points. I have seen a 495 trillion to one claim on a DNA result. I love the extra precision: its 495 trillion to one, not 500 trillion to one. I think these types of claims are not science, but they are stated certainty. I agree if there was some physical law that we believed—like you glass breaking example—then perhaps it is possible to make such claims. But lacking them a trillion to one seems to me to be wrong.

Actually the supposed claim about sampling you make doesn’t even type check without some modification.

You want to infer that your distribution D assigns a probability D({H}) to the outcome H with D({H}) between 3/4 and 3/8 with probability at least 2/3. As stated this doesn’t make sense since D is a function assigning reals to subsets of the outcome space defined in terms of a preexisting space of outcomes. Thus it’s incoherent to talk about an outcome specifying the behavior of D. I mean what the heck would P( P({H})=.5) = 1/3 even mean?

Now what you could do is specify some space of probability distributions and some other (meta) probability on that space. Thus we can define P({D}) to give the probability that D is selected from some collection of distributions but in this context the statement is still a type error.

Rather than proving this formally I will give an intuitive argument. Suppose you know for a fact that the coin you are flipping is biased ( you are drawing from D1 where D1({H})=.8). Since you know that for an absolute fact no number of samples from your unknown distribution could warrant giving any probability but 0 to the outcome that your unknown distribution was that of a fair coin.

—-

Ultimately this is the difference between hypothesis testing and assigning probability.

After many flips of a coin I might be able to say something like “The probability of getting this many more heads than tails IF this coin is fair is less than 5%” but I couldn’t infer from just this anything about the probability the coin is fair. For instance I might simply infallibly know it’s fair and no amount of hypothesis testing to the contrary should prompt me to stop believing what I have infallible knowledge of.

what about the fact that DNA can be created? http://www.scientificamerican.com/blog/post.cfm?id=lab-creates-fake-dna-evidence-2009-08-18

There is also the issue of cheating. Labs have been found to cheat and take shortcuts.

I always wondered how well a lab would do if given say 100 unlabeled samples, and asked which are the same? Would they always get it right? I doubt it.

1. I wonder if whoever made that 1 in a trillion quote did a union bound over the number of profiles in the database they compared against to avoid the mortal sin of computing the probability of an event after it has happened. Why this is important: if every pair has a million to one change of matching and you compare against a million profiles in a database, the chances of finding a match are about 1-1/e.

2. An extension of your idea: it only makes sense to predict 1 in N odds for an event if the odds of a serious modeling error is also at most 1 in N. Therefore meaningful predictions of tiny probabilities can only be made by extraordinarily solid models. For real-world problems there’s usually only one airtight model, namely historical experience, which as you say requires N trials to give a meaningful 1 in N prediction.

3. In a setting of repeated predictions, the technique of always adding 1 to the number of failures when computing empirical probabilities might yield some sort of regret bound.

I think the most recent post at the blog “33 Bits of Entropy” is very relevant to the current discussion: http://33bits.org/2009/12/02/the-entropy-of-a-dna-profile/

Thanks for the pointer. It is quite relevant.

I had not heard of the Nguyen chess case. Yes, the trillion-to-1 is what the rating differential is supposed to mean. Mark Glickman, the Boston Univ. professor mentioned in the article, is certainly the one to know. He was instrumental in shifting the chess rating system from Arpad Elo’s original bell-curve-based model to one based on a logistic distribution, which eliminates some arbitrary cutoffs and gets a better understanding of the tail. It also treats chess players as a dynamic population model, and part of the motivation—here I think US Chess Federation statistician Mike Nolan had a lead role—is to reduce the ratings lag for usually-young players who are rapidly improving. Glickman also has his own rating system called “Glicko” (http://en.wikipedia.org/wiki/Glicko_rating_system), motivated precisely to measure deviations more accurately.

I infer that part of the consternation here is that Nguyen is not a youngster. However, I buy Nguyen’s argument. What’s the alternative? It could only be that he was computer-cheating, but I’ve seen no such accusation. My own work (http://www.cse.buffalo.edu/~regan/chess/fidelity/; there’s much additional material and data a busy term has kept me from processing and posting) aims not only to test such eventualities, but also to evaluate the “Intrinsic Rating” based on the objective quality of the moves made, rather than the results of games. With on-average 30-or-so applicable moves per game (excluding the first few and moves where one side is clearly winning), I get a much larger sample space.

And I think small sample space is relevant here too—when “a trillion to one” pops up from only 7 event items, it’s yea-much-more believable that a systematic explanation is responsible—here that he got good on the Net. Finally, a chess-specific tidbit: 30 years ago new players would be started out at 1200, and the federation still considers 1000 to be “a bright beginner”. So for a guy with 15 years Internet play to be over 1600 is not unusual, and over 1800 I’ll believe it.

Dear Dick, One part of the difficulty is that it is quite hard to give the probabilistic statement you have mentioned a precise formal meaning.

In many cases, testing pairwise (and small wise) independence can strengthen the claims you discussed. (This goes against your quasi-theorem intuition.) If indeed the DNA markers are 2-wise independent or k-wise independent for small values of k, this will support the estimates based on independence (or at least their overall relevance). This can be tested by not-so -many tests. (I would expect that such a test will fail the independence assumption in the chess example.)

For the purpose of courts there is no real difference between a 1:10^10 probability and 1:10^5 probability.

The claim that there are no two people in the world with the same DNA markers looks overall plausible. (You can make it more plausible by further statistical tests.)

I agree that for courts 1 in a 100,000 is plenty strong enough. Its the claim that its 459 trillion to one that got me going.

Two remarks: Probability is at the heart of court decisions and often the intuitions and implicit calculations are wrong. It is rare that judges explain their probabilistic “calculations” here is a link to such a case which is very interesting:

http://arielrubinstein.tau.ac.il/papers/02.pdf

I am also skeptical to claims involving tiny probabilities and especially very “accurate” tiny probabilities. (I wrote a paper analyzing two experiments where in the first a probability for an event was claimed to be 1: 1.2 x 10^-9 and in the replication it was claimed to be 1:1.3 x 10^-9 .) Such tiny probabilities appear often in various risk-assessement where it is clear that the underlying independence assumptions are false.

Gil, you write “In many cases, testing pairwise (and small wise) independence can strengthen the claims you discussed. (This goes against your quasi-theorem intuition.) […] This can be tested by not-so -many tests”

However, I’m not quite sure this is true. the Alon et al paper needs 1/eps^2 samples to test 2-wise independence, even on a constant-sized domain. They show a lower bound of 1/eps, again, even for constant-sized domains. In fact, there’s a simple hard instance: take a two-variable space, which with probability (1-eps)/2 chooses (0,0), w.p. (1-eps/2) chooses (1,1), w.p. eps/2 chooses (1,0) and w.p. eps/2 chooses (0,1). It takes 1/eps samples to “see” that the coordinates are not independent.

So I’m not at all sure you can test 2-wise independence in a good enough way. I vaguely feel that if we’re eps-far from 2-wise independent then the independence might fail “exactly in our instance”. This whole things seems to

strengthenDick’s claim rather than making it weaker.In fact, there’s a simple hard instance: take a two-variable space, which with probability (1-eps)/2 chooses (0,0), w.p. (1-eps/2) chooses (1,1), w.p. eps/2 chooses (1,0) and w.p. eps/2 chooses (0,1).

Those probabilities don’t appear to sum to one, and I couldn’t figure out what distribution you intended.

But there

aresimple counter example distributions:With probability (1 – 10^-12) choose a random 40 bit string.

With probability 10^-12 choose a string of 40 ones.

If you take much fewer that a trillion samples from that distribution, what you see will almost certainly be identical to 40 independent bits, but the actual probability of all ones will be twice what you would predict from that.

But does the existence of such distributions really invalidate the reasoning in the criminal case? If the distribution of human genomes was specially chosen to frame the defendant, it surely could, but how likely is that? Are we to believe that the whole population of the earth revolves around this case?

Such pathological distributions seem so much less likely than, say, a lab error swapping the results between two suspects, that they aren’t even worth considering.

(Let me clarify, when I said that the claim that there are no two people in the world with the same DNA markers looks overall plausible, I assume that the probabilistic rough argument based on independence shows that this is the case with a huge margin. I dont know how many DNA markers are there and what are the relevant numbers.)

Your quasi-theorem reminds me of the “No Free Lunch Theorems” (NFL) that were very popular a few years back.

There is probably a formulation in which your theorem is true, but truth does not imply relevance. NFL implies that no inductive reasoning can never work. It is a theorem, but the conditions of the theorem don’t hold, so things aren’t that bad.

Where does the “world” on which you are to do experiments come from? Is it chosen from a distribution? If so which? The theorem is probably true for most definitions of a “uniform” distribution, but those may not be as reasonable as they seem.

Is it chosen by an adversary? That is probably closer to what you mean. Again, the theorem is probably true, but if the actual world was worse case, we would have worse things to worry about than the probability of DNA matches. In a worst case (or uniformly chosen) word the glass mentioned in a previous comment

wouldbe just as likely to re-assemble as it was to break.Be all that as it may, the trillion to one probabilities given are hogwash. It isn’t even true that no two people have ever been found with the same DNA. Surely the probability that the defendant has an unrecorded identical twin is greater than one in a trillion.

Great comment. I like your last point.

I did not even get into issues of lab error. If they even use a computer in any step, I doubt that its a trillion to one that there was no computer error.

While your point stands that “one in a trillion” is likely an inaccurate statement given limited knowledge (or even with it, as some of the other commentators point out) it might in possibly miss the semantics intended by the maker of that statement. I’ve found in my personal discourse that statements like this (sometimes formulated differently like: “99.9999% accurate” or “one in a million”) really just are a way of saying “this has a really high/low probability.” The accuracy and precision of the number itself are immaterial to the speaker and the statement is idiomatic. Of course this raises all sorts of problems in the context of something like a trial

I just came across with an event that with very tiny probability. See this

http://gilkalai.files.wordpress.com/2009/12/random1.pdf

The empirical probability that when you print something an English letter will tranform into a meaninless rectangle is 1/8. The printout that I linked to represents an event whose probability is 1:617.2 trillion. Nevertheless it did happen!

As others have written, in the DNA evidence context you can’t empirically justify the independence assertion, nor can you exclude other sources of error. My colleague Terry Speed spent a morning, testifying for the defense in the O.J. Simpson murder trial, repeating these two points over and over …..

More constructively, it’s fun to spend part of an undergraduate class period discussing what really does have a 1-in-a-million chance, up to a factor of 2 —

see this page.

Actually it’s quite easy to demonstrate that your “Quasi-Theorem” is in fact false.

For the easy proof imagine you are going to flip a coin N times. WITHOUT assuming anything about independence we see that there are 2^N mutually incompatible events each of which specifies some possibly sequence of outcomes. Thus as the total probability is one at least one of these events must have a probability of less than 1/2^N. As that’s the probability that is assigned before the experiment is started I can just pick N big enough to violated your rule no matter how many observations we’ve made in the past about coins or the like.

Alternatively one could argue:

There are a countably infinite number of mutually incompatible logically possible hypothesizes about future outcomes. For instance we can associate any positive integer N to the claim “The universe will end between N and N+1 billion years after New Years 2010.”

Now there are only finitely many total observations we have performed up to this time, say O. If your quasi-theorem was true than every one of these infinitely many hypothesises would have probability at least 1/O and since they are disjoint events the total probability that the universe ends between now and O+2 billion years from now is at least (O+1)/O > 1. Contradiction.

—

At a more qualatative level what is going on is simply a reflection of the philosophical problem of induction. In order to make any sense of the world at all we have to come pre-armed we some innate sense of which hypothesises are more plausible and which are less plausible. The very fact that we can do science at all is a result of the fact that our innate plausibility judgement assigns extremely low probabilities to hypothesises like, “Aliens put a bunch of evidence out here just to screw with us for a reality TV show which ends tomorrow.”

Fundamentally I think you are a bit confused about what exactly it means to say something has a certain probability. And that’s understandable, many famous philosophers are as well.

Ultimately probability is just a mathematical tool for making predictions. Like any other aspect of science we postulate some informally described mathematical model for the behavior, e.g., coin flips are modeled as independent binary events. That is adopting a particular probability model is no different than assuming we live in a frictionless Newtonian world. In this sense the probability assignmet is a fallible hypothesis like any other in science. Becoming more certain this hypothesis is valid is simply irrelevant to the probability since it’s not a measure of your personal conviction or willingness to bet.

What makes it confusing is we use the exact same terminology when we use probability as a model for our (ideal?) degree of belief that various outcomes will come to pass. In this sense you assign an event a probability based merely on your confidence it will come to pass. It doesn’t matter that Waterloo was fought long in the past you might not know who won and assign Napolean only a 40% chance of losing.

As the courts want object scientific testimony the reports are all in the former sense which is unfortunate since the jury interprets them in the later sense.

It occurs to me that ultimately what is causing the confusion here is the idea that one could somehow prove things about various real world probabilities without starting with some background a priori probability distribution or the equivalent.

If you are a Bayesian you (definitionally) start with some prior probability function that you then modify by conditionalization. Even if you aren’t a Bayesian you have to start off with something that performs the same work and has the same implications (forces you to assign extremely low probabilities to certain outcomes).

To give a suggestive example suppose I tell you that I’ve written a program on my computer to take some physical random process as input (geiger counter) and produce a string of 0’s and 1’s of length 10. Surely for any given string there are some odds that you’d (assuming you aren’t risk averse) be indifferent between betting that string would and wouldn’t come up. If so either I can expose you as irrational and make a guaranteed profit from you (google dutch book argument) or those odds yield a probability function and at least one such string must have been assigned probability less than 1/2^10.

The key point here is that it didn’t matter that you started without any information about what was going on at all. In order to make deciscions you are forced to make brute judgements of relative likelihood and if there are enough options some options have to be regarded as very unlikely PRIOR to any experience.

Part of the problem is that, yes in fact the tests are NOT independent, the chances of a false positive are MUCH MUCH MUCH higher than one in a trillion, but are not a real problem if you have another reason to believe that D is guilty, because the probability you actually care about is not P(sample is D’s blood) but P(D guilty), your estimate of which is going to be .

Somewhere I found a great post explaining this much more clearly, but yes the ‘one in a trillion’ number is ridiculous (especially since it’s based on the idea the somehow either D or the crime scene sample was randomly selected, which is not true of either).