Amplifying on the PCR Amplifier
PCR is an algorithm that amplifies DNA
Kary Mullis is not a theorist, but is a chemist who won the 1993 Nobel Prize for inventing the PCR method—an amplifier for DNA. He is also an alumnus of Georgia Tech, where I am these days, yet we do not always claim him as such—according to my friends in biology. While Kary is famous, and justly so, he has a quirky personality that does not always make everyone thrilled to be connected with him.
Today I want to discuss the details of his PCR method, which I hope you will enjoy. It really is an algorithm. Really. I think this is, at least partially, why there was such a controversy swirling around whether or not Mullis did enough to win a Nobel Prize. He certainly is one of the few who have been honored with a Nobel Prize for the invention of an algorithm; for example, Tjalling Koopmans and Leonid Kantorovich won their economics Nobel Prizes for linear programming in 1975. But there are few others. Also, PCR raises a pure theory problem that I hope someone will be able to resolve. More on that shortly.
Mullis invented PCR while he was employed at Cetus, now a defunct company. He got a $10,000 invention award, while Cetus eventually sold the patent rights for 300 million dollars. This was back when this was real money, even today that is a pretty good amount for a single patent. Mullis was not happy with the size of his award, which is not hard to understand.
This reminds me of a story about IBM. They have an award for the best invention of the year, which comes along with a nice, but relatively modest, cash award. Years ago the inventor of a certain encoding trick for improving the density of disk storage units was awarded this prize. Then, he was awarded the same prize the following year: for the same invention. This is like being “Rookie of the Year” for two years in a row. Since the invention made IBM billions of dollars in additional sales of disk units, they needed someway to thank him. So he got the best invention of the year, for the same invention, twice. Perhaps Cetus could have done something like this too.
Back to PCR. There was a huge court battle over whether the Cetus patent for PCR was valid or not. Several famous biologists, including some Nobel Laureates, testified against Cetus and Mullis: arguing that the PCR method was already known, that Mullis did not invent anything, and therefore that the patent should be thrown out. Mullis won.
I have read parts of the trial transcript and other accounts, and I think I have a theory why there was such outcry an against Mullis. First, he is—as I stated already—a character. If you have any doubts read his book—“Dancing Naked in the Mind Field.” Second, there was a great deal of money and prestige at stake. The patent, if valid, was extremely valuable. Finally, there was—and this is my two cents—the algorithm issue. Mullis invented no new chemistry, he discovered no new biological process, he found nothing new in the usual sense. What he did that was worthy of a Nobel was “just” put together known facts in a new way.
To me this is like arguing that the famous RSA algorithm is not new: we all knew about modular arithmetic, about primes, about Fermat’s Little Theorem, and so on. So how could Ron Rivest, Adi Shamir, and Len Adleman be said to have invented something? They did. As theorists, we appreciate the huge breakthrough they made—but the biologists not being oriented toward algorithms might have had trouble with PCR.
Let’s now turn to the Polymerase Chain Reaction (PCR) algorithm and the open question about it that I wish to raise. If you want you can jump to the next to the last section where I discuss the open question, but I worked hard on the next sections—the DNA diagrams were tricky—so at least look at them.
The cool thing about PCR is you do not need to know a lot about the deep structure of DNA in order to understand the method. There are a couple of key facts that you do, however, need to know. A single strand of DNA can, for us, be viewed as a string of four letters from the alphabet , where each letter stands for one of four nucleic acids. Thus,
is an example of a single strand of DNA. The bonds denoted by the symbol are strong bonds. In is important that the bonds are directional.
Two single pieces of DNA can form a double strand of DNA, which consists of two such sequences placed together: one running left to right, the other right to left, and where the letters meet they follow the two famous Watson-Crick rules:
The symbol denotes a weak bond. See these figures for some physical intuition why they pair the way they do:
Thus, the following is an example of a double strand of DNA:
Note, the top strand goes in the opposite direction of the bottom strand.
Mullis used a few simple operations that can be done to DNA. One is that DNA can be heated. When a double strand of DNA is heated properly (not too hot, and not too fast) it breaks nicely into the top and bottom strands. Thus, the above double strand would break into:
A key point is that
are the same.
Another simple operation is that DNA can be cooled. In this case single strands will try to find their Watson-Crick mates and bond. Note, the above two strings could reform again into,
The key is could. In order for this to happen they must find each other. If they were alone in a test tube, it would be very unlikely to happen. But, if there are many copies of each type, then many of the above double strand would form.
The last operation that Mullis uses is the most interesting, and it relies on the enzyme polymerase. Consider the following piece of DNA made of two single strands,
Note, that the bottom is incomplete. How do we get the remaining part of the bottom strand constructed? The answer is to add the enzyme polymerase and a sufficient supply of each nucleic acid . Then, polymerase moves along and adds the missing bases—shown in bold—one at a time. In this case first a , then an , and finally a .
An important point: polymerase only completes the bottom if it is already started. Thus, a single strand of DNA would not be changed at all. The top must have some of the bottom started already: actually the bottom part must be at least a few bases. One more key point to make: polymerase only adds in the direction of the DNA strand. Thus, in this case nothing will happen:
All these steps—heating, cooling, and polymerase—were known for decades before Mullis’s work; amazingly that is all he needs to do the PCR algorithm.
I would like to explain what I will call the basic method first, then I will explain the full method. Suppose that you start with a double strand of DNA of the form:
where , , and are chains formed of the nucleic acids . Also, and are long enough length for polymerase to work. Put the following all together in a test tube. The double strands of the above DNA that you wish to copy; lots of single copies of ; lots of polymerase. Then, add one more ingredient: lots of copies of the primers, which are single strands of DNA of the form and .
Then, all you do is heat, cool, heat, cool, for some number of cycles. The magic is that this is enough to make an exponential number of copies of the original DNA. To see this consider what will happen after the first heat and cool cycle. The above will become:
This follows since during heating the double strand breaks into the top and bottom strands; then during cooling the most likely event is for the abundant primers to attach to the top and the bottom. It is very unlikely for the double strands to re-attach together—since there are so many primers. Next, polymerase does its magic and after a while the above becomes:
Thus, we have doubled the number of strands of the DNA—we have an amplifier.
The full PCR method can copy a selected part of a long strand of DNA. This is really the method that Mullis invented. The ability to copy only part of a much longer strand is of huge importance to biologists. Curiously, it is exactly the same method as the basic one. The only difference is the analysis of its correctness is a bit more subtle: I find this interesting, since this happens all the time in the analysis of our algorithms.
Suppose that you start with the following double strand of DNA:
As before, put the following all together in a test tube. The double strands of the above DNA; lots of single copies of ; lots of polymerase. Then, add one more ingredient: lots of copies of the primers, which are single strands of DNA of the form and . Then, it is just a bit harder to see that the following becomes amplified:
Thus, only the DNA between the primers is exponentially increased in number. Other, partial pieces of DNA will grow in quantity, but only at a polynomial rate. This is the PCR algorithm.
Can PCR Amplify One Strand of DNA?
Can PCR copy one piece of DNA? I have asked this over the years of my biology friends and have been told that they believe that it can. However, usually PCR is used to amplify many strands of DNA into a huge amount of DNA. So the problem that I want to raise today is: can theory help determine the ability of PCR to amplify even one piece of DNA? The obvious lab protocol would be to prepare test tubes that have exactly one piece and see how the PCR method works. The trouble with this is simple: how do we make a test tube that is guaranteed to have exactly one piece of the required DNA? I believe that this is not completely trivial—especially for short strands of DNA.
The Amplifier Sensitivity Problem
Note, even if biologists can prepare test tubes with one strand, the following problem is interesting since it could have applications to other amplification methods. The problem is an ideal version of trying to decide if PCR can amplify one strand or not, without the ability to create test tubes with one strand.
I have thought about this issue some, and would like to present it as a pure theory problem. I will give a simple version, but you can probably imagine how we could make the problem more realistic.
Imagine that you have boxes—think test tubes. A box can have a number of balls—think DNA strands—inside: the number can range from to a very large number. You can hold a box, but no amount of shaking or any other trick will tell you anything about its contents. If you hold it up to the light, you see just the box.
However, you do have a detector, that given a box does the following: if the box has less than balls in it, then the detector says none; if the box has more than balls in it, the detector says yes there are balls. Once a box is given to the detector, the box is destroyed.
Your task is to determine the value of ; this value is always at least , but is otherwise, secret. So far there is not much you can do. I will let you have a number of operations that you can use in your quest to find the value of :
- You can ask for a new box. This will be filled with a varying number of balls, but the number of balls in the box will always be vastly larger than .
- You can ask that the contents of a box be randomly divided into boxes. This operation destroys the original box and returns to you boxes. The balls of the original are distributed in a uniformly random manner into the boxes. For example, you might wish to split the contents of a box into two new boxes. If the original contained balls, then the new ones will have and where and the division is random.
That is all you can do. The question is: can you devise a protocol that will get a good estimate on the secret value of ? A related question: suppose that I tell you that is or . Can you devise a protocol that determines which is true? Note, you also have access to randomness so the protocol is assumed to be random, and therefore the results need to only hold with high probability.
The main question today is to solve the amplifier sensitivity problem and find . Can we develop a protocol that can solve this? Also can we discover a protocol that tells from ?
Note, even if PCR has been shown in the lab to be able to amplify one piece of DNA, these protocols may be of use in the future with other types of amplifying mechanisms. As science moves down into the depths of nano scale I can certainly imagine other applications of a protocol that can determine how sensitive an amplifier is.
Please solve these problems. Or prove that they cannot be solved.