Programming Turing Machines is Hard
Two classic Turing Machine simulation theorems
Patrick Fischer is one of the founders of complexity theory. He played a critical role in starting the field, in proving some of the early results, and in making theory as vibrant an area as it is today. We owe him a great deal of thanks. The picture is of his wife, Charlotte Froese Fischer, who is a research professor of Computer Science—I could not find one of him.
Today I would like to talk about two central results of complexity theory. They are old, they are important, and they have not been improved for years. I believe that the time may be right to finally make progress on them.
One of the curious features of theory, especially results concerning Turing Machines (TM’s), is that their proofs often depend on the ability to program TM’s in clever ways. Programming a TM well is a skill that requires a programmer to understand how linear memory—Turing Tapes—can be manipulated. Linear memory is not as easy to use as random access memory, and this is why TM programmers get paid so well. Okay, you probably can make much more money programming iPhone apps, but being a master TM programmer is hard.
I recall one conference, not FOCS or STOC, where a speaker gave a talk on TM’s that did not seem correct to me, but I was not sure. Pat was also in the audience and said nothing during the talk. Immediately after the talk was over and a coffee break was in progress, Pat walked up to the speaker and said:
Son, what you said about Turing Machines is not right. I program them for a living, and you cannot do what you claim.
I have always liked the phrase “I program them for a living.” Pat has been doing that for almost half a century. His first paper that I can locate was On computability by certain classes of restricted Turing machines in the proceedings of the 1963 Fourth Annual Symposium on Switching Circuit Theory and Logical Design—the conference that became the conference that became FOCS.
Let’s turn to the two theorems.
The Two Theorems
The first is by Fred Hennie and Richard Stearns in 1966 and is titled “Two-Tape Simulation of Multitape Turing Machines.”
Theorem: Let be a multi-tape TM that runs in time . Then, there is a two-tape TM that accepts the same language as and runs in time .
For the next theorem we need to define what it means for a TM to be oblivious. A TM is called oblivious provided for all and for all inputs of length , the heads of the TM move in exactly the same way. That is the tape motions do not depend on the input —they only depend on its length.
The second is by Nicholas Pippenger and Michael Fischer in 1979 and is titled “Relations Among Complexity Measures.” They extend the Hennie-Stearns result to make the simulation oblivious. An aside: Michael and Pat are brothers. They probably are one of the first brothers that both worked in complexity theory.
Theorem: Let be a multi-tape TM that runs in time . Then, there is a two-tape TM that accepts the same language as and runs in time . Further, the machine is oblivious.
I have already discussed the second theorem here. The reason I wanted to raise the two theorems together is that they really show how programming TM’s is a skill that we are still learning how to do well.
The main application of the Hennie-Stearns theorem is to prove efficient deterministic time hierarchy theorems. The main application of the Pippenger-Fischer theorem is to prove efficient circuit simulations of languages computable by Turing Machines. Note, I also wish to fix an error that I made earlier: the order of the authors is “Pippenger-Fischer” not “Fischer-Pippenger.” I apologize to Nick and Mike.
The best known separation theorems for nondeterministic time are much stronger than those known for deterministic time, which are based on Hennie-Stearns. The best is still, I believe, due to Joel Seiferas, Michael Fischer, and Albert Meyer in Separating Nondeterministic Time Complexity Classes.
Theorem: Suppose that and are time-constructible functions such that , then
One key reason is the beautiful simulation theorem of Ron Book that I discussed before.
Proofs Strategies of the Theorems
I will give the flavor of how these theorems are proved—as usual see the papers for the actual details.
What is interesting is that while most—many?—know the theorems, I believe that most do not know the proofs. I did know the proofs at one time, but have long forgotten the details.
Somehow the results are important, but their proofs are no longer well known. I think that this is a shame for two reasons. First, they are beautiful non-trivial theorems. Second, if we forget how they are proved, then they will never get improved. This would be too bad, since an improvement to either of the theorems would have important consequences.
The main idea both use is what we call today amortized complexity. Both proofs use amortization, but do not call it that. I am not a historian, but I believe that the notion of amortized complexity is usually associated with the work of Danny Sleator and Bob Tarjan in their famous work on splay-trees. For example, they won the prestigious Paris Kanellakis Award in 1999 for this brilliant work.
Both theorems, Hennie-Stearns and Pippenger-Fischer, essentially have to use a kind of amortized simulation. The critical issue is this: Suppose that you want to simulate multiple Turing tapes. The obvious method is to store their contents on one tape, and use your other tape for “working” storage. This is exactly what they both do.
The problem is that one linear device does not seem well suited to store two linear sequences. Suppose that you store the sequences and as one sequence
Here is the location of the tape head, and and are the contents of the two tapes that are being simulated.
The simulation is easy. Just look at the two symbols, decide what to do next, and do it. No problem. Wrong. The difficulty is that what if the simulation needs to keep one tape head stationary and move the next one to the right. Then, the next state should be:
The trouble is that to get to this global state is very expensive: it requires the simulator to move symbols. Since can be huge, this would be a very inefficient simulation.
What to do? The details are complex, but the high level idea is to be conservative. Why move everything? The tape heads might go back to where they were previously at the very next step, or they might continue to diverge—move further and further apart. The clever idea is to think of the linear storage as a series of blocks that double in size. Then, as the two heads move the simulator only moves at first a block of size , then a block of size , and so on. Further, it is necessary to organize the tape so that there are “slack” or empty blocks. These allow content to be moved into the slack area from neighboring blocks. Also to get obliviousness, one can schedule in advance block moves that may turn out to be unnecessary—if they are found to be redundant when the time comes, the simulator writes dummy characters.
The insight is that there is no bound on the cost of any simulation step: some take constant time, and others take time where is the size of the block that they need to move. But, the total cost over the whole simulation is bounded by the “average” cost of each step times the number of steps, which is shown to be
Very neat. Again see the papers for the details.
The obvious open problems are to try and improve these classic theorems. I conjecture that it should be possible to improve them, but I have no clear idea how to proceed. I hope that you can find a trick and program TM’s better than I can.
A final comment. I wonder what would have happened if the proofs, not the results, of these early TM theorems had become well known. The proofs both used the key idea of amortization that today plays such an important role in the theory of algorithms.
A related question is: are there tricks in complexity theorems that could be used in non-complexity situations? More bluntly: is there some fundamental programming trick that is used in complexity theory toady that could be used in other parts of theory or even computer science in general? I wonder.