# Programming Turing Machines is Hard

* Two classic Turing Machine simulation theorems *

Patrick Fischer is one of the founders of complexity theory. He played a critical role in starting the field, in proving some of the early results, and in making theory as vibrant an area as it is today. We owe him a great deal of thanks. The picture is of his wife, Charlotte Froese Fischer, who is a research professor of Computer Science—I could not find one of him.

Today I would like to talk about two central results of complexity theory. They are old, they are important, and they have not been improved for years. I believe that the time may be right to finally make progress on them.

One of the curious features of theory, especially results concerning Turing Machines (TM’s), is that their proofs often depend on the ability to program TM’s in clever ways. Programming a TM well is a skill that requires a programmer to understand how linear memory—Turing Tapes—can be manipulated. Linear memory is not as easy to use as random access memory, and this is why TM programmers get paid so well. Okay, you probably can make much more money programming iPhone apps, but being a master TM programmer is hard.

I recall one conference, not FOCS or STOC, where a speaker gave a talk on TM’s that did not seem correct to me, but I was not sure. Pat was also in the audience and said nothing during the talk. Immediately after the talk was over and a coffee break was in progress, Pat walked up to the speaker and said:

Son, what you said about Turing Machines is not right. I program them for a living, and you cannot do what you claim.

I have always liked the phrase “I program them for a living.” Pat has been doing that for almost half a century. His first paper that I can locate was On computability by certain classes of restricted Turing machines in the proceedings of the 1963 Fourth Annual Symposium on Switching Circuit Theory and Logical Design—the conference that became the conference that became FOCS.

Let’s turn to the two theorems.

** The Two Theorems **

The first is by Fred Hennie and Richard Stearns in 1966 and is titled “Two-Tape Simulation of Multitape Turing Machines.”

Theorem:Let be a multi-tape TM that runs in time . Then, there is a two-tape TM that accepts the same language as and runs in time .

For the next theorem we need to define what it means for a TM to be *oblivious*. A TM is called oblivious provided for all and for all inputs of length , the heads of the TM move in exactly the same way. That is the tape motions do not depend on the input —they only depend on its length.

The second is by Nicholas Pippenger and Michael Fischer in 1979 and is titled “Relations Among Complexity Measures.” They extend the Hennie-Stearns result to make the simulation oblivious. An aside: Michael and Pat are brothers. They probably are one of the first brothers that both worked in complexity theory.

Theorem:Let be a multi-tape TM that runs in time . Then, there is a two-tape TM that accepts the same language as and runs in time . Further, the machine is oblivious.

** Applications **

I have already discussed the second theorem here. The reason I wanted to raise the two theorems together is that they really show how programming TM’s is a skill that we are still learning how to do well.

The main application of the Hennie-Stearns theorem is to prove efficient deterministic time hierarchy theorems. The main application of the Pippenger-Fischer theorem is to prove efficient circuit simulations of languages computable by Turing Machines. Note, I also wish to fix an error that I made earlier: the order of the authors is “Pippenger-Fischer” not “Fischer-Pippenger.” I apologize to Nick and Mike.

The best known separation theorems for nondeterministic time are much stronger than those known for deterministic time, which are based on Hennie-Stearns. The best is still, I believe, due to Joel Seiferas, Michael Fischer, and Albert Meyer in Separating Nondeterministic Time Complexity Classes.

Theorem:Suppose that and are time-constructible functions such that , then

One key reason is the beautiful simulation theorem of Ron Book that I discussed before.

** Proofs Strategies of the Theorems **

I will give the flavor of how these theorems are proved—as usual see the papers for the actual details.

What is interesting is that while most—many?—know the theorems, I believe that most do not know the proofs. I did know the proofs at one time, but have long forgotten the details.

Somehow the results are important, but their proofs are no longer well known. I think that this is a shame for two reasons. First, they are beautiful non-trivial theorems. Second, if we forget how they are proved, then they will never get improved. This would be too bad, since an improvement to either of the theorems would have important consequences.

The main idea both use is what we call today *amortized complexity*. Both proofs use amortization, but do not call it that. I am not a historian, but I believe that the notion of amortized complexity is usually associated with the work of Danny Sleator and Bob Tarjan in their famous work on splay-trees. For example, they won the prestigious Paris Kanellakis Award in 1999 for this brilliant work.

Both theorems, Hennie-Stearns and Pippenger-Fischer, essentially have to use a kind of amortized simulation. The critical issue is this: Suppose that you want to simulate multiple Turing tapes. The obvious method is to store their contents on one tape, and use your other tape for “working” storage. This is exactly what they both do.

The problem is that one linear device does not seem well suited to store two linear sequences. Suppose that you store the sequences and as one sequence

Here is the location of the tape head, and and are the contents of the two tapes that are being simulated.

The simulation is easy. Just look at the two symbols, decide what to do next, and do it. No problem. Wrong. The difficulty is that what if the simulation needs to keep one tape head stationary and move the next one to the right. Then, the next state should be:

The trouble is that to get to this global state is very expensive: it requires the simulator to move symbols. Since can be huge, this would be a very inefficient simulation.

What to do? The details are complex, but the high level idea is to be conservative. Why move everything? The tape heads might go back to where they were previously at the very next step, or they might continue to diverge—move further and further apart. The clever idea is to think of the linear storage as a series of blocks that double in size. Then, as the two heads move the simulator only moves at first a block of size , then a block of size , and so on. Further, it is necessary to organize the tape so that there are “slack” or empty blocks. These allow content to be moved into the slack area from neighboring blocks. Also to get obliviousness, one can schedule in advance block moves that may turn out to be unnecessary—if they are found to be redundant when the time comes, the simulator writes dummy characters.

The insight is that there is *no* bound on the cost of any simulation step: some take constant time, and others take time where is the size of the block that they need to move. But, the total cost over the whole simulation is bounded by the “average” cost of each step times the number of steps, which is shown to be

Very neat. Again see the papers for the details.

** Open Problems **

The obvious open problems are to try and improve these classic theorems. I conjecture that it should be possible to improve them, but I have no clear idea how to proceed. I hope that you can find a trick and program TM’s better than I can.

A final comment. I wonder what would have happened if the proofs, not the results, of these early TM theorems had become well known. The proofs both used the key idea of amortization that today plays such an important role in the theory of algorithms.

A related question is: are there tricks in complexity theorems that could be used in non-complexity situations? More bluntly: is there some fundamental programming trick that is used in complexity theory toady that could be used in other parts of theory or even computer science in general? I wonder.

The Pippenger-Fischer proof and its timing can be approached by considering a kind of sequence that I believe was studied by Axel Thue, before Turing was even born. Define:

Jag(1) = 1,0,-1,0

Jag(2) = 1,2,3,2,1,0,-1,-2,-3,-2,-1,0

Jag(3) = 1,2,3,4,5,6,7,6,5,4,3,2,1,0,-1,-2,-3,-4,-5,-6,-7,-6,-5,-4,-3,-2,-1,0

…and so on, with Jag(i) going out to +/-(2^i – 1).

Then the infinite sequence is:

Jag(1)Jag(2)Jag(1)Jag(3)Jag(1)Jag(2)Jag(1)Jag(4)Jag(1)Jag(2)…

The program of the 2-tape oblivious TM M’ can be written so that each “jag” simulates 1 step of the original k-tape TM M, with the jag describing the head motions on the first tape (which has 2k tracks, a data track and “slack track” for each tape of M). There are analogous motions on the second tape, and since the sequence is followed regardless of bit values (until the M’ sees that M has halted), M’ is oblivious. It remains to count the total sizes of the jags thru the t-th one, where we may assume t is a power of 2, t = 2^k, so the t-th jag is Jag(k+1). Then the contributions from all Jag(1), all Jag(2), all Jag(3).. are roughly equal, so the sum of steps is roughly 4t(k+1), which gives time O(t*log t) to simulate t steps of M.

Thue didn’t include the negative numbers, so I call the above the “two-way Thue sequence” :-). I’ve never written out the program of this M’ in full, and would love to know if someone has.

The circuits resulting from M’ embed the full binary tree, and so have exponential expansion. Whereas, the t^2-size circuits from the standard Savage space-time tableau simulation can be laid out on a square mesh, and so have only quadratic expansion. One of the improvements I’d be interested to see is obtaining t^{1+\epsilon} size with reasonable polynomial expansion.

This is O(t log t) simulation is very clever. But it seems to me that what it really reveals is that the Turing machine is the wrong model of computation if you want to prove strong time hierarchy theorems. If you adopt a RAM model with a simple programming language, then a universal program can simulate other programs with only constant overhead. Jones, Sudborough and Zalcberg, and Ben-Amram have shown that in such a model—which is the one that students are used to working with in the real world—there are constant-factor time hierarchy theorems, and even (1+epsilon)-factor hierarchy theorems. Their results are not well-known in the theory community, but they should be.

So, is the Turing machine model of computation really the right one for teaching computational complexity? Many students get the wrong impression that the O(t log t) overhead is something fundamental about simulation and the hierarchy theorem, but it’s really just an artificial limitation of the constant-tape TM.

The TM is certainly important historically. But in this case it obscures the central aspect of the hierarchy theorem—namely, simulation and bounded diagonalization—rather than making it clear.

Hi, Cris—

I actually devised a variant of the TM model that has constant-factor time overhead for getting k tapes down to

one. It has random-access (actually “tree-access”) to the input tape, for four “fingers” labeled a,b,c,d. Its computations consist of a series of finite transductions that read from [a,...,b] and write output to [c,...,d]. The cost for each is |b – a|plusm(max{a,b,c,d}), where m(a) stands for the time to address memory location a. This and several other nice robustness theorems hold for functions m(a) = a^{1/d}, where I think of d as the implicit dimensionality of the memory. But the memory itself is linear, thus embodying the notion that “locality is one-dimensional”. My papers on this model are SIAM J. Comput. 25, 1996 and STACS’94; the latter uses the number of simple finite transductions as a parallel time measure to characterize the NC^k classes. I’d be curious to know whether this helps restore some of the “honour” of the Turing machine architecture in the face of your comment…My “4t(k+1)” should be 2t(k+1)—though programming details might force each jag to be repeated once, a-la how “Jag(1)” is repeated in the standard linear-speedup-theorem proof, thus bringing back the “4”.

I really liked this post. It’s been a long time since I’ve actually seen a TM argument — are they still used in class? I was at the conference presentation where Pat made his famous “I program these for a living” comment. My recollection is that it was one of those Johns Hopkins conferences although we were also all together with him at a Computer Science Conference — early, when it was all abstracts and 15 minute presentations — session that same year where he took off on someone who was messing up a multi-tape TM proof.

Dick,

The proofs of these claims are not too lost. The recently published Arora-Barak complexity textbook proves the Hennie-Stearns theorem you cite (although, it does so in an appendix). It leaves the Pippenger-Fischer result as an exercise. Other books might mention these results, but I am not sure.

-Michael

Thanks. I do wish we could improve these results. I really do not think they are best. But I am stuck.

Homer/Selman goes through this in blow-by-blow detail. Many students here have seen these proofs.

s.

Arora-Barak covers the first theorem in detail in Chapter 1, not in an appendix.

Programming

non-deterministic Turing machines is that much harder…In this line, you may be interested in a recent result showing that any language accepted in NTIME(f(n)) can be accepted by a non-deterministic machine running in time f(n) and space \sqrt(f(n)). See

Hello.

What if, in a general theory of everything kind of way, some singular human conscious had used simple Yes(I should feel guilty) or No(I do not feel guilty) answers to answer every moral question that he could remember that had ever occurred in his life. In this way he could have become completely pure. He would have truly asked himself all the questions that could be asked of him. He would have emotionally evolved back to when he was a child.

What if he then assigned an ‘It doesn’t matter as life is only made to make mistakes’ label on every decision that he had analysed.

This would not make him God or the devil, but just very still and very exhausted. Anybody can do this but just for the purpose of this experiment lets say I have done this. Which I have.

There are no fears in me and if I died today I could deal with that because who can know me better than myself? Neither God or the Devil. I am consciously judging myself on ‘their’ level.

To make this work, despite my many faults, take ME as the ONLY universal constant. In this sense I have killed God and the Devil external to myself.The only place that they could exist is if I chose to believe they are internal.

This is obviously more a matter for a shrink more than a mathematician, but that would only be the case depending on what you believed the base operating system of the universe to be. Math / Physics / morals or some other concept.

As long I agreed to do NOTHING, to never move or think a thought, humanity would have something to peg all understanding on. Each person could send a moral choice and I would simply send back only one philosophy. ‘ LIFE IS ONLY FOR MAKING MISTAKES’.

People, for the purpose of this experiment could disbelief their belief system knowing they could go back to it at any time. It would give them an opportunity to unburden themselves to someone pure. A new Pandora’s box. Once everyone had finished I would simply format my drive and always leave adequate space for each individual to send any more thoughts that they thought were impure. People would then eventually end up with clear minds and have to be judged only on their physical efforts. Either good or bad. It would get rid of a lot of maybes which would speed lives along..

If we then assume that there are a finite(or at some point in the future, given a lot of hard work, there will be a finite amount) amount of topics that can be conceived of then we can realise that there will come to a time when we, as a people, will NOT have to work again.

Once we reach that point we will only have the option of doing the things we love or doing the things we hate as society will be completely catered for in every possible scenario. People will find their true path in life which will make them infinitely more happy, forever.

In this system there is no point in accounting for time in any context.

If we consider this to be The Grand Unified Theory then we can set the parameters as we please.

This will be our common goals for humanity. As before everyone would have their say. This could be a computer database that was completely updated in any given moment when a unique individual thought of a new idea / direction that may or may not have been thought before.

All that would be required is that every person on the planet have a mobile phone or access to the web and a self organising weighting algorithm biased on an initial set of parameters that someone has set to push the operation in a random direction.

As I’m speaking first I say we concentrate on GRAINE.

Genetics – Robotics – Artificial Intelligence – Nanotechnology and Zero Point Energy.

I have chosen these as I think the subjects will cross breed information(word of mouth first) at the present day optimum rate to get us to our finishing point, complete and finished mastery of all possible information.

Surely mastery of information in this way will lead us to the bottom of a fractal??? What if one end of the universes fractal was me??? What could we start to build with that concept???

As parameters, we can assume that we live only in 3 dimensions. We can place this box around The Milky Way galaxy as this gives us plenty of scope for all kinds of discoveries.

In this new system we can make time obsolete as it only making us contemplate our things that cannot be solved because up to now, no one has been willing to stand around for long enough. It has been holding us back.

All watches should be banned so that we find a natural rhythm with each other, those in our immediate area and then further afield.

An old clock watch in this system is should only be used when you think of an event that you need to go to. It is a compass, a modern day direction of thought.

A digital clock can be adapted to let you know where you are in relation to this new milky way boxed system.(x,y,z).

With these two types of clocks used in combination we can safely start taking small steps around the box by using the digital to see exactly where you are and then using the old watch to direct you as your thoughts come to you.

We can even build as many assign atomic clocks as are needed to individually track stars. Each and every star in the Milky Way galaxy.

I supposed a good idea would be to assume that I was inside the centre of the super-massive black hole at the centre of the galaxy. That would stop us from being so Earth centric.

We could even go as far as to say that we are each an individual star and that we project all our energies onto the super-massive black hole at the centre of the galaxy.

You can assume that I have stabilized the black hole into a non rotating perfect cube. 6 outputs /f aces in which we all see ‘the universe and earth, friends’ etc. This acts like a block hole mirror finish. Once you look it is very hard to look away from.

The 6 face cube should make the maths easier to run as you could construct the inner of the black hole with solid beams of condensed light(1 unit long) arranged into right angled triangles with set squares in for strength.

Some people would naturally say that if the universe is essentially unknowable as the more things we ‘discover’ the more things there are to combine with and therefore talk about. This is extremely fractal in both directions. There can be no true answers because there is no grounding point. Nothing for the people who are interested, to start building there own personal concepts, theories and designs on.

Is it possible that with just one man becoming conscious of a highly improbable possibility that all of universes fractals might collapse into one wave function that would answer all of humanities questions in a day? Could this be possible?

Answers to why we are here? What the point of life really is et al?

Is it possible that the insignificant possibility could become an escalating one that would at some point reach 100%???

Could it be at that point that the math would figure itself out so naturally that we would barely need to think about it. We would instantly understand Quantum theory and all.

Can anybody run the numbers on that probability?

I don’t think science or math could ever answer philosophical questions. Also, why center coordinate systems on the black hole? The only significance of that point in space is its extraordinary mass gradient.

Also, how high were you when you wrote this?

Note that Wolfgang Paul reported in 1982 (Information and Control 53:1-8) that the 2-tape Hennie-Stearns simulation can be improved at best to time O(n(log n)^{1/3}). (See also Paturi et al., Information and Computation 88:88-104.)

Joel,

Thanks for this pointer…dick