Using the weak trace method to get better simulations of nondeterministic Turing machines Rao Kosaraju is a world famous computer scientist who has worked on many aspects of the theory of computation. He has done seminal work on algorithms, on data structures, on geometry, on applications of theory to diverse areas, and on complexity theory. For example, he worked on, perhaps, the first “correct” proof of the Vector Addition Reachability Problem–see this for more details.

This is a continuation of my last two posts on the simulation of nondeterministic Turing machines (NTM). The last couple of days have been busy trying to get my head around what is going on with the major question: How fast can we simulate a non-deterministic Turing Machine (NTM)?

As I thought about Turing machines, I could not stop thinking about one of Rao’s beautiful results: his algorithm for finding the strong connected components (SCC) of a directed graph. I thought of this for two reasons: first, his major trick is a “reversal” of the edges of the graph, and second there is a time-reversal aspect to the history of his algorithm. See this for a nice exposition of how Rao’s algorithm works. Reversals play an important role, as you will see, in understanding how to simulate NTM’s.

As I understand, Rao discovered his algorithm in 1978, while the first algorithm that solved SCC in linear time was published in 1972 by Bob Tarjan. Our friends at Wikipedia get the time sequence a bit messed up: sometimes they point out that Bob’s algorithm is a variation of Rao’s. That is a bit strange, to call an earlier algorithm a variation of a later one. Bob was first. Rao was second. Rao’s algorithm has some advantages over Bob’s; for example, it is somewhat simpler than Bob’s. But there is no doubt that Tarjan’s algorithm came first.

The story of how Rao discovered his algorithm may help explain the confusion of who did what when. Is that clear: ”who did what when”–sounds a bit like a comment made at a Senate hearing.

Rao told me that one day he was eating lunch at the Johns Hopkins’ faculty club, which is a quite pleasant place to eat lunch, when he realized that he had forgotten his notes for his algorithms class. Right after lunch he had planned to present Tarjan’s SCC algorithm to this class. Rao was in trouble, since he could not remember Bob’s linear time algorithm. Rao’s office was too far away: by the time he got back to his office and got his notes, he would be very late to class. This was way before the Internet, so there was no way to lookup anything, which meant there was no way to get to his notes. No way to find the details of Tarjan’s algorithm.

What to do? Rao said he had a few minutes, so he sat down and worked out Bob’s linear time algorithm for SCC, “from memory”. He was relieved that he seemed to “recall” all the details, and just made it to his class on-time. There, he presented “Bob’s” linear time algorithm to his students, which went well. After class he went back to his office and finally looked at his notes. He was shocked to see that what he presented in class was an algorithm that was fundamentally different than Tarjan’s: Rao had “remembered” a new algorithm for SCC. Pretty neat. Perhaps, from time to time, we should all forget our notes before teaching?

I would like to say that is how I found the better simulation I discussed in my last post, or how I found the simple theorem I will discuss shortly. But, that would be incorrect. I also think Rao’s story shows something about his tremendous abilities that few have.

Summary of Results

Here is a summary of what is going on: Suppose that the NTM ${M}$ has ${Q}$ states, ${k}$ tapes, and ${a}$ distinct non-blank symbols for the tapes. Let the time the machine takes be ${t = t(n)}$. In the following I will summarize what I know about the simulation of ${M}$ by a deterministic Turing machine.

I will not be concerned with terms of the form ${t^{O(1)}}$ where ${O(1)}$ is absolute, since the dominant term is usually exponential in ${t}$. The exact powers of ${t}$ will be worked out in a paper.

Some of the results require a bit of care since the simulating machine is a deterministic Turing machine and not a random access machine. See Table 1.  ${\bullet }$ Classic Method Based on Tree Search: This uses time ${c^{t}}$ and polynomial in ${t}$ space. The value of ${c}$ is $(3Q)^k$. The idea here is to view the computation as a tree. Each node is a Turing machine configuration. The branching factor is based on the number of ways the finite state control can change and the possible motions of the tape heads. ${\bullet }$ Method Based on DAG search: This uses time and space ${Qc^{t}}$. The value of ${c}$ is ${a^{k}}$. The idea here is to view the computation as a directed acyclic graph (DAG). Each node is a Turing machine configuration. Since it is a DAG, the number vertices is roughly the number of possible tape configurations. This places ${Q}$ outside of the exponential term. See the post on Bart’s idea for more details. ${\bullet }$ Our Method Based on Weak Traces: This uses time ${Qc^{t}}$ and space that is polynomial in ${t}$. The value of ${c}$ is ${(3a)^{k}}$.

Note, if space must be polynomial in ${t}$ the weak trace method is best. If only time matters, then the DAG method is best. However, we have a new method that dominates all the above provided one is only interested in the exponential base of ${t}$. ${\bullet }$ A New Method Based on Weak Traces: This uses time and space ${d^{Q}c^{t}}$. The value of ${c}$ is ${(a+\epsilon)^{k/2}}$ where ${\epsilon > 0}$ is arbitrary and the value of ${d}$ depends on ${\epsilon}$.

The key to the last result is that we get ${a^{k/2}}$ as the base on the exponential term. This is a step toward the conjecture. If we could push this a bit further than I believe that we could solve the conjecture.

The high level view of the last result is this. First, we prove that we can reduce the number of reversals in a NTM without making any change to the running time. This yields a weak trace method that removes the factor of ${3}$ due to tape motion. The second idea is based on a simple observation: in a weak trace there is no reason to record the symbol left on a tape location that will not ever be re-visited. This symbol can be dropped. The way we exploit this is to argue that either (i) dropping these redundant bits yields a big improvement, or (ii) we can use a different strategy based on the DAG method to simulate the NTM.

This will all be discussed in detail in a future post–not the next one–enough on NTM for a few days. Finally, I am working with Subrahmanyam Kalyanasundaram and Farbod Shokrieh on a detailed paper on the new bound.

Reducing Reversals

A key lemma is the following reversal reduction lemma:

Lemma: Suppose that ${M}$ is a multi-tape NTM that runs in time ${t(n)}$. Then, for any ${\epsilon>0}$, there is another multi-tape NTM that simulates ${M}$ and runs in time $\displaystyle t(n) + O(1/\epsilon)$

and makes at most ${\epsilon t(n) + O(1)}$ reversals. Further, both machines have the same number of tapes and tape alphabet.

The point of this lemma is that the time of the simulation is essentially the same, yet the number of reversals is reduced by an arbitrary factor.

Open Problems

Thus, we have proved, for the case of two tapes and binary alphabet,

Theorem: For a two-tape binary alphabet NTM with running time ${t(n)}$, there is a deterministic simulation with running time, $\displaystyle (2 + \delta)^{t(n)}$

for any ${\delta>0}$.

Can we decrease the constant below ${2}$? I believe that this should be possible.

A last open problem for today: what happens with the simulation of a bounded error probabilistic Turing machine? Can we get the same type of efficient deterministic simulations?

1. May 16, 2009 7:10 pm

Kosarju’s story is great! Reminds me of the joke that floated around when Laci Lovasz taught a course on graph representation theory at University of Washington. Sometimes you could see that Laci was solving things on the fly and someone suggested we steal his notes and put in an open problem there. Then one could just copy off the solution from the board 🙂

2. May 17, 2009 5:59 pm

Can we decrease the constant below 2?

You will need non-relativizing techniques. See Theorem 0.7 here: http://dl.getdropbox.com/u/131739/lipton-conjecture.pdf

• May 18, 2009 7:01 am

I will read this carefully. But sounds like a great insight. The methods we are using to get the new bounds use special combinatorial properties of Turing machines that should be okay.

I did wonder about oracles, great that you proved it. I will be happy to highlight in future post.

• May 18, 2009 7:20 am

Thank you!

3. May 17, 2009 5:59 pm

Thanks for this post. I must say, I am curious about the details of this new approach.

4. May 18, 2009 8:25 am

Re. simulations of probabilistic Turing machines, Dieter van Melkebeek and I proved that for each language L in probabilistic time t on a multi-tape TM, there is a constant c < 2 such that L can be decided in deterministic time c^t. Our proof technique is an adaptation of the Nisan-Zuckerman idea of recycling random bits, and does not relativize (since it makes use of the tape structure of multi-tape TMs).

• May 18, 2009 12:13 pm

I apologize, I missed referencing this. Will fix in the future. It is very neat work.

• May 18, 2009 5:02 pm

I don’t understand why your argument about the tape structure of the TM does not relativize. You use the property that given a configuration C of a TM, if the TM has k tapes and alphabet B, then letting d=2*(k+1)*log(B), only d*s bits of C can affect the next s steps of the TM, due to the fact that the tape heads cannot move far enough to read more bits of the configuration in s steps. As you state in the paper, this statement is not true for configurations of RAM machines, but it it is true for oracle TMs with an arbitrary oracle. In fact, I can’t tell any part of your proof that does not relativize, except possibly the error-reduction technique you invoke at the very end (due to Karp, Pippenger, Sipser). I don’t understand that technique so I don’t know.

Here is a (seemingly) related question for anyone reading. It is not true that relative to every oracle, the palindrome language takes time Omega(n^2) to decide on a 1-tape machine (for instance, if the palindrome language is the oracle then it takes time n to decide). But is it true that for every oracle A, there exists a language decidable in time O(n) relative to A on a 2-tape machine but not decidable in time less than Omega(n^2) relative to A on a 1-tape machine? I tentatively conjecture yes, viewing this as akin to the idea that for some oracles A, CLIQUE is not NP^A-complete, but relative to every oracle A, there exists an NP^A-complete language (it just may not be CLIQUE).

• May 21, 2009 12:36 pm

Dave: If the contents of the oracle tape are taken to be part of the configuration, then the entire contents of the oracle tape are relevant to what happens when the oracle call is actually made (note that the oracle call conventionally takes just one time step to answer). So I don’t understand what you mean when you say that the statement is true for oracle TMs with an arbitrary oracle.

• May 21, 2009 9:53 pm

The contents of the oracle tape are part of the configuration in the sense that they are needed to compute a next configuration given a current one. But those contents are the same throughout the computation no matter what configuration it is (which is not true of the other tapes that comprise the configuration), and I believe this is relevant. More precisely, the following statement formalizes (and relativizes to an arbitrary oracle) your statement in the proof that “there can be at most 2^(ds) such functions”.

Let A be an arbitrary oracle. Let M with state set Q be a k-tape OTM with a binary tape alphabet. Let s be a natural number and let b = k*(2s+1)*|Q|. There is a set F (depending on M and A) of boolean functions of s inputs, where |F| <= 2^b, such that the following holds. Let C be a configuration of M. F contains the function f that maps the final s random coin flips to the answer M^A outputs s steps after configuration C. The existence of F follows from the fact that the current state and the 2s+1 bits surrounding the tape head on each of the k tapes, uniquely identify which function f maps the next s random bits to the answer, even if they do not suffice to compute it (knowledge of A is necessary for that).

F depends on both A and M. Changing A could change which functions F contains, but it cannot increase the size of F beyond 2^b. If I read your proof correctly, it depends only on the fact that the number of functions in F is "small" in some sense, not on which functions it contains. This is why I think this portion of your proof, and in particular inequality (6) that employs it, remain true relative to an arbitrary oracle.

5. 6. 