A better better simulation of nondeterministic Turing machines

Bart de Keijzer just sent me a great comment. A great one. I was working on a continuation of my last post on the simulation of nondeterministic Turing machines (NTM). I have a additional insight into the simulation technology that was sketched there. For example, I now can prove,

Theorem: For a two-tape binary alphabet NTM with running time ${t(n)}$, there is a deterministic simulation with running time,

$\displaystyle (4 + \delta)^{t(n)}$

for any ${\delta>0}$.

In the previous post, the constant was ${36}$. But while I was preparing that post I got a great comment from de Keijzer that proves a much better result that I missed:

Theorem: For a ${k}$-tape NTM with alphabet of size ${a}$ and running time ${t(n)}$, there is a deterministic simulation with running time,

$\displaystyle O(a^{kt(n)}t(n)^{k+3}).$

His Proof

Let ${M}$ be a ${k}$-tape NTM running in time ${t(n)}$ with tape alphabet of size ${a}$. A configuration is a description of the non-blank contents of the tapes, the position of the tape heads, and a state. The length of each tape can be assumed to be ${t(n)}$, therefore a configuration for ${M}$ can be encoded as a string of length ${k \cdot t(n)}$ of symbols from an alphabet of size ${a}$ and at most ${k \cdot \log t(n) +O(1)}$ additional bits. Thus, the total number of configurations is at most,

$\displaystyle N= O(a^{kt(n)} \cdot t(n)^{k}).$

Let ${t=t(n)}$. The deterministic machine ${M'}$ will construct a series of sets of configurations, ${C_{i}}$ for ${i=0,\dots,t}$. Inductively, ${C_{i}}$ is the set of all configurations that ${M}$ can be in after ${i}$ steps. Clearly, we only accept if some accepting configuration is in some ${C_{i}}$. Call the construction of ${C_{i}}$ the ${i^{th}}$ stage.

The set ${C_{0}}$ consists only of the initial configuration of ${M}$. We will show how to go from ${C_{i}}$ to ${C_{i+1}}$ efficiently, and that will complete his proof. The machine ${M'}$ can scan each configuration in ${C_{i}}$ and output all the configurations that can be reached in one step. Call this set ${D_{i+1}}$.

We cannot use ${D_{i+1}}$ as ${C_{i+1}}$ since some configurations may occur more than once. So the machine ${M'}$ sorts the set ${D_{i+1}}$ and removes any duplicate configurations to form ${C_{i+1}}$.

That is how ${M'}$ simulates ${M}$. The remaining issue is how much time do the stages take in total? Clearly, there are ${t}$ stages, so the question is what is the cost of each stage. The first contribution to the cost is the pass over each configuration in ${C_{i}}$ and outputting the next configurations to form ${D_{i+1}}$. This clearly takes at most

$\displaystyle O(t) \cdot | C_{i} |$

time since ${M'}$ can simulate a configuration for one step in linear time. The remaining is the time to sort to remove duplicates. This takes at most

$\displaystyle O(N \log N ) \cdot t.$

Note, the machine ${M'}$ has to sort ${O(N)}$ objects of size ${O(t)}$. Thus, the total time for a stage is

$\displaystyle O(a^{kt} \cdot t^{k}) \cdot O(t^2).$

Hence, the total time for the simulation is ${O(a^{kt} \cdot t^{k+3}).}$

Pretty neat.

Open Problems

The main question remains, can we prove the conjecture of the last post?

Conjecture: For any ${\epsilon>0}$,

$\displaystyle \mathsf{NTIME}(t(n)) \subseteq \mathsf{DTIME}(2^{\epsilon t(n)}).$

By the way this is, in my opinion, what this blog is all about. I am thrilled that some of the questions I have raised are getting solved. I think what Bart noticed, may seem simple now, but was missed by all of us for years. Great job Bart.

May 14, 2009 6:37 pm

Thanks a lot for featuring my result!
Let’s hope we will soon be able to prove the target conjecture.

May 14, 2009 7:10 pm

My pleasure. There is a difference that I will post on soon. Your idea uses exponential space and time. The trace based method only uses exponential time, since a trace is short. But I still like the result very much.

May 14, 2009 7:51 pm

I am missing some subtle detail in the description of this technique. It looks like a straightforward bread-first search of the configuration space to me, which is a standard way to argue that deterministic machines can simulate nondeterministic machines. Which detail am I missing that makes this different from a standard breadth-first search?

In particular, your goal (as stated in the former post) is to avoid the dependence of the time bound on the number of states of M. But in this construction, you state

“A configuration is a description of the non-blank contents of the tapes, the position of the tape heads, and a state.”

followed immediately by

“a configuration for M can be encoded as a string of length k * t(n) of symbols from an alphabet of size a and at most k * log t(n) +O(1) additional bits.”

Presumably, the portion of the string describing the state constitutes some of the “O(1) additional bits”. Surely, once you have fixed a machine M, it takes “O(1)” bits to describe a state. But this “constant” still depends on the state set of M even if it does not depend on the input to M.

I believe the second theorem you have stated, since the big-O hides the multiplicative constant (specifically, the constant is |Q| if the state set of M is Q) introduced by the state set of M. But I do not see how the second theorem implies the first, which has removed this constant entirely. I suppose one could use the Linear Speedup Theorem to argue that the constant disappears that way. But the machinery of the Linear Speedup Theorem is what is making the dependence on the state set of M disappear, not the simulation technique. Presumably your goal was to argue that a binary-alphabet, two-tape deterministic machine carries out the simulation of the first theorem, in which case you cannot apply the Linear Speedup Theorem, which increases the alphabet size.

May 14, 2009 9:26 pm

By using exponential space he runs in time only exponential in the number of configurations. That depends on the number of states, but the constant c in the main term c^kt does not depend on the number of states. That is the point.

Also usually people think of an NTM as defining a search tree. In that case it will be too large. Now with the sort step it is not a search tree but a dag. That allows the better bound.

I will have another post to go over all this in next day or so. Stay tuned for even better results–I hope.

May 14, 2009 10:31 pm

Correct me if I am missing something…

Assuming k = 2 :
– The way you defined the transition function allows the NTM to modify tape1 and tape2 in one step.
– When you are counting the bits required to encode the tapes, you used the bound 2.t(n), implicitly assuming that each bit written on the tapes takes its own separate step.

If we fix one of them, are we not saving the factor 2 in the exponent (at the expense of a multiplicative factor of t(n) in processing time for a stage) ? The extra multiplicative factor is to detect the end of tape1 and beginning of tape2.

May 15, 2009 6:56 am

Think of the method as a graph search. There are N vertices where N=4^t for two tapes. The path we are looking for is at most t long. Let D be the max degree, which is a function of the number of states of the machine.

There is an algorithm that checks for such a path on a random access machine in time O(DNt) and space O(N). For each vertex just mark if reached yet etc. This is what he does.

I am planning a last, for a while post, that will explain all and give a better bound based on the trace method.

May 15, 2009 4:17 am

Your conjecture can be proved for simulation of
single-tape nondeterministic machines
by deterministic multitape machines.
for the number of tapes k=1,
not giving an answer for the general case
(but better than nothing).
More precisely:

Theorem: Suppose that is a single-tape nondeterministic
Turing machine M that runs (weakly) in time T(n).
Then, there is a deterministic two-tape machine that simulates
M in (strong) time at most
2^{\epsilon * T(n)}.
This holds for each T(n) the value of which can be computed
in time O(T(n)) by some deterministic two-tape machine
(it is time constructible),
moreover, \lim_{n\rightarrow\infy} n/T(n) = 0
(it grows faster than n).

*****
The argument is based on the following:

Theorem 3.2 in
[Viliam Geffert, A speed-up theorem without tape
compression, Theoretical Computer Science 118 (1993) 49-65]:

“For each single-tape nondeterministic Turing machine M
using an H-letter worktape alphabet (H>2),
weakly time-bounded by T(n),
satisfying \lim_{n\rightarrow\infy} n/T(n) = 0,
and weakly space-bounded by S(n),
and each \epsilon>0,
there exists an equivalent single-tape machine M’
using an (H+l)-letter worktape alphabet,
weakly time-bounded by T’(n) < \epsilon * T(n)
and weakly space-bounded by S’(n) < S(n).”

*****
To clarify what “weak” and “strong” means here:

Definition:
(a) A nondeterministic Turing machine is said to be
strongly T(n)-time bounded, if no computation path on any input
of length n executes more than r(n) steps.
(b) It is weakly T(n)-time-bounded if, for each accepted input of
length n, there exists at least one accepting computation path which
executes at most T(n) steps.

*****
original worktape alphabet, we can achieve an arbitrary speed-up,
T’(n) < \epsilon * T(n), for any \epsilon.
The only restriction is that T(n) must grow faster than n.

Now, by applying your technique to the machine M’ from Theorem 3.2,
you can easily get the desired result.

May 15, 2009 7:23 am

Oh, I see, that’s really interesting! Great
Perhaps we can try to generalize it

But I think it will be hard.
Yeah, it requires a generalization of theorem 3.2, but also an even faster simulation method, I guess..

May 15, 2009 7:55 am

Very cool..rjlipton

May 15, 2009 7:00 am

I don’t entirely see your point I think. You write:

> – When you are counting the bits required to encode the tapes, you used the
> bound 2.t(n), implicitly assuming that each bit written on the tapes takes its own
> separate step.

I have the feeling that we are actually assuming that tape1 and tape2 can be modified in one single step. It’s simply that we need to describe the contents of both tapes.. could you please elaborate?

May 15, 2009 9:37 am

I meant :
– in time t(n) the combined length of the tapes is at most t(n) not 2.t(n)

May 16, 2009 12:50 pm

I’m sorry, no offense meant to Bart, but I really don’t think this argument is new. In my complexity theory course (we used the textbook “Computability and Complexity Theory”, by Homer and Selman), I was taught the following simulation argument:

Let M = (Q, Sigma, Gamma, delta, s, q_a, q_r) be a k-tape NTM with running time t, where a = |Gamma| is the size of the tape alphabet. Simulate M(x) with a deterministic algorithm as follows. Let t = t(|x|). Conduct a breadth-first search (BFS) of the configuration space of M to t levels beginning at the initial configuration of M on input x, maintaining a set of nodes (i.e., configurations) already visited so that no node is visited twice, as is standard for any BFS or DFS implementation. The set of nodes discovered at the i’th level would roughly (see below) correspond to what Bart calls C_i.

There are at most a^(k*t) * |Q| * t^k configurations possible to reach on input x; call this quantity N. One can do a BFS in O(N log N) time, using, for instance, using a red-black tree to store a list of already-visited nodes. Then the running time of the algorithm is O(a^(k*t) * |Q| * t^k * log (a^(k*t) * |Q| * t^k)) = O(a^(k*t) * |Q| * t^k * (k * t * log a + log |Q| + k * log t)). If we call |Q|, a, and k “constant with respect to t”, then we can swallow them into the big-O and the above bound is O(a^(k*t) * t^(k+1)). We can then use the Linear Speedup Theorem to eliminate the big-O constants and say, for instance, that a binary 2-tape NTM (a=k=2) can be simulated in time 4^t * t^3, or (4+delta)^t for any delta > 0, and then |Q| does not even appear as a multiplicative constant. But without the Linear Speedup Theorem – if you require that the machine doing the simulating has the same tape alphabet and number of tapes – then I think you are stuck with |Q| affecting the running time, if only as a multiplicative constant. (Not to mention that the BFS probably will take longer, like O(N^2) time since we don’t have random access memory to make the red-black tree fast.)

The point is that at no time in the argument does |Q| ever appear in either the mantissa or the exponent of the exponential term a^(k*t). Maybe this point about the mantissa depending only on Gamma is not typically emphasized in an introductory complexity course. To make the analysis simpler, one might (see the proof of Theorem 5.11 in Homer and Selman’s textbook, for instance) state, “given k, a, and |Q|, there is a constant d such that for all t, d^t > a^(k*t) * |Q| * t^k”, and then you have a mantissa d that *does* depend on |Q|. But this is done to simplify the algebra for analysis, since the typical goal is simply to show an exponential-time simulation, and not to demonstrate that the exponential term can be made not to depend on |Q|. Homer and Selman in their Theorem 5.11 actually describe their algorithm as a quadratic-time depth-first search, but a similar analysis as above applies to their algorithm. The point I want to emphasize is that Homer and Selman definitely take into account the idea of not revisiting nodes, which is the key in their argument, the argument above, and Bart’s argument, in making the exponential term not depend on Q.

As far as I can tell, Bart’s algorithm is a variant of the BFS algorithm that is somewhat less efficient than a garden-variety BFS, since Bart eliminates duplicates only within a level, but not between different levels. But unless there is something that I am missing (and please, someone point it out if you see it), not only was the argument already known, but it *is* the standard argument for showing that NTMs can simulate TMs (such as the proof of Theorem 5.11 in Homer and Selman), with no new simulation techniques required, but simply a more careful running time analysis of the existing technique.

May 16, 2009 2:34 pm

Of course I would not like it if my idea has already been dicovered 😦
Unfortunately I do not have access to Homer & Selman’s book right now, but I will look it up next monday at the library.

To my knowledge, the idea of removing duplicate states among levels in BFS has not been considered. Saving configurations has only been used in DFS in cases when the runtime of the NTM is not known. But anyway, I’ll look into it.

May 16, 2009 3:10 pm

Hey, I’ve made a rather busy private career so far out of proving things I thought were new and discovering that they aren’t, so don’t feel bad. 🙂

Also, I could be totally wrong. Here is a photo of the proof in Homer and Selman so you can check for yourself: http://dl.getdropbox.com/u/131739/homer-selman.JPG

Like I said, they get sloppy with the time analysis at the end of the first paragraph since they aren’t explicitly trying to keep |Q| out of the exponential term. But I’m pretty sure if they would carry through the expression on the left hand side of the inequality then they’d get a time bound only quadratically worse than yours, with the number of states (denoted s in their proof) still outside of the exponential term.