Skip to content

Simulation of Nondeterministic Turing Machines-Again

May 14, 2009

A better better simulation of nondeterministic Turing machines


Bart de Keijzer just sent me a great comment. A great one. I was working on a continuation of my last post on the simulation of nondeterministic Turing machines (NTM). I have a additional insight into the simulation technology that was sketched there. For example, I now can prove,

Theorem: For a two-tape binary alphabet NTM with running time {t(n)}, there is a deterministic simulation with running time,

\displaystyle  (4 + \delta)^{t(n)}

for any {\delta>0}.

In the previous post, the constant was {36}. But while I was preparing that post I got a great comment from de Keijzer that proves a much better result that I missed:

Theorem: For a {k}-tape NTM with alphabet of size {a} and running time {t(n)}, there is a deterministic simulation with running time,

\displaystyle  O(a^{kt(n)}t(n)^{k+3}).

His Proof

Let {M} be a {k}-tape NTM running in time {t(n)} with tape alphabet of size {a}. A configuration is a description of the non-blank contents of the tapes, the position of the tape heads, and a state. The length of each tape can be assumed to be {t(n)}, therefore a configuration for {M} can be encoded as a string of length {k \cdot t(n)} of symbols from an alphabet of size {a} and at most {k \cdot \log t(n) +O(1)} additional bits. Thus, the total number of configurations is at most,

\displaystyle  N= O(a^{kt(n)} \cdot t(n)^{k}).

Let {t=t(n)}. The deterministic machine {M'} will construct a series of sets of configurations, {C_{i}} for {i=0,\dots,t}. Inductively, {C_{i}} is the set of all configurations that {M} can be in after {i} steps. Clearly, we only accept if some accepting configuration is in some {C_{i}}. Call the construction of {C_{i}} the {i^{th}} stage.

The set {C_{0}} consists only of the initial configuration of {M}. We will show how to go from {C_{i}} to {C_{i+1}} efficiently, and that will complete his proof. The machine {M'} can scan each configuration in {C_{i}} and output all the configurations that can be reached in one step. Call this set {D_{i+1}}.

We cannot use {D_{i+1}} as {C_{i+1}} since some configurations may occur more than once. So the machine {M'} sorts the set {D_{i+1}} and removes any duplicate configurations to form {C_{i+1}}.

That is how {M'} simulates {M}. The remaining issue is how much time do the stages take in total? Clearly, there are {t} stages, so the question is what is the cost of each stage. The first contribution to the cost is the pass over each configuration in {C_{i}} and outputting the next configurations to form {D_{i+1}}. This clearly takes at most

\displaystyle  O(t) \cdot | C_{i} |

time since {M'} can simulate a configuration for one step in linear time. The remaining is the time to sort to remove duplicates. This takes at most

\displaystyle  O(N \log N ) \cdot t.

Note, the machine {M'} has to sort {O(N)} objects of size {O(t)}. Thus, the total time for a stage is

\displaystyle  O(a^{kt} \cdot t^{k}) \cdot O(t^2).

Hence, the total time for the simulation is {O(a^{kt} \cdot t^{k+3}).}

Pretty neat.

Open Problems

The main question remains, can we prove the conjecture of the last post?

Conjecture: For any {\epsilon>0},

\displaystyle \mathsf{NTIME}(t(n)) \subseteq \mathsf{DTIME}(2^{\epsilon t(n)}).

By the way this is, in my opinion, what this blog is all about. I am thrilled that some of the questions I have raised are getting solved. I think what Bart noticed, may seem simple now, but was missed by all of us for years. Great job Bart.

18 Comments leave one →
  1. Bart de Keijzer permalink
    May 14, 2009 6:37 pm

    Thanks a lot for featuring my result!
    Let’s hope we will soon be able to prove the target conjecture.

    • rjlipton permalink*
      May 14, 2009 7:10 pm

      My pleasure. There is a difference that I will post on soon. Your idea uses exponential space and time. The trace based method only uses exponential time, since a trace is short. But I still like the result very much.

  2. Dave permalink
    May 14, 2009 7:51 pm

    I am missing some subtle detail in the description of this technique. It looks like a straightforward bread-first search of the configuration space to me, which is a standard way to argue that deterministic machines can simulate nondeterministic machines. Which detail am I missing that makes this different from a standard breadth-first search?

    In particular, your goal (as stated in the former post) is to avoid the dependence of the time bound on the number of states of M. But in this construction, you state

    “A configuration is a description of the non-blank contents of the tapes, the position of the tape heads, and a state.”

    followed immediately by

    “a configuration for M can be encoded as a string of length k * t(n) of symbols from an alphabet of size a and at most k * log t(n) +O(1) additional bits.”

    Presumably, the portion of the string describing the state constitutes some of the “O(1) additional bits”. Surely, once you have fixed a machine M, it takes “O(1)” bits to describe a state. But this “constant” still depends on the state set of M even if it does not depend on the input to M.

    I believe the second theorem you have stated, since the big-O hides the multiplicative constant (specifically, the constant is |Q| if the state set of M is Q) introduced by the state set of M. But I do not see how the second theorem implies the first, which has removed this constant entirely. I suppose one could use the Linear Speedup Theorem to argue that the constant disappears that way. But the machinery of the Linear Speedup Theorem is what is making the dependence on the state set of M disappear, not the simulation technique. Presumably your goal was to argue that a binary-alphabet, two-tape deterministic machine carries out the simulation of the first theorem, in which case you cannot apply the Linear Speedup Theorem, which increases the alphabet size.

    • rjlipton permalink*
      May 14, 2009 9:26 pm

      By using exponential space he runs in time only exponential in the number of configurations. That depends on the number of states, but the constant c in the main term c^kt does not depend on the number of states. That is the point.

      Also usually people think of an NTM as defining a search tree. In that case it will be too large. Now with the sort step it is not a search tree but a dag. That allows the better bound.

      I will have another post to go over all this in next day or so. Stay tuned for even better results–I hope.

  3. kintali permalink
    May 14, 2009 10:31 pm

    Correct me if I am missing something…

    Assuming k = 2 :
    – The way you defined the transition function allows the NTM to modify tape1 and tape2 in one step.
    – When you are counting the bits required to encode the tapes, you used the bound 2.t(n), implicitly assuming that each bit written on the tapes takes its own separate step.

    If we fix one of them, are we not saving the factor 2 in the exponent (at the expense of a multiplicative factor of t(n) in processing time for a stage) ? The extra multiplicative factor is to detect the end of tape1 and beginning of tape2.

    • rjlipton permalink*
      May 15, 2009 6:56 am

      Think of the method as a graph search. There are N vertices where N=4^t for two tapes. The path we are looking for is at most t long. Let D be the max degree, which is a function of the number of states of the machine.

      There is an algorithm that checks for such a path on a random access machine in time O(DNt) and space O(N). For each vertex just mark if reached yet etc. This is what he does.

      I am planning a last, for a while post, that will explain all and give a better bound based on the trace method.

  4. Viliam Geffert permalink
    May 15, 2009 4:17 am

    Your conjecture can be proved for simulation of
    single-tape nondeterministic machines
    by deterministic multitape machines.
    This is only a partial answer for your question,
    for the number of tapes k=1,
    not giving an answer for the general case
    (but better than nothing).
    More precisely:

    Theorem: Suppose that is a single-tape nondeterministic
    Turing machine M that runs (weakly) in time T(n).
    Then, there is a deterministic two-tape machine that simulates
    M in (strong) time at most
    2^{\epsilon * T(n)}.
    This holds for each T(n) the value of which can be computed
    in time O(T(n)) by some deterministic two-tape machine
    (it is time constructible),
    moreover, \lim_{n\rightarrow\infy} n/T(n) = 0
    (it grows faster than n).

    The argument is based on the following:

    Theorem 3.2 in
    [Viliam Geffert, A speed-up theorem without tape
    compression, Theoretical Computer Science 118 (1993) 49-65]:

    “For each single-tape nondeterministic Turing machine M
    using an H-letter worktape alphabet (H>2),
    weakly time-bounded by T(n),
    satisfying \lim_{n\rightarrow\infy} n/T(n) = 0,
    and weakly space-bounded by S(n),
    and each \epsilon>0,
    there exists an equivalent single-tape machine M’
    using an (H+l)-letter worktape alphabet,
    weakly time-bounded by T’(n) < \epsilon * T(n)
    and weakly space-bounded by S’(n) < S(n).”

    To clarify what “weak” and “strong” means here:

    (a) A nondeterministic Turing machine is said to be
    strongly T(n)-time bounded, if no computation path on any input
    of length n executes more than r(n) steps.
    (b) It is weakly T(n)-time-bounded if, for each accepted input of
    length n, there exists at least one accepting computation path which
    executes at most T(n) steps.

    Thus, by Theorem 3.2, adding a single additional symbol to the
    original worktape alphabet, we can achieve an arbitrary speed-up,
    T’(n) < \epsilon * T(n), for any \epsilon.
    The only restriction is that T(n) must grow faster than n.

    Now, by applying your technique to the machine M’ from Theorem 3.2,
    you can easily get the desired result.

    • Bart de Keijzer permalink
      May 15, 2009 7:23 am

      Oh, I see, that’s really interesting! Great
      Perhaps we can try to generalize it

      But I think it will be hard.
      Yeah, it requires a generalization of theorem 3.2, but also an even faster simulation method, I guess..

    • rjlipton permalink*
      May 15, 2009 7:55 am

      Very cool..rjlipton

  5. Bart de Keijzer permalink
    May 15, 2009 7:00 am

    I don’t entirely see your point I think. You write:

    > – When you are counting the bits required to encode the tapes, you used the
    > bound 2.t(n), implicitly assuming that each bit written on the tapes takes its own
    > separate step.

    I have the feeling that we are actually assuming that tape1 and tape2 can be modified in one single step. It’s simply that we need to describe the contents of both tapes.. could you please elaborate?

    • kintali permalink
      May 15, 2009 9:37 am

      I meant :
      – in time t(n) the combined length of the tapes is at most t(n) not 2.t(n)

  6. Dave permalink
    May 16, 2009 12:50 pm

    I’m sorry, no offense meant to Bart, but I really don’t think this argument is new. In my complexity theory course (we used the textbook “Computability and Complexity Theory”, by Homer and Selman), I was taught the following simulation argument:

    Let M = (Q, Sigma, Gamma, delta, s, q_a, q_r) be a k-tape NTM with running time t, where a = |Gamma| is the size of the tape alphabet. Simulate M(x) with a deterministic algorithm as follows. Let t = t(|x|). Conduct a breadth-first search (BFS) of the configuration space of M to t levels beginning at the initial configuration of M on input x, maintaining a set of nodes (i.e., configurations) already visited so that no node is visited twice, as is standard for any BFS or DFS implementation. The set of nodes discovered at the i’th level would roughly (see below) correspond to what Bart calls C_i.

    There are at most a^(k*t) * |Q| * t^k configurations possible to reach on input x; call this quantity N. One can do a BFS in O(N log N) time, using, for instance, using a red-black tree to store a list of already-visited nodes. Then the running time of the algorithm is O(a^(k*t) * |Q| * t^k * log (a^(k*t) * |Q| * t^k)) = O(a^(k*t) * |Q| * t^k * (k * t * log a + log |Q| + k * log t)). If we call |Q|, a, and k “constant with respect to t”, then we can swallow them into the big-O and the above bound is O(a^(k*t) * t^(k+1)). We can then use the Linear Speedup Theorem to eliminate the big-O constants and say, for instance, that a binary 2-tape NTM (a=k=2) can be simulated in time 4^t * t^3, or (4+delta)^t for any delta > 0, and then |Q| does not even appear as a multiplicative constant. But without the Linear Speedup Theorem – if you require that the machine doing the simulating has the same tape alphabet and number of tapes – then I think you are stuck with |Q| affecting the running time, if only as a multiplicative constant. (Not to mention that the BFS probably will take longer, like O(N^2) time since we don’t have random access memory to make the red-black tree fast.)

    The point is that at no time in the argument does |Q| ever appear in either the mantissa or the exponent of the exponential term a^(k*t). Maybe this point about the mantissa depending only on Gamma is not typically emphasized in an introductory complexity course. To make the analysis simpler, one might (see the proof of Theorem 5.11 in Homer and Selman’s textbook, for instance) state, “given k, a, and |Q|, there is a constant d such that for all t, d^t > a^(k*t) * |Q| * t^k”, and then you have a mantissa d that *does* depend on |Q|. But this is done to simplify the algebra for analysis, since the typical goal is simply to show an exponential-time simulation, and not to demonstrate that the exponential term can be made not to depend on |Q|. Homer and Selman in their Theorem 5.11 actually describe their algorithm as a quadratic-time depth-first search, but a similar analysis as above applies to their algorithm. The point I want to emphasize is that Homer and Selman definitely take into account the idea of not revisiting nodes, which is the key in their argument, the argument above, and Bart’s argument, in making the exponential term not depend on Q.

    As far as I can tell, Bart’s algorithm is a variant of the BFS algorithm that is somewhat less efficient than a garden-variety BFS, since Bart eliminates duplicates only within a level, but not between different levels. But unless there is something that I am missing (and please, someone point it out if you see it), not only was the argument already known, but it *is* the standard argument for showing that NTMs can simulate TMs (such as the proof of Theorem 5.11 in Homer and Selman), with no new simulation techniques required, but simply a more careful running time analysis of the existing technique.

    • Bart de Keijzer permalink
      May 16, 2009 2:34 pm

      Of course I would not like it if my idea has already been dicovered 😦
      Unfortunately I do not have access to Homer & Selman’s book right now, but I will look it up next monday at the library.

      To my knowledge, the idea of removing duplicate states among levels in BFS has not been considered. Saving configurations has only been used in DFS in cases when the runtime of the NTM is not known. But anyway, I’ll look into it.

      • Dave permalink
        May 16, 2009 3:10 pm

        Hey, I’ve made a rather busy private career so far out of proving things I thought were new and discovering that they aren’t, so don’t feel bad. 🙂

        Also, I could be totally wrong. Here is a photo of the proof in Homer and Selman so you can check for yourself:

        Like I said, they get sloppy with the time analysis at the end of the first paragraph since they aren’t explicitly trying to keep |Q| out of the exponential term. But I’m pretty sure if they would carry through the expression on the left hand side of the inequality then they’d get a time bound only quadratically worse than yours, with the number of states (denoted s in their proof) still outside of the exponential term.

    • rjlipton permalink*
      May 17, 2009 8:16 am

      The linear speedup theorem does decrease the time, but the alphabet blowups. So I do not see how that can help. I think the linear speedup proofs that I know will not help here.

      As for the rest I think that the idea of Bart is essentially a search of the DAG not the tree. That what I point out in my next post.

      We do now get a much better bound I believe than Selman or anyone previous. For two tapes binary alphabet get essentially (2.00001)^t say.

      The new proof is not just a simulation of the graph of the states. I need to use properties that are special to Turing machine tapes.


  1. Daily News About People : A few links about People - Thursday, 14 May 2009 18:04
  2. New Ideas On Nondeterministic Simulation « Gödel’s Lost Letter and P=NP
  3. How Fast Can We Simulate Guessing? « Gödel’s Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s