A nondeterministic Turing machine can be simulated faster than searching its configuration graph

Subrahmanyam Kalyanasundaram is one of my current graduate students, and is working on various problems in complexity theory. He is—as I have said before—one of the reasons these discussions are half as coherent as they are. I have no idea what I would do without his help.

Today I want to make a short announcement about a paper Subruk, Ken Regan, Farbod Shokrieh, and I have finally written. The paper is on the problem: How fast can a nondeterministic Turing machine (NTM) be simulated by a deterministic Turing machine? I have discussed this problem before, and we now have a draft of the paper.

Besides stating our theorem I want to discuss some possible extensions and conjectures. When I was a beginning researcher I read most of Albert Meyer’s papers; they were always interesting and usually quite deep. I noticed Albert used the phrase “strong conjecture,” and I asked him one day what was the difference between a strong conjecture and a conjecture? He explained his papers had three types of statements:

• Theorems and Lemmas: These were statements he could prove.
• Conjectures: These were statements he believed were true, but could not prove.
• Strong Conjectures: These were statements he was sure were true, but did not have the time to work out all the details.

I think whenever Albert said “strong conjecture” he was always right.

A Sale: ${50\%}$ Off

I wish I could report we have a terrific result, a shockingly great simulation, but we do have a significant improvement over the best known simulations. Here is the main theorem—see the full paper for the details:

Theorem: Any ${k}$-tape nondeterministic Turing machine with tape alphabet size ${a}$ that runs in time ${t(n)}$, can be simulated by a deterministic Turing machine in time

$\displaystyle a^{0.5 \cdot kt(n) + o(t(n))}.$

The theorem in the paper has a more precise bound, but the key information is all here. The whole point of this theorem is the factor ${0.5}$ in the exponent, a ${50\%}$ improvement over the known methods. I could have written the key expression as

$\displaystyle a^{\mathbf{\Phi} \cdot kt(n) + o(t(n))}$

where ${\Phi = 0.5}$.

The standard way to simulate a NTM yields ${\Phi=1}$: thus, we have improved the deterministic time by a square root. The usual method notes there are about

$\displaystyle S \approx a^{kt(n)}$

total configurations of a NTM running in time ${t(n)}$. These configurations form a natural directed graph of ${S}$ vertices, and so a deterministic simulation need only search this graph. Our theorem uses the special structure of this graph—it comes from a Turing machine—to reduce the search cost from ${S}$ to ${\sqrt{S}}$.

Some Extensions

I will follow Albert Meyer and make two conjectures about our theorem. However, they are not strong conjectures, but I do think there is a good chance they are provable with our technology.

Conjecture Suppose ${M}$ is a ${k}$-tape nondeterministic Turing machine with tape alphabet size ${a}$ that runs in time ${t(n)}$. Then, there is a deterministic Turing machine that runs in time

$\displaystyle a^{\mathbf{0.5} \cdot kt(n) + o(t(n))}$

and computes the exact number of accepting computations of ${M}$ on a given input ${x}$ of length ${n}$.

I think, along with my co-authors, our proof can be extended to handle computing the number of accepting computations, not just whether or not there is one. The reason it is only a conjecture and not a strong conjecture, is there are some subtle details that need to be carefully checked. Of course the interest in this result is the application to the simulation of other complexity classes such as #P and PP.

In a different direction I think we should be able to use the famous quantum algorithm of Lov Grover to prove the following:

Conjecture Any ${k}$-tape nondeterministic Turing machine with tape alphabet size ${a}$ that runs in time ${t(n)}$, can be simulated by a quantum Turing machine in time

$\displaystyle a^{\mathbf{0.25} \cdot kt(n) + o(t(n))}$

with bounded error probability.

Note, the exponent is half of the deterministic exponent. This is exactly what one would expect if Grover’s method can be applied here.

Open Problems

The obvious problem is to improve the main simulation, to decrease ${\Phi}$. Is it possible to prove a much better simulation? I think so. Perhaps ${\Phi}$ can be any positive number. Also are either or both of the stated conjectures true? Are they strong conjectures?

P.S. Happy earth day.

1. April 22, 2010 3:28 pm

I don’t know about your conjecture (or guess?) that it can be done for any positive number. The simulation problem is severely ETH-hard (Exponential Time Hypothesis). In fact, with reservation for annoying complications, I believe that a positive answer to that conjecture would imply FPT=W[1] (at least non-uniformly so?), which may or may not be a stronger statement.

April 22, 2010 6:02 pm

First, I do not believe ETH is so likely. We cannot prove anything like it, and even P not equal to NP does not imply ETH. So I am not clear this is a good argument against the conjecture.

Second, I believe ETH for 3-SAT and so on does not even contradict the conjecture. If NTM(t) is in deterministic time exponential in o(t), this does not contradict ETH. Recall, 3-SAT needs O(nlogn) bits to encode even a sparse 3-SAT problem, so as long as o(n) in exponent is not too small no contradiction. Right?

• April 23, 2010 3:28 am

You are free to believe what you want about ETH. I just wanted to point it out.

The connection is not quite direct, but goes via parameterized complexity. To explain it requires a few more details. (Big post ahead…)

I guess you’re familiar with the parameterized complexity classes FPT and W[1].

The NTM halting problem, with one tape, unbounded alphabet, and the number of steps as parameter, is complete for W[1].

By Yijia Chen, Martin Grohe: An Isomorphism Between Subexponential and Parameterized Complexity Theory (http://dx.doi.org/10.1137/070687153), there is something called the miniaturization mapping, which forms a bijection between parameterized and subexponential-time complexity. A (parameterized) problem P can be solved in subexponential time (in the parameter) if and only if the miniaturization of P is FPT. It also preserves reducibility between problems (with subexponential-time reduction families on the subexponential time side, FPT-reductions on the FPT side).

The preimage of the above-specified arbitrary-alphabet NTM problem under the miniaturization mapping is the same problem with parameter t*log a, if t is the number of steps and a the size of the alphabet.

This can be shown directly, but it also follows from the tools of Yijia Chen, Jörg Flum: Subexponential Time and Fixed-parameter Tractability: Exploiting the Miniaturization Mapping (http://dx.doi.org/10.1093/logcom/exn029). Succinctly put, the Linear Speedup Theorem constitutes an (alphabet size)-distillation for the NTM problem.

Of course, having arbitrary alphabet and parameter t*log a is more or less equivalent to having a binary alphabet and using t’=O(t*log a) time steps.

So if the problem you describe can be solved in time poly(instance size)*2^o(t), then it can be solved in time poly(instance size)*f(t) for arbitrary alphabet, for some function f(t), and this implies FPT=W[1] (and, e.g., an f(k)*poly(n)-time algorithm for the k-clique problem).

Like I said, FPT=W[1] may or may not be a weaker assumption than ETH. If FPT=W[1], then ETH is false, but the converse is not known to be true. (The image of 3-SAT under the miniaturization mapping is an annoying class called M[1] which lies between FPT and W[1].)

There is a reduction implicit in all of this, but it has the form of a subexponential-time reduction family: for any parameter l, it uses time f(l)*2^(n/l) to reduce 3-SAT to a binary-tape NTM halting problem with t=f(l)*n steps, and if we take l to be an arbitrary constant, then we can solve 3-SAT in arbitrarily low exponential time if the NTM problem can be solved with a 2^o(t) dependency on t.

Reservation: I do not know if this exactly covers your original statement. Could it be possible that for every epsilon, there is an algorithm with running time poly(instance size)*2^(epsilon*t), but that there is not even a non-uniform subexponential-time algorithm? This issue always confused me.

April 23, 2010 6:42 am

Thanks for the detailed comment. These conjectures, ETH is a conjecture too, just show how little we know about our basic complexity models. I am, of course, unsure what to believe, but I am open to thinking the conventional wisdom could be wrong on many of these conjectures. Time will hopefully let us see what is really true.

Thanks again.

2. April 22, 2010 4:50 pm

The halving of the exponent works because t = t(n) is the overall running time—it does not apply to a separate measure u(n) of the # of nondeterministic bits used. Achieving t(n)/2 seems to require “balancing away” some of the gains in the previous posts, while other aspects of them reappeared. A technical matter in one of the simulations being balanced is that the computation is divided into blocks of size order-sqrt(t). Different patterns of tape-head movements in each block are amortized against the number of uniquely-visited cells that were written to, with the effect that a potential 3^{kt} term for nondeterminism in head movements gets absorbed. This process, however, conflates paths with the same head positions ending a block but different head-movements within it—and this is why counting the number of accepting paths of the original machine does not so easily fall out.

One overall interpretation is that nondeterminism of writing characters is essentially the only kind that matters—nondeterminism by next-state or head movements has a lower-order effect. The latter kinds do not even change the base of the exponent from “a” to “a+\delta” as in the previous post. Of course some sources make string-writing nondeterminism the only kind. We also get a uniform simulation (meaning an arbitrary NTM N is also given as input to the DTM) with essentially the same time bounds, because the k-tapes-to-2 overhead just goes into the lower-order term. We could express the uniform time as a function of n’ = n + the program size of N, but with the latter bounded essentially by 3a^2 q^2\log(q), the time expressed as a function of n’ does not “look” notably smaller.

3. April 23, 2010 8:00 am

If your algorithm can simulate a TM with execution graph of size S in time \sqrt(S),
isn’t that a non-quantum version of Grover’s algorithm?

April 23, 2010 10:51 am

Stephan : This is not a general graph search algorithm. Here we exploit the fact that the graph is a configuration graph of an NTM. It is not really a non-quantum version of Grover’s algorithm.

April 23, 2010 10:14 am

Forgive me for being slow. I looked at the paper. It’s very technical and I could not, on one reading, figure out what the “trick” was in getting the square root reduction.

Furthermore, I would have thought before reading it that, by definition, in order for a deterministic Turing machine to simulate a nondeterministic Turing machine, the deterministic Turing machine would have to trace the whole graph defined by the configurations of the nondeterministic Turing machine, which couldn’t possibly take a square root of the size of the graph.

What am I missing?

April 23, 2010 10:50 am

Craig : Let me try to explain, the “trick”. We have two algorithms, one is the classic BFS on the configuration graph. And another is a trace method, where the various computation paths of the NTM are enumerated using the head movements and the tape alphabet seen by the heads at each time instance. It is well known that the BFS takes a^{kt} time. The trace method is optimized to run in a similar time bound as well.

But the two approaches are inherently different. If the NTM uses more tape space than kt/2 (note that in time t, with k tapes, the max possible tape use is kt), then the trace approach gives us better bounds, and if it uses less than kt/2, then the BFS gives us better bounds. This is what I would call the trick. Balancing these two cases and get the exponent down to kt/2.

April 23, 2010 1:26 pm

I notice that the paper says on page 7:

“We make the following observation: the last time we visit a location in the NTM tape, we need not write any character there. This is because the tape head is not going to return to that position. If the NTM visits at least kt/2 locations on all tapes together, then there are at least kt/2 locations visited for a last time. Now, when we consider block traces, we do not need to have a symbol to write down, if we are visiting a tape location for a last time. We could potentially save on a factor of a^{kt/2} on the running time. This brings down the main factor in the running time in Theorem 4 to a^{kt/2} as well.”

But if we don’t write the symbol for at least kt/2 locations, how is this a simulation? A deterministic simulation of a nondeterministic Turing machine should emulate all aspects of it, not just the parts that have any bearing on whether or not it accepts or rejects a path. What am I missing?

• April 23, 2010 4:50 pm

MW—great query. I haven’t gotten the details to line up (halting for NTM vs. ATM, the effectiveness detail, what exactly is the parameter…) yet this afternoon, but am continuing with it…

Craig—good relevant point. Exactly what it means to “simulate” can be subjective, but your issue comes out completely objectively in the stated open problem about counting the exact number of accepting computations of the original machine.

Subjectively we could reply that when the symbol written isn’t subsequently read, its identity is not really part of the behavior of the original machine. Grayer is the case of the original NTM being regarded as computing a “multivalued function”, where potentially all a^t guessed strings could be legal outputs. If the deterministic machine were required to cycle thru all a^t outputs to constitute a “simulation”, then indeed we don’t get the “t/2”. However, in section 5 we require the DTM only to find some legal output.