The Fischer-Pippenger Theorem, oblivious Turing Machines and a strange FOCS talk

Michael Fischer has made a huge impact on at least two quite separate areas of complexity theory. In the first part of his career he made important and seminal contributions to the basic understanding of Turing Machines (TM). This work is quite technically challenging and still stands as some of the deepest work done on the behavior of TM’s. While today we are more interested, often, in the behavior of general algorithms, the care and feeding of TM’s is important. In my opinion, there remain many nuggets to be mined regarding their behavior.

In the second part of his career, he switched to the area of distributed computing. He has made many important contributions to this area: both upper and lower bounds. Much of the work—especially in the beginning–was joint work with Nancy Lynch. It is interesting to note that Nancy started her career in recursion theory, before also switching to help create the theory of distributed systems. She was awarded the prestigious Donald Knuth Prize in 2007 for her work on distributed computing. I plan on talking more about that in the future.

Today I plan to talk about a wonderful theorem that is due to Mike Fischer and Nicholas Pippenger. Their result showed a tight relationship between time complexity of a TM and boolean complexity. The bound they proved three decades ago is still the best known. Amazing.

At the FOCS 1975, which was held that year at Berkeley, I had two great memories. One concerns discovering on checking into my room that I was sharing with Rich DeMillo, that there was another occupant sharing the room. There was a mouse racing around our room. A mouse. The meeting was being held in the old Claremont Hotel, and when we told the desk that we had cleverly trapped the mouse under a wastebasket, their response was, “yes, and what would you like us to do?” We, Rich and I, changed rooms—I believe the mouse stayed.

The other memory, from that FOCS, is one about Mike and the word “crock”. See this for the meaning if you are unsure what “crock” means, but I prefer not to directly explain it. That way we can keep our G rating for this blog.

Here is what happened. I was walking the hall when a talk entitled “Economy of Descriptions by Parsers, DPDA’s, and PDA’s” was about to start. I had no intention of going to this talk; it was great work no doubt, but it was not something I knew about, nor cared to know about. As I was trying to find a quiet place to think I ran into Professor Dr. Fischer. I was junior and he was senior. Mike told me that the next talk was on an important topic—one that he had worked on—and I should come hear it. So off we went and sat down in the middle of the hotel ballroom.

The talk was given by Matt Geller and his co-authors were: Harry Hunt, Thomas Szymanski, and Jeffrey Ullman. As soon as the talk started Mike proceeded to fall asleep. I figured that I could not leave in the middle of the talk, and since I had a blank pad and a pen, I started to try to solve some problem I was working on at the time. Talks can be a quite relaxing place to think, all the more true if you are not really listening to the speaker. I knew Matt well, but as I said the talk’s topic was not one I followed.

The talk ended and Matt proceeded to get the usual technical questions. He answered them in turn, and all seemed fine to the naïve observer—me. Somehow Matt then said something that hit a raw nerve with his co-authors. Ullman, recall he was one of the co-authors, raised his hand, and was called on by the session chair. Jeff said, “Matt did you just say X?” Matt immediately answered yes he had. With that Jeff said,

“That is a crock, and I want to completely disassociate myself from that.”

As soon as “crock” was uttered, Mike Fischer’s head popped up and he was fully awake. Is there some subconscious mechanism we have for certain words? Anyway, Mike quickly began to whisper to me what was going on? I started to try to explain the context the best I could while the session chair quickly pointed out that were on a coffee break.

The next thing that happened was unique. Matt was soon surrounded by a circle of experts and all his co-authors. They had a heated argument about was X true. The more Matt said the further, I noticed, that his co-authors moved away—the circle around him keep growing in size. I never did find out what the issue was.

By the way, Geller moved out of theory a few years later and announced that he was moving to Hollywood to become a screenwriter. He actually had some success with TV shows, and there on the screen from time to time was, screenplay by Matt Geller.

Let’s turn now to a wonderful theorem on TM’s.

Oblivious Turing Machines

A TM operates on an input ${x}$ of length ${n}$. The TM can move its heads based on the values of the input. However, we call a TM oblivious (OTM) if the motions of the heads are independent of the input ${x}$. Thus, the head motion can only depend on the length ${n}$ of the input, but not on the individual bits of the input.

Oblivious TM’s are important for several reasons. First, there is a strong connection between the time an OTM takes and the size of the boolean circuit that the machine computes. Second, there have been applications to cryptography over the years of OTM’s, the fact that the heads move the same way independent of the input bits is useful in certain information hiding schemes.

Suppose a TM accepts some language ${L \subseteq \{0,1\}^{*}}$. In a natural way there is a boolean function ${f_{n}}$, for each ${n}$, so that for all ${x \in \{0,1\}^{n}}$,

$\displaystyle f_{n}(x) = 1 \text{ if and only if } x \in L.$

By the boolean complexity of ${f_{n}(x)}$ we mean the size of the smallest general circuit that computes ${f_{n}(x)}$.

Theorem: Suppose that ${M}$ is a deterministic oblivious TM that runs in time ${T(n)}$. Then, the language it accepts has boolean complexity at most ${O(T(n))}$.

The famous Fischer-Pippenger Theorem is:

Theorem: Suppose that ${M}$ is a deterministic TM that runs in time ${T(n)}$. Then, there is a deterministic TM that accepts the same language as ${M}$ and is oblivious. More, it runs in time ${O(T(n)\log T(n))}$.

See this for a modern description of these results. Earlier in 1972 John Savage proved essentially the first theorem and the second with the weaker bound of ${O(T(n)^{2})}$ in his paper “Computational work and time on finite machines.” John’s paper is a very important early paper that had many cool ideas in it. I think it does not get as much attention as it deserves because John tried to make connections between “real” complexity measures and more theoretical ones. Yet it is an early important piece of work.

The Proof

I will not give a detailed proof of their theorem. The reference does a great job, but I do want to say something about how it works in general.

The central idea of simulating a TM and making the tapes oblivious is really a neat kind of amortized complexity argument. This was before the concept of amortized complexity was invented. Mike and Nick had to cleverly move the Turing heads in a fixed manner, but yet do an arbitrary computation.

I think the following may help explain what they needed to do, without getting into the details of Turing tapes. Consider the problem of sorting ${n}$ numbers

$\displaystyle x_{1}, \dots, x_{n}.$

There are many algorithms for sorting ${n}$ numbers in ${O(n \log n)}$ comparisons. However, the data motion is quite adaptive, that is what is stored where and when depends on the value of the data.

At a cost of only an extra ${O(\log n)}$ one can make the data motion independent of the data values. The trick is to use a sorting network, such as: Batcher odd-even mergesort, bitonic sort, or Shell sort. In a sorting network the basic primitive is swap: compare two data items and move the larger to the “bottom” position and the smaller to the “top” position. The order of which items to perform the swap operation on is fixed independent of the values of the items. Here is an example of a sorting network and a trace of how it works on a given input.

See this for details on sorting networks.

In a natural sense a sorting network is oblivious, since which data items are compared at each step are determined independent of the data values. Of course, it is possible to do even better with a more complex sorting network, but I will leave that for another day.

Open Problems

Improve the Fischer-Pippenger Theorem, or prove that it is optimal. A separate question is to improve the circuit simulation result. There is no reason that a TM that runs in time ${n^{100}}$ cannot always be computed by a linear size circuit. I have commented earlier that the famous Andrey Kolmogorov may have believed this. I have tried over the years to improve both these theorems with no success. I hope you are more clever and can see a new approach.

July 28, 2009 6:11 pm

If we drop the oblivious requirement from the Fischer-Pippenger Theorem, do we know if we can improve upon the log T(n) term?

July 28, 2009 9:34 pm

Not sure what you mean. They prove that time T goes to TlogT oblivious time. Then, TlogT oblivious time goes to circuit size O(TlogT).

If you mean can we avoid the step of going to oblivious and go directly to circuit size, the answer is unknown. But I think if this is what you mean it a great idea. If we could somehow avoid making the TM oblivious then perhaps could get a better bound.

2. July 28, 2009 11:39 pm

The proof given in the Vollmer book referenced is not actually based on the original Pippenger-Fischer paper (“Relations Among Complexity Measures”), but is rather based on the paper “The Network Complexity and the Turing Machine Complexity of Finite Functions”, by C.P. Schnorr. From what I can parse, Schnorr based his results off of lectures given by Fischer. Schnorr’s proof seems much simpler to me — the Pippenger-Fischer proof uses a recursive method that I can’t seem to unwind (while Schnorr’s paper gives a straightforward procedure).

Also, I think the above anonymous is referring to how efficient we can simulate TM’s if we drop the oblivious requirement. The T\log T simulation (without obliviousness) was first achieved by Hennie and Stearns in “Two-Tape Simulation of Multitape Turing Machines” (in 1966), and this implied a better deterministic time hierarchy theorem: that for f with g\log g=o(f), TIME(g) is strictly contained in TIME(f). This hasn’t been improved to my knowledge, so I think the answer is that the T\log T simulation (even without obliviousness) is the best we know.

It might also be worth mentioning that the oblivious T\log T simulation comes up in other areas, such as showing time-space lower bounds for SAT.

3. August 6, 2009 12:55 pm

When I introduce this theorem, I diagram the oblivious walk first:
Jag(1)Jag(3)Jag(1)Jag(7)Jag(1)Jag(3)Jag(1)Jag(15)Jag(1)… where

Jag(m) = 0->1->…->m->(m-1)->…->0->(-1)->…->(-m)->(-m+1)->…->0.

I think the one-sided version of this was employed by Axel Thue—I have to re-hunt the reference—so I call it the above the “two-way Thue walk.” Then I claim that n steps of the TM M can be done in n “jags”, and prove that n jags take O(nlogn) steps, with a small constant on the O. This top-down visual approach lets me shortcut the proof of the claim, especially useful when the class has 50-minute lectures and I’m anxious to segue from the bit-level underpinnings to the higher-level NP theory. For Cook’s Theorem I then use Schnorr’s O(nlogn) reduction directly to 3SAT from the same paper, though translating z = x NAND y to clauses (x v z) && (y v z) && (-x v -y v -z).

Savitch’s original O(n^2)-sized circuits have one important virtue: they can be easily stamped out on a 2-dim square mesh, which entails that the distance-d neighborhood of any cell has O(d^2) neighbors . The O(n log n) circuits—whose essence one can grasp by drawing the Thue walk and adding vertical lines connecting successive visits to cell j (for all j)—have exp(d)-sized neighborhoods. The question that most interests me is, suppose we allow a cubic mesh, or more generally allow O(d^m)-sized neighborhoods for general m > 2. Can we achieve circuit size better than O(n^2), such as O(n^{m/(m-1)), perhaps ignoring log factors? This says O(n^2) for m=2, O(n^1.5) for m=3, O(n^1.333…) for m=4, …, and yes O-tilde(n) for “m=infinity”.