The role of fads in computational complexity theory

Heidi Klum is of course not a computer scientist, but is a world famous model and the host of the hit TV show “Project Runway.” To paraphrase her: “In theory (fashion), one day you are in and one day you are out.”

What I would like to discuss today is the way that computational theory is driven, at least partly, by fads. According to the dictionary a fad is: a temporary fashion, notion, manner of conduct, etc., especially one followed enthusiastically by a group. What is in yesterday, may be out today.

In any science it is natural for topics to change over time. As progress is made, topics must evolve. This evolution may make old topics obsolete, and open up entire new ones, which will have their own questions. These questions may require new methods for further progress to be made, and so on. This type of change is natural and healthy.

I have two concerns, first, that important old ideas maybe tossed aside, forgotten. I will shortly give a list of topics that once were very much in favorite and are now out. This may be okay, but one consequence is that important earlier work is not generally known by today’s researchers.

Those who cannot remember the past are condemned to repeat it.

From George Santayana.

Second, I am concerned that the dominance of fads, in computational theory, could negatively effect the future. In my last post I discussed the reasons that Georg Cantor kept a low profile in naming his first paper, which proved that the reals are uncountable. He was afraid of the controversy his work had already stirred up. In that post, I raised a question: if his work was happening today, would he be able to get funded, get promoted, or get into top conferences? I wonder.

A Disclaimer

I want to say up front that I have worked on many of these “fads”, so am throwing stones at my work too. I do want to point out that there are areas that were once central to theory, and have gone out of favor. I think there are important results in some of these areas that we all should know, but many do not. Also we might ask ourselves, the next time we judge what is “in” and what is “out”, will our own area go out of favor someday?

What is a Blum Complexity measure? If you guessed it is a measure named after the famous Manny Blum, you are on the right track. In the late 1960’s Blum decided to generalize the notion of time and space resource measures. He did this by creating a simple pair of axioms that any “reasonable” complexity measure should have. More importantly he, and others, proved a number of interesting theorems about any resource measure that satisfied his axioms.

One of the big results in Blum complexity measure theory is that there are problems with no “optimal” algorithm. A basic example of this is:

Theorem: There is a recursive set so that given any algorithm for the set with running time ${T(n),}$ there is another algorithm for the set that runs in time ${\log T(n)}$ for large enough ${n}$.

There were much stronger results, I only give this as an indication of the type of results that were proved. This theorem and its cousins are called “Speed-Up Theorems.”

When I was a graduate student, Blum complexity theory was the rage. One half of the theory qualifier at Carnegie-Mellon, the year I took it, was a question of the form: consider a resource measure ${X}$, is it or is it not a Blum Complexity Measure? I had a very short proof that showed that the answer was no–the measure ${X}$ violated the speedup theorem. But I worried a bit that my answer could be wrong, since it was so short.

The beauty of the area was that the complexity measures were defined by two simple axioms and so the results applied equally to time or space or even other exotic measures. The reason the area became less important, I think, was not that the results were not beautiful, nor technically difficult, but that they applied only to artificial problems. Further, the focus of theory shifted to not what properties did time and space have in common, but how were they different. See this for Fortnow’s thoughts on this topic.

What is Program Schema Theory? If you guessed it is a theory of programs named after Schema you are wrong. It is the study of programs started in the late 1950’s by Iu Ianov, that attempted to avoid the undecidability of the Halting problem. The halting problem makes all interesting questions about programs undecidable. This is actually a theorem called Rice’s Theorem–roughly any non-trivial property that depends on the behavior of a program is undecidable. Alan Perlis called this phenomena the “Turing Tar-pit.” Alan’s point was whenever we try to do something general with programs, like the dinosaurs, we get stuck in the tar-pit.

The central idea of Ianov was to strip a program of all semantics except the flow structure. Thus, replace statements in programs like:

$\displaystyle x \leftarrow y + z^{2};$

by

$\displaystyle x \leftarrow f(y,g(z)).$

His idea was to try to prove properties that were true for all instances of the functions ${f(,)}$ and ${g()}$. Perhaps one could avoid falling into the Turing Tar-pit, if the analysis could avoid the subtleties of the addition and squaring operation.

The good news was that for very simple cases such as one variable and monadic operations all was decidable; the bad news was that anything more became undecidable. There were many very pretty results that were proved in this area, but again the central idea did not work exactly as planned.

One example of the work that was done is Zohar Manna thesis on program schema theory. It is a classic: it is relatively short, and very beautiful. I once looked at it as a student and thought, with great envy, I wish I could write a thesis like that.

What is a Balloon Automata? I hope you do not know the answer to this; if you do, for bonus points what is an AFL? The correct answer has nothing to do with American football. Balloon automata were an attempt to generalize the notion of finite automata, pushdown automata and others into one general package. The idea was reasonable: perhaps some theorems could be proved for whole classes of automata rather than having to prove them piecemeal. The difficulty was that the interesting questions tend to be special for each type of automata.

Perhaps some areas deserve to go away.

What is a Reducible Graph? These are a subset of all directed graphs. In a natural way, directed graphs can be used to model the information flow of a program. After the rise of structured programming, an important insight was that many real flow graphs were reducible. I will not give the definition here, but they had a beautiful inductive flavor. This inductive structure allowed algorithms that were slow on general directed graphs to be faster on reducible graphs, often even run in linear time.

This work was quite neat and had an interesting structural result called “The Ice Cream Cone Lemma.” I do not know why this is not more central today. However, see this for a modern treatment.

What is a Ultra Computer? In the 1980’s Jack Schwartz of NYU started a joint project with IBM to build the “ultra machine.” This was to be the ultimate in parallel computers, that’s why it was called the ultra machine. The design at a top level was a collection of processors that were connected to a collection of memory units via a network. The project included other clever ideas such as “combiners” that you can read about on your own, if you are interested.

There was a large amount of theory research into the structure of such machines. Quite a bit was done at NYU, but most papers were written by those at other institutions. One of the major theory questions was how to avoid collisions in the network from slowing down the machine: if many processors needed data from the same memories that would be a problem. Some clever methods were developed that used randomness and hashing techniques to attempt to avoid such bottlenecks.

Actually, a few leaders of the theory field, like Wolfgang Paul, left theory to go and actually build their designs. A huge lost for theory. I will have a post of Paul’s important theory work soon.

What is VLSI? In the 1980’s Carver Mead and Lynn Conway wrote a great book on Very Large Scale Intergration, that made it possible for mortals to design VLSI chips. We did just that at Princeton where I was at the time, and many other places did the same. A theory for the complexity of chips was created at Carnegie-Mellon by by H. T. Kung and his students. A typical theorem looked like this:

Theorem 2 Any chip that computes ${X}$ has ${AT^{2} \ge cn^{2}}$, for large enough ${n}$ and a constant ${c>0}$.

Here ${A}$ is the area of the chip, and ${T}$ is the time that the chip takes to compute the problem. For example, ${X}$ could be “the product of two ${n}$ bit numbers.”

All of these proofs were similar. First you showed that there had to be a geometric bisection of the chip into two parts, call them ${L}$ and ${R}$, which had a certain property. Second, one proved that ${L}$ and ${R}$ had to exchange many bits of information, in order for the computation to work. This was the special property. Finally, the lower bound on ${AT^{2}}$ followed by showing that either: (i) the bisection cut was large and hence ${A}$ was large, or (ii) the bisection cut was small and so ${T^{2}}$ had to be large since it would take a long time to move many bits across a small cut.

One lasting contribution was the creation of the notion of communication complexity. For example, nondeterministic communication complexity, I believe, was first introduced by Bob Sedgewick and myself in this context. We needed nondeterministic communication complexity to solve a particular chip lower bound question.

Unfortunately, one of the failings of this area is that the constant ${c}$ was tiny. See this earlier post for a comment on this.

What is WSI? It is wafer scale intergration. A company called Trilogy Systems was founded around 1980 to make WSI work in practice. They eventually spent over $230 million trying to succeed, although they eventually failed. So what is WSI? When VLSI chips are made they are fabricated on large wafers that contain many individual chips. The wafers are then cut into separate pieces, these pieces are tested, and the working ones are used. Trilogy’s idea was to avoid the whole cut apart and test operations. Rather, their idea was to keep the wafer whole and let software find the good chips, and then use software to route around the faulted chips. This sounded like a promising idea, but it did not work. There were many practical issues that made for the failure: the main one was the cost benefit of keeping the wafer intact was not large enough. Thus,$230 million was lost, when that was a large amount of money. Remember those days.

There were some very pretty theory questions that arose in the theory of WSI. For example, theoreticians like Charles Leiserson and Tom Leighton wrote a paper on the theory of WSI. The specific problem they studied is interesting in its own right. Roughly one has a square grid of devices and each can fail with some probability ${p>0}$, but they fail independently. The question is how large a set of devices can you construct that are connected by relatively short wires. The analysis was non-trivial and interesting.

What is a Decision Problem? Okay I sure you got this one right. The issue with this area is that it was still a “fad” that somehow went mostly out of favor. Two examples come to mind.

A long standing, very hard, open problem was given two deterministic pushdown automata, do they accept the same language? Leslie Valiant proved one of the first non-trivial results and showed that for a very special sub-case the answer was yes. His proof is a clever simulation of both pushdown machines by one pushdown machine. He showed that if the two machines stay “near” each other the simulation works; however, if they “drift” apart, then the simulation fails, but in this case they are not equivalent. Very clever. This appeared in a STOC. Yet later when the general problem was solved it was not to appear in STOC or FOCS.

The same thing happen with the work of Jérôme Leroux on the decision problem for Vector Addition Systems. I have discussed such systems in an earlier post, and must say that I am not unbiased since I worked on them. But, I cannot understand why the FOCS program committee would reject his work. He said that one of the three FOCS reviews was “I think that the author should find a conference that is better suited for this paper”.

Clearly, decision problems are out.

What is Univariate Complexity? Okay I think you know this one too. What you may not be aware is that this “fad” had methods that are still relevant today. It is a good example of Santayana’s warning.

In the early days of complexity theory we studied the complexity of univariate polynomials. There were a series of results on the complexity of a polynomial such as

$\displaystyle a(x) = a_{n}x^{n} + a_{n-1}x^{n-1} + \dots + a_{0} .$

A simple degree argument showed that no straight-line computation could compute such a polynomial in less that ${\log n}$, provided ${a_{n} \neq 0}$. But in general a polynomial required almost order ${n}$ steps.

Then, as today, the goal was to get lower bounds on explicit polynomials. The best that was proved then were results like this due to Volker Strassen.

Theorem: The polynomial,

$\displaystyle \sum_{k=0}^{n} 2^{2^{k}}x^{k}$

requires ${{\Omega}(n)}$ operations to compute.

His proof technology is worth recalling. His idea was to show that if a polynomial ${a(x)}$ could be computed fast, then its coefficients must lie on a hyper-surface

$\displaystyle S(a_{0},\dots,a_{n}) =0$

and moreover the surface had relatively small degree and height. Thus, he reduced the challenge to showing that there were explicit points that were not on the surface. The reason his theorem worked is the double exponential growth kept the points off the hyper-surface.

I cannot resist to mention that I worked in this area and proved this:

Theorem: There exist polynomials,

$\displaystyle \sum_{i=0}^{n} c_{i}x^{i}$

with ${c_{i} \in \{0,1\} }$ that require ${{\Omega}(n^{1/4}/\log n )}$ operations to compute.

This has been improved to $\Omega(\sqrt n/\log n)$ by Claus Schnorr. Note, in all these results arbitrary complex numbers are allowed in the computation.

Recently Ran Raz has proved some pretty results on multi-variate polynomials that are based on, I believe, similar technology to that of Strassen. Ran’s work goes way beyond the work that was done earlier, and he did his work without knowing anything about Strassen’s paper. I know this because I pointed out Strassen’s work to him when Ran visited Georgia Tech. Ran was very kind to reference our earlier work. However, it seems too bad that this earlier work was on the outs, and therefore was unknown to modern researchers.

Open Problems

This is a different type of post, so I guess there should be a different type of questions. How do make sure that results from old “fads” are not completely forgotten? I think there are lots of good ideas that have been lost to the community. Perhaps you agree? Let me know.

Also it may be interesting to note that several of the old “fads” had a similar goal: how could we find common structure among disparate concepts? This clearly applies to Blum measures, and Ballloon automata. Today we seem to focus completely on problems that are special to each area. Could we be missing the forest for the trees by not trying to unify things more? Clearly, such a unification must be done properly or we will lose the fine structure of complexity theory. But I think there could be something there to do.

Finally, this issue of “fads” has arisen in the community, so check out this to see what Michael Mitzenmacher has to say on a related issue.

1. April 22, 2009 4:01 pm

On reducible graphs: I guess a modern theory take are algorithms for bounded tree-width. For instance, people study constraint satisfaction problems on bounded tree-width graphs. [Thorup’97] shows that the control flow of structured programming languages is a graph of bounded tree-width.

April 22, 2009 5:34 pm

Although fads occur in many areas, I think perhaps they are more endemic to TCS because of its conference-focused publishing system. Because conference publications (in certain conferences) are given so much weight in TCS — even more than journal publications — accepting a paper to a conference is a much more important decision than it otherwise would be. This imposes more of a limitation on the number of accepted papers than there otherwise would be, which in turn leads PCs to accept mostly papers that are currently in vogue.

1) publications were released early on the arXiv/ECCC/etc.
2) conferences were for presenting new results to an audience (and the usual networking and getting together), but NOT for giving de facto credit
3) journal publications are what get you credit

then we could perhaps accept more papers to conferences, including ones that weren’t necessarily part of the latest, greatest (?) fads.

On another note: I don’t think we’re necessarily missing the forest for the trees, in that I don’t think there’s some Grand Unified Theory that will pull together lots of disparate research strands. But our complete inability to answer some of the most basic questions indicates that we’re definitely missing something. Our situation is more akin to knowing lots about the bark and leaves of the trees, but not realizing that the trees would die without their roots. Unfortunately, that analogy isn’t quite as pithy.

3. April 22, 2009 9:04 pm

Width Parameters Beyond Tree-width and their Applications …(Imagine, for instance, that someone would manage to prove NP-hardness of some …… Unfortunately, only weaker partial results in the case of branch-width 4 are known so far (for …… whose decomposition tree T is rooted at some vertex p0 = root(T). ….. This is defined in analogy to brambles of graphs [189]. [P=NP][EZZ=Musatov].