A study in collaboration

Eric Allender and Luke Friedman and William Gasarch have written a paper that shows why it is good for a student to be exposed to experts in various areas. Luke is Eric’s student. Both Eric and Bill can individually be called “experts in various areas,” but here the symmetric difference of their area clouds seems to have been more important than their intersection. Eric is particularly known as an expert in Kolmogorov complexity, and has led a longstanding project to relate various versions of it to computational randomness. Bill has training in logic and recursion theory, and is adept at proof techniques that work by meeting—and sometimes injuring—infinitely many requirements in some priority order. The paper features a new result in Kolmogorov complexity obtained by meeting such requirements.

In such cases, what is a student to do? I (Ken) say learn the symmetric difference—then you don’t have to worry about just keeping up with either advisor.

Bill has a paper published in the logic journal, the Journal of Symbolic Logic, which I (Dick) find extremely impressive. The paper is also impressive—it is joint with Stephen Fenner and Brian Postow, the latter when a student of the late and much-loved Carl Smith. (I, Ken, also have a paper with Smith and Postow, one well outside my areas.)

Bill of course also co-authors the famous blog Computational Complexity with Lance Fortnow. Blogging promotes his research by exposing him more to other areas, such as Eric’s. Eric has not written for any blogs—not even a guest post can we find. Non-blogging promotes his research by giving him more time to write papers that involve other areas, such as Bill’s.

Today I (Dick) wish to talk about their paper itself, which will appear at ICALP 2011. It has the interesting, but general, title, “Limits on the Computational Power of Random Strings.” While its title is general I believe it is a terrific, important, and potentially seminal paper. As usual see the paper for the full statements of the theorems, and for all the details. What I want to do is just give a flavor of their results—while Ken wants to take a larger view on the problems they are trying to solve. Hopefully the symmetric difference of what we brought to this post will still be clear enough to motivate you to read their paper carefully, and even write the next paper.

Random Strings

Recall the set ${R}$ of random strings comprises those strings ${s}$ that have no shorter description than just ${s}$. This elides a technical issue—but essentially it is correct. A string ${s}$ can be described in many ways, and it is in ${R}$ if there is not a much better way to describe ${s}$ other than presenting the string itself. This notion was created in 1965 in a short note by the great Andrei Kolmogorov. Bruno Durand and Alexander Zvonkin in an article on this notion said:

As was common with him, Kolmogorov, published, in 1965, a short note that started a new line of research.

Very impressive. Perhaps we should all publish more short notes—alas probably only someone like Kolmogorov can create whole areas this way.

There now is a large body of results on Kolmogorov random strings. They have been used in many areas of theory, including lower bounds, proof theory, and protocols. There are two basic results from this area. First, a string chosen uniformly at random is likely to be in ${R}$. Second, determining whether a string is in ${R}$ is uncomputable. The former follows from a simple, almost trivial, counting argument; the latter follows essentially from the undecidability of the halting problem.

The Main Problem

Bill, Eric, and Luke study the set of Kolmogorov random strings ${R}$ from this point of view: How powerful is the ability to access an oracle with questions of the form ${x \in R}$?

Previously results were known of the form:

$\displaystyle \text{(some complexity class)} \subseteq \text{(some other complexity class)}^{R}.$

The classes are ${\mathsf{P}}$, ${\mathsf{NP}}$, ${\mathsf{BPP}}$, among others. The left-hand sides were always computable complexity classes, but the right-hand sides were not obviously even computable. This follows since the right-hand side uses access to ${R}$, and ${R}$ is of course well known to be uncomputable. Three prior such results by Eric and/or his frequent co-authors (not Bill—indeed this is their first collaboration), using ${\mathsf{P}_{tt}^X}$ to mean the class of languages that polynomial-time truth-table reduce to ${X}$, are:

$\displaystyle \begin{array}{rcl} \mathsf{BPP} &\subseteq& \mathsf{P}_{tt}^R\\ \\ \mathsf{PSPACE} &\subseteq& \mathsf{P}^R\\ \\ \mathsf{NEXP} &\subseteq& \mathsf{NP}^R \end{array}$

Since the right-hand sides have power beyond what is computable, however, are these inclusions really meaningful? What one would like are equalities, giving new characterizations of these classes, which would quantify the algorithmic power of the random strings.

A simple idea is to intersect the right-hand sides of these inclusions with the class ${\mathsf{DEC}}$ of decidable sets. As one might expect, this merely caps the problem at ${\mathsf{DEC}}$ itself: for any time function ${t(n)}$ there is a set ${A \in \mathsf{DEC} \setminus \mathsf{DTIME}[t(n)]}$ that reduces to ${R}$.

However, they observed that the manner of building such sets ${A}$ always exploited a technical feature of how ${R}$ is defined, namely the choice of a particular universal Turing machine. This means that what was previously seen as an “academic” issue was mattering concretely to complexity classes. Well, let’s approach it as an academic issue, along lines of today’s “postmodern” liberal arts culture.

The “Postmodern Problem” of Information Theory

As described at the outset by Wikipedia, Postmodernism is characterized by the revelation that apparent absolutes are based on relative constructs, especially regarding the language by which they are conveyed.

Now if there is one quantity in Computer Science that we feel should be absolute, surely it is the measure of information. After all, in many human languages, computer science itself is called some word form of Informatics. In computational complexity theory, we have by-and-large agreed that the multitape Turing machine is the model that provides the “standard meter.” The maximum number of steps taken by a program on inputs of a given length ${n}$ is a unique way to quantify the time complexity of the program. All Kolmogorov-complexity measures of information, however, are relative to a particular choice of a universal Turing machine ${U}$ by which to define the measure:

$\displaystyle K(x) = K_U(x) = \min\{|s|: U(s) = x\}.$

It is ironic that the machine model is absolute for time complexity, but the machines are themselves the relative element for ${K(x)}$. In normative terms, one might expect a need to choose the best ${U}$, but the theory doesn’t do this. Instead it shows that the choice affects ${K(x)}$ only up to a fixed additive constant ${c_U}$ depending only on ${U}$. Results in the theory either work the same for any particular ${U}$, or are asymptotic in a way that the constant ${c_U}$ doesn’t matter.

In this recent contest to find short universal Turing machines, the first concrete items under “Goals and Motivations” involve Kolmogorov complexity, next to a photo of Kolmogorov himself. They reference John Tromp’s “utterly simple computer model” and this paper by Jean-Paul Delahaye and Hector Zenil trying to find a “stable” definition. Delahaye and Zenil remark that the range of values of ${c_U}$ arising from natural ${U}$ is “so large that in practice it plays a major role” in concretely quantifying ${K(x)}$. One feels that how much information a string contains should be independent of choices within the machine model.

For Eric, Luke, and Bill this is a problem because they don’t really have a single set ${R}$. Instead what the theory gives them is a multitude of sets

$\displaystyle R_U = \{x: K_U(x) \geq |x|\}.$

The particular point is that the constant ${c_U}$ which is hidden in this definition can matter to infinitely many strings. This is not just a finite difference in the sets ${R_U}$. The program ${U}$ may have a special feature exploited to give infinitely many strings ${x}$ a description less than ${|x|}$, though greater than ${|x| - c_U}$. The difference of having or not having infinitely many strings can affect even asymptotic complexity results. They found that this was causing the power problem even after intersecting with ${\mathsf{DEC}}$.

Here we note the technical issue elided above. Given a seed string ${s}$ for ${U(s)}$, how do we know whether ${s}$ is describing itself or commands to produce some other string? We can disambiguate by adding a leading or trailing quote symbol, like ' or quote in Lisp. Adding a trailing quote symbol produces a prefix code, since ${s}$'${y}$ is illegal for all non-empty strings ${y}$. This is handled more naturally without introducing a quote symbol by demanding that the domain of ${U}$ be a prefix code to begin with. Thus ${K_U(x)}$ is technically defined by restricting to universal machines ${U}$ that have this property.

Their Solution and New Results

Looking for a “best” ${U}$ would not solve the problem. Every individual ${U}$—even with the prefix-code restriction—still makes arbitrarily complex decidable sets reduce to ${R_U}$, even under the strongly restrictive notion of disjunctive truth-table (${dtt}$) reducibility. Instead they take all the ${U}$‘s by the horns by intersecting over them.

One might think to define ${R = \cap_U R_U}$ with the idea that truly random strings aren’t helped by any ${U}$. However this set is empty—every string has a ${U}$ that embeds it and outputs it from a short command. The distinctive insight, which Eric and company actually had from an earlier paper, is to intersect the reduction classes. This gave a characterization of ${\mathsf{P}}$ via the ${dtt}$ reducibility:

$\displaystyle \mathsf{P} = \mathsf{DEC} \cap \bigcap_U \mathsf{P}_{dtt}^{R_U}.$

This is accomplished by turning the problem on its head: for every decidable language ${L}$ not in ${\mathsf{P}}$, we can find a ${U}$ such that ${L}$ does not poly-time ${dtt}$-reduce to ${R_U}$. This excludes ${L}$ from the intersection, leaving just ${\mathsf{P}}$ on the right-hand side. This already is very neat, an upside-down kind of diagonalization where you build a machine rather than defeat one. But how to diagonalize against the stronger reductions, even against ${\mathsf{NP}}$ oracle machines?

Enter Bill and inter Luke, in-between. Here is one sentence that shows the kind of intellectual proof complexity needed:

We break this requirement up into an infinite number of requirements.

Then each new requirement gives rise to a game, which is another tool of logicians. The games body-check the requirements in priority order and sometimes injure them—since Atlanta is about to lose its ice hockey team, this will have to substitute. As before, please see the paper for full details.

Even so, they are only able to give upper and lower bounds—not quite matching for a characterization. Here are their two main results:

$\displaystyle \begin{array}{rcl} \mathsf{BPP} &\subseteq& \mathsf{DEC} \cap \bigcap_U \mathsf{P}_{tt}^{R_U} \subseteq \mathsf{PSPACE}\\ \\ \mathsf{NEXP} &\subseteq& \mathsf{DEC} \cap \bigcap_U \mathsf{NP}^{R_U} \subseteq \mathsf{EXPSPACE}. \end{array}$

The amazing thing about both the old and the new results is that they seem like they should not hold. Why should intersecting over machines ${U}$ be the key step in getting a complexity class? The authors express some doubt about whether the “unnatural” diagonalizing ${U}$ they construct should really be allowed. However, to exclude them by fiat would add unnatural conditions to the definition of Kolmogorov complexity itself. We (Dick and Ken) add that it is satisfying to get natural uniform complexity classes out of the intersection, at least as bounds on it.

This last note about uniform classes speaks to possible larger significance. Any computable pseudo-random generator will have to have range disjoint from any ${R_U}$. Since ${R_U}$ is large, it makes an effective statistical test to distinguish the range of the generator from the uniform distribution. However, if ${L}$ is a language in a class to which we wish to apply de-randomization, but there is a ${U}$ such that ${L}$ does not reduce to ${R_U}$, then we cannot simulate the effect of this statistical test. Thus the upper bounds in the main theorems represent limitations on the size of complexity classes whose relationships can be affected by uniform hardness assumptions.

We also wonder, if the upper bound on the second main theorem can be improved to ${\mathsf{NEXP}}$ so that it matches the lower bound, whether that can help make further progress on lower bounds for ${\mathsf{NEXP}}$ itself.

Open Problems

Eric, Luke, and Bill list as open questions whether one can remove the “${\mathsf{DEC} \cap}$” term from the middles of the main results, and whether one can replace ${K_U(x)}$ by the original Kolmogorov complexity notion ${C_U(x)}$ which doesn’t require ${U}$ to implement a prefix code. There is one other “game-changing” trick in their proof which we have omitted here, namely changing the oracle from ${R_U}$ to a related set, and it is open how far this matters.

Of course their main open question is, can the ${\mathsf{BPP}}$-to-${\mathsf{PSPACE}}$ and ${\mathsf{NEXP}}$-to-${\mathsf{EXPSPACE}}$ bounds be tightened? What about ${\cap_U \mathsf{P}^{R_U}}$, does it equal ${\mathsf{PSPACE}}$?

Is there a better way to judge the true power of random strings? Can this help for general lower bounds?

[changed “and” to “and/or” after Eric for three results; some format fixes.]

1. June 2, 2011 5:09 pm

To be sure, Complexity Theory has its own “postmodern problems.” Only the running time of a program is well-defined, not that of an algorithm. By Blum’s Speed-Up Theorem, some problems are known to have no best program, hence no well-defined complexity. The theory is largely asymptotic anyway. Yet it has been one of the most scientifically important achievements of the last half-century, with P=?NP among the most important mathematical open problems.

“Postmodern problems” weren’t unknown to the ancient philosophers either. Heraclitus around 500 BC is the first known to argue that there ought to be a unique Logos underlying natural and human science the way logic serves for mathematics. But even granting one, Heraclitus himself wrote that man would be unlikely to understand it concretely.

I would define a small-ell logos as “a means by which truth and other quantities are determined.” In this sense, a particular choice of universal TM serves as a logos for the definition of Kolmogorov complexity—at least it is needed to define the value K(x), even though the value is almost always uncomputable. Thus the problem of finding a unique definition for information content is analogous to the classical philosophical problem of finding the unique Logos for the world, which Christianity and various traditions answer in one way, while science tries to answer in another.

June 2, 2011 6:26 pm

Great post. Great (post-)comment.

• June 6, 2011 12:33 pm

There’s an assertion in the Allender/Friedman/Gasarch article that reads:

It [is] not clear that it [is] even meaningful to speak about eﬃcient reductions to an undecidable set.

It would have been very helpful if the article had given examples of “decidable” versus “undecidable” sets.

E.g., is the set of even integers decidable? Is the set of random integers decidable? Is the set of Turing machines in $P$ decidable? For me, the answers are “yes”, “no”, and “it’s hard to say” … and yet the answers to this class of questions seemingly are of considerable importance in complexity theory …

… and that’s why this particular Gödel’s Lost Letter column (for me) would be substantially enhanced by a discussion (with examples?) of what complexity theorists mean by “decidable sets.”

2. June 4, 2011 8:11 am

Wow… although some spots are still obscure, I can say I grapsed the essense of both the problem and the proof.

Great proof technique and great job explaining it. I see great potential in this line of research, perhaps even answering the great one.

3. June 11, 2011 1:40 pm

I have a question regarding the definition of what is a random string. To avoid choosing any particular universal Turing machine, coudn’t we simply define the complexity of a string as the smallest possible number of states, such that there exists a Turing machine with this number of states?

4. June 11, 2011 1:45 pm

In my previous comment it should be “a Turing machine withthis number of states which outputs the string?|

June 12, 2011 2:19 pm

@Lukasz: in your reasoning, equally valid definitions would be e.g. “number of states times 2” or “number of states + f(M)”, where f(M) is some function measuring the “complexity” (arbitrarily defined) of the machine’s transition graph. Freedom in this choice corresponds to the freedom of choice of Universal TM you get in the classical definition of Kolmogorov complexity. In the classical definition, this arbitrariness is “hidden” in the definition of the universal machine (i.e. how it translates binary strings into descriptions of transition graphs)..

• June 13, 2011 3:38 am

@Marcin: thanks, that shed some light, but could you convince me why “number of states” alone is not a good complexity measure?