Plus a long-promised discussion on diagonalization

 TRUST security source

Dexter Kozen has been on the faculty of computer science at Cornell for almost 30 of the department’s 50 years. He first came to Cornell 40 years ago as a graduate student and finished a PhD under Juris Hartmanis in just over 2 years. He was named to the Joseph Newton Pew, Jr., professorship 20 years ago, and celebrated his 60th birthday in 2012. Besides many research achievements he co-authored an award-winning book on Dynamic Logic.

Today we salute the 50th anniversary of Cornell’s department and keep a promise we made 5 years ago to talk about a diagonalization theorem by Dexter. It yields what may be an interesting finite puzzle.

The 50th anniversary symposium earlier this fall was a great occasion; I wish I’d been able to drive over for it. There were two days of talks by Cornell faculty and alumni. It coincided with the department moving into a new building, Gates Hall, named for—who else?—Bill and Melinda Gates as principal donors. Bill Gates came for the dedication and engaged Cornell president David Skorton in a one-hour conversation on how to serve students best.

I have fond memories of my time at Cornell in 1986–7 and 1988–9 split with time at Oxford. I was given a home in the department though my postdoc came from Cornell’s Mathematical Sciences Institute, and I was treated incredibly well. It was a great personal as well as professional time for me. When I last visited two years ago, Gates Hall was just going up across the street, beyond the left-field fence of the baseball field I viewed from my office window in the 1980s. Dick and I wish them the best for the next 50 years.

 “Innovative Departments” source

## 1982

Dexter is on sabbatical in the Netherlands and Denmark this academic year. I first met him in Denmark, at ICALP 1982 in Aarhus. We took part in a local tradition of performing musical numbers at the conference banquet. He had a band called the “Steamin’ Weenies” when I was at Cornell. He has kept his music up with bands of various names through the “Katatonics,” which performed in a Superstorm Sandy benefit two years ago at Ithaca’s club “The Gates” (not named for—who else?—Bill and Melinda Gates).

It is hard to believe, but summer 1982 is three-fourths of the way back to Steve Cook’s 1971 paper which put the ${\mathsf{P}}$ versus ${\mathsf{NP}}$ question on everyone’s map. Back then we knew it was hard but there was more optimism. I have told the story of sharing a train car with Mike Sipser on the way to that conference and hearing Mike’s confidence in being able to solve the question.

The aspect that captivated me that summer was formal logic issues involved in separating complexity classes. They can be approached in two ways: as sets of languages and as subsets of machines. The former I tried to conceptualize as a topological space. As with the Zariski topology of algebraic sets employed by Alexander Grothendieck, it satisfies only the ${T_1}$ axiom and so is not a Hausdorff space. The latter view was developed by Dexter’s 1979 paper, “Indexings of Subrecursive Classes.”

Dexter’s paper cut to the chase away from logic or topology by treating the problem as one about programming systems, i.e., recursive enumerations of machines that have nice properties like efficient substitution and composition. He showed a tradeoff between power and niceness of what one can do with them. The niftiest theorem, in section 7 of his paper, shows that if you insist on representing ${\mathsf{P}}$ by machines that mark ${n^k}$ cells on a tape as a polynomial time clock, and insist on a composition operator giving only a constant-factor overhead in program size, then both universal simulation and diagonalization need more than polynomial space. Thus such representations of ${\mathsf{P}}$ will not support a proof of ${\mathsf{P}}$ different from ${\mathsf{PSPACE}}$, let alone ${\mathsf{NP}}$. But in the previous section he gave a theorem showing that if you give up all efficient closure properties, then you can in principle achieve any diagonalization you like.

## Kozen’s Theorem

The only property of the class ${\mathsf{P}}$ that is used by the theorem is that it is closed under finite differences: if ${L}$ is different from a language ${B \in \mathsf{P}}$ by a finite set, meaning that the symmetric difference ${L \;\Delta\; B = (L \setminus B) \cup (B \setminus L)}$ is finite, then ${L}$ is in ${\mathsf{P}}$. And of course it uses that ${\mathsf{P}}$ has an effective programming system, which is what distinguished bounded closed sets in my topological view.

Theorem 1 Let ${[P_u]_{u=1}^{\infty}}$ be any enumeration of ${\mathsf{P}}$ by machines, and let ${A}$ be any language outside of ${\mathsf{P}}$. Then we can define a function ${g : \mathbb{N} \longrightarrow \mathbb{N}}$ such that

1. ${\mathsf{P} = \{L(P_{g(u)}) : u \in \mathbb{N}\}}$, and

2. ${A = D_g}$ where we define ${D_g = \{u : u \notin L(P_{g(u)})\}}$.

Moreover, any reasonable formal system that can prove ${A \notin \mathsf{P}}$ can prove statements 1 and 2. Indeed, we can make ${g}$ a permutation, so that the new indexing involves the same machines as the original, only scrambled. In the case ${A = \mathsf{SAT}}$, this yields his prose conclusion:

If ${\mathsf{P \neq NP}}$ is provable at all, then it is provable by diagonalization.

As he was quick to add, this doesn’t say that the mere fact of ${\mathsf{P \neq NP}}$ enables diagonalization, only that working by diagonalization does not impose any additional burden on a reasonable formal system.

Proof: We define ${g(u)}$ in stages. At each stage we have “marked” the values ${g(t)}$ for ${t < u}$. At stage ${u}$, take ${v}$ to be the least unmarked value such that

$\displaystyle u \in (L(P_v) \;\Delta\; A)$

and define ${g(u) = v}$. This also marks ${v}$. A suitable ${v}$ always exists since infinitely many languages in ${\mathsf{P}}$ include ${u}$, which handles the case ${u \notin A}$, and infinitely many exclude ${u}$, so there’s always a ${v}$ when ${u \in A}$ too. Also clearly ${g}$ is one-to-one since we mark each value. To see that ${g}$ is onto, let ${v}$ be the least unmarked value at any stage ${u}$. Since ${A}$ and ${L(P_v)}$ differ by an infinite set, there will always be a ${u' \geq u}$ in the symmetric difference, and the algorithm will set ${g(u') = v}$ for the least such ${u'}$. $\Box$

## Interpretations

The proof works even if ${A}$ is undecidable—we just get that ${g = g_A}$ is uncomputable—and there are uncountably many ${g}$‘s to go with uncountably many ${A}$‘s. When ${A}$ is computable the ${g_A}$ in the algorithm is computable, and if the indexing includes many copies of machines accepting ${\emptyset}$ or ${\Sigma^*}$ then we can get ${g(u) < 3u}$ or lower.

The inverse of ${g_A}$, that is, the mapping ${e_A(v) = u}$, however, may have explosive growth for infinitely many ${v}$, depending on how sparse the symmetric difference of ${A}$ with some languages in ${\mathsf{P}}$ is. Still, if the statement ${A \notin \mathsf{P}}$ is provable in a given formal system, then the inverse is provably total, so the diagonalization can be verified. Another way of putting this is that the formal system is able to verify that every language in ${\mathsf{P}}$—that is, every ${L(P_v)}$, is eventually enumerated, so the formal system can tell that the class being diagonalized against is all of ${\mathsf{P}}$.

The ease of the proof and its relation to oracle results contributed to controversy over how to interpret it. I’ve mostly felt that the logical “horse” was being put behind or at best alongside the “cart” of combinatorial proof methods. Recently I’ve appreciated how the proof focuses the question on how large your idea of “${\mathsf{P}}$” is rather than how hard ${\mathsf{SAT}}$ is, which was a key issue in the attempted ${\mathsf{P \neq NP}}$ proof by Vinay Deolalikar. The oracle result showing ${\mathsf{P}^B = \mathsf{NP}^B}$ for some computable languages ${B}$ (any ${\mathsf{PSPACE}}$-complete language will do) was supposed to argue that diagonalization against the class ${\mathsf{P}}$ had no special power as a tool for ${\mathsf{P \neq NP}}$.

The proof relativizes by taking ${[P_u]}$ to be an ensemble of polynomial-time oracle Turing machines giving ${\mathsf{P}^B = \{L(P_u^B)\}}$ for each oracle set ${B}$, and by taking a fixed oracle TM ${N}$ such that ${L(N^B)}$ is ${\mathsf{NP}^B}$-complete for each ${B}$. We get a fixed oracle transducer ${T}$ such that ${T^B}$ executes the algorithm for ${A^B = L(N^B)}$. Whenever ${\mathsf{P}^B \neq \mathsf{NP}^B}$ it computes a total function ${g_A^B}$ whose diagonal is ${A^B}$. If you think of “${g_A^B}$” as “${g_A}$ relativized to ${B}$” then this seems like a recipe for contradiction if the unrelativized ${g_A}$ is total and ${\mathsf{P}^B = \mathsf{NP}^B}$. But what happens when ${\mathsf{P}^B = \mathsf{NP}^B}$ is just that ${T^B}$ computes an undefined function—indeed it doesn’t halt. To make an algebraic analogy, ${g_A^B}$ should be parsed as ${(g^B)_A}$ not ${(g_A)^B}$, so “the diagram does not commute” and there is no problem.

## Variations

I came back to Dexter’s theorem this fall term while re-thinking how best to introduce diagonalization in an undergrad or grad theory course. The classical way is

$\displaystyle D = \{e : e \notin L(M_e)\},$

where ${e}$ is the “Gödel Number” of the machine ${M_e}$. My own usual style is to identify a machine or program with its code as a string, thus writing

$\displaystyle D = \{M : M \notin L(M)\}.$

This removes any need to talk about the correspondence between strings and numbers, Gödel or any kind of numbers. However, styling this via programming raises questions such as whether “${M}$” means the source or compiled code, and does it matter how they differ?

Recently I’ve preferred to think of ${M}$ as an object, indeed analogizing components ${(Q,\Sigma,\delta,\dots)}$ to fields of a class. Then “${e}$” re-appears as an encoding of the object by a (number or) string. We define:

$\displaystyle D^e = \{e(M): e(M) \notin L(M)\}.$

Now it doesn’t matter if we say ${e(M)}$ is the source code or the compiled executable, and it can be anything—we can downplay the “self-reference” aspect if we wish. The encoding function need not be 1-to-1, provided it is “extensional” in the sense that

$\displaystyle e(M) = e(M') \implies L(M) = L(M').$

We still get the diagonal contradiction: if ${D^e = L(M)}$, then ${e(M) \notin L(M) \implies e(M) \in D^e}$ contradicting ${D^e = L(M)}$. And ${e(M) \in L(M) \implies e(M) \in D^e}$, which implies that there is some ${M'}$ such that ${e(M') = e(M) \wedge e(M') \notin L(M')}$, but by extensionality we have ${L(M') = L(M)}$ and again that is a contradiction. (Well, you don’t have to tell students this more-complicated proof—just say that ${e}$ is 1-to-1 and it’s just as quick as the usual diagonal proof.)

Returning to Dexter’s diagonal set ${D_g = \{u: u \notin L(M_{g(u)})}$, we see it is the same as ${D^e}$ with ${e = g^{-1}}$. Looking at ${D_g}$ by itself, much as we can relax ${e}$ being 1-to-1, we can relax ${g}$ being onto, so long as ${g}$ is “functionally onto” the languages, in keeping with statement 1 of Theorem 1. Indeed, when the programming system ${[P_u]}$ repeats each language infinitely often, we could redo the proof to make ${g}$ an increasing function.

We can abstract this by considering any function ${F}$ from a set ${S}$ into its power set ${P(S)}$ and functions ${e,g}$ on ${S}$. It is interesting to try to formalize exactly when ${D^e = D_g}$, that is,

$\displaystyle \{e(x) : e(x) \notin F(x)\} = \{u: u \notin F(g(u))\},$

with ${e}$ being extensional with respect to ${F}$ and ${F\circ g}$ being onto the image of ${F}$. One interesting requirement is that when ${e}$ is not onto ${S}$, ${Ran(F)}$ must cover ${S \setminus Ran(e)}$, that is:

$\displaystyle (\forall z \in S)(\exists x \in S)[z = e(x) \vee z \in F(x)].$

Again I wonder if there is a useful analogy in abstract algebra or category theory for this situation. Given the function ${e}$ and ${y \in Ran(e)}$, we can define ${g(y) = x}$ for some choice of ${x}$ such that ${e(x) = y}$. For other ${y}$ we can put ${g(y) = z}$ for any ${z}$ such that ${y \in F(z)}$. Given the function ${g}$, we can define ${e(x)}$ for any ${x}$ by taking ${y}$ such that ${F(x) = F(g(y))}$, since ${g}$ is functionally onto ${Ran(F)}$, and define ${e(x) = y}$.

## A Finite Puzzle

The abstraction nicely gives a concrete puzzle when ${S}$ is a finite set. Then we don’t have Dexter’s property of ${Ran(F)}$ being closed under finite differences, but we can still ask:

Is every unindexed subset a diagonal? That is, for all ${A \subseteq S}$ with ${A \notin Ran(F)}$, does there exist ${g: S \longrightarrow S}$ such that ${A = D_g(F)}$?

Equivalently, does there exist ${e: S \longrightarrow S}$ such that ${A = D^e(F)}$? If ${F}$ is 1-to-1, then ${g}$ must be onto, which means ${g}$ must be a permutation since ${S}$ is finite, and likewise ${e}$ must be its inverse. There are interesting cases where ${F}$ is not 1-to-1, in which the extra latitude for ${g}$ and ${e}$ becomes important.

For example, let ${S = \{a,b,c\}}$ and let ${F(a) = \{a,b\}}$, ${F(b) = \{a\}}$, and ${F(c) = \{b,c\}}$. Then ${F}$ is 1-to-1, so we have six permutations ${g}$ to consider. They become the six columns after the first in the following table:

$\displaystyle \begin{array}{c|cccccc} \hline a & a,b & a,b & a & a & b,c & b,c\\ b & a & b,c & a,b & b,c & a & a,b\\ c & b,c & a & b,c & a,b & a,b & a\\ \hline D_g & \{b\} & \{c\} & \emptyset & \{c\} & \{a,b,c\} & \{a,c\} \end{array}$

Every subset not in the range appears; curiously ${\{c\}}$ appears twice. If instead we define ${F(a) = \{a\}}$ only, then only three permutations are relevant, but having ${F(a) = F(b)}$ gives us three other functions ${g}$ to consider:

$\displaystyle \begin{array}{c|cccccc} \hline a & a & a & b,c & a & b,c & b,c\\ b & a & b,c & a & b,c & a & b,c\\ c & b,c & a & a & b,c & b,c & a\\ \hline D_g & \{b\} & \{c\} & \{a,b,c\} & \emptyset & \{a,b\} & \{a,c\} \end{array}$

Once again we have all eight subsets between the range and the diagonals, this time with no repetition.

Which functions ${F}$ have this “pan-diagonal” property? If some element ${a \in S}$ belongs to every set ${F(x)}$ then we can never get ${a}$ into any diagonal, so except for some small-${S}$ cases the answer is ‘no’ for such ${F}$. Note that the complementary function ${F'(x) = S \setminus F(x)}$ gives equivalent results by duality. Hence the answer is also generally ‘no’ when some element is excluded from all subsets ${F(x)}$. This extends to say that if some ${k}$ elements are collectively included in ${k-1}$ or fewer subsets, and ${F}$ is one-to-one, then it is impossible to get ${\emptyset}$ as a diagonal, so the answer is ‘no’ unless ${\emptyset}$ is already part of the range. Note, however, that our second example shows that success is still possible if ${F}$ is not 1-to-1.

Is there an easily-stated criterion for ${F}$ to be pan-diagonal? Or is ${\mathsf{NP}}$-hardness lurking about? That is the puzzle. One can also pose it for infinite ${S}$, when the range of ${F}$ is not closed under finite differences.

## Open Problems

How powerful is diagonalization? Does the puzzle have a simple answer?

We also wish everyone a Happy Thanksgiving.

November 26, 2014 8:25 pm

I would be ashamed to be associated with the Cornell University in any way. See http://www.tcm.phy.cam.ac.uk/~bdj10/archivefreedom/main.html about Cornell’s arxiv.org preprint server blacklisting of authors. This still hasn’t changed in the ten years since that was written. Why does everyone pretend that everything is wonderful at Cornell University when it’s not? And why does everyone remain silent?

Don’t scientists understand that many of the important scientific discoveries were considered silly at one time by the mainstream? If arxiv.org were around in Einstein’s time, I have no doubt that Einstein would have been blacklisted by Cornell for either not having a proper email address (before he got the job at the patent office) or having crazy ideas that contradict Newton’s laws of physics.

Anyway, happy 50th birthday to Cornell’s CS department. And Happy Thanksgiving! The Pilgrims came to America trying to find freedom of expression. Today, Cornell University’s arxiv.org embodies exactly what the Pilgrims were running from.

November 27, 2014 12:36 am

They not only have discriminating policy on publishing papers, they sound like they are giving a charity!
Once I was trying to publish a different paper with the same issue on NP=?P, they came out with claim that it is the same paper with a different title. Upon providing an explanation, they told me to “take my research somewhere else”.
There have been other examples of many similar papers with different titles in a more explicit way but they were ignored.

2. November 27, 2014 2:18 pm

In the sentence about Bill Gates and the Cornell President David Skorton discussing “how to serve students best,” I didn’t think it necessary to say “how students can be served best both on campus and in career choices.” My economy of words did, however, call to mind this famous “Twilight Zone” episode.😉

3. December 1, 2014 10:35 am

I am blacklisted at arXiv.org, and then after I’ve published http://arxiv.org/ftp/arxiv/papers/0907/0907.3965.pdf, I had tried hard but I could never more publish another paper in it, as http://www.andrebarbosa.eti.br/The_Size_of_the_Hilbert_Hotel_Computer.pdf, http://www.andrebarbosa.eti.br/The_Randomness_Delusion.pdf, and several others were all summarily rejected.

However, I think that this is fair, since my papers are considered crackpottery, and the official channels of serious Science cannot open all their scarce doors to all cranks of the world, which have little contributions to do e great thirsty to gain huge attention of the sternest researchers of the respective fields.

4. SG from Melbourne AU permalink
December 4, 2014 12:22 am

As someone who has worked with Dexter Kozen, I’d say that it’s been a great pleasure knowing him and working with him. The last time I met him was at Perth (AU) 2-3 years ago, and he got very involved with the theory community in Australia through their (rather, our) annual theory symposium.

January 13, 2015 12:25 pm

Is this an answer to the finite puzzle? :
S = {a,b,c,d}, F(a) = {a,b}, F(b) = {a}, F(c) = {b}, F(d) = {c,d}.
Then F is 1-1, so to have Range(F o g) = Range(F), g must be 1-1.
The empty set is “unindexed”, but if D_g(F) = empty, then for all x in S,
x is in F(g(x)). This means that Range(F) has a “system of distinct representatives”,
but for A = {a,b,c}, union(F(x), x in A) = {a,b}, which is too small to afford distinct
representatives.
So, there is no g with both Range(F o g) = Range(F) and D_g(F) = empty.
(Also, union(Range(F)) = S, which wasn’t required by the “puzzle” but could be
added easily by including the extra element d.)

• January 13, 2015 7:33 pm

This case appears to be covered by the sentence:

“This extends to say that if some k elements are collectively included in k-1 or fewer subsets, and F is one-to-one, then it is impossible to get \emptyset as a diagonal, so the answer is ‘no’ unless \emptyset is already part of the range.”

Here k=2 and the two elements c,d are collectively included in just 1 subset, F(d) = {c,d}. Hence there is no way of choosing 4 distinct representatives, 1 from each of the four subsets in the range. There is a dual to this condition which is analogous to the previous sentence: if some k elements overlap all but k-1 or fewer subsets then also no-go in general. The problem is whether obeying these restrictions always suffices to be pan-diagonal.