A problem about “unraveling” graphs into strings

László Babai is a great combinatorial mathematician as well as a great complexity theorist. Among his many illustrious results are surprising upper bounds and nontrivial lower bounds, both kinds obtained by connecting group theory with complexity theory.

Today we wish to talk about Babai’s work on the Graph Isomorphism problem, and present an open problem about the complexity of finding a “canonical form” for a graph.

It is emblematic that his 1985 STOC paper which introduced Arthur-Merlin games, and whose main result was that some matrix-group problems belong to ${\mathsf{NP}^R \cap \mathsf{co}\mathrm{-}\mathsf{NP}^R}$ for a random oracle ${R}$, was titled “Trading Group Theory for Randomness.” This led into his famous paper with Shlomo Moran which developed the Arthur-Merlin theory in full and shared the inaugural Gödel Prize with Shafi Goldwasser, Silvio Micali, and Charles Rackoff’s equally famous paper on Interactive Proof Systems.

Laci, as he is often called, is also an ardent popularizer of mathematics for undergraduates. He co-founded the Budapest Semesters in Mathematics program to bring North Americans to his native Hungary, won a major teaching award at the University of Chicago in 2005, and has led an 8-week summer REU course at Chicago every year since 2001.

## Graph Isomorphism

A graph ${G}$ is usually encoded as an adjacency matrix or as a list of its edges. Either way involves numbering the vertices ${1}$ to ${n}$. Two different numberings will often produce two different encodings ${g_1}$ and ${g_2}$, even though they represent the same graph. The Graph Isomorphism (GI) problem is, given any two encodings ${g_1}$ and ${g_2}$, do they represent the same graph? Or if we think of the encodings themselves as the “graphs,” are those graphs isomorphic?

GI belongs to ${\mathsf{NP}}$: when the graphs are isomorphic, one can guess a permutation ${\sigma}$ of ${\{1,\dots,n\}}$, and verify for each pair ${i,j \in \{1,\dots,n\}}$ that ${(i,j)}$ is an edge in ${g_1}$ if and only if ${(\sigma(i),\sigma(j))}$ is an edge in ${g_2}$. This is one of the amazingly few natural decision problems in ${\mathsf{NP}}$ whose status for many years has been “Intermediate.” It is not known to be ${\mathsf{NP}}$-complete, and is not known to be in ${\mathsf{P}}$. If GI is ${\mathsf{NP}}$-complete then the polynomial hierarchy collapses at the second level, as follows from its complement belonging to Babai’s Arthur-Merlin class ${\mathsf{AM}}$, which is a randomized form of ${\mathsf{NP}}$.

The famous text by Michael Garey and David Johnson listed twelve problems in ${\mathsf{NP}}$ that in 1979 were not known to be complete or in ${\mathsf{P}}$. Of these only two remain with “Intermediate” status: GI and Factoring. As stated in the previous post, each of us believes one of them is in ${\mathsf{P}}$. What can be said now is that many special cases of GI belong to ${\mathsf{P}}$, as enumerated here. This includes a deep paper by Laci with Dima Grigoriev and David Mount.

## Why GI May Be Hard

We get several claims now and then to have discovered a polynomial time algorithm for GI. So far it still seems to be open whether or not GI is in ${\mathsf{P}}$. Those working on the problem should, we think, be aware of several basic facts about GI.

${\bullet }$ Pure Matrix Methods Are Weak: Several obvious ideas are to look at the adjacency matrix of a graph and try to use this to give the graph a canonical name. If the method depends only on the eigenvalues of the matrix, then the method must fail. This follows since there are many graphs that are co-spectral: they are graphs that are not isomorphic, but have exactly the same spectrum:

${\bullet }$ Strongly Regular Graphs Are Tough: A great test case for any new idea on GI is the class of graphs that are called strongly regular. A regular graph, of course, is just a graph with the constraint that all its degrees are the same. Another way to think about this is: every vertex in the graph has the same number of neighbors. A strongly regular carries this property to the next level. A regular graph is ${(\lambda,\mu)}$ strongly regular if,

1. Every two adjacent vertices have exactly ${\lambda}$ common neighbors.
2. Every two non-adjacent vertices have ${\mu}$ common neighbors.

The GI problem restricted to these graphs is also open. The reason it is so hard is that all naive methods that depend on the local neighborhood structure fail—because the graphs are so “regular.”

${\bullet }$ The GI “Cluster”: Another piece of evidence is that a few dozen different-seeming problems, from group theory, matrix theory, and other fields, are known to be polynomial-time equivalent to GI. One example is a term-equality problem proved “GI-complete” by David Basin. This is a miniature version of the argument that since no one has found a poly-time algorithm for any of the thousands of different ${\mathsf{NP}}$-complete problems, they are all hard. This argument has its doubters, Dick included, but at least for GI more than Factoring, there is some “salience” of the level of complexity it represents.

See this 2005 survey for other information on the hardness of GI.

## Canonizing Graphs

An isomorphism invariant is a function ${I}$ from graph representations to binary strings such that ${I(g) = I(g')}$ if and only if ${g}$ is isomorphic to ${g'}$.

Yuri Gurevich showed that any such ${I}$ can be modified with polynomial-time overhead into an invariant ${I'}$ such that ${I'(g)}$ outputs an adjacency matrix ${g_0}$ for the graph. Thus ${I'(g') = g_0}$ if and only if ${g'}$ represents the same graph ${G}$ as ${g}$. Then ${g_0}$ is called a canonical form for ${G}$, and ${I'}$ is called a canonizing function.

Babai and Ludek Kučera found an ${O(n^2)}$-time computable function that acts as a canonizing function on all but a ${1/2^{\delta n}}$ fraction of random ${n}$-vertex graphs. The scheme is simple to execute but a bit tricky to state, so we refer to the paper for details. This implies that graph-isomorphism is in polynomial time for “random” graphs.

Note that there are many classes of objects, including special types of graphs, for which canonical naming schemes are known. For example, this is long known to be true for trees. Rather then present that naming method, let’s consider a much simpler problem of naming sets of numbers. A simple method for naming a set ${S = \{s_{1},\dots,s_{n}\}}$ is to sort the set of numbers and then make them into a string: if ${s_{\pi_{1}},\dots,s_{\pi_{n}}}$ is the sorted order of the set ${S}$‘s elements, then the “name” of the set is the sequence

$\displaystyle s_{\pi_{1}},\dots,s_{\pi_{n}}.$

Note the key point is that two sets are equal as sets if and only if they have the exact same sequence. Thus, the set ${\{8,3,41\}}$ is named by the sequence ${3,8,41}$.

## An Approach

We wish to consider a more simple-minded way to try to get an invariant or a canonical form, using any polynomial-time computable 1-1 function ${f}$ from graphs to strings. Define

$\displaystyle \ell_f(g) = \max\{f(g') : g' \text{ is isomorphic to } g\}.$

Here the maximum can be taken with regard to the standard lexicographic order on binary strings, and is called the lex-max canonical form. A decision problem that helps one find it by binary search is:

Definition 1 Decision problem ${\mathsf{CAN}(f)}$: Given ${g}$ and a string ${w \in \{0,1\}^*}$, does there exist a ${g'}$ isomorphic to ${g}$ such that ${f(g') \geq w}$?

Then computing ${\ell_f}$ is polynomial-time equivalent to deciding ${\mathsf{CAN}(f)}$. Also ${\mathsf{CAN}(f)}$ belongs to ${\mathsf{NP}}$, since we can guess ${g'}$. Thus ${\mathsf{CAN}(f)}$ is always between GI and the ${\mathsf{NP}}$-complete problems. We wonder, is it always ${\mathsf{NP}}$-complete? Is it ever equivalent to GI itself? Strangely, no canonizing function is known to be equivalent to GI, but many are ${\mathsf{NP}}$-hard.

## Which “Unravelings” are NP-Hard?

Let us simply try to “unravel” an adjacency matrix into a binary string. For an undirected graph, we need only the upper triangle of the matrix. The three most obvious rules we can think of are to list those entries in order going: 1. across by the rows, 2. down by the columns, or 3. starting with the largest diagonal and ending at the upper-right corner. These are best shown by a diagram:

$\displaystyle A = \begin{array}{|ccccc|} 0 & 1 & 1 & 0 & 1\\ { } & 0 & 1 & 1 & 1\\ { } & { } & 0 & 0 & 1\\ { } & { } & { } & 0 & 0\\ { } & { } & { } & { } & 0\\ \end{array}$

$\displaystyle \begin{array}{rcl} f_1(A) &=& 1101.111.01.0 = 1101111010\\ f_2(A) &=& 1.11.010.1110 = 1110101110\\ f_3(A) &=& 1100.111.01.1 = 1100111011 \end{array}$

Theorem 2 ${\mathsf{CAN}(f_2)}$ is ${\mathsf{NP}}$-complete, by reduction from Clique, and ${\mathsf{CAN}(f_3)}$ is ${\mathsf{NP}}$-complete, by reduction from Hamiltonian Path.

To prove this, given an instance ${(g,k)}$ of Clique, define ${w}$ to consist of ${k}$-choose-${2}$ ${1}$‘s, followed by ${0}$‘s out to length ${n}$-choose-${2}$. Then the graph represented by ${g}$ has a clique of size ${k}$ if and only if there is a permutation that numbers the vertices of the clique ${1,\dots,k}$. Applying this permutation yields an adjacency matrix ${A'}$ such that ${f_2(A')}$ begins with ${k}$-choose-${2}$ ${1}$‘s, so ${f_2(A') \geq w}$. (Note that for the above ${A}$, we can show a 4-clique by swapping vertex labels ${4}$ and ${5}$.)

For ${f_3}$, given an instance ${g}$ of Hamiltonian Path, take ${w}$ to begin with ${n-1}$ ${1}$‘s and the rest ${0}$. Then ${g}$ has such a path if and only if there is a re-numbering of the vertices in the path from ${1}$ to ${n}$, and this majorizes ${w}$.

However, we do not know whether ${\mathsf{CAN}(f_1)}$ is ${\mathsf{NP}}$-complete. If we try ${w = 1^{n-1}0\cdots 0}$ as for ${f_3}$, then having a permutation creating ${A'}$ such that ${f_1(A') \geq w}$ only means that the original graph ${g}$ has a node connected to every other node, which is easy to determine. We do not see how to choose ${w}$ to gain information much more useful than that.

If we compare graphs to sets, we can see why canonization is possibility hard. For sets of numbers the making of a canonical name is helped by our ability to sort numbers—the act of sorting preserves sets, but places them into a canonical form. The difficulty for graphs, and therefore GI, is that the corresponding “sorting” step seems to be impossible. Or said better, today we have no idea how to “sort” an adjacency matrix in polynomial time.

## Open Problems

Is ${\mathsf{CAN}(f_1)}$ ${\mathsf{NP}}$-complete? Or is row-major order somehow weaker than the column-major and diagonal unravelings?

Is there a canonizing function that is equivalent to GI itself, or is canonizing graphs always harder than testing for isomorphism?

Are GI and canonizing really in ${\mathsf{P}}$? An ostensibly easier task is to put GI into ${\mathsf{BQP}}$ as with Factoring—see this StackExchange thread for more.

June 28, 2011 2:00 pm

We know at this point that no “Shor-like” quantum algorithm can solve GI. Specifically, we know that any quantum algorithm which treats the graph as a “black box”, an oracle that returns a permuted graph in an unstructured way, requires highly-entangled measurements (Hallgren et al., http://www.cse.psu.edu/~hallgren/multireg.pdf ). Moreover, the families of entangled measurements that we know how to carry out efficiently don’t work (Moore, Russell, and Sniady, http://arxiv.org/abs/quant-ph/0612089 ).

My personal guess at this point is that GI is in BQP if and only if it is in P. But this is because one should always make guesses that one would like to have proved wrong.

2. June 28, 2011 2:17 pm

For the general question of which equivalence classes have poly-time computable complete invariants and/or canonizations, there’s a lucid and interesting treatment in this paper of Fortnow and Grochow:

http://arxiv.org/abs/0907.4775

• June 28, 2011 2:25 pm

(Of course, it doesn’t *answer* the question; but it gives what I think is the right complexity-theoretic framework to think about it, and proves some structural results.)

3. June 28, 2011 8:59 pm

You mention that strongly regular graphs seem particularly hard to distinguish. That led me to consider the following family of decision problems related to GI:

Let A be a set of graphs. Define GI(A) to be this problem: given a graph H, is H isomorphic to a graph in A?

I am specifically interested in the case where A = {G_n | n = 1, 2, …}, where G_n is an n-vertex graph. GI(A) asks: is H isomorphic to the unique n-vertex graph in A?

Since the set A may not be computable, it might be best to have it as an oracle — I don’t want to try to define the problem too carefully. What I wonder is: are problems of this type easy? Or are there in fact specific graphs which are “difficult to recognize”?

If this problem is easy, we can extend the definition of GI(A) to any set of graphs and ask the same question. If it seems as hard as GI, maybe it will help to shed light on GI.

• June 28, 2011 9:10 pm

Here is I think a better formulation of the question I am really trying to ask.

Let f(G) be the number of gates needed in a circuit which outputs 1 if the input graph is isomorphic to G, and 0 otherwise. Let f(n) = max{f(G) | G has n vertices}.

How fast does f(n) grow? Is it polynomial? What families of graphs attain this maximum?

4. June 29, 2011 8:54 am

1) Is strongly regular graphs hard? See, D. A. Spielman, Faster isomorphism testing of strongly regular graphs, Proc. 28th ACM STOC, 1996, 576-–584. (O((n^1/3) log n)

2) I think that canonizing graphs is harder than testing for isomorphism. See, A. Rahnamai Barghi, I. Ponomarenko, Non-isomorphic graphs with cospectral symmetric powers, Electronic Journal of Combinatorics, http://www.combinatorics.org/Volume_16/PDF/v16i1r120.pdf

5. June 30, 2011 7:22 am

GI is a great problem and the evidence regarding what is the answer is confusing. I remember people trying to show it NP complete in the 70s (Now, we have very strong evidence aganst it; it will collapse the hierarchy.) It looked unrelated to factoring before Shor’s algorithm and then suddenly it looked like the noncommutative analog, but then it looked again not so related to factoring. It is related to very interesting group theory and very interesting graph theory. The question of finding parameters that will allow to identify non isimorphic graph is closely related to main themes of mathematical studies. The related study of isospectral graphs is quite exciting. The (somewhat remotely) possibly related edge- and vertex- reconstruction problems is of interest. Really a great problem!

• June 30, 2011 7:10 pm

Thank you for good reply! In 1979, M. R. Garey and D. S. Johnson (well-known book “Computers and intractability”) noted that proofs of NP-completeness seem to require a certain amount of redundancy, which redundancy graph isomorphism problem lacks (pages 155–156).

July 20, 2011 8:55 am

As you mentioned cospectral graphs. It would be interesting to find graphs with equal spectrum plus equal Laplacian spectrum!

July 4, 2013 12:11 am

For characterizing graphs up to isomorphism it could be worthwhile to try for a complete set of invariants which can be computed easily.

Consider following set which seems to me a complete set of invariants:

it is as follows:

C(G) = (C1(G), C2(G), ……., Cr(G), ……, Cp(G))

where, Cr(G) = (Cr(T1), Cr(T2), ….Cr(Tm)), a sub-sequence of numbers , representing count of trees on (r-1) edges of different (non-isomorphic) types among the all possible non-isomorphic types that are possible for trees on (r-1) edges, taken in the same order for both graphs whose isomorphism is to be checked.

As an alternative way of attaching the problem do have a look at:

http://vixra.org/pdf/1204.0019v1.pdf

DPM