Name That Graph
A problem about “unraveling” graphs into strings
László Babai is a great combinatorial mathematician as well as a great complexity theorist. Among his many illustrious results are surprising upper bounds and nontrivial lower bounds, both kinds obtained by connecting group theory with complexity theory.
Today we wish to talk about Babai’s work on the Graph Isomorphism problem, and present an open problem about the complexity of finding a “canonical form” for a graph.
It is emblematic that his 1985 STOC paper which introduced Arthur-Merlin games, and whose main result was that some matrix-group problems belong to for a random oracle , was titled “Trading Group Theory for Randomness.” This led into his famous paper with Shlomo Moran which developed the Arthur-Merlin theory in full and shared the inaugural Gödel Prize with Shafi Goldwasser, Silvio Micali, and Charles Rackoff’s equally famous paper on Interactive Proof Systems.
Laci, as he is often called, is also an ardent popularizer of mathematics for undergraduates. He co-founded the Budapest Semesters in Mathematics program to bring North Americans to his native Hungary, won a major teaching award at the University of Chicago in 2005, and has led an 8-week summer REU course at Chicago every year since 2001.
A graph is usually encoded as an adjacency matrix or as a list of its edges. Either way involves numbering the vertices to . Two different numberings will often produce two different encodings and , even though they represent the same graph. The Graph Isomorphism (GI) problem is, given any two encodings and , do they represent the same graph? Or if we think of the encodings themselves as the “graphs,” are those graphs isomorphic?
GI belongs to : when the graphs are isomorphic, one can guess a permutation of , and verify for each pair that is an edge in if and only if is an edge in . This is one of the amazingly few natural decision problems in whose status for many years has been “Intermediate.” It is not known to be -complete, and is not known to be in . If GI is -complete then the polynomial hierarchy collapses at the second level, as follows from its complement belonging to Babai’s Arthur-Merlin class , which is a randomized form of .
The famous text by Michael Garey and David Johnson listed twelve problems in that in 1979 were not known to be complete or in . Of these only two remain with “Intermediate” status: GI and Factoring. As stated in the previous post, each of us believes one of them is in . What can be said now is that many special cases of GI belong to , as enumerated here. This includes a deep paper by Laci with Dima Grigoriev and David Mount.
Why GI May Be Hard
We get several claims now and then to have discovered a polynomial time algorithm for GI. So far it still seems to be open whether or not GI is in . Those working on the problem should, we think, be aware of several basic facts about GI.
Pure Matrix Methods Are Weak: Several obvious ideas are to look at the adjacency matrix of a graph and try to use this to give the graph a canonical name. If the method depends only on the eigenvalues of the matrix, then the method must fail. This follows since there are many graphs that are co-spectral: they are graphs that are not isomorphic, but have exactly the same spectrum:
Strongly Regular Graphs Are Tough: A great test case for any new idea on GI is the class of graphs that are called strongly regular. A regular graph, of course, is just a graph with the constraint that all its degrees are the same. Another way to think about this is: every vertex in the graph has the same number of neighbors. A strongly regular carries this property to the next level. A regular graph is strongly regular if,
- Every two adjacent vertices have exactly common neighbors.
- Every two non-adjacent vertices have common neighbors.
The GI problem restricted to these graphs is also open. The reason it is so hard is that all naive methods that depend on the local neighborhood structure fail—because the graphs are so “regular.”
The GI “Cluster”: Another piece of evidence is that a few dozen different-seeming problems, from group theory, matrix theory, and other fields, are known to be polynomial-time equivalent to GI. One example is a term-equality problem proved “GI-complete” by David Basin. This is a miniature version of the argument that since no one has found a poly-time algorithm for any of the thousands of different -complete problems, they are all hard. This argument has its doubters, Dick included, but at least for GI more than Factoring, there is some “salience” of the level of complexity it represents.
See this 2005 survey for other information on the hardness of GI.
An isomorphism invariant is a function from graph representations to binary strings such that if and only if is isomorphic to .
Yuri Gurevich showed that any such can be modified with polynomial-time overhead into an invariant such that outputs an adjacency matrix for the graph. Thus if and only if represents the same graph as . Then is called a canonical form for , and is called a canonizing function.
Babai and Ludek Kučera found an -time computable function that acts as a canonizing function on all but a fraction of random -vertex graphs. The scheme is simple to execute but a bit tricky to state, so we refer to the paper for details. This implies that graph-isomorphism is in polynomial time for “random” graphs.
Note that there are many classes of objects, including special types of graphs, for which canonical naming schemes are known. For example, this is long known to be true for trees. Rather then present that naming method, let’s consider a much simpler problem of naming sets of numbers. A simple method for naming a set is to sort the set of numbers and then make them into a string: if is the sorted order of the set ‘s elements, then the “name” of the set is the sequence
Note the key point is that two sets are equal as sets if and only if they have the exact same sequence. Thus, the set is named by the sequence .
We wish to consider a more simple-minded way to try to get an invariant or a canonical form, using any polynomial-time computable 1-1 function from graphs to strings. Define
Here the maximum can be taken with regard to the standard lexicographic order on binary strings, and is called the lex-max canonical form. A decision problem that helps one find it by binary search is:
Definition 1 Decision problem : Given and a string , does there exist a isomorphic to such that ?
Then computing is polynomial-time equivalent to deciding . Also belongs to , since we can guess . Thus is always between GI and the -complete problems. We wonder, is it always -complete? Is it ever equivalent to GI itself? Strangely, no canonizing function is known to be equivalent to GI, but many are -hard.
Which “Unravelings” are NP-Hard?
Let us simply try to “unravel” an adjacency matrix into a binary string. For an undirected graph, we need only the upper triangle of the matrix. The three most obvious rules we can think of are to list those entries in order going: 1. across by the rows, 2. down by the columns, or 3. starting with the largest diagonal and ending at the upper-right corner. These are best shown by a diagram:
Theorem 2 is -complete, by reduction from Clique, and is -complete, by reduction from Hamiltonian Path.
To prove this, given an instance of Clique, define to consist of -choose- ‘s, followed by ‘s out to length -choose-. Then the graph represented by has a clique of size if and only if there is a permutation that numbers the vertices of the clique . Applying this permutation yields an adjacency matrix such that begins with -choose- ‘s, so . (Note that for the above , we can show a 4-clique by swapping vertex labels and .)
For , given an instance of Hamiltonian Path, take to begin with ‘s and the rest . Then has such a path if and only if there is a re-numbering of the vertices in the path from to , and this majorizes .
However, we do not know whether is -complete. If we try as for , then having a permutation creating such that only means that the original graph has a node connected to every other node, which is easy to determine. We do not see how to choose to gain information much more useful than that.
If we compare graphs to sets, we can see why canonization is possibility hard. For sets of numbers the making of a canonical name is helped by our ability to sort numbers—the act of sorting preserves sets, but places them into a canonical form. The difficulty for graphs, and therefore GI, is that the corresponding “sorting” step seems to be impossible. Or said better, today we have no idea how to “sort” an adjacency matrix in polynomial time.
Is -complete? Or is row-major order somehow weaker than the column-major and diagonal unravelings?
Is there a canonizing function that is equivalent to GI itself, or is canonizing graphs always harder than testing for isomorphism?
Are GI and canonizing really in ? An ostensibly easier task is to put GI into as with Factoring—see this StackExchange thread for more.