Skip to content

Permutation Problems With Strings

December 7, 2015


From “Parsons Puzzles” to Babai’s breakthrough, and more?

ParsonsHadenFernEricson
NZ silver fern source: Robin Ducker (CC)

Dale Parsons and Patricia Haden, of Otago Polytechnic in New Zealand, wrote a seminal paper on a genre of programming puzzles that Barbara Ericson at Georgia Tech is furthering in her doctoral work. The puzzles give lines of code in a scrambled order, sometimes with incorrect lines thrown in. The goal is to find the correct order of the correct lines.

Today Ken and I wish to discuss various string rearrangement problems in relation to the Graph Isomorphism problem.

The Parsons method was pointed out to me by Ericson, who also serves as our Director of Computing Outreach. Her thesis abstract says in part:

Parson’s programming puzzles are a family of code construction assignments where lines of code are given, and the task is to form the solution by sorting and possibly selecting the correct code lines. We introduce a novel family of Parson’s puzzles where the lines of code need to be sorted in two dimensions. The vertical dimension is used to order the lines, whereas the horizontal dimension is used to change control flow and code blocks based on indentation as in Python.

This really intrigued me and I looked into what exactly is the Parsons method. One of her major references is a paper by Petri Ihantola and Ville Karavirta of Aalto University in Finland. Here is one of their examples:

puzzle

One can make a similar but harder puzzle out of Ken’s example a year ago in our memorial post for Susan Horwitz, by scrambling his six lines of code that swap two nodes in a circularly linked list. In a class teaching proofs, I wonder how good an exercise it would be to scramble lines of a formal derivation.

All this has us thinking further about problems involving rearranging strings. We can think of each given line as a character. Some Parsons puzzles have multiple occurrences of the same line which is like having the same character multiple times in a string.

Viewed this way, we can even call “Parson’s puzzle” a Parsons puzzle: the apostrophe should if anywhere be after the final ‘s’. There’s nothing wrong with “Parsons puzzle” (reading the name as an adjective not possessive), which is also a legal Parsons solution since the apostrophe “line of code” is skipped. There are isomorphic answers since we could swap either ‘s’ or ‘z’, and these swaps remain automorphisms of the final string. Now we are getting into the domain of László Babai’s new results about strings, to which we turn.

Graph Isomorphism With Subgroups

To understand Babai’s usage of “string isomorphism,” a concept introduced by Eugene Luks in 1980 under the name “color isomorphism,” let’s first consider a natural, known add-on to the graph isomorphism (GI) problem:

Given two labeled graphs {X,Y} on nodes {1,\dots,n} plus a group {G} presented by permutations {\pi_1,\dots,\pi_k} of {1,\dots,n}, is there an isomorphism {\sigma} from {X} to {Y} that belongs to {G}?

For example, consider {n = 6} and the subgroup {G} generated by {\pi_1 = (1~2~3~6)(4~5)} and {\pi_2 = (2~4)(5~6)}. These are both even permutations since {\pi_1 = (3~6)(2~3)(1~2)(4~5)} so we immediately know {G} is not all of the symmetric group {S_6}. Now consider these two labeled graphs {X} and {Y}:

PrismIso

Then {X} and {Y} are isomorphic by the interchange {(3~5)}, but being odd, this does not belong to {G}. Does that mean the answer to the problem for {(X,Y,\pi_1,\pi_2)} is “no”? Well, the graph {X} has odd automorphisms such as its mirror flip—perhaps one of them composed with {(3~5)} belongs to {G}? It’s a tricky question. We could force a “no” answer by adding a “dongle” to nodes 3 and 5 in both graphs so they can only be swapped with each other and different “dongles” to other nodes that prevent any other possible isomorphism. The dongles could be a single node connected to 3 and 5 and {i} nodes connected only to node {i} for {i = 1,2,4,6} in each graph.

GI is then the special case where we are given generators of {S_n}. Whether the general case reduces back to GI is a major open problem.

String Isomorphism

The simplest form of string isomorphism is, given strings {x,y} of length {N}, is there a permutation of {1,\dots,N} that carries {x} to {y}? This has a trivial answer: yes if and only if for every character {c} in the alphabet, the numbers of occurrences of {c} in {x} and in {y} are the same. This is easily in polynomial time, as are some other problems by the name “string isomorphism” on the Net. But here is the analogous subgroup problem:

Given strings {x,y} of length {N} and generators {\pi_1,\dots,\pi_k} of a subgroup {G} of {S_N}, does permuting the entries of {x} by some member of {G} give {y}?

This is not trivial. It is exactly as hard as the graph subgroup version, hence hard for GI. Let us first reduce GI to it. Having {G} enforces that only permutations derived from rearrangements of the {n} node indices are allowed on the {N = \binom{n}{2}} string indices. Specifying some way to “unroll” a graph’s adjacency matrix {A} into a string yields generators for the right {G}.

Since our graphs are undirected we need only unroll the upper triangle. Let’s do it by first going down the largest diagonal

\displaystyle A[1,2].A[2,3],\dots,A[n-1,n],

then the next-largest, and so on in “bands” ending at the upper-right corner {A[1,n]}. For {n = 4} we get {N = 6} entries in the following order:

K4

Here the six “edges” are really the slots in the string for possible edges. A pair of generators for {S_4} is {(A~B~C~D)} and {(A~B)}. The corresponding permutations of the edge slots are {\pi_1 = (1~2~3~6)(4~5)} and {\pi_2 = (2~4)(5~6)}. By “design of accident” these are the same as in the example above though there is no relation—this just expedites our seeing that the subgroup {G_N} generated by {\pi_1,\pi_2} is not all of {S_6}. The idea extends for any {n > 4} in canonical fashion: from the cycle and transposition generators of {S_n} we get two generators {\pi_1,\pi_2} for {G_N}. Babai calls the group {G_N} in such cases {S_n^{(2)}} to emphasize the action being on pairs.

Now given any labeled graphs {X} and {Y} on {n} nodes, we can read off the corresponding strings {x} and {y}, and then {X} is isomorphic to {Y} if and only if {(x,y,\pi_1,\pi_2)} is a yes-case of the string problem. This completes the reduction from GI.

To reduce from the subgroup version of GI we map the given generators of {G} to their images in {S_n^{(2)}} under the above embedding. This reduction can be inverted: Given the string {x}, define the graph {X} to be a disjoint union of isolated pairs of nodes {i_0,i_1} connected by an edge for each {i} such that {x_i = 1}, and isolated pairs not connected by an edge for each {i} such that {x_i = 0}. Likewise map {y} to a graph {Y}. Finally map each generator {\pi_{\ell}} by replacing each move {i \rightarrow j} in {\pi_{\ell}} by {i_0 \rightarrow j_0} and {i_1 \rightarrow j_1}.

Graph and String Automorphisms Too

An automorphism is a rearrangement that leaves a structure {X} the same. Such permutations form a group called the automorphism group {\mathsf{Aut}(X)}. GI has long been known to be polynomial-time equivalent to the problem of computing a complete set of generators for {\mathsf{Aut}(X)} when {X} is a labeled graph. Assuming {X} and {Y} are connected graphs (else use their complements), the forward reduction takes {X' = X \cup Y} and notes that every generating set of {\mathsf{Aut}(X')} contains a member that maps {V(X)} onto {V(Y)} if and only if {X} is isomorphic to {Y}.

The reverse direction is trickier and requires both decomposing the group and progressively adding “dongles” to pairs of graphs and asking whether they are isomorphic. The decision version, “does {X} have a nontrivial automorphism?” (GA), reduces to GI but is not known to be equivalent.

This extends to an equivalence of the subgroup-{G} version of GI and finding generators for all automorphisms in a subgroup. We almost get a reduction to the subgroup version of GA which is a decision problem: Make two copies {[n_0],[n_1]} of {[n]} and convert each given generator {\pi} of {G} into a {\pi'} that maps {i_0} to {(\pi i)_1} and {i_1} to {(\pi^{-1} i)_0}. Also convert the identity which yields the swap {\sigma: i_0 \leftrightarrow i_1} and let these generate {G'}. Then {X} and {Y} are isomorphic by a member of {G} if and only if {X \cup Y} has an automorphism that is an odd-length word over these generators. We still get a reduction to finding all generators of {G' \cap \mathsf{Aut}(X \cup Y)}. The reverse reduction is still tricky, but no more so than the original since it was decomposing groups anyway.

For strings {x}, we again have a situation where the basic problem is trivial: the automorphisms permute the indices for each individual character among themselves, so we need only write the usual generators for a product of symmetric groups. However, the subgroup version is again nontrivial. We state it for a binary string, but like the above it extends to any set of characters or “colors.”

Given a string {x \in \{0,1\}^N} and generators for a subgroup {G} of {S_N}, find generators for {G \cap (S_0 \times S_1)} where {S_0} permutes the indices {i} having {x_i = 0} and {S_1} permutes {j} with {x_j = 1}.

Note how this involves finding generators for the intersection of two subgroups. By ideas like those just above this is equivalent to the subgroup version of string isomorphism, hence in turn to the subgroup version of GI. The method of the last section can also reduce the functional graph-automorphism problem directly to this string version, and conversely.

The extra generality of the string problems is that the subgroup {G'} involved need not be {S_n^{(2)}} as in the reductions from GI, indeed need not arise from subgroups {G} acting on graphs at all. This feature is exploited in the recursions that Babai’s proof sets up. His algorithm classifies all of these problems into quasipolynomial time. Thus, defining the problems with subgroups makes them nicer.

Use in the Algorithms

Although automorphisms seem simpler than isomorphisms, if our objective is to reduce the groups {G} involved, the last reduction above involving {X' = X \cup Y} is going in the wrong direction since {G'} is larger than {G}. Moreover, the odd-length words that interested us formed not a subgroup but a coset of the even-length words under the swap {\sigma}. The following key idea of Eugene Luks’s seminal 1980-82 paper putting the bounded-degree case of GI into {\mathsf{P}} shows how cosets can be handled directly and without resort to the reduction to automorphisms.

Suppose in general that one has a subgroup {H} of {G} and coset representatives {\sigma_1,\sigma_2,\dots,\sigma_r} where {\sigma_1} is the identity and {r = [G:H]}. For right-cosets this means

\displaystyle G = H \cup H\sigma_2 \cup \cdots \cup H\sigma_r.

For any subset {K} of {G} write {\mathsf{ISO}_K(X,Y)} to mean the set of isomorphisms from {X} to {Y} that belong to {K}. That is, writing actions on the right, {\mathsf{ISO}_K(X,Y) = \{\rho \in K: X\rho = Y\}}. Then just by simple set theory,

\displaystyle \mathsf{ISO}_G(X,Y) \;=\; \bigcup_j \;\mathsf{ISO}_{H\sigma_j}(X,Y).

The terms for {j \geq 2} aren’t recursing on subgroups but the trick is that we can make it so by permuting {Y} by {\sigma_j^{-1}} to get {Y_j}. Then:

\displaystyle \begin{array}{rcl} \mathsf{ISO}_{H\sigma_j}(X,Y) &=& \{h\sigma_j: h \in H \wedge Xh\sigma_j = Y\}\\ &=& \{h\sigma_j: h \in H \wedge Xh = Y\sigma_j^{-1}\}\\&=& \{h \in H: Xh = Y\sigma_j^{-1}\}\,\sigma_j\\ &=& \mathsf{ISO}_{H}(X,Y_j)\,\sigma_j. \end{array}

This allows making progress by {r}-fold recursion on the isomorphism problem for the subgroup {H}. The idea applies equally well for graphs or strings or other structures.

Babai’s algorithm builds on this by creating new ways to achieve either this or some other progress, meanwhile metering progress via certain homomorphisms from {G} into symmetric groups on smaller structures rather than by trying to divide up the domain of {X} or {Y}. Automorphisms still frame essential concepts in the algorithm, in part because whenever {\mathsf{ISO}_G(X,Y)} is nonempty, any one member {\sigma} makes {\mathsf{ISO}_G(X,Y) = \mathsf{Aut}_G(X)\sigma}. We hope to survey more of this in a future post.

What to Call These Problems?

We hasten to add that these computational problems are not new. Their contexts in Luks’s paper are brought out by further references including a 1983 paper by Gary Miller and Babai’s 1994 Handbook of Combinatorics survey, and they are foreshadowed in the treatment of coloring-preserving auto/iso-morphisms in Babai’s own 1979 paper which introduced the term “Las Vegas algorithm.”

In highlighting them all together we’d like to find a common and consistent naming scheme. Adapting names from these papers and echoing Luks’s 1993 survey might suggest names like {\text{\sc String-Iso}_G} or {\text{\sc Graph-Auto}_G} or {\text{\sc Graph-Auto-Gen}_G}, the last distinguishing the problem of giving a full set of generators for {G \cap \mathsf{Aut}(X)} from the decision problem of whether {G \cap \mathsf{Aut}(X)} is nontrivial. We’ve had a suggestion to put the ‘G’ out front and the class of relational structures in parentheses, such as {\text{\sc G-Iso}(\mathsf{Strings})} or {G\text{\sc -Auto}(\mathsf{Graphs})}, with the latter’s math-italicized {G} indicating a particular subgroup {G}.

For abbreviations we like appending the ‘G’ to turn GI into GIG and similarly SIG for the string version. The rub is that for the automorphism problems this could lead to names like GAG and SAG, even GAGG and SAGG for the generator-set versions, which don’t sound so nice. Writing {\equiv} for polynomial-time or some tighter equivalence, and using less-gaggy notation, the current state is:

  • {\text{\sc Iso}(\mathsf{Strings})} and {\text{\sc Auto-Gen}(\mathsf{Strings})} belong to {\mathsf{P}}.

 

  • {\text{\sc Iso}(\mathsf{Graphs}) \equiv} {\text{\sc Auto-Gen}(\mathsf{Graphs})}.

 

  • {\text{\sc G-Iso}(\mathsf{Graphs}) \equiv} {\text{\sc G-Iso}(\mathsf{Strings}) \equiv} {\text{\sc G-Auto-Gen}(\mathsf{Strings}) \equiv} {\text{\sc G-Auto-Gen}(\mathsf{Graphs})}.
  • {\text{\sc Auto}(\mathsf{Graphs})} reduces to GI, which reduces to the {\text{\sc G-Iso}} problems.We don’t know about the {\text{\sc G-Auto}} decision versions for graphs and strings. All of these in turn reduce to various forms of the canonization problem, which features in this 1983 paper by Babai and Luks, Luks’s 1993 survey, this 1997 note by Yuri Gurevich, and this recent paper by Lance Fortnow and Joshua Grochow.

    More Isomorphism and Permutation Problems

    As a hint that much more can be said, consider this 1998 dissertation by David Christie titled “Genome Rearrangement Problems.” Then consider that the word “genome” stops appearing after page 15 but there are 143 pages of neat material on permutations and sorting and string edit-distance type problems and cases that are {\mathsf{NP}}-hard, or not. Evidently there are “Parsons puzzles” for genetic code. We’re not saying that Babai’s algorithm has any relevance to these—Laci is the first to disclaim any competitiveness with existing heuristic GI solvers—but something in the new structural and algorithmic ideas might click.

    Open Problems

    The most particular problem is whether GIG—that is, {\text{\sc G-Iso}(\mathsf{Graphs})}—reduces back to GI. If not, then does it sit properly between GI and the canonization problems? What to call these problems?

    We end with a simple Parsons problem, with one line to omit:

    and we await further news expectantly.

    talk last Tuesday

    to the University of Chicago,

    We were glad to hear that Laci’s part-III

    the proof is by double recursion

    was not disturbed by the previous day’s hoax threat

    [wording change at end of “Use” section.]

6 Comments leave one →
  1. December 8, 2015 4:22 pm

    P vs NP?

    We define an interesting problem called $MAS$. We show $MAS$ is actually a succinct version of the well known $\textit{NP–complete}$ problem $\textit{SUBSET–PRODUCT}$. When we accept or reject the succinct instances of $MAS$, then we are accepting or rejecting the equivalent and large instances of $\textit{SUBSET–PRODUCT}$. Moreover, we show $MAS \in \textit{NP–complete}$.

    In our proof we start assuming that $P = NP$. But, if $P = NP$, then $MAS$ and $\textit{SUBSET–PRODUCT}$ would be in $\textit{P–complete}$, because they are $\textit{NP–complete}$. A succinct version of a problem that is complete for $P$ can be shown not to lie in $P$, because it will be complete for $EXP$. Indeed, in Papadimitriou’s book is proved the following statement: “$NEXP$ and $EXP$ are nothing else but $P$ and $NP$ on exponentially more succinct input”. Since $MAS$ is a succinct version of $\textit{SUBSET–PRODUCT}$ and $\textit{SUBSET–PRODUCT}$ would be in $\textit{P–complete}$, then we obtain that $MAS$ should be also in $\textit{EXP–complete}$.

    Since the classes $P$ and $EXP$ are closed under reductions, and $MAS$ is complete for both $P$ and $EXP$, then we could state that $P = EXP$. However, as result of Hierarchy Theorem the class $P$ cannot be equal to $EXP$. To sum up, we obtain a contradiction under the assumption that $P = NP$, and thus, we can claim that $P \neq NP$ as a direct consequence of the Reductio ad absurdum rule.

    You could see more on (version 2)…

    https://hal.archives-ouvertes.fr/hal-01233924v2/document

    Best Regards,
    Frank.

    • Aula permalink
      December 10, 2015 6:33 am

      Your argument has a flaw: P=NP doesn’t imply that P-completeness and NP-completeness are the same thing. You would have to assume that L=P=NP (if your definition of P-hard uses linear time) or NC=P=NP (if P-hard uses bounded circuit size), but then you have no hope of showing that P=NP by itself leads to a contradiction.

  2. Javaid Aslam permalink
    December 12, 2015 9:09 pm

    Profs RJLipton+KWRegan,
    Here is an idea on graph isomorphism, not particularly original, but I thought I should run by you. This extends an algorithm for the search of a permutation pi in a group G < S_n given by its generators. [Chris Hoffman 1982, LNCS]. Considering the length of time for keeping this GI problem unsolved, it is unlikely that a similar idea has not been tried out, but again I thought you might provide an insight into its failure.

    Original algorithm: Compares the action of each Coset Representative of G, with that of pi on each element of the domain X = {1,2, …, n}.
    This clearly gives an O(n^2) time bound determined by the size of the generators of G.
    I have the following extension which does not “quite” work because I don’t know what that new comparison of two the “actions” should be here. Anyway, the heuristics is here:
    Given: Graphs g1 and g2 labeled from X, and the generators of S_n as the (right) Coset Representatives: {() (1 2), (1 3), …(1, n)
    ( 2 3), (2 4), …(2 n), … , (n, n-1)}

    Now we compare the action of a generator psi of S_n on the given graph g1 (leading to an isomorphic graph of g1) with possibly a sub-graph of g2 (induced by psi).
    The comparison progresses in an order beginning with the Coset Rep in {(n-1 n)}.
    Each step must lead to a “successful” comparison for an isomorphism to exist. The bound is then O(n^2) comparisons. The isomorphism (if successful) is the product of all “psi” because of the transitive (in fact equivalence) relation.

    A major difficulty in extending the original algorithm, as I see ,is that, unlike in the original search for a permutation in a group, the two domains of the isomorphic graphs may not be the same, even though the labels are taken from the same set X. This makes the comparison “action” difficult to determine.

    This is perhaps why Babai is using the strategy of string isomorphism, to sync the two domains.
    But the fact that the isomorphism relation is an equivalence relation, we will always have a polynomial (O(n^2)) size n-partite directed acyclic graph (DAG) representing the equivalence class of the isomorphic graphs of g1. Therefore, should it not be possible to search for a (directed) path representing the graph g2 in this DAG?
    Thank you,

    • Javaid Aslam permalink
      December 12, 2015 10:39 pm

      Of course, I meant the “search” in polynomial time:
      >Therefore, should it not be possible to search for a (directed) path representing the graph g2 in this DAG?

  3. December 15, 2015 10:33 am

    Thanks Aula,

    All currently known $NP-complete$ problems are $NP-complete$ under log space reductions including my problem $MAS$. I should have mentioned to avoid misunderstandings, sorry: Maybe in the next version🙂.

    Best Regards,
    Frank.

Trackbacks

  1. Blog Post #2000: Barbara Ericson Proposes: Effectiveness and Efficiency of Adaptive Parsons Problems #CSEdWeek | Computing Education Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s