Zeke Zalcstein worked, before he retired a few years ago, on the boundary between mathematics and computational complexity. He started his career in mathematics, and then Zeke moved into computational complexity theory. His PhD advisor was John Rhodes who is an expert on many things, including the structure of semigroups; Rhodes has recently worked on the P=NP question himself; more on that in another post. The picture is John Rhodes since Zeke is camera shy.

I have known Zeke since I was a graduate student at Carnegie-Mellon and he was faculty there. Later, Zeke worked at a number of universities, including Stony-Brook, and then spend the latter part of his career as a program director at CISE/NSF.

Zeke loves to laugh, he is funny, he knows tons of mathematics–especially algebra, but he is a character in the best sense of the word. For example, he is very particular about what he eats and drinks, even his water cannot be just any water. When Zeke was at Stony-Brook he discovered that there was a water source in a certain open field that had the best water in the world. Not just Stony-Brook, the world. The water source was a faucet coming out of the ground, in an otherwise empty field. I asked Zeke where the water came from, the faucet was simply connected to a pipe that came out of the ground, no label, no sign, nothing. Did it come from a spring, from a well, or from the city water main? Zeke said he just knew it was great water.

Unfortunately, one day Zeke fell and broke his arm so it was impossible for him to drive for several weeks. While Zeke was incapacitated, a certain graduate student was kind enough to make periodic trips out to the faucet, fill up plastic jugs with the special water, and deliver them to Zeke.

I was once telling this story to John Hennessy, who is the President of Stanford University. He started to laugh, and confirmed the story: John had been the graduate student who was kind enough to help Zeke.

Enough, on to the main topic of today’s post. Zeke and I worked together on a number of problems over the years, and today I will talk about one that has a curious history, a neat proof, and a interesting application that never really happen.

The Problem and Our Results

The problem is: what is the space complexity of the word problem for the free group on two letters? This is not how we first heard the problem, but it is equivalent to what we were asked. The free group $F_2$ on two letters “a” and “b” is the group where the only relationships are:

$\displaystyle aa^{-1} = a^{-1}a = bb^{-1} = b^{-1}b = 1.$

It is called the free group since these are the minimal relations that can hold in any group. As usual the word problem is to determine given any word over ${a,a^{-1},b,b^{-1}}$ whether or not it equals ${1}$. For the rest, when we say word problem, we mean “the word problem for the free group on two letters.”

Zeke and I proved:

Theorem: The word problem is in ${\mathsf{DLOG}}$.

Actually we can proved more:

Theorem: A probabilistic log-space machine that has a one-way read only input tape can solve the word problem with error at most ${\epsilon}$, for any ${\epsilon>0}$.

There is a simple linear time algorithm for the word problem. The algorithm uses a pushdown store, which is initially empty. The algorithm processes each input symbol ${x}$ as follows: If the top of the pushdown is ${y}$ and ${xy=1}$, then pop off the top of the pushdown; if not, then push ${x}$ onto the pushdown. Then, go to the next input symbol. When there is no more input symbols accept only if the pushdown is empty. The algorithm clearly runs in linear time, and it is not hard to show that it is correct.

This algorithm uses linear space: a string that starts with many ${a}$‘s will, for example, require the pushdown to hold many symbols. Thus, the goal is to find a different algorithm that avoids using so much space.

My on-going agenda, in these posts, is to explain the role people play in doing research. In the next two sections I will explain the “curious history” of the question: who I “think” first asked the question, and why they may have asked the question. Then, I will explain the how we solved the problem and proved a theorem. You can skip the next two sections and get right to the proof method. No history. No background. But, I hope that you find the history and motivation interesting. Your choice.

My memory for technical conversations is usually pretty good, but this question on the space complexity of the word problem has a murky history. My version is that at a STOC conference, years ago, Juris Hartmanis mentioned a related problem to a number of us over a coffee break. I instantly liked the problem, and during the next few months Zeke and I worked on the problem, until we found a solution.

At the next conference, FOCS, we told Juris our solution to “his” problem. Juris said that he liked the problem, liked our solution, but he had never asked us the problem. Who knows. At least we did not call the paper: “On a Conjecture of Hartmanis.” I still believe, sometimes, that maybe Juris told it to us, but I must be confused. Anyway I think you will like the method we used to solve it.

The problem that someone asked us, was not what is the space complexity of the word problem for the free group on two letters. Instead we were asked a more “language” type question that is the same as the word problem. We were asked: what is the space complexity of a certain language ${D2}$?

Before defining ${D2}$ it may help to explain where ${D2}$ comes from. Define the following language ${D}$ over the alphabet that contains two types of parentheses: “${[}$” and “${]}$” and “${(}$” and “${)}$”. A string ${w}$ is in ${D}$ provided the application of the following rules eventually lead to the empty string:

$\displaystyle u[]v \rightarrow uv \text{ and } u()v \rightarrow uv.$

Thus,

$\displaystyle [ \quad [ \quad ( \quad ) \quad ] \quad]$

is in the language, but

$\displaystyle [ \quad ( \quad [ \quad ) \quad ] \quad ]$

is not. Note, ${D}$ is closed under concatenation.

The language ${D}$ consists of expressions with two types of parentheses that are well nested: sometimes it is called the Dyck language. Each ${[}$ must have a matching ${]}$ and also each ${(}$ must have a matching ${)}$. Moreover, the two types of parentheses must not get in each others way, thus

$\displaystyle \big[ \quad \big( \quad \big] \quad \big)$

is not in ${D}$.

The Dyck language ${D}$ is a context-free language, and is central to the theory of context-free language theory. Thus, another way to define ${D}$ is by the following context free grammar: (The symbol ${\lambda}$ is the empty string.)

$\displaystyle S \rightarrow \lambda \mid S \rightarrow SS \mid S \rightarrow [S] \mid S \rightarrow (S).$

The following is an easy theorem:

Theorem: The language ${D}$ is in ${\mathsf{DLOG}}$.

I will now explain how to prove this theorem; to do this we need the notion of matching parentheses. In a string ${w}$ say that ${w_{i}=[}$ matches ${w_{j}=]}$ provided ${i and as we count left-to-right ${j}$ is the first time that starting with ${i}$ that the number of ${[}$ and ${]}$ are equal. There is a similar definition for the parentheses ${(}$ and ${)}$. We call ${[}$ and ${(}$ left parentheses and ${]}$ and ${)}$ right parentheses.

Call a string ${w}$ good provided it satisfies the following:

1. for every left parentheses of either type there is a matching right one of the same type;
2. for every right parentheses of either type there is a matching left one of the same type;
3. if ${w_{i}}$ matches ${w_{j}}$ and ${w_{k}}$ matches ${w_{l}}$, then $(i,j)$ and $(k,l)$ are disjoint intervals or one wholly contains the other.

The last means that there is no overlapping matches. It should be clear that checking whether or not a string is good can be done easily in log-space.

Lemma: A string ${w}$ is good if and only if ${w}$ is in ${D}$.

It is easy to see that all strings in ${D}$ are good. So suppose that there is a good string that is not in ${D}$. Select the shortest possible string ${w}$. The first symbol of ${w}$ must be a left parentheses by (2). So assume that ${w_{1} = [}$. There must be a right parentheses in ${w}$ by (1) so assume that ${w_{i}}$ is the first one. Clearly, ${i>1}$. Suppose that ${w_{i} = )}$. Then, the previous ${w_{i-1}}$ must equal ${[}$; otherwise, ${w}$ would not be the shortest possible counterexample. Thus,

$\displaystyle w=[ \quad x \quad [_{1} \quad )_{2} \quad y$

where ${x}$ are all left parentheses. By (2) the ${)_{2}}$ matches some earlier ${(}$, but the ${[_{1}}$ matches some later ${]}$ and this contradicts (3).

Next suppose that ${w_{i} = ]}$. Then, the previous ${w_{i-1}}$ must equal ${(}$; otherwise, ${w}$ would not be the shortest possible counterexample. Thus,

$\displaystyle w=[ \quad x \quad (_{1} \quad ]_{2} \quad y$

where again ${x}$ are all left parentheses. By (2) the ${]_{2}}$ matches some earlier ${[}$, but the ${(_{1}}$ matches some later ${)}$ and this contradicts (3).

Now let us finally define the language ${D2.}$ It is a variation of the Dyck language that allows a different type of pairing of the symbols. This difference makes proving that it lies in ${\mathsf{DLOG}}$ harder. This is the question, someone asked us–I guess not Juris. Sometimes ${D2}$ is called a “two-sided” Dyck language.

A string ${w}$ is in ${D2}$ provided the application of the following rules eventually lead to the empty string:

$\displaystyle u[]v \rightarrow uv \text{ and } u()v \rightarrow uv$

and

$\displaystyle u][v \rightarrow uv \text{ and } u)(v \rightarrow uv.$

The point of ${D2}$ is that the parentheses have a type but no “direction.”

The trouble with ${D2}$ is there does not seem to be a counterpart to the notion of matching; a string of the form:

$\displaystyle \dots [ w ] \dots$

could have the ${[}$ match with a ${]}$ that comes earlier. This is why I do not know a direct counting method to recognize ${D2}$. I guess this is why someone asked us the question. I wish I could remember who.

Our Solution

Zeke and I knew two insights, one trivial and the other based on a classic result from group theory. The first is that ${D2}$ is really the same as accepting the words from a free group on two letters.

$\displaystyle [ \rightarrow a \text{ and } ] \rightarrow a^{-1}$

and

$\displaystyle ( \rightarrow b \text{ and } ) \rightarrow b^{-1}.$

The second insight is that the free group on two letters has a faithful representation over ${2 \times 2}$ integer valued matrices; this result was proved by Ivan Sanov in 1947.

Theorem: There are two integer matrices $A,B$ so that the mapping ${a \rightarrow A}$ and ${b \rightarrow B}$ is a faithful representation of the free group on ${a,b}$.

Actually we can construct the matrices explicitly. Consider the following two matrices: $A$ is

 1 2 0 1

and $B$ is:

 1 0 2 1

Both matrices are invertible and further their inverses are also integer matrices The map that sends

$\displaystyle a \rightarrow A \text{ and } a^{-1} \rightarrow A^{-1} \text{ and } b \rightarrow B \text{ and } b^{-1} \rightarrow B^{-1}$

defines a mapping $\rho$ from the free group on two letters, $F_2$ to $\mathsf{SL}(2,\mathsf{Z})$: the latter is the $2 \times 2$ matrices over the integers with determinant $1.$ This mapping $\rho : F_2 \to \mathsf{SL}(2,\mathsf{Z})$ is a group isomorphism.

That means that we can replace the word problem by: does a sequence of matrices over ${A,B,A^{-1},B^{-1}}$ equal the identity matrix ${I}$. This transformation of the word problem into a problem about matrices is the key to our results. For example,

$\displaystyle aba^{-1}b \rightarrow ABA^{-1}B.$

Here is how Zeke and I use this transformation to prove the our theorem. We show how to check whether or not

$\displaystyle M = M_{1} \times M_{2} \times \dots \times M_{n}$

is equal to ${I}$ where each ${M_{i}}$ is from ${A,B,A^{-1},B^{-1}}$. The obvious way to do this is to compute matrix ${M}$ and see if it is equal to ${I}$. The difficulty with this approach is that the matrix ${M}$ may have entries that are too large and cannot be stored in log-space.

We solve this with the Chinese Remainder Theorem. Suppose that ${p}$ is a prime with at most ${O(\log n)}$ bits. Let ${M_{p} = M \bmod p}$. A log-space machine can pass over the input and compute the product ${M_{p}}$: this means that we do all the arithmetic modulo ${p}$, but we still are multiplying ${2 \times 2}$ matrices. Then, the machine checks whether or not ${M_{p} = I}$. If it does not, then clearly ${M \neq I}$, and we can reject the input. The machine does this for all primes ${p}$ of the given size. If all ${M_{p} \equiv I \bmod p}$, then the machine accepts.

We claim that this algorithm is correct. The insight is that if ${M}$ is not equal to ${I}$, then ${M-I}$ has some non-zero entry, which cannot be too large. Then, by the Chinese Remainder Theorem we have a contradiction, and so the algorithm is correct. This uses the well known fact that

$\displaystyle \prod_{p \le t} p \ge c^{t}$

where the product is over primes and ${c>1}$.

Finally, the probabilistic result follows in essentially the same way. The only difference is that now the machine randomly selects one prime ${p}$ to check. We then argue that the non-zero entry of ${M-I}$ is unlikely to be divisible by a randomly selected prime. This relies the simple fact that the non-zero entry can only have a polynomial number of prime factors. Thus, as long as we randomly select the prime from a large enough set, it is unlikely that the non-zero entry will be $0$ modulo the prime.

Karp-Rabin

What does the Karp-Rabin pattern matching algorithm have to do with our result on the word problem? Indeed. Dick Karp and Michael Rabin are both famous for so many things, but one of my favorites of their results is their randomized pattern matching algorithm.

Rabin has told how they first thought of their algorithm. They needed a way to hash strings that had a certain property, and our mapping from the free group to matrices modulo a prime worked perfectly. So at one time they were using the same technology that we used to solve the word problem in log-space.

Unfortunately for Zeke and I, they quickly got rid of the matrix ideas and replaced them by a much simpler method. But Michael has repeatedly told me that the matrix ideas played a role in his initial thinking.

Open Problems

Our proof method could handle a larger class of word problems, than just the word problem for free groups. Suppose that ${G}$ is any linear group over the rationals. Then, the word problem for this group can also be solved in ${\mathsf{DLOG}}$. One can even prove more, but see our paper for the details. Thus, many infinite groups have word problems that can be done in log-space.

The power of matrix representation theory is something that you may be able to use to solve other problems. I believe that we have not made as much use of the power of representation methods in other parts of computer science as perhaps we could. In any event the mapping $\rho$ may itself be useful to you.

Now if I could only recall who initially asked me the problem ${\dots}$

April 16, 2009 9:49 am

Lovely post!

April 16, 2009 3:53 pm

Great post! I love the Karp-Rabin algorithm, it’s beautifully simple, one of the many examples Rabin has given of the power of randomness (your post that mentioned his O(n) closest pair in 2d space algorithm was also interesting). If you pay attention to parameters, it’s also a useful building block for many neat string matching algotithms (finding all palindromes in a long string for example).

Interesting to know the relation to your work on the word problem. Somehow the more complicated solution came first.

April 16, 2009 4:06 pm

One question about balance lemma: suppose we have w = “(((((“, it seems it satisfies mentioned condition, but indeed it doesn’t belong to D.

April 16, 2009 7:20 pm

I am sorry, I made a silly error in the post. The definition and the Balance Lemma are slightly off. First, the definition needs to allow [][][] to be in the language–I have fixed this. Second, the Balance Lemma needs to be fixed too: the string w itself must be [-balanced and (-balanced.

I will update post in am. Thanks for reading so closely. Again sorry for the errors…rjlipton

Did you ever think that a post is unlucky? I seem to be having a great deal of trouble getting this one right. I again an sorry for any trouble or confusion.

• September 24, 2016 5:18 pm

Is this post unlucky? At least the following statement is also wrong: “This mapping $$\rho : F_2 \to \mathsf{SL}(2,\mathsf{Z})$$ is a group isomorphism.” It is not an isomorphism, but only a monomorphism. The matrices in the image are congruent to the identity modulo 2…

4. April 16, 2009 4:52 pm

for D and D2 and in particular the Balance Lemma, is it correct that you’re also assuming the languages are closed under concatenation? e.g. is ([]()) in D?

April 18, 2009 5:10 am

“A probabilistic log-space machine that has a one-way read only input tape”

Thought provoking! So this is the first streaming algorithm 🙂

April 18, 2009 7:41 am

Ah, in your fix, you seem to have cut out the part where you defined D2…

April 19, 2009 3:27 pm

It could be added that the one-sided Dyck languages are in TC0
(Lynch’s algorithm seen as a circuit) while the two-sided ones
are NC1-hard as shown in Robinson’s thesis (1993, San Diego).

April 20, 2009 7:58 am

Is the group generated by the two 2×2 matrices

[3 5]
[0 5]

and

[2 0]
[0 3]

free?

April 20, 2009 8:07 am

I do not know if these matrices are free? I see the issue. Is there an application in mind. As you may know the prove that the ones I mention are free relies on showing that the sequence of matrices is coded into the one of the entries of the matrix product.

April 20, 2009 10:37 am

It’s just an open problem that I don’t know how to solve. Seems amazing that such a simple problem could cause difficulties.