Stringology—The “Real” String Theory
Real-time computation and the start of stringology
Zvi Galil is my boss’s boss, he is the Dean of the College of Computing at Georgia Tech. He is also one of the world’s experts on stringology, an area that he helped create and name in 1984. He is also well known for many other beautiful results—perhaps one of his most famous is in the paper with Ofer Gabber entitled: Explicit Constructions of Linear Size Superconcentrators.
Today I want to talk about another result of Zvi that is quite surprising and helped launch the area of stringology.
I still remember when Gabber and Galil announced their breakthrough result in 1979. I was at Berkeley, and a group of us got their paper and began to study it, trying to understand the proof. One of the key lemmas was a deep inequality about complex numbers—do not let it scare you:
Lemma: Let be a doubly infinite array of complex numbers. Also assume and . Then
Here is equal to .
Their proof showed that a certain explicit graph was essentially an expander. Their idea was to think of this property not as a graph property, but as an inequality. This is a powerful idea that allowed them to use methods from analysis to prove their theorem. This makes another example of unexpected connections between different parts of mathematics.
I recall spending a good part of the summer trying to really understand where this inequality came from, and why it was true. While their construction has since been improved, the original paper and the proof still stand as one of the great results of theory.
What makes a problem important? I have discussed this many times before, but it still is one of the questions that I think about, especially late at night. Some problems are clearly important because they are hard, because they “fight back,” and because they are difficult. Certainly the work on superconcentrators was important—by this measure and others.
Some problems are important because they combine a lovely realistic model with deep mathematics. The other problem of Galil that helped start stringology is a combination: it is a hard problem, and a problem that is about a realistic model of computation. Curiously the model of computation has somehow become less studied than it was years ago—I am not sure why that is true. The model concerns real-time computation. In order to be concrete we will define real-time only for Turing machines, but the notion can and has been defined for almost any model of computation.
In a real-time computation we imagine that a Turing machine is required to read its input tape one character each time step. It is not allowed to compute multiple steps before reading the next input character—it must read one. This can be relaxed to insist that the machine read a character every constant number of steps—what’s important is that the time between reading the input symbols is bounded. Then the machine gives its verdict within the same bounded time after reading the last input character. This is a tremendous restriction on algorithms; even a linear time computation is rarely a real-time computation, since it can read in the whole input and then rescan the input. Of course a linear time computation takes constant amortized time per input, but the real-time constraint is much stronger.
In modern terms the real-time restriction is a reasonable and practical model: in many situations data arrives whether or not you want it to. A internet router gets packets as they come. It can buffer or store them, but it cannot just ignore them. Further performance issues require that the router try to be as real-time as possible; otherwise packets will be delayed or even lost.
I think the interest in real-time computation in the past, especially algorithms on Turing machines, was not driven by practical concerns, however. It was driven by the surprise that real-time Turing machines were so powerful. Even with this restriction they could do magical things. One of the classes of things they could do involved basic questions about strings.
So let’s turn now to stringology.
Recall a string is a palindrome if identical to , where is the reversal of the string . Palindromes are strings that read forward the same as backward. Some favorites are:
- A man, a plan, a canal: Panama
- Able was I ere I saw Elba
- Never odd or even
Note in these examples we are a bit loose about the notion: we do not count spaces, nor punctation, and we ignore case. Thus the last example as a string should be
The notion of palindrome is a mathematical one: the string reads the same forward or backward. However, it is not the same picture forward and backwards: only the symbols are the same, not the images. For instance, “” is a palindrome as a string, but not as a figure since backwards it would be “.”
There has been work on writing that reads the same from different orientations—this has nothing, I believe, to do with stringology. But it is beautiful work. The artist John Langdon has created many pictures with miraculous symmetry properties: Here is one of his beautiful examples:
Try reading it upside down.
The Discovery: Act I
Zvi tells me that in the fall of 1975 he was a postdoc at IBM Yorktown Heights. Albert Meyer was also visiting there, and one day he handed Zvi a handwritten manuscript. It was a translation of a Russian paper by Anatol Slisenko—the translation had been made by Bob Daley. The paper had one result. An amazing result. Slisenko had proved:
Theorem: There is a multi-tape Turing Machine that can recognize initial non-trivial palindromes in real-time.
Galil explains that he thought this result was “incredible and impossible.” Real-time is a very strong constraint, and Zvi could not believe that palindromes could be recognized by such machines.
The intuition I imagine he had is this: Suppose the machine has read of the input . Since the machine is real-time, it has only constant time to decide if is or is not a palindrome. It just seems impossible.
Yet Galil had a translation of Slisenko’s paper: Recognizing a symmetry predicate by multihead Turing machines with input. Here “symmetry” predicate refers to palindromes. The paper was huge. Slisenko’s journal paper was 173 pages, and Daley’s translation was handwritten pages.
Galil was taken aback with Slisenko result. He decided to try and read the paper, but after about 30 of the 500 pages he gave up. Yet he felt, correctly as it turned out, that there should be a much shorter and simpler proof. He set out to find it. He soon succeeded, with a proof that is only 15 pages. Of course some of the notions of Slisenko were used, some were modified, and many new ideas were added by Zvi.
And as he told me:
This began my love affair with string algorithms.
The Discovery: Intermission
I cannot do justice to Galil’s proof, so I will not even try to explain it in any detail. The critical insight is the following “meta” theorem:
Theorem: If a linear time off-line algorithm satisfies the predictability condition, then can be transformed into a real-time algorithm.
The predictability condition is a technical condition on the behavior of an algorithm—it is due to Galil, and is one of the principal reasons his proof is so much shorter. The intuition is this: suppose that the algorithm spends too much time processing some input symbol, then it must be the case that many of the next output symbols are . This “prediction” allows the theorem to convert an algorithm that is not real-time into one that is. See his paper for the details.
The Discovery: Act II
The story does not end there: it was really the beginning of a much longer story. Slisenko answered by creating a much shorter proof of his own that is published here. The abstract is: A comparatively short proof is given of the recognizability of palindromes in real-time on multitape Turing machines. It is based on the same idea as the original proof by the author, and on Z. Galil’s idea for simplifying the proof by using the Fischer-Paterson algorithm for finding all symmetric suffixes in linear time.
The Discovery: Act III
Perhaps more important than these papers is that they helped launch an area of theory that is called Stringology. It is roughly the study of strings, their properties, and algorithms. It is quite active today: there are countless papers, current workshops, and journals devoted to this important area.
In 1997 Zvi gave a talk about the state of stringology; the talk’s title was: Open Problems in Stringology, Thirteen Years Later. His abstract is quite amusing, as are many of his talks. Somehow he got “thirteen” worked into the abstract three times: The talk was on Friday, June thirteen, 1997; it was about his 1984 talk where he created the field and listed thirteen open problems; and his 1984 talk was exactly thirteen years earlier.
Stringology is doing well today. Modern string problems often allow approximations—since the exact questions are too hard—and many algorithms are probabilistic. Alberto Apostolico is also at Tech, and is another leader of this important area.
I cannot resist giving one elementary result about strings, yet one that I find amazing. I guess one insight from stringology is that the notion of strings seems so simple, so basic, yet strings have many wonderful properties and surprising structure. In my opinion that is the importance of much of stringology: the illumination of how to think about and manipulate strings.
The problem I have always liked is this: Suppose that and are two strings over some alphabet. When are there positive integers so that
Here is the concatenation of copies of ; for instance, . The question is, given and , determine whether or not there are so that . A moment’s thought should allow you to see that there is a simple quadratic-time algorithm. Just let and . Then check to see if is equal to . It is not hard to see that this yields a correct quadratic, worse-case, algorithm. The surprise, in my opinion, is that there is a simple linear time algorithm. The algorithm is just to check whether or not
Of course the hard part, like many fast algorithms, is proving that this is correct. It is because of the following theorem:
Theorem: Suppose that are two strings. Then there are positive so that if and only if .
The proof of this theorem is not too hard, but it is typical of many string theorems. It requires a non-trivial argument, with careful case analysis. Added: here is a reference to a proof of this theorem.
Galil asked me if the original Slisenko paper was one of the longest papers in Theory to date. Recall Slisenko’s original paper was 173 journal pages. What do you think?