On The Intersection of Finite Automata
Can we beat the classic emptiness test for the intersection of finite automata?
George Karakostas and Anastasios Viglas are two excellent theorists, who both have their PhD’s from the Princeton computer science department. I was the PhD advisor of Viglas, and later had the pleasure of working with both of them on several projects.
Today I want to talk about finite state automata, and an open question concerning them. The question is very annoying, since we have no idea how to improve a classic algorithm. This algorithm is decades old, but it is still the best known. There seems to be no hope to show that algorithm result is optimal, yet there also seems to be no idea how to beat it. Oh well, we have seen this before.
It was a great pleasure to work with George and Anastasios. I enjoyed our work together very much, and they have continued working together over the years—on a wide variety of topics.
I really do not have any really great stories about them, or even one of them. One story that does comes to mind is about steaks. George likes his steak cooked well done, as I do too. I know that is probably not the proper way to order a steak, but that is the way I like them. But, when George says “well-done” he means “well-done.” I always loved the way George would explain this to our server: They would ask George, “how would you like your steak cooked?” And George would say, “well done—actually burn it.” This usually caused some discussion with the server, and George had to be insistent that he really wanted it “burnt.” I like someone who is definite.
Let’s turn from steaks to finite automata, and to a question about them that is wide open.
Intersection of Finite State Automata
The power of deterministic finite state automata (FSA) is one of the great paradoxes, in my opinion, of computer science. FSA are just about the simplest possible computational device that you can imagine. They have finite storage, they read their input once, and they are deterministic—what could be simpler? FSA are closed under just about any operation; almost any questions concerning their behavior is easy to decide. They seem “trivial.”
On the other hand, FSA are more powerful than you might guess—this is the paradox. While FSA cannot even count, they can do magic. I have already pointed out that FSA play a major role in the modern analysis of hardware—BDD’s, they can be used to prove the decidability of complex logical theories—Presburger’s Arithmetic, and they are used everyday by millions to do searches. Every time you search for a pattern in a text string, especially when you use “don’t cares”, the search is done via FSA technology. Simple, yet powerful.
I want to raise a question about FSA that seems to be interesting, yet is unyielding to our attempts to understand it. In order to understand the question we first need some simple definitions. If is a FSA, let be the language that it accepts, and let be the number of states of .
One of the classic results in the theory of FSA is this:
Theorem: If and are FSA’s, then there is an FSA with at most states that accepts .
The proof of this uses the classic cartesian product method: one creates a machine that has pairs of states, and simulates both and at the same time. This is why the simulation uses states. A corollary of this result is the following:
Corollary: If and are FSA’s, then there is an algorithm that runs in time and decides whether or not is empty.
George, Anastasios, and I noticed that there is no reason to believe that this cannot be improved. I guess that is a double negative, what I mean is: let’s find an algorithm that solves the emptiness problem for in time linear in the description the two FSA’s. Note, there are FSA’s and so that the smallest number of states of a machine so that
is approximately the product of the number of states of and . Thus, a faster algorithm must avoid the step of building the product automata. But, this is precisely the issue I have raised before: the algorithm for determining emptiness is free to doing anything it wants.
We worked hard on trying to beat the standard approach and could not improve on it. So eventually we wrote a paper that in the true spirit of theory, assumed that there was a faster than product algorithm for the intersection of FSA. Then, we looked at what the consequences would be of such an algorithm. Our main discovery was that if there existed faster methods for the intersection of automata, then cool things would happen to many long standing open problems.
You could of course take this two ways: you could use these results to say it is unlikely that any algorithm can beat the product algorithm, or you could use these results to say how wonderful a better algorithm would be. It’s your choice. I like to be positive, so I tend to hope that there is a faster algorithm.
Suppose that are FSA each with at most states. We asked what would the consequences be there was an algorithm that ran in time and determined whether or not there was an input they all accepted. That is determined whether or not the following language is empty or not:
Call this the product emptiness problem (PEP). Note, if is allowed to be arbitrary, then the answer is known. Dexter Kozen proved long ago that in this case the problem is complete. Thus, we have to restrict ourselves to either fixed , or allow algorithms that run in time
Here are a couple of results from our paper that should help give you a flavor of the power of assuming that PEP could be solved more efficiently, than the product algorithm. As usual please check out the paper for the details. All the results assume which means that for any fixed there is an algorithm for solving PEP with FSA’a where each has at most states in time , for any .
Theorem: If , then there is an algorithm for the knapsack problem that runs in time , for any .
Theorem: If , then there is an algorithm for the factoring problem that runs in time , for any .
I have discussed the following earlier.
Theorem: If , then
for any .
A Sample Proof
None of the proofs of these theorems are too difficult, but the most complex is the last result. This result uses the block respecting method. Here is a sample proof that should demonstrate how is used. I will outline the proof of the factoring theorem: actually proving a type bound.
Let be the FSA that given the input checks whether or not
and neither nor is . We assume that and are bit numbers and is a fixed bit number that we wish to factor. Then, select three distinct primes each about size . Look at the PEP,
There are two cases. If there are no solutions, then must be a prime. If there are solutions, then suppose that is one. Then,
But, so that this is an equality, and
The above is a primality test. We make it into a factoring algorithm by adding a simple test: we insist that the first automata also test that lie in some range. This requires only a polynomial factor increase in the number of states of . But, we can then do a binary search to find a factor of .
Can one solve the PEP in time faster than the product algorithm? Even for two automata this seems like a neat problem. I find it hard to believe that the best one can do is to build the product machine, and then solve the emptiness problem.
I do have one idea on a possible method to improve the PEP in the case of two automata. Suppose that and are two FSA. Then, construct the non-deterministic pushdown automata that operates as follows on the input : The machine reads the input and pushes the input onto the pushdown store; at the same time then machines checks whether or not the input is accepted by . Then, the machine pops off the input from the pushdown and checks whether or not the value coming off the pushdown, (the reversal of ), is accepted by the machine .
This can be done by a non-deterministic pushdown automata that uses at most states. The emptiness problem for pushdown machines is in polynomial time. If it could be done is less than quadratic time, then at least for two automata we would be able to beat the product algorithm. An open problem is can this be made to work?