One-way probabilistic finite state automata are universal

Anne Condon is a theoretician who early in her career did fundamental work in complexity theory. She then moved into the area of DNA computing, and next to the area of algorithms for biology. Her work in all these areas is terrific, and she is a wonderful person to work with on any project. I had the privilege of working with her on a number of papers; I will talk about part of one of these today.

The reason I plan to present our result is two fold. First, it shows that randomness can convert a weak computational model, into one that is universal, i.e. can compute anything. If I did not know this result, I might have guessed that randomness could increase the power of a computational model by an exponential amount, but not by an “unbounded amount”. The second reason is that our result uses some interesting “tricks” that might be useful in your research. We use an old but important result of Minsky–yes that Minsky, a clever theorem of Freivalds, and a prover-verifier argument.

The Problem

I still remember visiting Madison, one winter, when she was a still a member of their Computer Science Department. It was beyond cold. Inside a warm office we chatted about her work and then she asked me a simple question about Markov Chains. Suppose that ${A}$ and ${B}$ are two ${n \times n}$ stochastic matrices. Also suppose that ${v}$ is a fixed ${n}$ dimension probability column vector: $v_i \ge 0$ for each $i$ and $v_1 + \dots v_n = 1.$

How hard is it determine whether or not there is a sequence of matrices ${M_{1},M_{2},\dots,M_{t}}$ where each ${M_{i} \in \{A,B\}}$ so that the first coordinate of the column vector

$\displaystyle M_{1}M_{1}\cdots M_{t}v$

is above ${2/3}$? Or that the first coordinate is always below ${1/3}$? Note, there is no prior bound on $t$: this is why the problem is not trivial.

I was immediately intrigued and began to work on the problem when I was back at Princeton. At first I guessed that the problem must be decidable, since I had previously worked on problems that were close to this and they were decidable. But I quickly began to hit the wall, nothing seemed to work. Then, I read a beautiful paper by Freivalds on probabilistic finite state automata. I quickly saw that it might be possible to improve his result to solve Anne’s problem.

Probabilistic Finite State Automata

A one-way probabilistic finite automata (1pfsa) is a finite state automata that has the extra ability to flip a fair coin. The automata reads the input tape and then based on its current state and the coin flip enters a state and moves the input tape head to the right. The machine has accept and reject states. The probability that an input tape is accepted (rejected) is defined in the obvious manner. An aside: often the definition of such automata allow arbitrary coins, but we will always assume that are coins are fair.

Probabilistic automata have been studied since the 1960′s, with some of earliest results due to Michael Rabin and to Azaria Paz. For example, Rabin showed that they could accept non-regular languages, among many other results. See Anne’s survey for more details. A two-way probabilistic finite automata (2pfsa) is a probabilistic finite state automata that has a two-way read-only input tape. The ends of the input tape are marked with special symbols, so the automata can tell when the input begins and ends. Since the machine is two-way it can scan the input multiple times, and can even have a loop and never stop.

The connection with Anne’s problem should be clear. If the alphabet is binary, then the matrices ${A}$ and ${B}$ encode the operation of the automata in the obvious manner. Suppose currently there is a probability vector $u$ that summaries the states of the automata. Then, if the input is a $0$, the new state vector is $Au$; if the input is a $1$, then the new state vector is $Bu$. Note, these will always be probability vectors since the matrices are stochastic.

Thus, Anne’s question was really: Given a 1pfsa could we decide whether or not there was some input tape so the machine accepted with probability ${2/3}$ or did all inputs get rejected with probability ${1/3}$? This is a kind of promise problem: the automata was assumed to operate so that for any input either the accept probability was bigger than ${2/3}$ or less than ${1/3}$. This is, of course, a natural assumption, think of it like the assumption that defines $\mathsf{BPP}.$ All our probabilistic automata, one-way or two-way, will have error bounded away from ${1/2}$. The problem, in this form, we call the emptiness problem for 1pfsa (2psa).

Rusins Freivalds (1981) had studied 2pfsa and proved the following surprising theorem:

Theorem 1 The language ${\{ a^{n}b^{n} \mid n \ge 0 \}}$ is accepted by a 2pfsa with bounded error.

His beautiful proof is central to our extension so we will sketch it in some detail. Suppose that the input tape contains ${a^{n}b^{m}}$. We can assume this since this property is easily checked by a finite state automata without any coins. The automata will then use a fixed size counter and check that ${n \equiv m \bmod 1000}$. This also can be checked by a finite state automata. If the input tape fails either of these test, then it rejects.

So we can assume that the input is of the form ${a^{n}b^{m}}$ with ${n \equiv m \bmod 1000}$. Freivald’s clever idea is now to make repeated passes over the input tape. During each pass he will flip a fair coin for each ${a}$. If all of these come up heads, let’s call it a win for a. During the same pass he with also flip a fair coin for each ${b}$. If all of these come up heads, let’s call it a win for b. There are four possibilities: if neither ${a}$ nor ${b}$ get a win, then do nothing. If both get a win also do nothing. Suppose that ${a}$ gets a win and ${b}$ does not. Then, increment a special counter ${C_{a}}$: it starts out at ${0}$. In a similar manner, if ${b}$ gets a win and ${a}$ does not, then increment another special counter ${C_{b}}$; it also starts at ${0}$.

These special counters keep track of how many times there is an $a$ win or a $b$ win. Of course, the probability of either winning is exponentially small in $n$ and $m$, but that is okay. With probability $1$, eventually one of the counters gets to ${100}$ and the automata stops.

The automata now must decide whether or not to accept or reject the input tape. Suppose the ${C_{a}}$ counter hits $100$ first. Then, the automata checks the value of the other counter ${C_{b}}$. If ${C_{b}>0}$ it declares that ${n=m}$ and accepts; otherwise, it declares that ${n \neq m}$ and rejects. The same is done if the other counter hits $100$ first: just interchange the roles of the two counters.

The question is why does this work? Suppose first that ${n=m}$. Then, clearly the chance of an ${a}$ win is exactly the same as an ${b}$ win. So the chance that ${C_{a} = 100}$ and ${C_{b}}$ is still ${0}$ is small. Thus, in the case that $n=m$ the automata will accept with high probability.

Next assume that ${m > n}$. The same argument will work if ${n > m}$. The key point is that since ${m > n}$ and ${ n \equiv m \bmod 1000}$, ${ m \ge n + 1000}$. So the probability of an ${b}$ win is ${2^{-1000}}$ less likely than a ${a}$ win independent of ${n}$. This shows that when $C_a$ hits $100$ the other counter is very likely to still be $0$. Thus, with high probability the automata will reject, as it should. Pretty cool idea, I think.

Once Freivald proved the above theorem it was not too difficult to prove his main result:

Theorem 2 The emptiness problem for 2pfsa is undecidable.

The details of going from counting, i.e. accepting the language ${\{ a^{n}b^{n} \mid n \ge 0 \}}$, to simulation of a Turing Machine is a bit technical, but not very hard.

Sketch of Our Proof

Condon and I proved this:

Theorem 3 The emptiness problem for 1pfsa is undecidable.

The big difference is now that the automata is one-way. This seems at the face of it a major problem. Freivald’s construction depends strongly on repeatedly going over and over the same input. How can we simulate his method if we get only one pass over the input?

The answer is that we view the problem as a prover-verfier problem. Suppose that a prover has a series of strings of the form:

$\displaystyle a^{n_{k}}b^{m_{k}}$

for ${k=1,\dots}$. The prover is consistent in the following sense: either all ${n_{k}=m_{k}}$, or none of the ${n_{k} = m_{k}}$. Our job is to use only a 1pfsa to tell which situation we are in: the equal case or the unequal case. Note, the prover can still be malicious:

1. The values of the ${n_{k}}$ and ${m_{k}}$ can vary as long as he is consistent; either they are all equal pairs, or all unequal pairs;
2. In the unequal case the prover can make ${n_{k} > m_{k}}$ or ${n_{k} < m_{k}}$: there is no requirement to be consistent from one index to another.

This last ability destroys Freivald’s method. He needed to have a race between ${a}$ and ${b}$. Now that is thrown out, since the prover can vary which is larger.

The new trick is to play a more complex game with the prover. We think of the prover as presenting us with ${a^{n}b^{m}}$ each time. As before we can assume that the string is always of this form and that ${n \equiv m \bmod 1000}$. We know that either ${n=m}$ always or that ${n \neq m}$ always. We cannot trust the order, i.e. sometimes ${n}$ could be bigger and sometimes ${m}$ in the unequal case.

The idea is to find a “game” that is like Freivalds but is symmetric: so no matter what order the prover uses in the unequal case we will discover it. The basic insight is to flip fair coins again but do four games:

1. this one wins if ${n+m}$ coins all come up heads;
2. this one wins if ${n+m}$ coins all come up heads;
3. this one wins if if ${2n}$ coins all could up heads;
4. his one wins if if ${2m}$ coins all could up heads.

In the equal case all have the same probability. In the unequal case, the probability of “(1) or (2)” compared to “(3) or (4)” is vastly different. This allows us to proceed essentially as in Freivald’s proof; note, it is critical that these games do not depend on the order. See Condon for details.

Open Problems

There are two main open types of problems that arise from this work. Consider the original Markov Chain model of two stochastic matrices $A,B$. Are there any natural restrictions on $A$ and $B$ that will make it possible to decide the emptiness problem? What if $A$ and $B$ commute? or if they have some weaker matrix property? This is close to another result of mine that I will post on latter on.

A more practical set of problems is can we use the ideas here for other problems? The ability to count via the probability methods that we used here all require exponential time. Yet they seem like they might be useful in other applications. A trivial example is the following–that I will end the post with–it is possible to build a counter that counts to approximately $N$ and only uses $\ln \ln N$ bits. The idea is to increment the counter only when you get $\ln N$ heads in a row of a fair coin. I leave it to you to work out the details.

1. March 20, 2009 1:39 pm

For theorem 3, why does it make sense to talk about $a^{n_{k}}b^{m_{k}}$ for ${k=1,\dots}$ rather than just for $k=1$, since we can only read the string once? If $k$ is a constant I see that this could be built into the machine.

March 20, 2009 1:52 pm

I am sorry if I was not clear. Imagine an input tape of the form:
$\displaystyle a^{n_1}b^{m_1}\#a^{n_2}b^{m_2}\# \dots$ Essentially its a series of $a^nb^m$‘s. Freivald could look over the same tape. We get to see repeated strings of the same type as Freivald did. The difference now is that they are all equal or all unequal. Our problem is more difficult because there is a weaker consistency: before when you scanned the same tape you got exactly the same values for $n$ and $m$. Now you are only sure that either always $n_k = m_k$ or that $n_k \neq m_k.$ This is harder than the previous situation.

Does that help?

….rjlipton

2. March 20, 2009 2:15 pm

Thanks! I think my question wasn’t precise in identifying my confusion: in the previous section you talked only about the language {a^n b^n | n}, (which you can read many times), then in talking about Thm 3 you change to the language { … a^{nk} b^{nk} … | n}. Upon further reflection I would infer the following: either of these is equally suitable to is show that emptiness is Turing-complete (details left out of this post). Is that correct?

Or another way of stating it: {a^n b^n | n} may not be recognizable by 1pfsa, but that is not needed for the proof.

March 20, 2009 2:20 pm

Exactly right. A one-way pfsa is weaker but still can do enough to be universal.

4. April 2, 2009 8:12 am

Have you ever thought about the Skolem-Mahler-Lech problem: given an integer matrix A and an integer vector v, determine whether the first coordinate of A^n v is zero for any n? The SLM theorem is that the set of n for which this holds is eventually periodic. In other words, there is some N, depending on (A,v), such that this set is periodic for n>N. But it is not known whether N is computable from (A,v)!

5. April 3, 2009 6:56 pm

An elementary question: what is the meaning of v, the probability vector? I took it to mean the probability vector over the states of the PFSA. However, in the case of Freivald’s proof, this v should be (1,0,….), because you always start in the designated start state, correct?

April 4, 2009 9:03 am

Yes that is right. The start vector is as you describe.

6. April 14, 2009 4:44 pm

If the states of the machine correspond to probability vectors, then aren’t there infinitely many states, rather than the finitely many that you need for the machine? Also, I didn’t quite see how the “coin flips” come in to give a probability of accepting or rejecting.

I did find the arguments about acceptance of the language of equal string lengths quite interesting though!

April 14, 2009 6:04 pm

There are actually an infinite number of states, even thought the machine is finite state: each different vector is a different “state”. However, the algorithm only needs the vectors to be finite dimensional to encode the undecidable problem. I not sure I understand the question about coin flips, perhaps you can explain that to me better? I hope this helps some.