# Cleverer Automata Exist

*A breakthrough on the separating words problem*

Zachary Chase is a graduate student of Ben Green at Oxford. Chase has already solved a number of interesting problems–check his site for more details. His advisor is famous for his brilliant work—especially in additive combinatorics. One example is his joint work with Terence Tao proving this amazing statement:

Theorem 1The prime numbers contain arbitrarily long arithmetic progressions.

Today we wish to report Chase’s new paper on a problem we have twice discussed before.

But first Ken wants to say something about Oxford where he got his degree long before Green arrived.

## Oxford Making Waves

Green moved to Oxford in 2013. He holds a professorship associated to Magdalen College. I (Ken) did not know him when I started at Oxford in 1981. It would have been hard, as Green was only 4 years old at the time. But I did know the preteen Ruth Lawrence when she started there and even once played a departmental croquet match including her in which Bryan Birch made some epic long shots. Lawrence had joined St. Hugh’s College in 1983 at the age of twelve.

Oxford has been Dick’s and my mind more in the past six years than before. Both of us were guests of Joël Ouaknine in 2012–2015 when he was there. Oxford has developed a front-line group in quantum computation, which fits as David Deutsch’s role as an originator began from there—note my story in the middle of this recent post.

Recently Oxford has been in the news for developing a promising Covid-19 vaccine. ChAdOx1 heads Wikipedia’s list of candidate vaccines and has gone to final trials, though there is still a long evaluation process before approval for general use.

Before that, a modeling study from Oxford in March raised the question of whether many more people have had Covid-19 without symptoms or any knowledge. This kind of possibility has since been likened to a “dark matter” hypothesis, not just now regarding Covid-19 but a decade ago and before.

A main supporting argument is that a wide class of mathematical models can be fitted with higher relative likelihood if the hypothesis is true. I have wanted to take time to evaluate this argument amid the wider backdrop of controversy over inference methods in physics, but online chess with unfortunately ramped-up frequency of cheating has filled up all disposable time and more.

## The Problem

Back to Chase’s new results on the following problem:

Given two distinct binary strings of length there is always a finite state deterministic automaton (FSA) that accepts one and rejects the other.How few states can such a machine have?

This is called the *separating words problem* (SWP). Here we consider it for binary strings only.

John Robson proved states are enough—we suppress any log factors. Some like to write this as . Chase improves this to :

Theorem 2For any distinct , there is a finite state deterministic automaton with states that accepts but not .

We previously discussed this twice at GLL. We discussed the background and early results here. The original problem is due to Pavel Goralcik and Vaclav Koubek. They proved an upper bound that was . Then we went over Robson’s bound here. The best upper bound was Robson’s result until Chase came along.

## The Approach

All the approaches to SWP seem to have a common thread. They find some family of “hash” functions so that:

- Any in can be computed by a FSA with few states.
- For any binary strings of length , there is an so that .

The challenge is to find clever families that can do do both. Be easy to compute and also be able to tell strings apart. Actually this is only a coarse outline—Chase’s situation is a bit more complicated.

## The Proof

We have taken the statement of Theorem 2 verbatim from the paper. It has a common pecadillo of beginning a sentence for a specific but writing later. However, this is how we think intuitively: in terms of how the pieces of the formula behave. Chase declares right away his intent to ignore the power of . How he gets the power of is the real point. We can convey the intuition in brief.

A length- binary string can be identified with its set of positions where the string has a . Chase begins by showing how a power of on is obtainable by considering sets of the form

where is prime and . Suppose we know a bound such that for all distinct (that is, all distinct binary strings of legnth ) there is a prime and such that

Then by the Chinese Remainder Theorem, there is a prime of magnitude about such that

Now we can make a finite automaton with states that always increments modulo and increments modulo each time it reads a when . Then has order-of states. The finisher is that suffices. Again we ignore the pecadillo but we add some redundant words to the statement in the paper between dashes:

Lemma 3For any distinct —of size at most —there is a prime such that for some ,

The power is of course weaker than Robson’s , but this statement conceals two “levers” that enable leap-frogging to get . The first is that we don’t have to limit attention to sets that come from places where the corresponding strings have a . Consider any string and take to be the set of index positions in which has the substring beginning at place . Define likewise for . Then we can try to prove results of the following form given :

Proposition 4For all distinct there is such that and

A finite automaton using this extension needs states to store in its finite control. The second lever is to try to prove results of this form, where now the words “of size at most ” are not redundant:

Lemma 5 (?)For any distinct of size at most there is a prime such that for some ,[Update: see note below.]

Now we need to balance the levers using the proposition and the lemma together. Since will add order- states to the automaton, we balance it against from the previous argument. So take . Then . Lemma 5 then gives the bound

on the magnitude of the needed primes . This yields the breakthrough on SWP.

Here a famous New Yorker cartoon with the caption “If only it were so simple” comes to mind. But there is a catch. Chase is not quite able to prove lemma 5. However, the lever comes with extra flexibility that enables finding that make and also give those sets an extra regularity property . Using , he is able to show the existence of good hash functions of a certain type. The modified lemma is enough to prove his new bound. The proof still uses intricate analysis including integrals.

This is classic high-power mathematics. When some idea is blocked, try to weaken the requirements. Sometimes it is possible to still proceed. It is a lesson that we sometimes forget, but a valuable one nevertheless.

## Open Problems

We like the SWP and think Chase’s contribution is impressive. Note that it adds a third element to and in the automaton. Can the argument be pushed further by finding more levers to add more elements? Is Lemma 5 true as stated, and with what (other) tradeoffs of and between it and Proposition 4? [**Update**: not for extreme tradeoffs—see this comment—but plausibly for polynomially related.]

We feel there could also be interesting applications for his theorem as it stands. Is the ability to tell two strings apart with a simple device—a FSA with not many states—useful? Could it solve some open problem? It does seem like a basic insight, yet we have no candidate application. Perhaps you have an idea.

[added Q on Lemma 5 to “Open Problems”, “lower” bound –> “upper” bound in third section, update in Open Problems.]

Is there an obv lower bound or a conj of the optimal?

Both are log n. Here is a good survey: https://arxiv.org/abs/1103.4513

This is awesome! I’m still going through, but thanks so much for the nice explanation.

Regarding Lemma 3:

(1) You don’t need to focus on primes for Lemma 3. Actually, I believe that primes add a log factor.

You should be able to show that for distinct A, B subset [n], there exists d = O(sqrt(n)) and i in [d] such that |A(i,d)| != |B(i,d)|. In other words, d doesn’t need to be a prime number which turns your O~ to O.

(2) As far as I can tell, this approach was presented in “Separating Words by Occurrences of Subwords” by Vyalyi and Gimadeev. I mentioned it in the previous comments using a less succinct notation: https://rjlipton.wordpress.com/2019/09/16/separating-words-decoding-a-paper/#comment-105362

Note: My comments (1) and (2) were just in regards to Lemma 3.

What a nice summary! Thanks for bringing attention to this; I really hope someone can improve upon n^(1/3). I wanted to quickly mention that Lemma 5 is not true as stated. For example, for any m, if you take n large enough, you can have A={1} and B={(\prod_{p ≤ m} p)+1}. However, I do think Lemma 5 is true if n is a polynomial of N, with the power of the implicit logarithm for the upper bound on p depending on the degree of the polynomial.

You’re very welcome, and thanks for the note about the queried lemma—I’ve added an update. This has been a personal favourite problem of ours for awhile, besides our both knowing Jeff Shallit for a long time, I all the way back to Princeton undergrad days, and note comments above by my graduated student Mike Wehar.

I came across a technical report from 2016 which claims an O(log n) bound on the number of states required to separate two words (of equal length too) http://invenio.nusl.cz/record/251413/files/content.csg.pdf

Am I missing something?

Dear Anoop SKM:

I believe that the proof is not correct. The machines would work if they scanned the two inputs at the same time. But I will check it out again…

Thanks for comment

Best

Dick