A breakthrough result shows the power of “almost”

 Cropped from Quanta Magazine source

Terry Tao has done it again. In two beautiful papers with modest titles, he has evidently proved the famous Discrepancy Conjecture (DC) of Paul Erdős. This emerged from discussion of his two earlier posts this month on his blog. They and his 9/18 announcement post re-create much of the content of the papers.

Today we wish to present just the statement of his new result in a vivid manner and some meta-observations on how he arrived at it.

Erdős originally offered a $500 prize for a solution. If we date that offer to an article he wrote in 1957, then it would be about$4,250 in today’s money, still a lot from one person. In 2013 we mused on whether a solution could be applied to win higher prizes. Last year there was an exhaustive proof of a partial case, but it was by computer brute force.

What worked now were the twin insights that DC follows from another conjecture and that proving an “almost” case of that conjecture could be enough to prove DC. We wonder whether finding new “almost” cases of our big conjectures in complexity theory will help win big prizes—or at least “almost” prizes.

A frog named Gorf sits on the shore of a large pond. Before him in the water are a stretch of lily pads. On or above each pad is a beetle or a fly. Gorf is hungry.

Gorf doesn’t worry about reaching the lily pads. He is a powerful jumper—he can jump any number ${d}$ of pads at once. Gorf’s problem is that once he starts jumping he can’t stop: he will keep jumping to every ${d}$-th pad.

Gorf needs a balanced diet. If he keeps eating more bugs than flies he will get a tummyache; if he eats too many flies, well those wings—you know what happens if you eat too much fiber. He needs there to be a constant ${C > 0}$ so that as he eats and eats, the absolute difference between the numbers of beetles and flies consumed never exceeds ${C}$.

Finally, Gorf is a worrywort. He is afraid the wrong choice of jumping distance ${d}$ could cause indigestion. So unless the sequence is such that every initial jump will lead to a ${C}$-balanced diet, he will stay on the shore and starve. Stated as a problem, what Erdős asked was this:

Does there exist an infinite sequence of beetles and flies so that Gorf won’t starve?

Erdős conjectured that there isn’t. Well OK, Erdős didn’t talk about frogs and bugs and flies. But he could have. Anyway he was right. Here is how he thought of it.

## By the Numbers

Each lily pad contains a number. Each fly is ${+1}$ and each beetle is ${-1}$. Or it could be the opposite; the problem is symmetric. Let ${a_n}$ be the number on lily pad ${n}$. Given the sequence ${[a_n]}$, what we care about is whether there is a ${C > 0}$ such that for all ${d}$ and ${k}$,

$\displaystyle \left|\sum_{i=1}^k a_{di}\right| \leq C.$

So as the frog jumps to every ${d}$-th pad and eats he is adding ${a_{di}}$ to his running total ${b_i}$. The question is: Can he jump from lily pad to lily pad and keep the number ${b_i}$ bounded—that is, keep it always between ${-C}$ and ${+C}$, for some ${C}$ and all ${i}$? If he can, then say the sequence is good.

Which sequences are good? Of course the answer depends on the sequence. If the pads are labeled

$\displaystyle 1,1,1,1,\dots,$

then clearly he will always get a larger and larger number. Note, this will happen no matter what jump ${d}$ he does.

Next, consider the sequence of lily pads

$\displaystyle 1,-1,1,-1,\dots$

In this case Gorf can just jump one pad at a time (${d = 1}$) and keep his count at ${0}$ or ${1}$. However, if he jumps 2, he only eats beetles. Since Gorf is worried that he would overhop the first lily pad and never be able to stop jumping two-by-two, he won’t budge. So this is still not good for all ${d}$, which is what Gorf needs.

## Almost Good?

Gorf reads a book on the Probabilistic Method. He wonders if he can trust that Nature is random—that the sequence of bugs and flies he encounters will conform to a normal distribution as ${k}$ grows, no matter what ${d}$ he chooses. So he should just take a random initial leap of faith and eat and be merry. But he realizes that the discrepancy—the difference between ${b_k}$ and zero—will expect to vary as ${\sqrt{k}/2}$, not constant. This is the wrong way around for the method to imply the existence of a good sequence. Just the thought of this gives Gorf indigestion.

Hope comes from the fact that we can construct sequences that give vastly lower discrepancy than random ones. Given ${n}$, divide out all factors of 3. The resulting number will either be congruent to 1 mod 3, in which case ${a_n = +1}$, or congruent to 2, whereupon ${a_n = -1}$. Then with ${d=1}$ the absolute partial sums stay within ${1 + \log_3 k}$. But this is not bounded by a constant, so not good enough for Gorf.

This last sequence is also multiplicative: ${a_{id} = a_i a_d}$. Until Tao’s result, no one had even ruled out the existence of a good sequence that is multiplicative. Multiplicative sequences can also be generalized by allowing values on—or at least within—the complex unit circle. For instance, let ${\omega}$ be on the circle and let ${h}$ be a function giving ${h(id) = h(i) + h(d)}$, such as ${h(n) =}$ the number of prime factors of ${n}$ including multiplicity (and ${h(1) = 0}$). Then the sequence ${\lambda_n = \omega^{h(n)}}$ is multiplicative and stays on the circle. Notions related to discrepancy can be extended to these sequences, perhaps summing just the real parts.

## Almost False

There is an easy way to make a sequence for Gorf that any diet doctor would approve: Leave some of the lily pads empty. That is, allow ${0}$ as a sequence value.

In fact, just repeat “beetle-fly-empty” over and over again: ${a_n = n \bmod 3}$ (regarding ${2}$ as ${-1}$). If the initial jump is ${1}$ or ${4}$ or ${7}$ etc., Gorf eats beetle-fly-beetle-fly…, as balanced as can be. If it is ${2}$ or ${5}$ or ${8}$, Gorf is equally happy: it’s fly-beetle-fly-beetle…

The problem for Gorf is that if ${d}$ is a multiple of 3 then he never eats. The problem for us is that the balancing is trivial. If you try filling in the zeros, you just have DC back again for cases where ${d}$ is a multiple of 3; if you do this balancing recursively, you get the ${[a_n]}$ sequence in the last section which doesn’t quite work.

Still, this showed that if the DC is true, it is true because Gorf is being “force-fed” at every pad. Exactly why that detail should matter may still be unclear.

DC is also true in a uniform sense: for every ${C > 0}$ there exists ${N}$ such that for every ${\pm 1}$ sequence of length ${N}$, some jump ${d}$ makes Gorf’s absolute partial sums ${|b_i|}$ exceed ${C}$ before reaching lily pad ${N}$. This follows from a famous lemma of Dénes Kőnig: if every branch of a subtree of the infinite binary tree is finite, then the whole subtree is finite. This does not, however, prevent the existence of an infinite sequence ${a_n}$ such that for every ${d}$ there exists ${C > 0}$ making ${|b_i| \leq C}$ for all ${i}$. Such a sequence is discussed in Remark 1.13 of Tao’s paper and is defined recursively by ${a_1 = 1}$ and for ${m = 1,2,\dots}$, ${j = 1,2,\dots,m}$, and ${k = 1,2,\dots,m!}$, by ${a_{jm! + k} = (-1)^j a_k.}$

Another delicate point is that if Gorf were allowed to start jumping at any initial pad ${a}$—say with ${a}$ relatively prime to ${d}$—then it was known that no sequence can meet the requirement of being good for all ${d}$ and all ${a}$. That is, any sequence has arithmetical progressions of unbounded discrepancy. Starting at zero—on the shore—is important to the problem.

Tao listed basically the same two mod-3 examples in his first post just before reporting the first bombshell of his breakthrough, which is that DC is implied by another conjecture.

 Quanta Magazine src1, Wired src2

## Almost True

The other conjecture only needs the idea of a Dirichlet character to state. This is a completely multiplicative function ${\chi: \mathbb{N^+} \rightarrow \mathbb{C}}$ that has some integer period ${k}$, for which ${\chi(n) \neq 0 \iff n}$ is relatively prime to ${k}$. The jumping-off point was a conjecture of Sarvadaman Chowla that for the sequence ${\lambda_n}$ above and all ${d \in \mathbb{N}^+}$, the magnitude of

$\displaystyle \sum_{n=1}^N \lambda_n \lambda_{n+d}$

is asymptotically ${o(N)}$. Peter Elliott generalized it to include Dirichlet characters but used an asymptotic condition that was initially too strong. Building on work earlier this year with Kaisa Matomaki and Maksim Radziwill, Tao first repaired Elliott’s conjecture and then made it concrete. We’ll call it TEC for Tao’s Elliott conjecture; we have changed his variable names to try to connect it more to the uniform version of DC above:

Conjecture (TEC). For all ${\epsilon > 0}$ there is ${k_0}$ such that for all ${k \geq k_0}$ there is ${N_k}$ such that for all ${N \geq N_k}$, and all multiplicative sequences ${[a_n]}$ into the unit complex disk: if all Dirichlet characters ${\chi}$ of period at most ${k}$ give that the real part of

$\displaystyle \sum_{p = 1}^N \frac{1}{p} \left(1 - a_p\overline{\chi(p)}p^{-it}\right)$

has absolute value ${\geq k}$ for all ${t}$ with ${|t| \leq kN}$, then for all ${d \leq \frac{1}{\epsilon}\;}$,

$\displaystyle \left|\sum_{n=1}^N a_n \overline{a_{n+d}}\right| \leq \epsilon N.$

There are differences in the connection, most notably that the conclusion bounds a sum over all ${d}$ rather than show it to be unbounded. However, what shakes out in the analysis is that keeping bounded this kind of sum of products over all possible ${d}$-jumps between lily pads (which can have complex life forms not just beetles and flies) enables you to avoid losing a handle on the imbalances from the jumping that Gorf actually does. The two kinds of jumping are set up to be related by a Fourier transform, which characteristically relates constancy in one signal to waviness and periodicity in the other. The connection with products of the form ${a_n \overline{a_{n+d}}}$ is analyzed by an old trick which Tao once discussed here. There is finally an important use of Andrew Granville’s notion of pretending, which we once briefly covered in connection with EDC here, and which became a focal point in much public PolyMath work credited liberally by Tao.

Well we think this is what happens when one looks at the details—we always say read the papers for the details, though this time we haven’t had time to absorb them ourselves. But we will say one final thing: Tao does not prove TEC. No. What he does instead is come up with a way to say it holds “on average” with respect to a separate quantity ${x}$ such that the sum limits ${n}$ to a range ${[\frac{x}{\omega(x)}, x]}$ where ${\omega(x)}$ is unbounded, ${n}$ is put in the denominator of the summand, and the right-hand side’s ${\epsilon N}$ is replaced by ${o(\log \omega(x))}$ as ${x \rightarrow \infty}$. It takes a lifetime of experience with the relevant tools of analysis to recognize when to apply this kind of transformation. But its success might stimulate us to think of more creative ways to use averaging arguments and average-case analysis in our field.

## Open Problems

There is still much to do and digest after this breakthrough. Is any nice constructive bound on ${N}$ in terms of ${C}$ implied by the analysis? Recall that ${C = 2}$ gave ${N = 1,161}$ last year. Gil Kalai and Tao have exchanged some further ideas beginning here.

One thing for sure, the problem of Erdős discrepancy is still a problem. Yogi Berra who passed away yesterday said “it ain’t over ’til it’s over,” and it isn’t over.

[fixed EDC statement with ${C}$ before ${d}$, added example to uniformity statement]

1. September 24, 2015 12:20 pm

Hat-tip to Jeff Shallit for telling us via Gowers’s weblog on Tuesday. Left on the cutting-room floor were further mentions of the Liouville sequence ${\lambda_n}$ including observations about its discrepancy in a 2008 paper by Peter Borwein, Ron Ferguson, and Michael Mossinghof; and speculations on relating averaged discrepancy to vanishing amplitudes of quantum algorithms on certain large enough inputs.

2. September 24, 2015 2:38 pm

While this story is an amazing showing by Terry Tao, it is nice to see how the “useful kibitzing” nature of blog discussions (so typical to polymath projects) plays a role also in the last few decisive days before the conjecture became a theorem.

Tao wrote a blog post in September 6 about sign patterns of Mobius functions for three consecutive integers based on a paper by Matomäki, Radziwiłł, and Tao. On the late morning of September 9, Uwe Stroinski wrote: “The Sudoku-flavor arguments remind me on the EDP Polymath project, where some of us tried to prove (without computer) that completely multiplicative sequences with values in {\pm 1} have discrepancy greater than 3. Can these recent results of Matomaki and Radziwill be used/adapted/generalized to help with this problem or is there some obstacle to make that hopeless ?” Terry replied: that “There is indeed some similarity on the surface,” but was at first rather skeptical on an actual connection.

A second commentator “kodlu” wrote “I’d like to second Uwe Stroinsky’s query, re: EDP for the multiplicative case.” Terry had a third look at it and wrote “Hmm, actually there does appear to be a connection between the EDP and something we looked at in our previous paper, namely Elliott’s conjecture (a generalization of Chowla’s conjecture to arbitrary bounded multiplicative functions). There is in fact a chance that a suitable version of this conjecture implies that all +1,-1 sequences have unbounded discrepancy. I’ll work this out in detail a subsequent blog post.”

This post appeared in September 11 and included the derivation of DC from (the corrected) Eliott’s conjecture. A few days later the conjecture was proved.

Apparently, possible connections with sign patterns of Mobius functions and DC were briefly considered in the 2010 discussion. See this comment by Tao on Gowers’s blog. https://gowers.wordpress.com/2015/09/20/edp28-problem-solved-by-terence-tao/#comment-150152

• September 24, 2015 3:37 pm

Gil, thanks for this account! Incidentally I (Ken) meant to include in the note above that we also left out mentioning the Mo”bius function as an example of a -1,0,1 sequence to consider, so thanks for that remark and links too. We try to keep the posts to within 5 small-“article”-format LaTeX pages.

3. September 24, 2015 3:14 pm

“Then with {d=1} the absolute partial sums stay within {1 + \log_3 k}.”

Is this a typo? Should it say, “with {d=k}”?

• September 24, 2015 4:11 pm

Did mean d=1; the partial sum is of the first k terms. Could say more about it… Our previous discussion of this example here was based on the example near the top of Tim Gowers’s 2009 post on EDC. The sequence begins (n = 1 to 30) 1,2,1,1,2,2,1,2,1,1, 2,1,1,2,2,1,2,2,1,2, 1,1,2,2,1,2,1,1,2,1… so unlike the simple alternating ones the case d=1 has nontrivial behavior.

4. September 25, 2015 11:34 am

“Notions related to discrepancy can be extended to these sequences, perhaps summing just the real parts.”

I wonder if taking just the real part doesn’t actually allow a multiplicative sequence with bounded discrepancy. (Without multiplicativity you could just color everything with $i$.) The safer (and standard) choice here is to actually add up the complex numbers and take the norm of the sum. The same thing can be done for coloring with higher-dimensional vectors. In fact, Terry’s proof in fact shows a lower bound for coloring with vectors in arbitrary dimension, which is equivalent to proving a lower bound for the natural SDP relaxation of EDP. So implicitly he is showing that there exists a family of dual solutions with unbounded value. One interesting problem is construct such a family explicitly.

The best upper bound for vector colorings is $O(\sqrt{\log N})$, so another natural open problem is to show this is tight.

As you note, one of the technical reasons EDP is hard is that a Dirichlet character provides a coloring that is large on average and has bounded discrepancy. It was very interesting to me that Terry’s proof also works with colorings that are large on average, but the average is taken differently: a random number is picked by picking random powers for the primes in its factorization. Under this distribution, a Dirichlet character is in fact quite small on average, as most of the probability mass is on numbers that have all small primes as factors.

5. September 28, 2015 4:35 am

You say “Given the sequence {[a_n]} and an initial jump {d}, what we care about is whether there is a {C > 0} such that for all {k},
\displaystyle \left|\sum_{i=1}^k a_{di}\right| \leq C.”
This is a little different from DC and there is indeed a sequence satisfying Gorf, which can be found in Tao’s Remark 1.13. The sequence in question being f(jD!+k)=(-1)^jf(k) for k=1,…,D!, j=1,…,D.

• September 28, 2015 10:26 am

Thanks very much! Dick had queried that but I thought the uniformity handled it. I have adjusted the wording accordingly and added a note to the paragraph on uniformity.