How some longstanding open problems were made to disappear

Ernie Croot, Vsevolod Lev, and Péter Pach (CLP) found a new application of polynomials last month. They proved that every set ${A \subseteq U = \mathbb{Z}_4^n}$ of size at least ${|U|^{1-\epsilon}}$ has three distinct elements ${a,b,c}$ such that ${2b = a + c}$. Jordan Ellenberg and Dion Gijswijt extended this to ${\mathbb{F}_q^n}$ for prime powers ${q}$. Previous bounds had the form ${|U| n^{-c}}$ at best. Our friend Gil Kalai and others observed impacts on other mathematical problems including conjectures about sizes of sunflowers.

Today we congratulate them—Croot is a colleague of Dick’s in Mathematics at Georgia Tech—and wonder what the breakthroughs involving polynomials might mean for complexity theory.

What’s amazing is that the above papers are so short, including a new advance by Ellenberg that is just 2 pages. In his own post on the results, Tim Gowers muses:

[The CLP argument presents a stiff challenge to my view that] mathematical ideas always result from a fairly systematic process—and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them. …[T]he argument has a magic quality that leaves one wondering how on earth anybody thought of it.

We don’t know if we can explain the source of the ‘magic’ but we will try to describe it in a way that might help apply it.

## Rank and Degree

At top level there is no more sleight-of-hand than a simple trick about matrix rank. We discussed ideas of rank some time ago.

If a matrix ${H}$ is a sum of ${s}$ matrices each of rank at most ${r}$, then any condition that would force ${H}$ to have rank ${> rs}$ must be false.

A simple case is where the condition zeroes every off-diagonal element of ${H}$. Then the main diagonal can have at most ${rs}$ nonzero entries. This actually gets applied in the papers. The fact that column rank equals row rank also helps for intuition, as Peter Cameron remarks.

A second trick might be called “degree-halving”: Suppose you have a polynomial ${P(x_1,\dots,x_n)}$ of degree ${d}$. Even if ${P}$ is irreducible, ${P}$ might be approximated or at least “subsumed” term-wise by a degree-${d}$ product ${P' = QR}$. When ${P}$ is multi-linear, or at least of bounded degree ${k}$ in each variable—call this ${P \in S_{d,k}}$—we may get ${P' = Q(x_1,\dots,x_m)\cdot R(x_{m+1},\dots,x_n)}$ where ${m}$ is close to ${n/2}$.

In any case, at least one of ${Q,R}$ must have degree at most ${d/2}$, say ${Q}$. If we can treat ${R}$ and its variables as parameters, maybe even substitute them by well-chosen constants, then we are down to ${Q}$ of degree ${d/2}$. Then ${Q}$ is a sum of terms each having a monomial of total degree ${d/2}$ in variables ${x_1,\dots,x_m}$ each with power at most ${k}$.

The number ${N(m,d/2,k)}$ of such monomials is relatively small. This limits the dimension of spaces spanned by such ${Q}$, which may in turn connect to the bound ${rs}$ above and/or limit the size of exceptional subsets ${A}$ of the whole space ${U}$. We discussed Roman Smolensky’s famous use of the degree-halving trick in circuit complexity here.

## Polynomial Leverage

These tricks of linear algebra and degree are all very well, but how can we use them to attack our problem? We want to bound the size of subsets ${A}$ having no element ${a}$ such that for some nonzero ${r \in U}$, ${a}$, ${a+r}$, and ${a + 2r}$ all belong to ${A}$. This is equivalent to ${A}$ having no three elements ${a,b,c}$ such that ${a + b = 2c}$. This means that the following two subsets of ${U}$ are disjoint:

$\displaystyle \begin{array}{rcl} D_A &=& \{2c: c \in A\};\\ S_A &=& \{a+b: a \in A, b \in A, a \neq b\}. \end{array}$

How can we use polynomials to gain leverage on this? The insight may look too trivial to matter:

Any polynomial supported only on ${D_A}$ must vanish on ${S_A}$.

Let ${C}$ be the complement of ${D_A}$ and let ${V = V_{C,d,k}}$ be the space of polynomials ${P}$ vanishing on ${C}$ that belong to our set ${S_{d,k}}$. We can lower-bound the size of ${V}$ by observing that the evaluation map ${E}$ from ${P}$ to the graph of its values on ${C}$ is a linear transformation. Its image has size at most ${q^{|C|}}$, and since ${V}$ is the kernel, we have ${|S_{d,k}|/|V| \leq q^{|C|}}$, so ${|V| \geq q^{N(n,d,k) - |C|}}$.

Well, this is useless unless ${|C| < N(n,d,k)}$, but ${C}$ is the complement of ${D_A}$ which is no bigger than the set ${A}$ we are trying to upper-bound. So it is useless—unless ${N(n,d,k)}$ is pretty big. So we need to choose ${d}$—and maybe ${k}$—to be not so low. We can do this, but how can this lower bound on ${V}$ help? We need a “clashing” upper bound. This is where the presto observation by CLP came in.

## Magic Box

Given the set ${A}$, make a matrix whose entry in row ${a}$, column ${b}$, is ${a + b}$. In APL notation this is the “${+}$ outerproduct” of ${A}$ with itself. Its diagonal is ${D_A}$ and the rest is ${S_A}$.

Now apply ${P}$ to every entry to get a matrix ${M_P}$. By ${P \in V}$ every off-diagonal entry vanishes, so ${M_P}$ is a diagonal matrix. Its rank is hence the number ${\rho}$ of nonzero diagonal entries. If we can upper-bound ${\rho}$, then we can upper-bound ${|V|}$ by the hoc-est-corpus rubric of description complexity:

Every ${P \in V}$ can be described by its up-to-${\rho}$ nonzero values on ${D_A}$, so there are at most ${\sum_{r \leq \rho} (q-1)^{r} \binom{|A|}{r} < q^{\rho} \binom{|A|}{\rho}}$ of them.

The papers use bounds on the dimension of ${V}$ in place of description complexity, but this is enough to see how to get some kind of upper bound. Since ${|C| = q^n - |D_A| \geq q^n - |A|}$, taking logs base ${q}$ gives us:

$\displaystyle N(n,d,k) - q^n + |A| \leq \log_{q}(V) \leq \rho + \log_q \binom{|A|}{\rho}.$

It remains to bound ${\rho}$, but it seems to take X-ray vision just to see that a bound can give us anything nontrivial. OK, any fixed bound on ${\rho}$ makes the right-hand side only ${O(\log |A|)}$ which yields a contradiction, so there is hope. The rank trick combines with degree-halving to pull a bound involving ${N(n,d/2,k)}$ out of the hat. Here is the version by Ellenberg and Gijswijt where the nonce choice ${k = q-1}$ suffices and the coefficients ${(1,1,-2)}$ on ${a,b,c}$ in ${a + b - 2c = 0}$ are replaced by a general triple ${(\alpha,\beta,\gamma)}$ such that ${\alpha + \beta + \gamma = 0}$:

Lemma 1 With ${A \subseteq \mathbb{F}_q^n}$, ${\alpha,\beta,\gamma \in \mathbb{F}_q}$, and ${d,k}$ as above, put ${V'}$ to be the set of polynomials in ${S_{d,k}}$ that vanish on ${S'_{A} = \{\alpha a + \beta b: a,b \in A, a \neq b\}}$. Then for all ${P \in V',}$ there are at most ${2N(n,d/2,k)}$ values ${c \in A}$ for which ${P(-\gamma c) \neq 0.}$

Proof: Let ${x = (x_1,\dots,x_n)}$ and ${y = (y_1,\dots,y_n)}$ be vectors of variables, and write

$\displaystyle P(\alpha x + \beta y) = \sum_{\mu_1,\mu_2} c_{1,2}\mu_1(x)\mu_2(y),$

where each coefficient ${c_{1,2}}$ is in ${\mathbb{F}_q}$ and the sum is over pairs of monomials ${\mu_1,\mu_2}$ whose product has degree at most ${d}$ and at most ${k}$ in any variable. Collect the terms in which ${\mu_1}$ has total degree at most ${d/2}$ separately from those where ${\mu_2}$ does, so that we get

$\displaystyle P(\alpha x + \beta y) = \sum_{\mu} \mu(x) F_{\mu}(y) + G_{\mu}(x) \mu(y),$

where each ${F_{\mu}}$ and ${G_{\mu}}$ is an arbitrary function and now the sum is over monomials ${\mu}$ of total degree at most ${d/2}$ (and still no more than ${k}$ in any variable if we care). Now look at the matrix ${M'_P}$ whose ${(a,b)}$ entry is ${P(\alpha a + \beta b)}$, including the diagonal where ${a = b}$. We have

$\displaystyle M'_P(a,b) = \sum_{\mu} \mu(a) F_{\mu}(b) + G_{\mu}(a) \mu(b).$

This is a sum of at most ${2N(n,d/2,k)}$ single-entry matrices, so the rank of ${M'_P}$ is at most that. Since ${M'_P}$ is a diagonal matrix and ${b = a}$ makes ${\alpha a + \beta a = -\gamma a}$, there are at most ${2N(n,d/2,k)}$ nonzero values ${P(-\gamma c)}$ over ${c \in A}$. $\Box$

Sawing the degree in half stacks up ${N(n,d,k)}$ against ${N(n,d/2,k)}$. We retain freedom to choose ${d}$ (and possibly ${k}$) to advantage. There are still considerable numerical details needed to ensure this works and tweaks to tighten bounds—for which we refer to the papers—but we have shown the “Pledge,” the “Turn,” and the “Prestige” of the argument.

## Open Problems

Can you find more applications of the polynomial technique besides those enumerated in the papers and posts we have linked? For circuit complexity we’d not only like to go from ${\mathbb{F}_q^n}$ back to ${\mathbb{Z}_q^n}$ as CLP have it, but also get results for ${\mathbb{Z}_q^n}$ when ${q}$ is not a prime power. Can we make assumptions (for sake of contradiction) that create situations with higher “leverage” than ${D_A,S_A}$ merely being disjoint?

[changed subtitle; linked hoc-est-corpus which literally means, “here is the body”; deleted and changed remarks before “Open Problems”; inserted tighter sum into description complexity formula.]

1. June 15, 2016 3:36 am

Can you explain the last sentence before “Open Problems”, about needing full generality? I can’t work out what it means: what can you do for general $(\alpha,\beta,\gamma)$ that you can’t do for $(1,1,-2)$?

• June 15, 2016 6:43 am

It was based on E+G’s remarks about the case ${\gamma = 0}$ but I see it’s not needed as a qualifier. Thanks.

2. June 15, 2016 12:57 pm

Another amazing result that follows from these techniques for one of my favorite problems: http://arxiv.org/pdf/1606.01230v2.pdf. Fox and LM Lovasz improve the bounds for the arithmetic triangle removal lemma dramatically, from a tower of two’s to polynomial!

3. June 15, 2016 3:42 pm

Finally someone mentions Smolensky! His proof was the first thing that came to my mind too after reading the CLP paper.