A win for multi-variable polynomials

 Composite crop from src1, src2

David Harvey and Joris van der Hoeven are computational mathematicians. They have just released a paper showing how to multiply two ${n}$-bit integers in ${O(n\log n)}$ time. A companion paper gives details of a simpler second algorithm that is ${O(n\log n)}$ time pending a widely-believed conjecture about primes.

Today we discuss what makes these new results possible.

The first paper leads off with a nice history of algorithms that beat the ${O(n^2)}$-time method all of us learned in school. Far from being galactic, the longtime-champion ${O(n\log n \log\!\log n)}$ algorithm of Arnold Schönhage and Volker Strassen is a workhorse of the GNU Multiple Precision Library (GMP). The ${\log\!\log n}$ factor was improved by Martin Fürer in 2007 and further in 2014 by Harvey and van der Hoeven with Grégoire Lecerf. Now it is gone altogether.

Their algorithm is currently not practical, but it is also not yet tuned. At the start of section 5, they note that their estimates take effect for ${n \geq 2^{d^{12}}}$ where they use ${d = 1,\!729}$. When they note that ${2^{d^{12}} > 2^{4,\!096}}$ they don’t mean the numbers need at least ${4,\!096}$ bits, rather ${2^{4,\!096}}$ is the number of bits. Since there are only about ${2^{270}}$ particles in the observable universe, this puts the term galactic to shame.

The number ${1,\!729}$ recalls a famous story about Godfrey Hardy and Srinivasa Ramanujan. This is not wholly a coincidence. They get a bounding multiplier of ${36}$ from one place, another factor of ${6}$ from another, and finally a factor of ${8}$ giving ${1,\!728}$ which equals ${12^3}$. Then they just make ${d}$ be higher by ${1}$, setting up ${1,\!729 = 12^3 + 1^3}$ which happens to equal ${9^3 + 10^3}$. But there is much flex in the choices and they sketch how to improve the needed bound from ${1,\!728}$ to just ${8}$. They also uncover some new tricks for calculating in certain large finite rings defined via multi-variable polynomials.

## Polynomials and Tricks

When we heard the news on Tuesday, both of us thought of the famous primality test of Manindra Agrawal, Neeraj Kayal, and Nitin Saxena (AKS), which placed deciding primality into deterministic polynomial time. We stated the message as the title of an old post on other joint work by Kayal: “Polynomials Are Easier Than Integers.”

The crux of the AKS test is to employ the theorem that for any co-prime ${(a,N)}$, the polynomial congruence relation

$\displaystyle (x + a)^N \equiv x^N + a \pmod{N}$

holds in ${\mathbb{Z}[x]}$ if and only if ${N}$ is prime. This is tested in the quotient ring ${\mathbb{Z}[x]/(x^r - 1)}$. The game becomes finding ${r}$ small enough and minimizing the number of ${a}$ that need to be checked in order for the body of tests in the quotient ring to be conclusive.

We take time to note that the case ${a = N-1}$ is possibly sufficient while testing a small set of ${r}$. Agrawal has conjectured that if ${(x - 1)^N \equiv x^N - 1}$ modulo both ${N}$ and ${x^r - 1}$ then either ${N}$ is prime or ${r}$ divides ${N^2 - 1}$. This conjecture is doubted on heuristic counting grounds but it has been verified for ${N < 10^{12}}$ and can be modified by including the case ${a = 2}$. It would improve the polynomial time in the number ${n}$ of bits in ${N}$ from ${\tilde{O}(n^6)}$ to ${\tilde{O}(n^3)}$ and also improve the constants. We mention this because Harvey and van der Hoeven have a number-theoretic conjecture that (as shown in their companion paper) improves both their constant on ${n\log n}$ and the simplicity of the proof.

The further message of the new papers is added power from giving the polynomials more variables. They use ${d}$ variables: ${x_1,\dots,x_{d-1}}$ plus a nominal ${x_d}$ that is later substituted by ${e^{\pi i/r} y}$. Here ${r}$ is a certain power of ${2}$ and combined with the ${d}$-th variable ${y}$ to build the base ring ${R}$ of complex polynomials in ${y}$ modulo ${y^r + 1}$. The other variables are used to mod out by ${(x_1^{t_1} - 1,\dots,x_{d-1}^{t_{d-1}} - 1)}$ where the ${t_i}$ are powers of ${2}$ no higher than ${2r}$. Operations in quotient rings of this type are especially efficient. They are used to emulate operations in ${\mathbb{Z}[x]/(x^s - 1)}$ where ${s}$ is a product of primes ${s_1\cdots s_d}$ that are near powers of ${2}$. The emulation is done via the isomorphism

$\displaystyle \mathbb{Z}[x]/(x^{s_1\cdots s_d} - 1) \cong \mathbb{Z}[x_1,\dots,x_d]/(x_1^{s_1} - 1,\dots,x_d^{s_d} - 1), \ \ \ \ \ (1)$

which follows from the Chinese remainder theorem.

They start with Schönhage and Strassen’s original trick of splitting the two ${n}$-bit integers into ${\Theta(\log n)}$-sized pieces so that the problem reduces to multiplying polynomials of degree ${\Theta(n/\log n)}$ in ${\mathbb{Z}[x]}$ with coefficients of ${O(\log n)}$ size. They use the ${d}$-dimensional Discrete Fourier Transform over ${\mathbb{C}}$ with complex operations that need only ${O(\log n)}$ bits of precision individually. This setting also gives leeway for approximative methods in their final algorithm. But first we describe what was evidently their original strategy.

## The Recursive Strategy

Their first task in either case is to find ${d}$-many distinct primes ${s_1,\dots,s_d}$ of order ${(n/\log n)^{1/d}}$ in magnitude. Now when ${d = 1,\!729}$ the exponent ${1/d}$ makes the primes quite small indeed, and the need to find distinct ones already drives ${n}$ up quite high before their algorithm can even go into effect. Moreover, for the approximation to work, the primes need to be congruent to ${1}$ modulo ${r}$.

Their original strategy—detailed in the companion paper—builds on a result of Charles Rader to reduce the DFT computation to multiplications in

$\displaystyle \mathbb{C}[x_1,\dots,x_d]/(x_1^{s_1 - 1} - 1,\dots,x_d^{s_d - 1} - 1),$

where compared to (1) they have reduced the exponents by one. Using ${s_i \equiv 1 \pmod{r}}$, they next reduce this to multiplications in

$\displaystyle \mathbb{C}[x_1,\dots,x_d]/(x_1^{r} - 1,\dots,x_d^{r} - 1).$

Then the above-mentioned substitution of ${x_d}$ by ${e^{\pi i/r} y}$ makes them into products over

$\displaystyle R[x_1,\dots,x_{d-1}]/(x_1^{t_1} - 1,\dots,x_{d-1}^{t_{d-1}} - 1) \qquad\text{with}\qquad R = \mathbb{C}[y]/(y^r + 1).$

These last products are at last amenable to the usual Fast Fourier Transform trick of applying FFT for each ${t_i}$ by pretending that ${y^{2r/t_i}}$ is a primitive ${t_i}$-th root of unity, multiplying pointwise in ${R}$, and doing the inverse FFT. Finally the products in ${R}$ can be converted into ${O(\log n)}$-bit integer products, and this sets up the vast recursion from multiplying the original ${n}$-bit integers to a batch of ${O(\log n)}$-bit ones.

## Finding Good Primes

In order for the recursion to solve to ${O(n\log n)}$, they need the values ${q_i = (s_i - 1)/r}$ to be not too large. Viewing this task from the bottom up, they want to choose a single value ${r}$ and choose small ${q_i}$ such that ${s_i = rq_i + 1}$ is prime. Since ${r}$ is a power of ${2}$, there is some resemblance to the harder task of finding Sophie Germain primes, which are also relevant to the AKS algorithm. In this case, however, all they need is a sufficiently small ${\epsilon > 0}$ such that:

For all coprime ${a,k}$ with ${k}$ sufficiently large, there is a prime ${q < k^{1+\epsilon}}$ such that ${q \equiv a \pmod{k}}$.

This is, however, only known for “${\epsilon}$${= 4.18}$ and the Generalized Riemann Hypothesis is only known to improve the bound to ${k^{2+\delta}}$ for any ${\delta > 0}$. The above is conjectured to hold for any ${\epsilon > 0}$ but it hasn’t been proved. They currently need it for ${\epsilon = 1/303}$.

The companion paper uses the conjecture to improve multiplication in the field ${\mathbb{F}_{p^\ell}}$ to time ${O(n\ell \log(p)(\log(n) + \log(\ell) + \log\!\log(p))}$. This uses a single algorithm over any prime ${p}$ and power ${\ell}$; of course if they are fixed, then this time likewise becomes ${O(n\log n)}$. This currently needs ${\epsilon = 1/2^{1,\!162}}$.

The companion paper has full details of how all this works and many more numerical tricks than we’ve mentioned—a tour de force. Getting the result for integer multiplication unconditionally, however, requires one more major tool in their arsenal. To use it they set up the primes ${s_i}$ to be slightly below respective powers of 2 ${t_i}$ (rather than be above a single power of ${2}$ ${r}$ by the aforementioned factors ${q_i}$) while retaining ${t_1\cdots t_d = O(s_1 \cdots s_d)}$.

Their innovation compared to the translations in the first strategy is that one can still reduce the DFT computation in dimensions ${s_1,\dots,s_d}$ needing ${O(\log n)}$ precision to a complex DFT in dimensions ${t_1,\dots,t_d}$ by employing approximation. Their algorithm is completely deterministic but it employs weights that conform to Gaussian distributions to effect the approximation. The preservation of Gaussians under Fourier transforms enables close enough approximations to be obtained by solving relatively small and easily-specified systems of linear equations. The resulting complex DFT is then massaged in a manner similar to the first strategy. The resulting recursion is not quite as steep—it goes from ${n}$ to ${n' = n^{1/d + o(1)}}$—but it gives

$\displaystyle M(n) = \frac{Kn}{n'} M(n') + O(n\log n)$

with ${K}$ and the ${O(n\log n)}$ term independent of ${d}$ (indeed, ${K}$ becomes the aforementioned ${1,\!729}$). This gives ${M(n) = O(n\log n)}$. The approximation technique does not work for multiplication in ${\mathbb{F}_{p^\ell}}$, however. As usual, here is where we say to consult the papers for further details—there is a wealth of them.

## Open Problems

Will the ${O(n\log n)}$ time be concretely improvable? Will it tell us anything about the problem of proving a super-linear lower bound? And short of those two goals, what new results will become possible through the multi-variable polynomial techniques they employ?

1. March 29, 2019 10:03 am

Well, the O(n log n) upper bound won’t be improved without refuting a central conjecture in network coding 🙂

https://arxiv.org/abs/1902.10935

• March 31, 2019 9:16 pm

Highly interesting, thanks!

2. April 14, 2019 2:26 pm

The P versus NP problem is a major unsolved problem in computer science. This is considered by many to be the most important open problem in the field. If any single NP-complete problem can be solved in polynomial time, then every NP problem has a polynomial time algorithm. We prove the known NP-complete problem BETWEENNESS can be solved in polynomial time. We also provide a Computer Code and Software which decides the instances of this problem in polynomial time.

The proof and code is in GitHub Inc:

3. April 15, 2019 3:21 pm

I made my first release of the solution of the NP-complete problem BETWEENNES on GitHub Inc. and I uploaded the binary so you could test it without the compilation of the source and report any error or check some examples.

Here is the release:

4. April 16, 2019 7:28 pm

After a feedback of some researchers, I found a bug in the first pre-release. I tried to fix it and I think I could do it. I want to thank you all for your advice after reading these comments and my emails. The new solution is released in

and fixes the problem of transitivity that must require a solution. Since this step was missing in the first release there were No instances that output Yes by the program. After I fixed the problem and tested again, then this works fine this time.

5. April 18, 2019 3:11 pm

We are still improving the code and the theoretical proof after the feedback of some good people who have approach to help. Thanks to them, I have a new version improved in the pre-release betweenness-1.3 in

Any comment or advice are always welcome.