Factoring, factoring, the whole day through, keeps on my mind—apologies to Hoagy Carmichael and Stuart Gorrell

 Waterloo Mathematics source

Michael Rubinstein is an expert on number theory, who is on the faculty of the University of Waterloo. He is one of the organizers of a 61${^{st}}$-birthday symposium being held December 15–19 in Princeton for my friend and former colleague, Peter Sarnak. I guess it is a matter of taste for a number theorist whether to observe a birthday with a lot of factors (60) or a prime (59 or 61). Rubinstein also does extensive experimental mathematics and lists several code libraries below his publications on his website, which also has interesting articles on the math, history, and practice of musical tuning.

Today Ken and I wish to discuss a paper of his on one of my favorite problems: integer factoring.

The paper appears in the 2013 volume of the electronic journal INTEGERS, one of whose sponsors is the University of West Georgia. It is titled, “The distribution of solutions to ${xy = N \pmod{a}}$ with an application to factoring integers.” He studies the structure of solutions to ${xy = N}$ (mod a), and uses this to prove the following result:

Theorem 1 For any ${\epsilon>0}$, there is a deterministic factoring algorithm that runs in ${O(N^{1/3+\epsilon})}$ time.

The “${+ \epsilon}$” hides sub-linear terms including ${\exp(O(\frac{\ln n}{\ln\ln n}))}$ coming from the divisor bound, logarithmic terms from the overhead in nearly-linear time integer multiplication, and related sources.

## The Factoring Divide

Factoring algorithms partition into two types: unprovable and provable. The unprovable algorithm usually use randomness and/or rely for their correctness on unproved hypotheses, yet are observed to be the fastest in practice for numbers of substantial size. The provable algorithms are usually deterministic, but their key feature is that their correctness is unconditional.

For those trying to factor numbers, to break codes for example, they use the fastest unprovable algorithms such as the general number field sieve (GNFS). The cool part of factoring is that one can always check the result of any algorithm quickly, so anytime an unprovable algorithm fails, it fails in a visible manner.

Why care then about slower algorithms that are provable? Indeed. The answer is that we would like to know the best provable algorithm for every problem, and that includes factoring. We are also interested in these algorithms because they often use clever tricks that might be useable elsewhere in computational number theory. But for factoring there is a special reason that is sometimes hidden by the notation. The GNFS has postulated runtime

$\displaystyle L_N[1/3,c] = \exp((c + o(1))(\ln N)^{1/3}(\ln\ln N)^{2/3})$

where ${c = \sqrt{64/9}{3} < 2}$. This is not a similar order to ${N^{1/3}}$. To see this more clearly, let ${n}$ be the length of ${N}$ in binary, so ${N \approx 2^n}$. Then the GNFS is roughly and slightly above ${2^{n^{1/3}}}$, but ${N^{1/3}}$ is ${2^{\frac{1}{3}n}}$, which is not only miles bigger, it is a properly exponential function.

Nobody knows a deterministic factoring algorithm that beats properly exponential time.

The unprovable algorithms taunt and tease us, because they hold out the promise of being able to beat exponential time, but nobody knows how to prove it. Indeed, as Rubinstein remarks in his intro, each “teaser” is a conceptual child of one of the exponential algorithms. A reason to care about his new algorithm is its potential to be a new way to attack factoring, even though it currently loses to the best known provable methods. These are all slight variations of the Pollard-Strassen method, all running in ${O(N^{1/4+\epsilon})}$-type times; there are also ${O(N^{1/5+\epsilon})}$ algorithms assuming the generalized Riemann hypothesis. See this for details.

## The New Factoring Algorithm

Most factoring algorithms—even Peter Shor’s quantum algorithm—use the idea of choosing a number ${a \ll N}$ and working modulo ${a}$. If ${a}$ and ${N}$ share a factor then we can quickly find it by Euclid’s gcd algorithm, while otherwise the problem of finding ${x,y < a}$ such that ${xy \equiv N \pmod{a}}$ provides useful information. The starting point of Rubinstein’s algorithm is to do this for values ${a}$ that are right near each other, and compare the values obtained.

In its outermost structure, it uses this fact: Suppose ${N = pq}$ where ${p}$ and ${q}$ are primes of the same length, so ${p < q < 2p}$. Suppose ${a \approx N^{1/3}}$. Then if we write

$\displaystyle \begin{array}{rcl} p &=& u_1 a + u_0\\ q &=& v_1 a + v_0, \end{array}$

with ${u_0,v_0 < a}$ (note they are relatively prime to ${a}$ or else gcd gives us a factor) we get that ${u_1}$ and ${v_1}$ are fairly small, about ${N^{1/6}}$. They are also roughly bounded by ${p^{1/3}}$ and by ${a^{1/2}}$. Multiples of them will stay small. So now let us work modulo ${a - \delta}$ for ${\delta = 1,2,3\dots}$. We have:

$\displaystyle \begin{array}{rcl} p &=& u_1 a + u_0 = u_1(a - \delta) + (u_0 + \delta u_1)\\ q &=& v_1 a + v_0 = v_1(a - \delta) + (v_0 + \delta v_1), \end{array}$

so

$\displaystyle N \equiv (u_0 + \delta u_1)(v_0 + \delta v_1) \pmod{a - \delta}.$

Now an issue is that the values ${x_{\delta},y_{\delta}}$ giving ${x_{\delta}y_{\delta} \equiv N \pmod{a - \delta}}$ are far from unique. However, among them as ${\delta = 0,1,2,3\dots}$ will be the collinear points

$\displaystyle (u_0,v_0),\;(u_0 + u_1, v_0 + v_1),\;(u_0 + 2u_1,v_0 + 2v_1),\; (u_0 + 3u_1,v_0 + 3v_1).$

The increments ${u_1,v_1}$ are small enough that the points stay close to each other in Euclidean space. If we divide the space into boxes of the right size and offset, they will all stay inside one box. If we can search a box well enough to find them and get all of ${u_0,u_1,v_0,v_1}$ exactly, then we get ${p}$ and ${q}$. Moreover—and this is why I like factoring—once we find them we can verify that we have the right ones; if the check fails and we were fooled by ${\delta}$-many other points we can move on.

Using ${u_1,v_1 < a^{1/2}}$, Rubinstein is able to show that the amortized number of “foolers” in a ${a^{1/2} \times a^{1/2}}$ square is ${O(1)}$. Since there are ${O(a)}$-many such squares and ${a \approx N^{1/3}}$, we get the target runtime. Note this is amortized, not just expected, so the algorithm is deterministic. The most clever and painstaking aspect is that estimates of the asymptotic convergence of solutions of ${xy \equiv N \pmod{a}}$ to uniform distribution on ${[a] \times [a]}$ are needed to get the “amortized” part. The details become complicated, but Rubinstein writes in an engaging top-down manner, and he includes a section on how this might—might—be used to break the sub-exponential barrier.

## A Related Question

I found this work interesting not just because it is a new approach to factoring. I have tried in the past to prove the following type of result: Is there an asymmetric cryptosystem that is breakable if and only if factoring can be done in polynomial time?

I want to replace systems like AES with ones that uses the hardness of factoring for their security. Systems like AES rely on intuition and experimental testing for their security—there is not even a conditional proof that they are secure.

My goal is really trivial. Any public-key system can be viewed as an asymmetric system. But what I want is that the encryption and decryption should be very fast. This is one of the reasons that modern systems use private-key systems to create symmetric keys: performance. Using public-key systems for all messages is too slow.

My idea is not far in spirit from what Rubinstein does in his factoring algorithm. This is what struck me when I first read his paper. His algorithm is slow because it has no idea which “box” to look at. Can we share some secret that would allow ${N}$ to be factored faster, yet still make it hard for those without the secret?

## Open Problems

Can we extend Rubinstein’s algorithm to break the ${N^{1/4}}$ barrier? Can his methods be used as described above to create asymmetric systems that are based on factoring? What is the real cost of factoring?

Our own Ken joins the team for TEDx

Adrienne Bermingham is the manager of this year’s TEDx Buffalo event, which will be held this Tuesday at the Montante Center of Canisius College in downtown Buffalo.

Today I wish to proudly announce that our own Ken Regan is one of the presenters at this year’s event.

Applying deep number theory to algorithms

 FullWiki source

Alan Baker won the Fields Medal in 1970 for his work on algebraic and transcendental numbers. He greatly extended a theorem of Aleksandr Gelfond and Theodor Schneider which had solved the seventh problem on David Hilbert’s famous list. Baker’s extension says that if ${\alpha_1,\dots,\alpha_k}$ are algebraic numbers other than ${0}$ or ${1}$, and if ${\beta_1,\dots,\beta_k}$ are irrational algebraic numbers that together with ${1}$ are linearly independent over the rationals, then the product of ${\alpha_i^{\beta_i}}$ for ${i = 1}$ to ${k}$ is transcendental. Hilbert had stated the ${k=1}$ case, which Gelfond and Schneider solved, and believed it would be harder than the Riemann Hypothesis.

Today Ken and I want to talk about computing numbers to high precision and their relationship to our recent discussion of Freeman Dyson’s challenge. Read more…

It’s harder than you think

William Kahan is a numerical analyst and an expert on all things about floating point numbers. He won the 1989 Turing award for his pioneering work on how to implement floating point computations correctly. He has also been a tireless advocate for the care needed to avoid insidious numerical bugs, and his strong feelings come out in a 1997 interview for Dr. Dobbs’ Journal.

Today I want to talk about one of his results on how to sum a series of numbers.

Can the impossible become practical?

 Complex analysts gallery source

Tibor Radó was a mathematician who was the PhD advisor to a number of computer scientists. They include Shen Lin, Paul Young, and several others. All were his students while he was at Ohio State University. He is not the Rado of the Erdős-Ko-Rado theorem—that is Richard Rado, without an accent. Tibor Radó had an Erdős Number of 3, through Lin and then Ron Graham.

Today we want to continue to talk about the famous “Busy Beaver” problem which we mentioned in our discussion on the Microsoft SVC lab closing. Read more…

 Microsoft Research source

Omer Reingold is a brilliant researcher, and now all can see that he is also a very classy and thoughtful person. As you all know by now, he and many others have suddenly lost their jobs with the closing of Microsoft’s Silicon Valley Campus (SVC) Research Lab. The lab closed this week: it has been removed from Microsoft’s “Research labs worldwide” page. The closing affects about 50 jobs directly, and is part of an overall reduction of about 18,000 staff worldwide.

Today, Ken and I wish to express some of our feelings about this. At first we thought we could not add to what others have already said about the closing, but we realized we could still express support and add to the wider conversation.