Factoring, factoring, the whole day through, keeps on my mind—apologies to Hoagy Carmichael and Stuart Gorrell

Michael Rubinstein is an expert on number theory, who is on the faculty of the University of Waterloo. He is one of the organizers of a 61 ${^{st}}$-birthday symposium being held December 15–19 in Princeton for my friend and former colleague, Peter Sarnak. I guess it is a matter of taste for a number theorist whether to observe a birthday with a lot of factors (60) or a prime (59 or 61). Rubinstein also does extensive experimental mathematics and lists several code libraries below his publications on his website, which also has interesting articles on the math, history, and practice of musical tuning.

Today Ken and I wish to discuss a paper of his on one of my favorite problems: integer factoring.

The paper appears in the 2013 volume of the electronic journal INTEGERS, one of whose sponsors is the University of West Georgia. It is titled, “The distribution of solutions to ${xy = N \pmod{a}}$ with an application to factoring integers.” He studies the structure of solutions to ${xy = N}$ (mod a), and uses this to prove the following result:

Theorem 1 For any ${\epsilon>0}$, there is a deterministic factoring algorithm that runs in ${O(N^{1/3+\epsilon})}$ time.

The “ ${+ \epsilon}$” hides sub-linear terms including ${\exp(O(\frac{\ln n}{\ln\ln n}))}$ coming from the divisor bound, logarithmic terms from the overhead in nearly-linear time integer multiplication, and related sources.

## The Factoring Divide

Factoring algorithms partition into two types: unprovable and provable. The unprovable algorithm usually use randomness and/or rely for their correctness on unproved hypotheses, yet are observed to be the fastest in practice for numbers of substantial size. The provable algorithms are usually deterministic, but their key feature is that their correctness is unconditional.

For those trying to factor numbers, to break codes for example, they use the fastest unprovable algorithms such as the general number field sieve (GNFS). The cool part of factoring is that one can always check the result of any algorithm quickly, so anytime an unprovable algorithm fails, it fails in a visible manner.

Why care then about slower algorithms that are provable? Indeed. The answer is that we would like to know the best provable algorithm for every problem, and that includes factoring. We are also interested in these algorithms because they often use clever tricks that might be useable elsewhere in computational number theory. But for factoring there is a special reason that is sometimes hidden by the notation. The GNFS has postulated runtime $\displaystyle L_N[1/3,c] = \exp((c + o(1))(\ln N)^{1/3}(\ln\ln N)^{2/3})$

where ${c = \sqrt{64/9}{3} < 2}$. This is not a similar order to ${N^{1/3}}$. To see this more clearly, let ${n}$ be the length of ${N}$ in binary, so ${N \approx 2^n}$. Then the GNFS is roughly and slightly above ${2^{n^{1/3}}}$, but ${N^{1/3}}$ is ${2^{\frac{1}{3}n}}$, which is not only miles bigger, it is a properly exponential function.

Nobody knows a deterministic factoring algorithm that beats properly exponential time.

The unprovable algorithms taunt and tease us, because they hold out the promise of being able to beat exponential time, but nobody knows how to prove it. Indeed, as Rubinstein remarks in his intro, each “teaser” is a conceptual child of one of the exponential algorithms. A reason to care about his new algorithm is its potential to be a new way to attack factoring, even though it currently loses to the best known provable methods. These are all slight variations of the Pollard-Strassen method, all running in ${O(N^{1/4+\epsilon})}$-type times; there are also ${O(N^{1/5+\epsilon})}$ algorithms assuming the generalized Riemann hypothesis. See this for details.

## The New Factoring Algorithm

Most factoring algorithms—even Peter Shor’s quantum algorithm—use the idea of choosing a number ${a \ll N}$ and working modulo ${a}$. If ${a}$ and ${N}$ share a factor then we can quickly find it by Euclid’s gcd algorithm, while otherwise the problem of finding ${x,y < a}$ such that ${xy \equiv N \pmod{a}}$ provides useful information. The starting point of Rubinstein’s algorithm is to do this for values ${a}$ that are right near each other, and compare the values obtained.

In its outermost structure, it uses this fact: Suppose ${N = pq}$ where ${p}$ and ${q}$ are primes of the same length, so ${p < q < 2p}$. Suppose ${a \approx N^{1/3}}$. Then if we write $\displaystyle \begin{array}{rcl} p &=& u_1 a + u_0\\ q &=& v_1 a + v_0, \end{array}$

with ${u_0,v_0 < a}$ (note they are relatively prime to ${a}$ or else gcd gives us a factor) we get that ${u_1}$ and ${v_1}$ are fairly small, about ${N^{1/6}}$. They are also roughly bounded by ${p^{1/3}}$ and by ${a^{1/2}}$. Multiples of them will stay small. So now let us work modulo ${a - \delta}$ for ${\delta = 1,2,3\dots}$. We have: $\displaystyle \begin{array}{rcl} p &=& u_1 a + u_0 = u_1(a - \delta) + (u_0 + \delta u_1)\\ q &=& v_1 a + v_0 = v_1(a - \delta) + (v_0 + \delta v_1), \end{array}$

so $\displaystyle N \equiv (u_0 + \delta u_1)(v_0 + \delta v_1) \pmod{a - \delta}.$

Now an issue is that the values ${x_{\delta},y_{\delta}}$ giving ${x_{\delta}y_{\delta} \equiv N \pmod{a - \delta}}$ are far from unique. However, among them as ${\delta = 0,1,2,3\dots}$ will be the collinear points $\displaystyle (u_0,v_0),\;(u_0 + u_1, v_0 + v_1),\;(u_0 + 2u_1,v_0 + 2v_1),\; (u_0 + 3u_1,v_0 + 3v_1).$

The increments ${u_1,v_1}$ are small enough that the points stay close to each other in Euclidean space. If we divide the space into boxes of the right size and offset, they will all stay inside one box. If we can search a box well enough to find them and get all of ${u_0,u_1,v_0,v_1}$ exactly, then we get ${p}$ and ${q}$. Moreover—and this is why I like factoring—once we find them we can verify that we have the right ones; if the check fails and we were fooled by ${\delta}$-many other points we can move on.

Using ${u_1,v_1 < a^{1/2}}$, Rubinstein is able to show that the amortized number of “foolers” in a ${a^{1/2} \times a^{1/2}}$ square is ${O(1)}$. Since there are ${O(a)}$-many such squares and ${a \approx N^{1/3}}$, we get the target runtime. Note this is amortized, not just expected, so the algorithm is deterministic. The most clever and painstaking aspect is that estimates of the asymptotic convergence of solutions of ${xy \equiv N \pmod{a}}$ to uniform distribution on ${[a] \times [a]}$ are needed to get the “amortized” part. The details become complicated, but Rubinstein writes in an engaging top-down manner, and he includes a section on how this might—might—be used to break the sub-exponential barrier.

## A Related Question

I found this work interesting not just because it is a new approach to factoring. I have tried in the past to prove the following type of result: Is there an asymmetric cryptosystem that is breakable if and only if factoring can be done in polynomial time?

I want to replace systems like AES with ones that uses the hardness of factoring for their security. Systems like AES rely on intuition and experimental testing for their security—there is not even a conditional proof that they are secure.

My goal is really trivial. Any public-key system can be viewed as an asymmetric system. But what I want is that the encryption and decryption should be very fast. This is one of the reasons that modern systems use private-key systems to create symmetric keys: performance. Using public-key systems for all messages is too slow.

My idea is not far in spirit from what Rubinstein does in his factoring algorithm. This is what struck me when I first read his paper. His algorithm is slow because it has no idea which “box” to look at. Can we share some secret that would allow ${N}$ to be factored faster, yet still make it hard for those without the secret?

## Open Problems

Can we extend Rubinstein’s algorithm to break the ${N^{1/4}}$ barrier? Can his methods be used as described above to create asymmetric systems that are based on factoring? What is the real cost of factoring?

1. October 18, 2014 10:12 am

Sharing a secret that would allow {N} to be factored faster seems pretty close to the problem of creating a backdoor in an asymmetric encryption system. It isn’t quite the same problem because a backdoor can be very specialized (only one instance is needed), while you want an efficient backdoor with very many instances that can be created relatively quickly. I don’t know if such an animal exists, but it might be a productive direction to look for ideas.

2. October 18, 2014 2:39 pm

In the third paragraph from the end you wrote asymmetric, but it seems you mean symmetric (I.e. private-key)

3. October 19, 2014 3:28 pm

He also shows a subexponential result in his paper?

4. October 19, 2014 7:49 pm

If you want efficient for symmetric-key yet provable, maybe lattices are the way to go?

5. October 21, 2014 1:05 am

Unrelated to factoring, but this may be of interest: “The mate-in-n problem of infinite chess is decidable” http://arxiv.org/abs/1201.5597

• October 23, 2014 1:23 am

Indeed, thanks!

6. October 22, 2014 6:14 pm

I think it’s a bit expansive to suggest block ciphers don’t have any hardness assumptions. Their difficulty relates to assumed difficulty of inverting finite group and field operations, and replacing a block cipher such as AES with a cipher that depends on the implicit hardness of factoring strikes me as counterproductive, especially since we don’t even know if factoring is hard. My intuition places more confidence in the difficulty of breaking established block ciphers than the hardness of factoring.

It may be appropriate for a protocol that uses a public key infrastructure that utilizes factoring as the hardness assumption for key exchange, where if factoring is broken the key is in the clear anyways, but for systems security we should have multiple independent hardness assumptions, so that if one falls (like factoring, or discrete log) the house wont fall down; unless some day we manage to make a provably secure public key infrastructure to replace them all.

7. October 23, 2014 6:53 am

I don’t think any cryptosystem has been proven to be as hard as factoring, but I understand that when working with elliptic curves, breaking ECDH encryption has been proven to be equivalent to solving ECDLP.

Obviously solving DLP would make factoring easy, so any cryptosystem which is provably as hard as DLP is provably at least as hard as factoring.

8. October 23, 2014 8:46 am

@m50d, wouldn’t the Rabin cryptosystem (https://en.wikipedia.org/wiki/Rabin_cryptosystem) satisfy the requirement?

“[T]he Rabin cryptosystem has the advantage that the problem on which it relies has been proved to be as hard as integer factorization […]”

9. October 30, 2014 9:29 am

I think that the Rabin cryptosystem was based on factoring but not still proven ‘semantically secure’ as done later in the Goldwasser and Micali’s seminal paper.
Am I right?
There is also a provable CCA-secure public-key encryption system from factoring of Dennis Hofheinz, Eike Kiltz, Victor Shoup, published at Eurocrypt 2009.

10. June 1, 2016 3:49 pm

I do not want to be optimistic, and of cause I made some mistakes 🙂 Just seems that p-adic numbers emerging in the problem. https://mkatkov.wordpress.com/2016/06/01/semiprime-factorization-and-modular-arithmetics/

11. 