A provable example of the linearization method

Nicolas Courtois is one of the leading experts on attacks on practical crypto-systems. He has worked on trying to break, for example, various symmetric cipher systems, including AES. Since 2000, AES, the Advanced Encryption Standard, is the symmetric cipher approved by NIST. It is claimed that it will be safe for the next 200 years. See Courtois’ page for up to date information on AES and related issues.

Today I would like to talk about a heuristic technology that has been quite powerful in attacking crypto-systems—called linearization. There is a great new result that there is a class of problems that can be provably solved by this heuristic.

There are two cultures in cryptography: the practical cryptography researchers and the theoretical cryptography researchers. Let’s call them the ${\cal P}$‘s and ${\cal T}$‘s. Both are important, both study related problems, they share techniques, and many researchers work in both areas. For example, the Siegenthaler inequality has been used by both communities—named for Thomas Siegenthaler. It relates the degree of the a boolean function and the size of its smallest non-zero Fourier coefficient.

Courtois is certainly one of the leaders in ${{\cal P}}$, and Sanjeev Arora is one of the leaders in ${{\cal T}}$. Our discussion is about a method that both groups have used in the past. However, I believe it is fair to say that the method, called linearization, has been mostly used by those in ${{\cal P}}$. What is exciting to report is that Arora and his student Rong Ge have a paper that shows how researchers in ${{\cal T}}$ can add to our global understanding of this important method. Perhaps their ideas, which are interesting on their own, can help shed light on ${{\cal P}}$. We will see.

Linearization Method

The linearization method is based on the simple idea of converting a quadratic equation into a linear equation. For instance,

$\displaystyle \sum_{i,j} a_{ij} \ x_i y_j = b$

is mapped to

$\displaystyle \sum_{i,j} a_{ij} \ V_{ij} = b.$

There is good news and bad news here: The good news that the new equation is linear in the variables ${V_{ij}}$; the bad news is that the new equation has lost some information. The following is now possible,

$\displaystyle V_{13} = 1, V_{24} = 1, V_{12} = 0.$

If we go back to the actual variables this is impossible, since the first two equations make ${x_1}$ and ${x_2}$ non-zero, yet the last equation means that one of these must be zero. This is a linear equation, and since linear is “good” this has the potential to change a hard problem into simple one.

The method typically takes a set of quadratic equations and converts them into a set of linear equations. Then these equations are solved by Gaussian elimination. Any solution of the quadratic equations is of course a solution of the linear ones; however, the linear equations may have many extra solutions. The method works provided the linear equations have few extra solutions.

We can try the same idea on equations of any degree ${d}$. We replace terms of degree at most ${d}$ by variables ${V_S}$ where ${S}$ is a set of size up to ${d}$. Of course, for greater ${d}$ we have many more ways to lose information. To keep the number of extra solutions down in proportion, we need many more equations. Note also that we could rewrite our higher-degree equations as quadratic by introducing new variables, but doing so might not improve the situation.

Heuristically, we can judge how many initial equations we need to have a good chance of making this work. In practice, in ${\cal P}$ this works sometimes. In theory, in ${\cal T}$ there are certain cases where this can be proved to work. That is the insight of Arora and Ge.

Linearization In Practice

The key technical insight is that AES has a nice algebraic structure. Nice algebraic structures can hurt crypto-systems, especially symmetric ciphers. Bruce Schneier and Niels Ferguson have written:

We have one criticism of AES: we don’t quite trust the security ${\dots}$ What concerns us the most about AES is its simple algebraic structure ${\dots}$ No other block cipher we know of has such a simple algebraic representation. We have no idea whether this leads to an attack or not, but not knowing is reason enough to be skeptical about the use of AES.

One problem is that there are attacks on AES that reduce to solving many quadratic equations—apply linearization. Solving such systems of equations in general is NP-hard, and so the hope is that AES is safe. But in practical cryptography one must look at the actual systems. In this case there are about 8000 equations with 1600 variables for the 128-bit AES.

These are apparently not enough equations to make linearization to work directly on AES. An extension of linearization called the XSL attack was created in 2002 by researchers Nicolas Courtois and Josef Pieprzyk to try to overcome this roadblock. The XSL attack is based on linearization and some further tricks. There is an on-going debate whether these or related methods will one day break AES.

Enter Arora and Ge. They have a pretty paper on using linearization methods to solve a learning problem. I think this paper and their results are important. They can prove that linearization actually works on their learning problem. This is in sharp contrast where in the practical work on ciphers like AES there are no proofs yet. There are ample experiments to suggest that certain attacks will or will not work. But no proofs.

Learning Noisy Parity

Arora and Ge work on the problem of learning parities with noise. This is a classic problem, well studied, and extremely difficult. The standard model is that there is a secret boolean vector ${s}$ of some length ${n}$. There is a “box” so that when you press the “button” the following happens:

• The system generates ${x \in \{0,1\}^n}$ uniformly at random and shows you ${x}$.
• The system generates a hidden “noise bit” ${r}$ under a non-uniform distribution, such as ${r = 1}$ happening with probability ${0.30}$.
• The system shows you the bit ${b = x\cdot s + r}$.

After many button presses, one gets a system of linear equations, most of which are correct. However, a substantial fraction of them are incorrect, and this makes solving the linear system very difficult.

Arora and Ge modify the problem in a interesting manner. Each time you press the button you get, say, ten ${x}$‘s and ten corresponding ${b}$‘s. You are promised that at most 3 are incorrect. Note this is the same on average, but it is not exactly the same as the original problem.

The surprise is that they can solve this new problem, which they call structured noise. The problem reminds me some of list decoding, and it seems to be related to that in some way.

Each time the button is pressed we get ${m}$ values. The ${d}$ incorrect solutions collectively satisfy a certain noise polynomial ${P}$. For example, they may satisfy a certain polynomial ${P_3}$ that is zero on all binary ${m}$-tuples of Hamming weight at most ${3}$. Note that ${P_3}$ can have degree ${4}$. The one condition on their theorem is that the correct pattern (or some correct pattern) cannot be written as the GF(2) sum of two patterns on which ${P}$ is zero. This rules out arbitrary noise patterns.

In their model the original variables stand for bits ${s_i}$ of the secret vector ${s}$. The new variables after the linearization have the form ${s_J}$ for sets ${J}$ of size up to ${d+1}$. A solution to those variables is good provided ${s_{\{i\}} = s_i}$ for all ${i}$. In this model they give a randomized algorithm ${A}$ and prove:

Theorem: If there are ${m}$ values and ${n}$ variables, then ${A}$ runs in ${O(n^m)}$ time and finds a good solution with high probability.

Of course for constant ${m}$—which entails constant ${d}$—this yields a probabilistic polynomial-time algorithm. I really like this paper, not only for proving a nice result, but for the type of result that they get. We need to prove more results of the form: the following heuristic method can be proved to work in the case of ${\dots}$ One of the mysteries of complexity theory is that some heuristics work very well in practice, and we do not understand why. I think that their result helps unravel this.

Open Problems

Can we prove that linearization works on more problems? Can we prove that other heuristics also work?

Is there connection between the beautiful work of Arora-Ge and crypto-analysis? Does the structure in AES allow an attack in which a certain proportion of “hits” are bound to be correct, and the non-hits satisfy a noise polynomial of suitable low degree? Can ${{\cal T}}$ help ${{\cal P}}$?

December 13, 2010 9:02 am

This is a great post. Unfortunately, the richness and heterogeneity of modern cryptography have reduced communication between the P and T worlds. Your post makes a good case for getting the communities to talk more to each other.

> The problem reminds me some of list decoding, and it seems to be related to that in some way.

There is a close relationship between learning parity with noise and coding: in a nutshell, we can think of the box as giving us some kind of random access to a noisy version of the Hadamard encoding of the secret s. The Hadamard encoding (for blog readers who aren’t familiar with it) is a list of all the inner products of the form $mod 2$, where x ranges over n-bit vectors.

If you can *choose* which random vectors (think “positions of the codeword”) $x$ are used by the box, then you are in the “usual” model of local decoding. Because you can use the box to verify a guess for $s$, a list-decoding algorithm suffices. This is exactly what Goldreich and Levin provided for their hardcore predicates result, and what Kushilevitz and Mansour provided for their result on finding large Fourier coefficients of a boolean function. Their algorithms access the codeword in a random but carefully correlated set of locations.

What seems to make learning parity with noise (LPN) hard is that you get access to the codeword in a uniformly random set of positions. This breaks the usual decoding algorithms.

Arora and Ruan seem to get around this obstacle by assuming a hard bound on how many errors are present in a given small block of positions. This appears (?) to open the door for an “error locator” polynomial approach very similar to the Berlekamp-Welch decoder for Reed-Solomon and BCH codes (see, e.g., Madhu Sudan’s lecture notes), which also uses a linearization idea similar to what you describe in the post. I haven’t actually read the Arora-Ruan paper yet, though, so my superficial impression may be way off.

December 13, 2010 9:04 am

In my previous comment, the math got screwed up by the HTML parser. I meant to write: The Hadamard encoding of an n-bit string $s$ is a list of all the inner products of the form $(x . s) mod 2$, where x ranges over n-bit vectors.

December 13, 2010 10:51 am

Again I would like to mention Daniel A. Spielman and Shang-Hua Teng 2004 “Smoothed Analysis: Why The Simplex Algorithm Usually Takes Polynomial Time.” (http://www-math.mit.edu/~spielman/simplex/), which shows rigorously why the Simplex Algorithm for solving linear equations performs so excellent in practice although it has such bad worst case bounds in theory.

P.S. This comment did not seem to appear the first time I posted it, perhaps it looked too much like spam with so little text around the link.

December 13, 2010 11:13 am

All,

WordPress has a decent spam filter, but I sometimes get alot of real spam. I check ever single spammed email and try to correct any mistakes. I apologize if sometiems takes a bit of time to fix. Also I do sometimes make an error. Sorry for this.

• December 13, 2010 12:41 pm

I tried to break this spam filter called NoSpamNX but it seems to be pretty good:
http://rankexploits.com/musings/2010/diet-spam/

December 14, 2010 3:56 am

@RB
The real problem is that Dick Lipton has no control over his anti-spam nor over “implicit” features of WordPress because he relies on WordPress hosting not his own.
Less hassles but there is a price to pay (beyond the hosting fees 😉 ).
I would not deem acceptable that someone tweaks my blog input behind my back, other people have other choices…

BTW, I would not call the default WordPress anti-spam “decent”, rather lame and meddling.

December 13, 2010 10:52 am

Again I would like to mention Daniel A. Spielman and Shang-Hua Teng 2004 “Smoothed Analysis: Why The Simplex Algorithm Usually Takes Polynomial Time.”, which shows rigorously why the Simplex Algorithm for solving linear equations performs so excellent in practice although it has such bad worst case bounds in theory.

December 13, 2010 4:17 pm

Linearization is also at the heart of the list decoding algorithms for Reed-Solomon (RS) and related codes. There are of course other ideas involved but I think the original idea is linearizartion.

This is most apparent in the “base case,” i.e. in the case of unique decoding of RS codes. In particular, here is the Gemmell-Sudan interpretation of the Berlekamp-Welch algorithm. (Recall that RS codes of dimension $k$ and block length $n\le q$ is the evaluation of polynomials over $\mathbb{F}_q$ of degree at most $k-1$ over some fixed subset $\{\alpha_1,\dots,\alpha_n\}\subseteq \mathbb{F}_q$.)

Given a received word $(y_1,\dots,y_n)$, let $P(X)$ correspond to the (unique) closest codeword of degree at most $k-1$. Let $E(X)$ be the error locator polynomial, i.e. $E(\alpha_i)=0$ if and only if $P(\alpha_i)\neq y_i$. Now define $Q(X,Y)= (Y-P(X))\cdot E(X)$. Note that $Q(\alpha_i,y_i)=0$ for every $i\in [n]$. If we think of the coefficients of $P(X), Q(X)$ as variables in the above $n$ equations then we have quadratic equations. If we can solve these equations for these unknowns, then we can recover $P(X)$.

The Berlekamp-Welch algorithm then proceeds to think of $P(X)\cdot E(X)$ as another univariate polynomial $A(X)$ of appropriate degree. Now define $Q'(X,Y)= Y\cdot E(X) - A(X)$ and insist on $Q'(\alpha_i,y_i)=0$ for every $i\in [n]$. It so turns out that for unique decoding all the parameters work out: one can solve the $n$ linear equations above and compute $P(X)=A(X)/E(X)$.

The genius of Sudan of course was to generalize the idea above to list decoding RS codes (and linearization in the “interpolation” step still plays an important role)

December 13, 2010 11:26 pm

I agree with Atri and others. I cut my teeth on the PCP constructions and the linearization technique is embedded in my mind from that early exposure. I think in the simplest occurs even in our 1992 ALMSS paper, in the constant bit verifier (which is based upon the Hadamard code).

Thank you Dick for your kind words. By the way, our paper also has some results for Regev’s LWE problem which may be of interest to the T’s at least.

6. December 14, 2010 1:45 am

I think that T can help P.

Actually, this is only tangentially related, but I will shamelessly plug an article I wrote some while ago. The basic thesis of the article is that P, while important and composed of very skilled individuals, are erecting cryptosystems (specifically crypto primitives and symmetric ciphers) much like we’ve historically erected buildings (without proper stress/strain analysis). In other words, we get them to work well, but not provably well.

I make a suggested fix, which I wouldn’t defend very heavily right now but still think is interesting. In the response to a question I replied that elliptic curve crypto would work in a post-quantum world (which I later discovered is false – ECC can be solved by fast quantum hidden abelian subgroup finders similar to those used in Shor’s Algorithm). Overall I still agree strongly with the article that we should look to find hardness garuntees for cryptographic primitives. Anyway, the link to the article is here: http://procrast-nation.com/?p=3647

December 14, 2010 10:43 am

The relationship between T’s and P’s does not have to be restricted to cryptanalysis. They can (and should) also share ideas in the area of cryptographic design. What we seem to lack is a theoretically sound methodology for assessing security of practical constructs. From the T’s perspective, lack of concrete cryptanalysis by the P’s may not be sufficient to ascertain the security of a cryptographic construction. From the P’s perspective, the asymptotic analysis that is routinely applied by the T’s is considered to be too coarse to be meaningful (though it may sometimes be applicable for cryptanalysis, as demonstrated by this post).

A couple of days ago NIST announced the list of 5 finalists in the SHA-3 competition. If all goes as planned, one of these 5 functions will be eventually chosen to be the next standard for secure hashing, and is expected to be extensively used in practice. As pointed out by Jonathan Katz WordPress Codex, it seems like the choice of finalists came down more to issues of efficiency than issues of security, as there have not been any serious attacks on any of the 14 semi-finalists (to quote NIST’s announcement: “Security was our greatest concern, and we took this very seriously, but none of these candidates was clearly broken.”). This seems to put NIST in a position in which they may find it difficult to differentiate between candidates in terms of security (from NIST’s report: “in some cases [we] did not select algorithms … largely because something about them made us ‘nervous,’ even though we knew of no clear attack”.)

We will most lilely end up with a SHA-3 that will not be broken in its lifetime, but at the same time we may never know what it is that makes the chosen function more secure than the other candidates. Hopefully, the T’s and the P’s will eventually join forces and try to develop a meaningful framework that would bridge these two worlds.

8. December 15, 2010 10:45 am

Dick asks: Can we prove that linearization works on more problems?

This is another of Dick’s “Great Questions” of 21st century mathematics. They’re “great” in the sense of Neils Bohr: when we evert these questions, we obtain more great questions. 🙂

One natural eversion of Dick’s question is: Can we prove that linearization fools us on more problems?

Euclidean geometry is a classic example in which linearization fools us: by elevating locally linear geometric principles to globally linear geometric principles, we arrive at a misleading picture of classical geometry … misleading in that the induced notion of natural state-space manifolds is too uniform.

Quantum geometry too suffers from the excessive embrace linearization: by elevating locally linear quantum geometry (aka Kählerian state-spaces) to globally linear quantum geometry (aka Hilbert state-space) principles, we obtain a misleadingly simplified picture of quantum dynamics … misleading (again) in the sense that the induced notion of natural state-space manifolds is too uniform.

Thus, with regard to issues of mathematical linearity, naturality, and uniformity, a good strategy is encapsulated in Andy Groves’ maxim: “Let linearity reign, then rein-in linearity !” 🙂

As is well-known, Groves’ strategy of creative mixing is accompanied (necessarily) by discomfiting-yet-creative adjustments in shared notions of naturality and uniformity. Groves has written extensively on this topic in the context of science, technology, and engineering; see for example Groves’ highly interesting book (with Robert Burgelman) Strategy Is Destiny.

Moreover, Groves-style mixed strategies are characteristic not only of successful mathematical enterprises, but of successful STEM enterprises broadly … in every previous century, and (hopefully!) in our 21st century too.

9. December 16, 2010 4:34 am

Hi, I wish to correct a few mistakes in your post, which I believe are widespread in the cryptographic community.

1) there is a theory about polynomial system solving: Gröbner basis theory. In the case of a single solution, computing a Gröbner basis is equivalent to solving.

2) XL has been shown a redundant version of a Gröbner basis algorithm called F4 http://www.springerlink.com/content/h1ejp6nvvr4bd74c/

3) XSL was shown to never terminate as presented http://www.springerlink.com/content/j33291m21n83/#section=568442&page=1 Thus there is no ongoing debate whether XSL will break AES: it cannot.

4) We do have complexity results about Gröbner bases under the assumption that “semi-regular sequences” exist and that random systems behave like semi-regular sequences. http://www-calfor.lip6.fr/ICPSS/papers/43BF/43bf.htm It is unclear whether GB techniques or similar techniques can break AES although it does not look likely.

5) There is no indication that AES is indeed week because of the algebraic structure.

6) If one has roughly m = n^2 equations in n unknowns then a Gröbner basis algorithm essentially does linearisation.

• December 16, 2010 8:49 am

Martinralbrecht: We do have complexity results about Gröbner bases …

Martin, we looked into Gröbner bases as a computational tool for solving a canonical problem in quantum dynamics: given a general vector |ψ⟩ in a Hilbert space, either exhibit a (computationally compressed) matrix product state that approximates |ψ⟩ , or establish that no such compressed representation exists (the similarity to guessing a cryptographic key is clear).

In our hands (admittedly clumsy hands) Gröbner basis methods were too slow for practical high-dimension state compression … they performed much less well than geometric “downhill flow” search methods.

It seemed to us (maybe our impression was mistaken), that the performance of Gröbner basis methods was too uniform … that the performance of Gröbner basis methods was uniformly as efficient (approximately) as their worst-case performance. While in contrast, geometric methods were highly efficient (in our hands) on the state-fitting cases that arise naturally in practical quantum simulations.

In summary, the computational efficiency of geometric search methods was markedly non-uniform … but in a favorable way. This helped us appreciate that, in some cases, a too-rigid insistence on uniform performance might cause us to overlook algorithms that perform exceedingly well FAPP (“for all practical purposes”).

• December 16, 2010 9:27 am

Hi, I’m not saying that Gröbner bases are the best algorithm in all cases. If you found another approach that works well for you, that’s great! But GBs are clearly preferable over linearisation, if only for the fact that linearisation is just a special case. XL too is a Gröbner basis algorithm, but a redundant one.

Also, GB algorithms do not behave uniform at all. The worst case complexity is double exponential (!) but if you have about n^2 quadratic polynomials in n unknowns the complexity is polynomial (about n^6). Finally, the performance of Gröbner basis algorithms varies greatly depending on the implementation (and other parameters such as the term ordering, selection strategy etc.). Which implementation did you use?

• December 16, 2010 9:56 am

Martin, thank you for that fine reply. We used Mathematica;s GroebnerBasis[] function (version 5.0) … and I freely confess that we did not have a solid understanding of the internal workings of this function.

In early trials on low-dimension test-cases for quantum state-compression, the geometric flow methods outperformed the Gröbner basis methods pretty substantially, and so we didn’t pursue the Gröbner methods.

Your remarks on the non-uniform performance of Gröbner methods make me think that our choice may well have been premature … that perhaps we should reexamine Gröbner methods with a view toward tuning them for quantum simulation purposes.

It is my understanding that quantum chemists are vigorously pursing (what may amount to) important practical applications of this line of research … see (as a start) Beylkin, Mohlenkamp , and Perez (2007) Approximating a Wavefunction as an Unconstrained Sum of Slater Determinants.

10. December 16, 2010 10:21 am

> We used Mathematica;s GroebnerBasis[] function (version 5.0)

There’s your problem right there (there might be other problems as well though). I know of no-one who would consider Mathematica as a serious contender in the area of Gröbner basis computation. Your options are as far as I know: Maple’s FGb, Magma or Singular which is free (e.g. via Sage, full disclosure: I’m a Sage developer http://www.sagemath.org). For many problems Magma is the fastest, but not for all.

• December 17, 2010 4:27 am

Sorry for replying to myself, but thinking about this some more: I assume you are computing over the rational numbers (sorry, I don’t know anything about your field)? Over the rationals, what is probably killing you is coefficient growth during the computation even if the solution isn’t that big. I’m usually working with finite fields and thus don’t have that problem. Especially, if your systems have very few solutions (even unique) then you could try a multi-modular approach: compute modulo a few primes and reconstruct the result using the Chinese Remainder Theorem. Magma, I believe, does that by default. Singular just implemented this strategy. I have no clue about Maple.