Rejoining many who have forgotten or overlooked results

 St. Andrews Mac Tutor biographies source

Henry Smith was a mathematician of the 19th century who worked mainly in number theory. He especially did important work on the representation of numbers by various quadratic forms. We have discussed how even in something seemingly settled, like Joseph Lagrange’s theorem that every natural number is representable as a sum of four squares, new questions are always around—especially when one considers complexity.

Today Ken and I want to discuss a private slip of forgetfulness, and how often others may have done the same.

For a variety of reasons we recently were thinking about one of the most basic questions in linear algebra: the solvability of

$\displaystyle Ax = b,$

where ${A}$ and ${b}$ are fixed and ${x}$ is to be determined. Over a field there is a polynomial time algorithm that determines whether there is a solution and finds one if there is. The algorithm is, of course, Gaussian elimination. It is named after the famous Carl Gauss, although it was cited already by Chinese mathematicians in 179 AD. It was rediscovered by Isaac Newton, called “the most natural way [but…]” by Leonhard Euler six years before Gauss was born, and molded by many of their contemporaries including Lagrange.

What Gauss did was streamline the notation for use by professional human computers—as he wrote in Latin, per eliminationem vulgarem. A recent historical article by Joseph Grcar has a table of the whole history, listing Camille Jordan the first to say “Gaussian elimination” in German in 1895, and Alan Turing the first to juxtapose Gauss’s name and “elimination” in English—in 1948—with non-human computers in mind.

Neither this nor a second article by Grcar mentions Smith, however.

## A Quiz

We are always interested in the history of results—that is a lot of what we are about. But we mainly focus on the actual theorems. So let’s look at

$\displaystyle Ax = b$

in some detail. Before we start, we’ve noticed a strange shift over the years. In old papers the equation is written

$\displaystyle AX = B.$

In more modern papers only a matrix is in capitals and vectors are lowercase. Just an observation. It appears that in the past they were actually using correct notation since they were interested in the more general case where ${x}$ could be a matrix not just a vector. So to be fair it is not a notational shift, but a shift in interest to just the case where the unknown is a vector.

Back to ${Ax=b}$ where ${A}$ and ${b}$ are integer-valued. Consider the following questions:

1. Is there an algorithm for finding ${x}$ in polynomial time over the rationals?

2. Does it really work in polynomial time?

3. Really?

4. What happens if we restrict ${x}$ to be an integer vector?

If we ask people these questions they will answer the first three with: Gaussian Elimination, yes, and yes. The fourth question provoked several puzzled looks from both students and colleagues. They were not sure. Diophantine in an unbounded number of variables but linear, hmmm…

The embarrassment is how we were confused between the first three questions and the last. It was only for a short while, but it is an example of the forgetting of old results. So let’s take a look at these questions.

## Bits of Trouble

We all know that Gaussian elimination (GE) takes only ${O(n^{3})}$ arithmetic operations to find ${x}$ in ${Ax=b}$, provided a solution exists. So what is the problem? The issue is that during execution, the intermediate results are not obviously bounded sufficiently. As noted in this StackExchange thread, a simple lower-triangular ${n \times n}$ matrix with 2’s on the diagonal and all 1’s underneath blows up to size ${2^{n-1}}$ bits when the straightforward algorithm is applied.

To run in time ${n^{O(1)}}$, an algorithm must not only stay within ${\mathsf{poly}(n)}$ arithmetic operations, those operations must also operate only with values that have ${\mathsf{poly}(n)}$ bits. This is not obvious for GE. How can we fix the claim that the problem is in ${\mathsf{P}}$, polynomial time?

Already in 1968, Erwin Bareiss proved a polynomial-time algorithm. He showed that if all initial values are at most ${L}$ bits then the complexity is at most

$\displaystyle O(n^{5}L^{2} (\log(n)^{2} + L^{2})).$

There is a simple way to see how to prove this via the Chinese Remainder Theorem (CRT)—my favorite theorem. The general strategy is by Cramer’s rule, which requires evaluating polynomially-many determinants involving the entries of ${A}$ and ${b}$, which are integers. It is easy to bound the size of each determinant, and by applying CRT we can get each of them exactly. We need only select distinct primes ${p_{1},\dots,p_{m}}$ so that we can run GE to get each determinant modulo ${p_{i}}$. This can be made to work and yields a true polynomial time algorithm.

## Smith Normal Form

Let’s now consider the case ${Ax=b}$ where we want ${x}$ to an integer vector. Note if ${x}$ is unique then we can use GE to get the rational answer and then see if it is integral. But if the system is under-determined then we are not able to do something so simple.

Enter Smith. One of his major results was to show that an integer matrix could be put into a certain form that allowed the solution of such equations. The theorem holds over any principal ideal domain (PID), which means it holds not only for the integers but also the ring of polynomials in one variable over the integers or any field.

Theorem 1 For every ${m \times n}$ matrix ${A}$ of determinantal rank ${r}$ over a PID, there are square matrices ${S,T}$ making ${D = SAT}$ into a diagonal matrix with nonzero elements ${d_{1,1},\dots,d_{r,r}}$ such that for all ${i}$, ${d_{i-1,i-1}}$ divides ${d_{i,i}}$. The matrices ${S,T}$ are invertible in the PID, and ${D}$, called the Smith normal form of ${A}$, is unique up to multiplication by units in the PID.

The proof works by iteration. The PID comes equipped with a well-founded measure of “degree” which is just absolute value for the integers. Permute rows and columns to put an element of minimum degree in the upper-left corner as ${a_{1,1}}$. If there is some ${a_{1,j}}$ in the top row that it doesn’t divide, then we have ${a_{1,j} = \beta a_{1,1} + r}$ with ${r}$ of lower degree than ${a_{1,1}}$. Hence subtracting off ${\beta}$ times the first column and swapping columns ${1}$ and ${j}$ makes progress. The same is done with rows until we get all multiples of ${a_{1,1}}$ in the first column as well, whereupon we can zero out the rest of the first row and column.

We repeat for ${a_{2,2}}$ and so forth until all non-diagonal entries are zero. If ${a_{1,1}}$ then does not divide some ${a_{j,j}}$, we add row ${j}$ to row 1, reduce and swap columns 1 and ${j}$ as before, and ultimately get ${a_{1,1}}$ dividing all of ${a_{j,1},a_{1,j},a_{j,j}}$. Zeroing out the two off-diagonal elements and repeating leaves a diagonal matrix in which ${a_{1,1}}$ does divide the other elements. This property is left undisturbed by subsequent operations on the other elements—since no division is ever done—and so the process completes.

This proof gives an algorithm, but is it polynomial? During the formation of the first diagonal matrix intermediate values in higher rows and columns can grow. However, the matrices ${S}$ and ${T}$ are products of elementary row and column operations that—for reasons similar to how Euclid’s gcd algorithm can be done with iterated subtraction—retain inverses over the PID.

## What I Forgot

Since I have some friends of whom I can ask quick questions, I asked one and quickly received a brief answer. It did little more than second my puzzlement, but it spurred me to recall the magic words “Smith normal form” which I put in a followup. Let’s see how this solves the problem of whether ${Ax = b}$ has a solution ${x}$ in integers. We have ${SAT = D}$ where ${T}$ in particular has an integer inverse. Thus ${SAx = Sb}$ and so ${DT^{-1}x = Sb}$. Now ${y = T^{-1}x}$ and ${c = Sb}$ are integer vectors. Letting ${y'}$ and ${c'}$ stand for their top ${r}$-many elements, and ${D'}$ for the top-left ${r \times r}$ part of ${D}$, we must have

$\displaystyle D'y' = c'.$

This is solvable for ${y'}$ in integers if and only if ${c(i)}$ is an integer multiple of ${d_{i,j}}$, for ${1 \leq i \leq r}$. Since ${D'}$ is invertible, the solution is unique in ${y'}$, and the remainder of ${y}$ is immaterial. Indeed the method extends to having matrices ${X}$ and ${B}$ in place of the vectors ${x}$ and ${b}$—those old papers had a point.

Ken was copied on the e-mails, and before using his brain to try to work the above out, he tried a Google search. Its top hit was a 2009 post on this blog.

This post featured Ravi Kannan who has been a friend for a long time. We were colleagues at Berkeley and have stayed close ever since. The post was on our collaboration to find a truly polynomial-time algorithm for a different problem about matrices: given rational ${A,x,y}$, does there exist ${k}$ such that ${A^k x = y}$? But the post of course hit the search terms about Smith normal form and polynomial time, and Ken noted laconically in his reply:

It was Ravi Kannan’s PhD topic.

Indeed, my old post tells the story of how Ravi spoke on this thesis as a postdoc at Berkeley, and upon being challenged in the talk on weren’t polynomial algorithms for Smith already known, had sought refuge in my office. Oh well. His 1979 journal paper took care to cite authorities in both the abstract and introduction:

As — pointed out, none of these algorithms is known to be polynomial. In transforming an integer matrix into Smith or Hermite normal form using known techniques, the number of digits of intermediate numbers does not appear to be bounded by a polynomial in the length of the input data as was pointed out by — and —…

Thus the answer to my question has been yes since 1979, and I knew it as recently as 2009. Or if you take the ${AX = B}$ matrix case as primary, maybe I didn’t know it—the paper linked above for that case dates only to 2009. It helps that there are blogs and resources like StackExchange and MathOverflow to look things up by keyword—and it seems to matter less whether you’ve actually authored said materials. At least in regard to Smith I had company.

Charles Hermite—he of the other normal form analyzed by Ravi—was also an expert on quadratic forms and was an influential member of France’s Academy of Sciences. The Academy periodically staged a Grand Prix competition in mathematics and the sciences, and in line with our recent discussion of the Breakthrough Prizes, it was awarded for work in the future.

Or at least it was supposed to be. One day in 1882 while reading the Comptes Rendus of the Academy, Smith came across the announcement for the next competition, and found it rather familiar. It was about the number and manner of ways of decomposing an integer into a sum of 5 squares. Of course Lagrange had shown how to do 4, but the question was how much more freedom came from 5. Well Smith had answered the question for all ${k \geq 5}$ in a paper published in 1867 in the proceedings of the British Royal Society.

He wrote to Hermite calling attention to his precedent. He was assured in reply that the prize committee had not been aware of this and his other related papers. To spare embarrassment and preserve the spirit of a competition already underway, Hermite requested Smith to play along and submit his answer under the competition’s blind-review process. It seems also that Hermite kept this news from the committee, both the precedent and the submission.

Smith agreed and got his memoir in before the deadline, but unfortunately passed away before he was announced as a joint winner. Judging from Wikipedia’s telling of the story, it appears Smith played along so graciously that his reply did not mention or otherwise cause the committee to recall his published papers. When the situation was finally realized and Hermite let on about Smith’s communication to him, Hermite was asked why he had not brought it to the committee’s notice. His reply boiled down to:

I forgot.

What’s not forgotten is how the prize launched the career of the other winner, the 18-year-old Hermann Minkowski.

## Open Problems

Did you know that solving for integer vectors is in polynomial time? I did, but somehow forgot. Oh well.

[word changes for Turing and Kannan]

January 15, 2015 2:44 am

Reblogged this on Pink Iguana.

2. January 15, 2015 2:23 pm

Reblogged this on Kelly J. Rose and commented:
This is a wonderful discussion on some pretty straightforward math.

January 16, 2015 9:14 am

I’ve often seen the discovery of a polynomial-time algorithm for Gaussian elimination attributed to Edmonds (1967), such as by Schrijver in Theory of Linear and Integer Programming. How does this relate to the Bareiss (1968) result?

4. January 16, 2015 11:08 am

the field of math (and science in general) is so large and so specialized that results are forgotten and rediscovered in different ways, and its interesting that even in the 1800s this was the case also, whereas the body of math is now several times larger. there are very few sociologists studying this type of cultural aspect. it is estimated sometimes that there may be more scientists or mathematicians alive now than ever lived in the entire history of the human race…..

5. January 18, 2015 12:58 pm

“a simple lower-triangular $n \times n$ matrix with 2’s on the diagonal and all 1’s underneath blows up to size $2^{n-1}$ bits when the straightforward algorithm is applied”

I don’t think that this example actually generates the claimed exponential blow-up. There is a slightly more complicated example in an ISSAC 97 paper “On the worst case of Gaussian elimination”, by Fang and Havas.

January 18, 2015 5:12 pm

Consider the problem: Given A ad b, where A is an n by n matrix with positive integers and b is a n by 1 vector of positive integers. Find, if it exists, a n by 1 vector x of positive integers such that Ax = b.

subset sum is a special case of this problem where A has n identical rows of weights w1, w2, …, wn and b is c (1, 1, …, 1)^T.

So clearly it seems your problem (4) is NP-hard.

7. January 19, 2015 1:41 pm

For matrices over a finite field there is a nice connection between the complexity of matrix reduction and the diameter of Cayley graphs for GL(n,q), when the complexity is measured in terms of the number of row operations.
In this paper I and two of my coauthors presented an algorithm which is faster than Gaussian elimination for some finite fields.
http://www.sciencedirect.com/science/article/pii/S0196885807000711
(There is also a freely available preprint version)

More recently this algorithm was shown to be essentially optimal by Demetres Christofides
https://www.mittag-leffler.se/preprints/files/IML-1314s-12.pdf

8. February 2, 2015 12:28 am

Reblogged this on Pathological Handwaving and commented:
AX=b vs Ax=b
This explains so much.