Tale of two approaches to the matrix product problem

Shmuel “Sam” Winograd was a researcher and later a researcher-manager at IBM Yorktown. He is famous for his work with Don Coppersmith on matrix product, but made many other important seminal contributions to the early work on arithmetic complexity.

Today I wish to talk about what Sam did on matrix product before Volker Strassen’s famous work.

I have know Sam for many years, especially through my many visits to IBM Yorktown years ago. Sam was always fun to talk to, and had a habit that few of us have today: he smoked. Where many mathematicians and theorists are said to convert caffeine into theorems, I always thought that Sam made great theorems out of smoke. Sam was not alone: Rick Statman and Forbes Lewis, strong members of the theory community at the time, smoked too.

I avoided smoking I believe because my parents were such heavy smokers—it a generational jump phenomenon. I recall my dad, who eventually replaced smoking cigarettes with smoking a pipe, smoking his favorite pipe while taking his morning shower. One time he turned the wrong way and the relatively cool water hitting the very hot pipe split it almost in half. He still used that pipe for quite a while until it completely broke in half.

## Matrix Product: 2012

Just the other day Virginia Williams gave one of the best talks we have had at Tech in years on her recent work on improving the Coppersmith-Winograd algorithm for matrix product. Oh, if you have given a talk in the last few years at Tech, then your talk was also terrific.

What was so wonderful about her talk was that she, in less than an hour, gave a beautiful overview of the key ideas that have been used since Strassen’s initial breakthrough in 1969. His paper laid out implicitly an approach to matrix product that has been followed ever since: find a solution to a small problem, apply recursion, and mix as needed. Here is a chart, courtesy of Virginia, of some of the exponents ${\omega}$ over the years on matrix product.

Recall ${\omega}$ is an allowed exponent if it is possible to compute the product of two ${n \times n}$ matrices over any ring in ${n^{\omega + \epsilon}}$ arithmetic operations, where ${\epsilon>0}$ is as small as you like. The ${\epsilon}$ allows extra stuff to be avoided and swept under the “rug.”

She got the two questions that always arise in any discussion of matrix product:

${\bullet }$ Are These Algorithms Practical? She smiled at this question. Strassen’s algorithm is apparently used today after computers have increased their speeds dramatically from 1969. But all the later algorithms are galactic in the sense that we use here. That does not decrease their importance in my opinion. These are questions that are simple to state, concern a basic computational problem, and must be understood better, if we are to claim to understand computation. But they will not be run any time soon—they are galactic.

${\bullet }$ What Do You Think ${\omega}$ Really Is? She answered this with a hope that ${\omega}$ should be equal to two. She did not say that her methods, or any methods known, would yield this value. But she seems to think that such a value may be possible. Since the known lower bounds are still linear after over forty years, betting on a better upper bound seems right to me. Virginia has talked to Strassen and he told her that he believes that ${\omega}$ will be strictly larger than two. Who can argue against the creator of the whole field—not me.

I would like to make a final point about Virginia’s work that I think is quite important. She reduces the finding of an exponent for the matrix product problem to a highly non-linear optimization problem. She then runs various computations, some using C/C++ code she wrote, some using libraries that others wrote, and some using Maple. The computations are run at high precision—currently 40 decimal digits.

But the point is that there is no problem with using computers: the final result that she gets is not a computer “proof” that cannot be checked by humans. The computers find a small set of numbers that must satisfy certain constraints. Finding the numbers is really hard, both in the theory sense and in the practical sense. Huge amounts of computing are required. However, once the numbers are found, checking that they satisfy the constraints can be done “by hand.” Okay, I would not want to do that, but they can be checked in principle because the number of values and constraints are small. This is important since she then avoids the issues that surround, for example, the computer proof of the Four Color Problem, and other problems.

## Matrix Product: 1968

The year 1968 was far away from today, not just in years. Many terrible things were happening in the world, but one highlight was on February 1, Lisa Marie Presley, the American singer was born. I was in school—I refuse to say what grade—you can figure it out.

Sam was working on improving the cost of computing a matrix product and looked at the problem in a way that was fundamentally different from Strassen’s later approach. Sam searched for a better algorithm for the general ${n \times n}$ case. He made the interesting discovery that he could save about a factor of two in cost, but this only worked when the matrix entries belonged to a commutative ring. His result is quite clever, quite surprising, quite simple—once you see it—but it used commutativity explicitly .

This is the main issue. Since Sam’s method assumes commutativity it could never be used as a recursive method. Strassen on the other hand discovered a method for multiplying ${2 \times 2}$ matrices in seven multiplications, but assumed no commutativity. This allowed him to apply recursion and get an asymptotic bound that beat cubic.

Here is the idea behind Sam’s construction. He actually worked out how to speed up the computation of inner products of two vectors ${x}$ and ${y}$. He does this by forming products of the form:

$\displaystyle ( X + Y) \times (X' + Y'),$

where ${X,X'}$ are from the ${x}$ vector and ${Y,Y'}$ are from the ${y}$ vector. Note that he gets two products towards the inner product, ${XY'}$ and ${YX'}$, for the price of one—note the order of the latter. But he also gets extra “junk,” namely the self-terms, ${XX'}$ and ${YY'}$. The trick, if we can call it that, is to pre-compute the self-terms so they can be removed easily.

Let’s work out the full details of how he does this in the next section.

The paper is “A new algorithm for inner product,” IEEE Transactions on Computers (1968). Later Richard Brent in his 1970 Master’s thesis, “Algorithms For Matrix Multiplication,” commented on both Strassen’s and Winograd’s work. Here is Sam’s paper, it was just two pages, with no references:

And here is a “blow-up” of the key equations:

For those who cannot read these images, I have trouble, here are the equations, for ${n}$ even:

$\displaystyle \xi = \sum_{j=1}^{n/2} x_{2j-1} \cdot x_{2j}.$

$\displaystyle \eta = \sum_{j=1}^{n/2} y_{2j-1} \cdot y_{2j}.$

and

$\displaystyle (x,y) = \sum_{j=1}^{n/2} (x_{2j-1} + y_{2j}) \cdot (x_{2j}+y_{2j-1}) - \xi - \eta .$

Note what happens in the above calculation. The terms ${x_{k}\cdot y_{k}}$ for ${k}$ odd come from the left times the right, while the ${y_{k}\cdot x_{k}}$ for ${k}$ even come from the left times the right. The terms that are quadratic in ${x}$ or in ${y}$ are eliminated by the subtraction of the ${\xi,\eta}$ terms. Sam can put all this together to save about a factor of two. Pretty neat.

## Open Problems

Winograd’s idea is one of those great insights that did not come to full fruition. But I still wonder if there is some possible way to save it. Since 1969 all work on matrix product, including Sam’s, has been on improving Strassen. In particular all have tried to get better algorithms for matrix product that work in the non-commutative case. Perhaps we should time-travel back to 1968 and see if there is a direct way to attack the commutative case. Can we improve on Sam’s ideas? Can we use new technology that was unknown back in the late 1960′s to improve matrix product for the commutative case?

$\displaystyle \S$

Can we also use similar ideas to get better ${n^{O(1)}}$-time approximations to ${2^n}$-sized inner products, such as those that arise in quantum computation? Our coverage of quantum computation will resume shortly $\dots$

1. February 1, 2012 2:37 pm

“Our coverage of quantum computation will resume shortly…”
Oh, I see, A superbowl commercial. A good one though.

2. February 1, 2012 3:46 pm

Note that the rank (noncommutative complexity) of a bilinear mapping cannot be more than twice its quadratic (commutative) complexity, so that commutativity cannot help much asymptotically.

Keeping that in mind, there have actually been a few improvements to the commutative case: Waksman (“On Winograd’s algorithm for inner products”, IEEE Transactions on Computers 19(4), 360-361, 1970) gave an algorithm that is slightly faster than Winograd’s. Makarov (“An algorithm for multiplying 3×3 matrices”, USSR Computational Mathematics and Mathematical Physics 26, 1987, 179-180) found a way to multiply 3×3 matrices over a commutative rings using 22 scalar multiplications, while the best bound over a general ring is 23. Finally, Drevet et al. (“Optimization techniques for small matrix multiplication”, Theoretical Computer Science 412(22), 2219-2236, 2011) searched for formulae that yield good multiplication counts for small matrices by systematically combining known algorithms.

3. February 1, 2012 5:29 pm

thx god that i choosed economics not math

4. February 3, 2012 8:02 pm

Reblogged this on Between numbers and commented:
A very informative blog post about the different algorithms for computing a product of matrices.

April 6, 2012 2:36 pm

Hi,

I want to clarify something: for positive integers n,p,m, let K_n,p,m denote
the map for multiplying n x p by p x m matrices over some field K.

The rank R(n,p,m) is defined to be the minimal length r >= 1 of a bilinear algorithm A that computes K_n,p,m (as def. in Burgisser et.al.).

Must this algorithm A be noncommutative (the matrix entries need not come from
a commutative ring), or if there is a commutative algorithm A’ that computes K_n,p,m, of smallest length r’ < r, then is R(n,p,m) = r'? Secondly, are commutative bilinear algorithms related to quadratic algorithms / multiplicative complexity.

Thanks for any help.