Galactic Sorting Networks
An improved sorting network to appear at STOC 2014
Teaching page source
Michael Goodrich is a Chancellor’s Professor at the University of California at Irvine, and recently served a year as Chair of the UCI Computer Science Department. He has designed algorithms that cover all of the area spanned by axes of cleverness and high performance. This includes attention to algorithms that we have called (clever but) galactic for having enormous constant factors and/or high exponents that forestall practical implementation. An example is his paper last year with his colleague Pawel Pszona, “Cole’s Parametric Search Technique Made Practical,” which greatly tunes up an algorithm-improvement technique of Richard Cole. Now he has a new paper taking a notorious behemoth head-on.
Today Ken and I wish to talk about Michael’s recent work improving on the size of the famous AKS sorting network.
The initials AKS have become legion twice over in complexity. They are engraved on Wikipedia for the deterministic polynomial-time primality test of Manindra Agrawal, Neeraj Kayal, and Nitin Saxena proved in 2002. Two decades earlier, however, they designated the sorting network devised by Miklós Ajtai, János Komlós, and Endre Szemerédi. For sorting elements these networks have asymptotically optimal size and depth, but as noted by Wikipedia,
…[they have] little or no practical application because of the large linear constants hidden by the Big-O notation.
Michael begs to differ, and will explain during his talk at the STOC conference being organized May 31—June 3 by Columbia University, with workshops there and the main conference on 34th Street. His paper is one of the few that is single authored, one of the few that is on sorting, and is titled “Zig-zag Sort: A Simple Deterministic Data-Oblivious Sorting Algorithm Running in Time.”
A sorting network is one that sorts in an oblivious manner, meaning that the comparisons done at each step are the same for all inputs. The following diagram is used to represent a small such network: the comparisons are the vertical lines, the inputs enter on the left, and each vertical line forms a swap with the larger value moving up and the lower going down.
Usually “oblivious” is not a great thing to be. But for sorting algorithms it is quite a useful property: it makes the algorithm simpler to describe, it makes it easier to implement in hardware and even software, and makes it useful in many security applications. The latter follows since the operations at each step are the same independent of the input; thus, someone watching the comparison operations, not their values, can learn nothing from the algorithm’s execution. Of course there is an interesting cost to being oblivious: the algorithm runs in the same time even if the original input is already sorted. I find the latter an interesting issue, since in practice the input may be in sorted order or at least close to this. But that is another question.
The best sorting network for many years was due to Kenneth Batcher and has size and depth , where is the number of items to be sorted. The constant hidden behind the notation is quite small. AKS brilliantly solved the longstanding open question of whether one needs only comparisons, getting depth to boot. But in 1998, Don Knuth noted that:
Batcher’s method is much better, unless exceeds the total memory capacity of all computers on earth!
As we say here at GLL, the AKS sorting network is galactic: it needs that be larger than or so to finally be smaller than Batcher’s network for items. Hence Knuth’s point—see here for some details. Also there is a whole book on sorting networks by Sherenaz Al-Haj Baddar and Batcher.
I must add that before 1983 the search for a better method than Batcher was a “hot” research problem. I worked on it trying to use random networks, but could not make any progress. Even in 1971, David Van Voorhis could beat Batcher when by one comparison, and he parlayed this and other ideas into a method that improved Batcher by an additive term of form for a positive . See this for the details.
The Amazing 0-1 Law
One of the coolest results in sorting theory is the 0-1 Law. This principle shows that a sorting network is valid if it can sort all sequences of s and s. This not only drastically cuts down on the number of tests needed to determine the validity of a network, it is of great use in creating many constructions of sorting networks as well. The principle is found in Knuth’s book on sorting and is attributed to Willard Bouricius by him. Hoary it may be, but Michael uses it in several ways, most notably to simplify measures of distances of sub-arrays from being sorted.
It is interesting to note that the 0-1 law has been generalized to networks that sort most vectors, rather than all of them. This is contained in a 2005 paper by Sanguthevar Rajasekaran and Sandeep Sen. I thought also about this years ago, but never got a result that seemed interesting. Theirs does.
Again the 0-1 law is one of those few results that we have in theory that say that an algorithm works for all inputs from a large universe provided it works correctly for a small subset of the universe. It would be interesting to see if such theorems could be proved for other classes of algorithms, even if we have to restrict the type of algorithm greatly. Such theorems could be extremely useful, but they seem to be hard to find.
Here is Michael’s new algorithm in pseudocode:
There are some comments that I want to make, besides the usual invitation to see the paper for the details. One is that the code is quite short, and that while the code is short, the main issue is proving that the code works—we sketch some details in a moment. The fact that the proof of correctness is much harder than the algorithm is a recurrent theme in modern algorithms. The classic quicksort algorithm is even shorter:
This phenomenon suggests to me a possible conjecture:
Could there be very short algorithms for SAT or other hard problems, factoring for example, that are short as programs, but have long complex proofs?
A related conjecture is: can we somehow prove that there is no very short program for SAT that runs fast? I think one of my first statements on GLL was that we could not rule out a polynomial time algorithm for SAT that fits on a single page. Is there any way to make this into a serious conjecture? How would we formalize it?
The last comment is that Michael does have to sacrifice the depth attained by AKS. In fact, so far he does not even get depth—he says in a footnote that he gets only parallel time.
The name “zig-zag” comes from places where the algorithm does a shellsort-type percolation going up rather than down, and more simply, flips upside down some sequences of elements that were nearly sorted from a previous step. This seems “counter-intuitive” as he says, but such steps are allowed also in Batcher’s bitonic sort, since flipping a bitonic sequence upside-down still leaves it bitonic. Such flips can ultimately be eliminated in Batcher’s network, as noted here, whereas they seem more integral to Michael’s networks.
The step needs only to be a “black box” that meets the following three partial-progress conditions on the -element arrays and , for suitably chosen values of the parameters . The first two were previously known, but the third one is new, and really helps “kick” the partial progress into completion:
- For any , at most of the largest elements in wind up in , and at most of the smallest elements in end up in .
- For any , the above holds with replaced by the somewhat smaller value .
- For any , and any , whenever is the number of the largest elements in that are still in , at most of those elements remain in after the step, and whenever is the count of elements in that are among the smallest in , at most of the smallest elements remain in afterward.
In the third clause, it is not mandated that the elements be a subset of the original —the analysis only cares about how the trace number of still-misplaced elements is “attenuated” by each application of . The inductive description of the algorithm building on this progress is very pretty.
There remains the task of coming up with gadgets for the needed choices and for various sizes . Here is where the dreaded fellow-traveler of galacticity, namely non-constructivity, enters in. The smallest gadget sizes are known only via non-constructive probabilistic/counting arguments. What the paper shows is that the “penalty” for making the gadgets constructive is appreciably smaller for his algorithm than for AKS, resulting in a relatively great savings. For non-constructive versions his algorithm is “competitive” while still being simpler.
Is there a really practical sorting network that is asymptotically optimal? Can it have or even depth?