An improved sorting network to appear at STOC 2014
Teaching page source
Michael Goodrich is a Chancellor’s Professor at the University of California at Irvine, and recently served a year as Chair of the UCI Computer Science Department. He has designed algorithms that cover all of the area spanned by axes of cleverness and high performance. This includes attention to algorithms that we have called (clever but) galactic for having enormous constant factors and/or high exponents that forestall practical implementation. An example is his paper last year with his colleague Pawel Pszona, “Cole’s Parametric Search Technique Made Practical,” which greatly tunes up an algorithm-improvement technique of Richard Cole. Now he has a new paper taking a notorious behemoth head-on.
Today Ken and I wish to talk about Michael’s recent work improving on the size of the famous AKS sorting network.
The initials AKS have become legion twice over in complexity. They are engraved on Wikipedia for the deterministic polynomial-time primality test of Manindra Agrawal, Neeraj Kayal, and Nitin Saxena proved in 2002. Two decades earlier, however, they designated the sorting network devised by Miklós Ajtai, János Komlós, and Endre Szemerédi. For sorting elements these networks have asymptotically optimal size and depth, but as noted by Wikipedia,
…[they have] little or no practical application because of the large linear constants hidden by the Big-O notation.
Michael begs to differ, and will explain during his talk at the STOC conference being organized May 31—June 3 by Columbia University, with workshops there and the main conference on 34th Street. His paper is one of the few that is single authored, one of the few that is on sorting, and is titled “Zig-zag Sort: A Simple Deterministic Data-Oblivious Sorting Algorithm Running in Time.”
A sorting network is one that sorts in an oblivious manner, meaning that the comparisons done at each step are the same for all inputs. The following diagram is used to represent a small such network: the comparisons are the vertical lines, the inputs enter on the left, and each vertical line forms a swap with the larger value moving up and the lower going down.
Usually “oblivious” is not a great thing to be. But for sorting algorithms it is quite a useful property: it makes the algorithm simpler to describe, it makes it easier to implement in hardware and even software, and makes it useful in many security applications. The latter follows since the operations at each step are the same independent of the input; thus, someone watching the comparison operations, not their values, can learn nothing from the algorithm’s execution. Of course there is an interesting cost to being oblivious: the algorithm runs in the same time even if the original input is already sorted. I find the latter an interesting issue, since in practice the input may be in sorted order or at least close to this. But that is another question.
The best sorting network for many years was due to Kenneth Batcher and has size and depth , where is the number of items to be sorted. The constant hidden behind the notation is quite small. AKS brilliantly solved the longstanding open question of whether one needs only comparisons, getting depth to boot. But in 1998, Don Knuth noted that:
Batcher’s method is much better, unless exceeds the total memory capacity of all computers on earth!
As we say here at GLL, the AKS sorting network is galactic: it needs that be larger than or so to finally be smaller than Batcher’s network for items. Hence Knuth’s point—see here for some details. Also there is a whole book on sorting networks by Sherenaz Al-Haj Baddar and Batcher.
I must add that before 1983 the search for a better method than Batcher was a “hot” research problem. I worked on it trying to use random networks, but could not make any progress. Even in 1971, David Van Voorhis could beat Batcher when by one comparison, and he parlayed this and other ideas into a method that improved Batcher by an additive term of form for a positive . See this for the details.
The Amazing 0-1 Law
One of the coolest results in sorting theory is the 0-1 Law. This principle shows that a sorting network is valid if it can sort all sequences of s and s. This not only drastically cuts down on the number of tests needed to determine the validity of a network, it is of great use in creating many constructions of sorting networks as well. The principle is found in Knuth’s book on sorting and is attributed to Willard Bouricius by him. Hoary it may be, but Michael uses it in several ways, most notably to simplify measures of distances of sub-arrays from being sorted.
It is interesting to note that the 0-1 law has been generalized to networks that sort most vectors, rather than all of them. This is contained in a 2005 paper by Sanguthevar Rajasekaran and Sandeep Sen. I thought also about this years ago, but never got a result that seemed interesting. Theirs does.
Again the 0-1 law is one of those few results that we have in theory that say that an algorithm works for all inputs from a large universe provided it works correctly for a small subset of the universe. It would be interesting to see if such theorems could be proved for other classes of algorithms, even if we have to restrict the type of algorithm greatly. Such theorems could be extremely useful, but they seem to be hard to find.
Here is Michael’s new algorithm in pseudocode:
There are some comments that I want to make, besides the usual invitation to see the paper for the details. One is that the code is quite short, and that while the code is short, the main issue is proving that the code works—we sketch some details in a moment. The fact that the proof of correctness is much harder than the algorithm is a recurrent theme in modern algorithms. The classic quicksort algorithm is even shorter:
This phenomenon suggests to me a possible conjecture:
Could there be very short algorithms for SAT or other hard problems, factoring for example, that are short as programs, but have long complex proofs?
A related conjecture is: can we somehow prove that there is no very short program for SAT that runs fast? I think one of my first statements on GLL was that we could not rule out a polynomial time algorithm for SAT that fits on a single page. Is there any way to make this into a serious conjecture? How would we formalize it?
The last comment is that Michael does have to sacrifice the depth attained by AKS. In fact, so far he does not even get depth—he says in a footnote that he gets only parallel time.
The name “zig-zag” comes from places where the algorithm does a shellsort-type percolation going up rather than down, and more simply, flips upside down some sequences of elements that were nearly sorted from a previous step. This seems “counter-intuitive” as he says, but such steps are allowed also in Batcher’s bitonic sort, since flipping a bitonic sequence upside-down still leaves it bitonic. Such flips can ultimately be eliminated in Batcher’s network, as noted here, whereas they seem more integral to Michael’s networks.
The step needs only to be a “black box” that meets the following three partial-progress conditions on the -element arrays and , for suitably chosen values of the parameters . The first two were previously known, but the third one is new, and really helps “kick” the partial progress into completion:
- For any , at most of the largest elements in wind up in , and at most of the smallest elements in end up in .
- For any , the above holds with replaced by the somewhat smaller value .
- For any , and any , whenever is the number of the largest elements in that are still in , at most of those elements remain in after the step, and whenever is the count of elements in that are among the smallest in , at most of the smallest elements remain in afterward.
In the third clause, it is not mandated that the elements be a subset of the original —the analysis only cares about how the trace number of still-misplaced elements is “attenuated” by each application of . The inductive description of the algorithm building on this progress is very pretty.
There remains the task of coming up with gadgets for the needed choices and for various sizes . Here is where the dreaded fellow-traveler of galacticity, namely non-constructivity, enters in. The smallest gadget sizes are known only via non-constructive probabilistic/counting arguments. What the paper shows is that the “penalty” for making the gadgets constructive is appreciably smaller for his algorithm than for AKS, resulting in a relatively great savings. For non-constructive versions his algorithm is “competitive” while still being simpler.
Is there a really practical sorting network that is asymptotically optimal? Can it have or even depth?
Why they can be important
Joe Bloggs is not a computer scientist, at least not one that we know. As noted by Wikipedia, Joe Bloggs is a “placeholder name” like “John Doe” or “Jane Roe” in the US, or “Ashok Kumar” in India. Sometimes real people have or acquire the placeholder names: Ashok Kumar (born Kumudlal Ganguly) was a famous actor in India, whose acting brothers also took the surname “Kumar,” and the “Roe” in Roe v. Wade is Norma McCorvey. We especially like the fact that “Joe Bloggs” came about long before there were blogs. A public restraining order is indeed called an “Ashok Kumar order” in India the way it is called a “John Doe order” in the US.
Today Ken and I consider whether there should be a “John Doe order” against submitting claimed proofs of or , and instead point out some reasons that we should look at such attempts by Ashok, Jane, or Joe.
Ideas from algebraic geometry and arithmetic complexity
Hyman Bass is a professor of both mathematics and mathematics education at the University of Michigan, after a long and storied career at Columbia University. He was one of the first generation of mathematicians to investigate K-theory, and gave what is now the recognized definition of the first generation of K-groups, that is for a ring . He co-founded with Jean-Pierre Serre a theory of graphs of groups, that is mappings associating a group to each vertex and edge of a graph. He is a past President of the American Mathematical Society—and exemplifies the idea that the AMS promotes mathematics at all levels: since 1996 he has worked on on elementary school math education with his Michigan colleague Deborah Ball.
Can chess statistics help design multiple-choice exams?
UT Dallas source
Bruce Pandolfini is one of very few living people who have been played by an Oscar-winning actor. He was the real-life chess teacher of junior chess player Joshua Waitzkin, who went on to become the world champion—in T’ai Chi. Their story is told in the movie “Searching For Bobby Fischer” with Sir Ben Kingsley, which is still iconic after 20-plus years. Pandolfini is still doing what he loves as an active chess teacher in New York City. For much of this time he has also written a popular feature called “Solitaire Chess” for Chess Life magazine, which is published by the United States Chess Federation.
Today Dick and I wish to compare styles of multiple-choice exams, with reference to “Solitaire Chess,” and have some fun as well.
Cases where we can count objects in polynomial time
New Interim Dean source—our congrats
Mike Sipser is one of the top theorists, especially in complexity theory, and has been Head of the Mathematics Department at MIT since 2004. He is long known for some very clever proofs and his famous survey paper on the status of . He is recently known for co-authoring the paper that introduced adiabatic quantum computing, which is the kind being given physical realization by the company D-Wave Systems. But he is perhaps best known nowadays for his textbook Introduction to the Theory of Computation. I have used the earlier version many times for teaching our undergraduate class; I have not used the third edition, mainly because I teach other things these days.
Today Ken and I wish to talk about a topic that is covered in his book, finite state automata, in relation to counting.
It’s that time of year again
Altered from NPR source.
Faadosly Polir is back in contact with us. We have encountered him before, but this time he has company. He has teamed up with two well-known radio personalities to launch a quiz show about mathematics.
Today Ken and I wish to talk about material for the show that Faadosly has sent us.
A famous algorithm, a new paper, a full correctness proof
Vijay Vazirani is one of the world experts on algorithmic theory, and is especially known for his masterful book on approximation algorithms. Among many results in computational complexity, two have been Wikified: his theorem with Les Valiant on unique SAT, and the subsequent generalized Isolation Lemma with Ketan Mulmuley and his brother Umesh Vazirani. Lately his focus has been more on computational aspects of auctions, economic systems, and game theory.
Today Ken and I wish to discuss a recent paper by Vijay on matching.