Testing Galactic Algorithms
How to get others to take a galactic algorithm seriously
Fred Sayward is not a theorist, but is an expert on software engineering, and in particular on testing programs for correctness.
Today I want to talk about the problem of getting someone to read a complex proof of a famous conjecture in complexity theory.
Often a claimed proof of such of conjecture consists of an algorithm that can be quite complex, and even worse have a terrible running time—the algorithm is galactic. Recall this means that the algorithm runs in a time bound that is huge, that the algorithm is unlikely to be usable in practice, and that it only runs in in “theory.” This is not unexpected: we would be thrilled with a galactic algorithm for any of our hard open questions. Actually, as Tim Gowers has said here more than once, he would be okay with , provided the algorithm was galactic.
The puzzle that I have thought about is can we test a galactic algorithm, or must we only rely on the claimed proof that the algorithm is correct. Since often these claims are complex, messy arguments that could be wrong, what happens is that people refuse to read them. This means that the algorithms are not checked at all. They may be wrong, but it seems to me that there should be some way to test the algorithms. Testing cannot prove an algorithm correct, but it can show that an algorithm is incorrect; even better, it can show the author of the algorithm that the work is incorrect. As I pointed out before, Roger Brockett once said:
It’s hard to argue with a counter-example.
I have several ideas from software engineering that could help here. It may be possible at least to do limited testing of many galactic algorithms, by using some methods from software testing. I must add that I have had many interests over the years, and one was—is?—in the theory and practice of testing programs.
Years ago I was interested in program testing—not the theory of program testing—but actual methods for testing real programs. One method that I worked on, with Rich DeMillo and Tim Budd and others, was called program mutation. This method is based on a thesis of how people actually program and how they make mistakes—perhaps I will discuss it another day.
Fred Sayward worked for me when I was just staring out, in the mid 1970’s, when I was an assistant professor at Yale University. Alan Perlis was then the chair of the Computer Science Department at Yale, and Roger Schank was also on the faculty. Roger was more senior to me, and he had a well deserved reputation as a strong leader of the Yale AI group—I am being polite here. His word in that group was law, which was probably fair, since he created the group from nothing. He put Yale “on the map” in AI.
I had no money at the time to hire a full-time post-doc, but I did anyway. One day at a faculty meeting Perlis casually remarked that the department had enough money to hire a two-year post-doc. There was no further discussion, and we moved on to other matters.
I instantly began looking for a possible post-doc candidate for my project on program mutation. I quickly found Sayward, who was finishing his Ph.D. at Brown University. I had him visit, got letters of recommendation, and then approached Perlis with the suggestion that we hire Fred. Perlis had been to his talk, liked the letters, and quickly we had an offical letter off to Sayward offering him the job. He accepted. Done deal.
A few months later, Schank walked into Perlis’s office and told Alan that he had thought about the post-doc position. Schank began to explain to Perlis that he, Roger, had a couple of ideas of who he planned to hire to fill this position—of course the position was his. Perlis waved him off and said,
Lipton already has found a candidate and the position is gone.
I was not there, so all this is second-hand from Perlis, but I can imagine that Roger was not pleased—in his mind all resources of the department should have gone to build up the AI group.
There were two consequences of this: First, Fred turned out to be a great choice. For example, we together with DeMillo wrote a paper on program testing that has well over a thousand citations. The second was that while Roger was initially upset, after this event he acted differently toward me. He treated me with more respect—in general theory was not his type of research. I always thought that he respected someone who “beat” him at the resource game, since he was such a strong player. I probably should not do it again, but doing it once was cool.
Years later, after Roger had moved to Northwestern University, he tried hard to hire me. Perhaps I should have gone. Oh well.
Often new algorithms, as I stated, that claim to solve a hard open problem can run in “polynomial time,” but not really in feasible time. If an algorithm runs in worst-case time
then for any reasonable problem the algorithm is unlikely to be runnable.
Especially for claims of new factoring algorithms, new Graph Isomorphism (GI) algorithms, or new SAT algorithms, the inability to implement the algorithm is a serious problem. Simply said, if you claim a new factoring algorithm and break one of the open RSA challenge problems, there is no doubt that people will take you seriously. The same goes for GI and SAT.
I often hear from those who claim to have say a new factoring algorithm: “but my algorithm cannot run on real examples, so you must read my proof.” I claim this is too narrow a view, that there are ways to use testing to check even galactic algorithms.
You have a galactic algorithm that factors, or does GI, or solves SAT, or . Yet no one is taking the algorithm seriously. Here are some suggestions that you might consider. Note, even before you announce to the world that you have this wonderful algorithm, you can use these ideas yourself to check your own work. I know that I have many times used them to check a new algorithm, and sometimes have discovered a problem. Computers are powerful, they should be part of the tool kit for anyone working in math and theory.
Really Small Cases: Your galactic algorithm may be able to run on a tiny problem. I have seen algorithms that claimed to solve a had open problem fail on the simplest cases.
Worst-Case: Okay, although the worst-case running time for is , that need not mean that it really runs in that time. Your analysis is worst-case: the key insight is that it runs, you believe, in polynomial time. You will be famous whether it is or . In my experience you may not have the tightest bounds, that is, you worst case may not be the actual running time of . The algorithm on some small problems may run much faster than you think.
For example, a proof that some procedure eventually converges in steps may be the best you can prove. Or that may be the easiest bound that follows quickly from a simple proof. But in practice the procedure might converge much much faster. This can be tested.
Even if the convergence is slow, running the algorithm could still show that it seems to be converging. This can help us gain insights into the behavior of the algorithm.
Subroutines: Your algorithm is likely to be built out of many subroutines, some of which may be run many, many times. For example, suppose the algorithm takes a partial assignment to a SAT problem and then runs some procedure to improve it. Even if the improvement procedure might have to run a galactic number of times, you can implement and test this procedure itself.
If the procedure does not improve the metric the way you claim it does, that is a problem. However, if for all the examples you try the procedure improves the metric a tiny amount, then you have gained some real credibility.
Special Cases: Algorithms are often created for high-dimensional general cases. Perhaps you can test your algorithm on very special cases. If the algorithm is geometric, then you could check that it works in the one-dimensional case, or in the two-dimensional case. I am surprised by the number of times complex algorithms have been claimed that fail even in the one-dimensional case. The line. This has happened in a real claimed result a while ago—no names—and it will probably happen again in the future.
Can these ideas be applied? Would you take a galactic algorithm more seriously if it was shown to work on non-trivial examples? Even in the partial way that we outline above?