Andrey Kolmogorov, Fred Hennie, Richard Stearns, and Walter Savitch are all famous separately; but they have something in common. Read on, and see.
Today I wish to discuss some algorithmic tricks and show that they were initially used by complexity theorists, years before they were used by algorithm designers.
To steal a phrase: it’s computational complexity all the way down. Well not exactly. The situation is slightly more complex—a bad pun. The complexity theorists often invented a concept and used it in a narrow way, while later it was rediscovered and made a general notion. This is another example of the principle: The last to discover X often gets the credit for X. I note that the dictionary gets this wrong:
Its not the first but the last. For after the last gets the credit, it’s no longer a “discovery.” Let’s look at three examples of this phenomenon.
Who invented it? Andrey Kolmogorov in 1953.
Who really invented it? Harold Lawson in 1964.
Details: Kolmogorov did so much that one should not be surprised that he invented “pointers.” In 1953 he wrote a short paper, really an abstract, “To the Definition of an Algorithm.” This later became a 26-page paper joint with the still-active Vladimir Uspensky. In that and the preceding decades, several researchers had advanced formal notions of an algorithm. Of course we now know they all are the same, in the sense they define the same class of functions. Whether one uses Turing machines, recursive functions, or lambda calculus—to name just a few—you get the same functions. This is an important point. In mathematics, confidence that a definition is right is often best seen by showing that there are many equivalent ways to define the concept.
Kolmogorov’s notion was similar to a Turing machine, but he allowed the “tape” to be an almost arbitrary graph. During the computation, in his model, the machine could add and change edges. How this and other models constitute a “Pointer Machine” is discussed in a survey by Amir ben-Amram. A beautiful explanation of Kolmogorov’s ideas is in the survey: “Algorithms: A Quest for Absolute Definitions,” by Andreas Blass and Yuri Gurevich. I quote from their paper:
The vertices of the graph correspond to Turing’s squares; each vertex has a color chosen from a fixed finite palette of vertex colors; one of the vertices is the current computation center. Each edge has a color chosen from a fixed finite palette of edge colors; distinct edges from the same node have different colors. The program has this form: replace the vicinity U of a fixed radius around the central node by a new vicinity W that depends on the isomorphism type of the digraph U with the colors and the distinguished central vertex. Contrary to Turing’s tape whose topology is fixed, Kolmogorov’s “tape” is reconfigurable.
Harold Lawson invented pointers as we know them today in 1964. He received the IEEE Computer Pioneer Award in 2000:
for inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high level language.
The idea that a pointer can be variable was by Kolmogorov, and distinct from fixed pointers in Lisp implementations as Ben-Amram notes, but making it a syntactic variable in a programming language made the difference.
Who invented it? Fred Hennie and Richard Stearns in 1966.
Who really invented it? Robert Tarjan in 1985.
Details: Hennie and Stearns proved one of the basic results in complexity theory. Initially Turing machines had one tape that was used for input and for temporary storage. It was quickly realized that one could easily generalize this to have multiple tapes—some for input, others for temporary storage. This does not change the class of computable functions. That is stable under such changes. But it does change the time that such a machine takes to compute some task. What they proved is that the number of tapes was relatively unimportant provide there were at least two. More precisely they proved that a machine with tapes that ran in time could be simulated in time by a machine with two tapes.
This result is very important and fundamental to the understanding of time complexity. The simulation is also very clever. Even for simulating three tapes by two, the issue is that the obvious way to do the simulation seems to require that the time increase from to order . But they show that the obvious method of making three (or more) tracks on one tape can be fixed to work so that the “average” simulation step takes order , using the second tape to move data. Note some simulation steps take a constant number of steps, and others take a huge number of steps. But the average is logarithmic and this proves the theorem. I have discussed this before—see here.
Bob Tarjan years later used the same fundamental idea in what is called amortized complexity. Bob’s ideas explained in his 1985 paper Amortized Computational Complexity made a useful formal distinction within the concept of an operation being good on average. The distinction is whether the operation can possibly be bad all the time, in highly “unlucky” cases. If you are finding the successor of a node in a binary search tree, you might unluckily have to go all the way up the tree, and all the way back down on the next step. But the step after that will be just one edge, and overall, the entire inorder transversal of nodes takes edge steps total, giving an average under . It doesn’t matter how unbalanced the tree is. The idea is used throughout algorithm design.
Who invented it? Walter Savitch in 1965.
Who really invented it? Whitfield Diffie and Martin Hellman in 1977.
Details: The most famous example in computational complexity is certainly Savitch’s brilliant result—still the best known—showing that nondeterministic space is contained in space. His idea is to change a search problem for a path from to into two search problems. One searches from and also searches from . If there is some point in common, then there is of course a path from to . If we insist on halving the allowance for the search each time we recurse, then we will succeed when—and only when— is exactly midway between and . This insistence guarantees recursion height plus local storage for the neighborhood of the current node, giving space overall. Ken likes to cite a “Modified Chinese Proverb” in his lectures:
A journey of a thousand miles has a step that is 500 miles from the beginning and 500 miles from the end.
Diffie and Hellman in 1977 created the idea of using this method for an attack on a block cipher. Their attack largely deflates the idea that a composition of encryptions
where each uses a different key , can be substantially more secure than a single application of some . The idea, explained in full by our friends at Wikipedia, is to guess an intermediate stage , let be the encodings up to and after stage , and the corresponding decodings. Plaintexts and ciphertexts are related by
If we had , we could try all single encoding keys to get , and store results in a lookup table. Then we can try all decryption keys to see if , and preserve pairs that match. Then for each pair try other and discard it if they don’t match, until one pair survives. Already we have time bounded by the sum not the product of the numbers of keys, multiplied instead by the number of trials. But the main point is that the time savings (at the expense of space for the table) are largely preserved in a recursion that guesses a breakpoint and recurses on the supposition that is at or near the midpoint, by an analysis similar to Savitch’s.
Okay, not all algorithmic ideas started in complexity theory. But a number of very important ones did. All three that we’ve covered in fact came from analyzing basic models of computation. Can you name some others?
]]>
How exceptions in theorems may affect their complexity
Cropped from India Today source |
Manjul Bhargava is a mathematician who just won one of the 2014 Fields Medals. We offer our congratulations on this achievement. He is an expert in number theory, which makes him special to us among Fields medalists. His Fields citation includes his doctoral work on a powerful reformulation and extension of Carl Gauss’s composition law for quadratic forms. He also proved a sense in which 290 is special to us among numbers, since we have been thinking recently about quadratic forms as tools in complexity theory.
Today we talk about his “290 Theorem” with Jonathan Hanke, which is quite accessible, and also raise complexity-related questions about this result.
One of his papers is in the American Math Monthly. It is on the factorial function and its generalizations. We always applaud when a pathbreaker writes a popular survey. It is hard to imagine a more concrete subject and a more accessible journal.
A sub-surface question in computational complexity is how much difference can finite changes make? With asymptotic time or space bounds, finite changes make zero difference—finite sets have zero asymptotic complexity. However, for measures of concrete complexity, such as the size of proofs and the information content of strings, even small exceptions may make a difference.
Why is special? Well the answer starts with, why is special? Call a quadratic form in integer variables universal if it represents all positive integers, that is, if .
is universal is Joseph Lagrange’s famous Four-Square Theorem, whose implications in complexity theory we have already noted.
Indeed, Jeffrey Shallit and Michael Rabin proved that this form is effectively universal in the sense that given any integer , we can find in random polynomial time integers so that
Our complexity problems here are perhaps related but different: given a form , how easy is it to test whether is universal, and how easy is it to prove the theorem on which the test is based?
We need to define the allowed range of forms carefully. Every can be specified by a matrix such that for all ,
If we were using non-commutative algebra where , then would be unique, but now the coefficient of can be split any way between the entries and . The convention is to split evenly, so that is symmetric, and once again unique. Then the condition that for is the same as being positive definite, and often this is assumed when talking about quadratic forms.
In Lagrange’s case, is the identity matrix. Diagonal matrices are positive definite if and only if all the diagonal entries are positive, but for symmetric real matrices in general things are trickier. The matrix for is positive definite despite the two entries, while is another “cheater” despite all-positive entries. In the case , is positive definite but has half-integer entries:
Then is said to be integer-valued but not classically integral. For much of two centuries, was also thought to “cheat,” so that only forms with even cross-term coefficients were distinguished, but Bhargava’s work has helped solidify the looser “twos-out” condition as the standard.
Obviously, universal forms are interesting. In 1993, John Conway and William Schneeberger proved that if a positive-definite quadratic form with integer matrix represents all positive integers up to , then it represents all positive integers. Pretty neat. This clearly makes the checking of such a form to see if it is universal a relatively simple task. The proof was not published and was quite intricate. This result is naturally called the 15 Theorem.
Bhargava found a simpler proof in 2000, which was hailed by Conway himself. He followed this up by proving Conway and Schneeberger’s conjecture that one could relax to an integer-valued form upon replacing by : If a form represents all positive integers up to , then it is universal. Precisely stated:
Theorem 1 For any integer-valued positive-definite quadratic form , if includes
then includes . Moreover, this set of twenty-nine numbers is minimal—remove any one and the statement becomes false.
This last theorem alternates senses of being easy and complex. Its statement is complex—where do those numbers come from? However, having only twenty-nine numbers to check makes it easy to prove that a given form is universal—just produce arguments yielding each number as a value. However, the problem of finding such arguments, especially given an arbitrary , remains possibly complex—can there be any kind of extended Rabin-Shallit theorem?
What we fix on, however, is what the nature of the statement says about the complexity of the proof. This raises the question of a measure of complexity of theorems. Our intent can be approached by considering four types of theorems:
Type I. Every of type has property .
Type II. Every of type has property , except possibly for belonging to this fixed finite set of values:
Type III. If every member of has property , then every of type has property .
Type IV. A theorem about objects in which a test of kind II or III is part of the statement.
To interpret this, think of as a positive integer, fix a form , and read as “.” Then for the particular form , statement I is “ is universal,” II is “ is almost universal” (in a sense we haven’t discussed but that is clear), and III is an instance of Bhargava’s theorem for the particular (when , that is). Then Bhargava’s theorem itself is type IV, where the objects are forms , and all it does is universally quantify the assertion of type III over those .
Note that in types II and III, the statement remains true if is replaced by a larger set . We intend this also in type IV—that is, the theorem is monotone in the test. So define a number to be special if there is an such that:
In types II and III it is enough to specify ; in IV we also give the theorem statement about ‘s.
Here are some examples of all these types:
Theorem 2. Every odd-order group is solvable. (Type I)
Theorem 3. Every finite group that is not cyclic or alternating or of certain Lie types is not simple, unless it is one of 26 so-called sporadic simple groups. (Type II)
Theorem 4. Every Fibonacci number has a prime factor that does not divide any earlier Fibonacci number, except for , , and . (Also type II)
A statement of type II logically implies one of type III, but in a trivial sense. For instance, let be the assertion that the (topological generalized) Poincaré conjecture holds in dimensions. Suppose we went back in time before Michael Freedman won his Fields Medal for proving in 1982, let alone Grigori Perelman for , but after Steve Smale won his in 1966 largely for proving for . Then with we had a theorem of type II, and also type III in the form “if Poincaré is true for then it is true for all .” But the implication generally does not go the other way, and since the Poincaré is true there was no special number.
Our point is that a theorem of type I seems easier to understand than one of type II or III, let alone IV. Hence theorem I might have a clearer proof than theorem II. None of this intuition is solid, but it does seem reasonable. The presence of gives the statements other than type-I higher information complexity. Our questions are whether the freedom to enlarge mitigates the increase in complexity compared to type I, whether this might allow easier proofs of the resulting statements, and how much harder it is to prove special numbers—when must be minimal.
For now we only have a few sketchy ideas on these questions, involving the notion of the Kolmogorov complexity of a string . This is named for Andrey Kolmogorov, but various related notions were discovered at the same time by others. Here it is enough to say that is the length of the shortest program that generates . The notion extends to define for finite sets.
Can we prove that certain theorems of types II–IV cannot be proved in a fixed formal system? Our intent is to argue like this: Suppose that we could prove the statement
where is a given finite set. The means we’re taking minimal. If itself has large complexity, then this might lead to a contradiction.
A simple result can be proved. Let denote the length of formula as a string, and let ZF be the usual set theory—or any reasonable theory, provided its set of theorems is r.e.
Theorem 5. Suppose that the statement
where is a given finite set is provable in ZF. Then there is an absolute constant that depends only on ZF so that
Proof: Let a machine search the proofs of ZF for a proof of the above theorem. It will eventually find the proof. This will give us the set and so we have a description of of size at most plus a constant that depends on the machine that does the search.
Since we can get a bound on , we may be able to argue that certain concrete properties cannot be proved to fail for sets that are finite but too high in complexity. This would be great if we could actually do something like this. We would like even better to remove the dependence on , and obtain results of this kind:
There are only finitely many numbers that ZF can prove to be special.
Although when is the largest element in , we don’t get by imitating the above proof, because the machine could output an infinite sequence of ‘s that go with different ‘s. It isn’t a contradiction because something like or the index of in this sequence needs to be given as well to specify , and this need not be bounded by any function of . One can craft artificial ‘s to falsify it. But in specific cases, or in the presence of restrictions on , this might yield (or at least suggest) interesting results.
For instance, we’ve noted that homotopy groups are finite and computable, so that the sequence of their structure must have Kolmogorov complexity at most a value that we can determine. We need not develop a fast algorithm to compute them, but just a short one. This proves that while the intuitive complexity of the structure of homotopy groups seems high, we can bound their K-complexity. Having tight bounds would be interesting, and then might help analyze what kinds of relaxations of theorems make them easier to prove.
We have raised a diverse array of complexity-related questions. Can more light be shed on them? Is there any kind of global limit on special numbers?
]]>
Edward Barbeau is now a professor emeritus of mathematics at the University of Toronto. Over the years he has been working to increase the interest in mathematics in general, and enhancing education in particular. He has published several books that are targeted to help both students and teachers see the joys of mathematics: one is called Power Play; another Fallacies, Flaws and Flimflam; and another After Math.
Today I want to discuss his definition of the derivative of a number, yes a number.
We all know the concept of a derivative of a function. It is one of the foundational concepts of calculus, and is usually defined by using limits. For the space of polynomials it can be viewed as a linear operator so
The derivative operator in general satisfies many properties; one is the product law:
This rule is usually credited to Gottfried Leibniz. Somehow the great Issac Newton did not know this rule—at least that is what is claimed by some.
Barbeau defined the derivative of a natural number in 1961. Define for a natural number by the following rules:
Here is a picture from his paper:
This proves that he really did it a long time ago. Note the typewriter type face: no LaTeX back then. He proved the basic result that is well defined. This is not hard, is necessary to make the definition meaningful, but we will leave it unproved here. See his paper for details.
A simple consequence of the rules is that for a prime. This follows by induction on . For it is rule (1). Suppose that :
Unfortunately this does not hold in general. Also is not a linear operator: and . This double failure, the derivative of a power is not simple and the derivative is not linear in general, makes difficult to use. One of the beauties of the usual derivative, even just for polynomials, is that it is a linear operator.
The derivative notion of Barbeau is interesting, yet it does not seem to have been intensively studied. I am not sure why—it may be because it is a strange function—I am not sure.
There is hope. Recently there have been a number of papers on his notion. Perhaps researchers are finally starting to realize there may be gold hidden in the derivative of a number. We will see.
Most of the papers on have been more about intrinsic properties of rather than applications. A small point: most of the papers replace by : so if you look at papers be prepared for this notation shift. I decided to follow the original paper’s notation.
The papers have results of three major kinds. One kind is the study of what are essentially differential equations. For example, what can we say about the solutions of
where is a constant? The others are growth or inequality results: how fast and how slow does grow? For example, for not a prime,
A third natural class of questions is: can we extend to more than just the natural numbers? It is easy to extend to integers, a bit harder to rationals, and not so easy beyond that.
Here are two interesting papers to look at:
I tried to use to prove something interesting. I think if we could use to prove something not about but about something that does not mention at all, that would be exciting. Tools must be developed in mathematics, but the key test of their power is their ability to solve problems from other areas. One example: the power of complex analysis was manifest when it was used to prove deep theorems from number theory. Another example: the power of the theory of Turing machines was clear when it was used to yield an alternate proof of the Incompleteness Theorem.
The best I could do is use to prove an ancient result: that is not rational. Well I may be able to prove a bit more.
We note that from the product rule: , for any . Recall if were prime this would be .
Now assume by way of contradiction that is rational number. Then for some positive numbers we have
As usual we can assume that are co-prime.
So let’s take derivatives of both sides of the equation—we have to use sometime, might as well start with it.
Note that it is valid to apply to both sides of an equation, so long as one is careful to obey the rules. For example allows but there is no additive rule to make the right-hand side become which would make the equation false.
The result of taking the derviative of both sides is:
Now square both sides and substitute for to get:
This implies that divides . This leads to a contradiction, since it implies that are not co-prime. Whether we also get that divides is possibly circular, but anyway this is enough. The point is that owing to , the derivative removed the problematic factor of .
Note from the original equation, we only get that divides which is too weak to immediately get a contradiction. Admittedly ours is not the greatest proof, not better than the usual one especially owing to the squaring step, but it does use the derivative of an number.
One idea: I believe that this idea can be used to prove more that the usual fact that has no nonzero solutions over the integers. I believe we can extend it to prove the same result in any ring where can be defined, possibly modulo issues about lack of unique factorization. This handles the Gaussian integers, for example.
Can we use this strange function to shed light on some open problem in number theory? Can we use it in complexity theory? A simple question is: what is the complexity of computing ? If where and are primes, then by the rules. But we know that and thus we have two equations in two unknowns and we can solve for and . So in this case computing is equivalent to factoring . What happens in the general case? An obvious conjecture is that computing is equivalent to factoring.
[fixed error in proof that owed to typo , as noted by user "Sniffnoy" and others, and changed some following text accordingly; further fix to that proof; fixed typo p^k to p^{k-1}]
]]>
A new result on our three body problem
Allan Grønlund and Seth Pettie are leaders in algorithm design and related problems.
Today I want to give a quick follow up on our discussion of 3SUM based on a recent paper of theirs.
I had planned to report on the accepted papers that will soon appear at this upcoming 2014 FOCS conference. See here for all conference details. But Pettie just sent me their joint paper that almost answers the questions raised in the last discussion on “our three body problem.” The paper is called: Threesomes, Degenerates, and Love Triangles: I have no comment on this title. Wait saying “I have no comment” is really making a comment—oh well.
Let’s start by recalling the definition of 3SUM. Given a set of integers the problem is to determine if there are three elements in ,
Actually the version they work on allows to be a set of real numbers, not just integers. This is important because there could be tricks that work with integers—taking mods for example—that cannot be made to work with reals. Since they are getting upper bounds this generalization is just fine.
They prove a variety of theorems in their paper, so look at it for all the details. One of the main results is:
Theorem:
The decision tree complexity of 3SUM is .
So this means that the 3SUM problem has a sub-quadratic algorithm. No. It means that it has decision tree complexity that is order , dropping the logarithmic factor. So let’s look quickly at what this really means, and at the same time give some idea of how they prove such a result.
Decision tree complexity is based on a simple model: a tree is defined that at each node makes a binary decision based on some test of the inputs. Various tests are allowed but here they are always linear in the inputs. This model has been studied for ages and many interesting results are known.
Some of the most exciting ones are based on insights of Michael Fredman. In particular he showed in 1976 several cool results about the power of decision trees. I should report in more detail on these ideas, since they are old but as the new 3SUM results show, still extremely powerful. One idea is used repeatedly in this paper:
we shall refer to the ingenious observation that if and only if as Fredman’s Trick.
Michael also proved an amazing result, in my opinion, that is also used in the current paper.
Lemma: A list of numbers whose sorted order is one of permutations can be sorted with pairwise comparisons.
Here is their algorithm for 3SUM:
In their paper they prove the key facts: the algorithm is correct, and it can be implemented using Fredman’s and other ideas to have the claimed number of comparisons. What is also interesting is that such decision-tree type algorithms sometimes “cheat” by using the existence of the decision tree, but the cost of computing what the next comparison is can be very hard. This happens, for example, with the Knapsack Problem—see here. In their algorithm the cost of deciding the comparisons is order Very neat.
Allan Grønlund and Pettie’s beautiful result on 3SUM is evidence that there should be an actual algorithm. I remain optimistic, as always. Perhaps they will be able to prove that soon. In any case, the result is still wonderful.
Ellis Horowitz is one of the founders of the theory of algorithms. His thesis with George Collins in 1969 had the word “algorithm” in the title: “Algorithms for Symbolic Integration of Rational Functions.” He is known for many things, including an algorithm that after forty years is still the best known.
Today I want to talk about this algorithm, and one of the most annoying open problems in complexity theory.
We have just recently talked about the three body problem from physics, but now it is our turn in computer theory. Our version is called the 3SUM problem. The problem—I guess you probably know it already—is given three sets of integers, are there , , and so that
Suppose the sets each contain at most elements and the elements are bounded by a polynomial in , then there is a trivial algorithm that takes : just try all triples. Our friends at Wikipedia say
3SUM can be easily solved in time
This is unfair. The algorithm may be simple, but it is in my opinion very clever. It is due essentially to Ellis and his then PhD student Sartaj Sahni, who proved in 1974 a related algorithm for knapsack. More on knapsack in a moment. They called their result a “square root improvement.”
Here is the general idea. Build a table of all possible values and then check to see if is in the table. With some care this can be done in time . I once talked about this result—it is one of my favorite algorithms. For example it can be used to solve the knapsack problem in time . Recall this is the problem of finding that are so that
Here and are given integers. This is what Ellis and Sahni did in 1974.
In ancient times, before 3SUM was called 3SUM, I thought about how to improve the above algorithm. The idea of breaking things into three or more sets was natural. I worked pretty hard on trying to get a better method. I failed. Here is what I said in the previous discussion:
A natural problem is what happens with a set like ? Is there a way to tell if is in this set in time , for example? If this is possible, then the knapsack problem has an algorithm whose exponential term is . I have thought about this and related approaches quite often. I have yet to discover anything, I hope you have better luck.
Actually the usual version now is that we are given one set . Then we must determine whether there are three elements in so that
It is not hard to show that having one set instead of three is essentially no real change: just code the set name into the number.
There are lower bounds on the 3SUM problem in various restricted models of computation. Jeff Erickson showed a quadratic lower bound provided the elements are real numbers and the computation is a special type of decision tree. The result is interesting because it is an adversary argument where it is useful to use infinitesimals in the argument. See the paper for details.
Back to the 3SUM problem for integers. If the set has all elements less than in absolute value, then 3SUM can be computed in time
The trick is to cheat: use more than comparisons as in Erickson’s model.
Suppose that is a set of positive numbers less than —this is not needed but I hope makes the idea clearer. The problem is to see if contains an element from another set . Define the polynomial by
Note it’s degree is at most . Using the Fast Fourier transform we can compute in the time . The critical point is that is equal to where
and is equal to the number of in so that . But we are almost done. Just check each element from to see if its coefficient in is positive.
I wish I could report that the problem has been solved, that there is a sub-quadratic algorithm now for 3SUM. That we have finally beat Horowitz and Sahni. But that is not the case. We do have some results that are additional evidence that either the old algorithm is best, or that it will be hard to improve it.
The modern view is to make a failure to find an algorithm into a useful tool. Modern cryptography is based on this: factoring has been studied forever and no polynomial-time algorithm is known, so we purpose that it is hard and use it for the basis of crypto-systems. Think RSA.
The same as been done to 3SUM. Since in forty years we cannot beat quadratic, let’s assume that there is no sub-quadratic algorithm and use this as a tool to “prove” conditional lower bounds on problems. It’s lemons into lemonade.
Well to be careful there are better algorithms. The work of Ilya Baran, Erik Demaine, and the late Mihai Pǎtraşcu give slightly better times on RAM machines. These results are interesting, but do not get , for example. See their paper for details. Pǎtraşcu also showed a tight reduction from 3-SUM to listing 3-cliques.
There is other work on why the 3SUM problem is hard by Amir Abboud, Kevin Lewi, and Ryan Williams: in the paper: Losing Weight by Gaining Edges. They prove many interesting results; here is one that I especially like:
Theorem 1 For any , if -Clique can be solved in time , then -SUM can be solved in time .
Of course -SUM is the generalization to 3SUM where we seek numbers in a set that sum to zero.
Is there a sub-quadratic algorithm for 3SUM? I would love to see one. The problem seems hard, very hard. It has been open for forty years, and perhaps it will be open for another forty. Who knows. As always I believe that there could very well be a better algorithm—I am always optimistic about the existence of new clever algorithms.
]]>
Demons and other curiosities
Pierre-Simon Laplace was a French scientist, perhaps one of the greatest ever, French or otherwise. His work affected the way we look at both mathematics and physics, among other areas of science. He may be least known for his discussion of what we now call Laplace’s demon.
Today I want to talk about his demon, and whether predicting the future is possible.
Can we predict the past? Can we predict the present? Can we predict the future? Predicting the past and predicting the present sound a bit silly. The usual question is: Can we predict the future? Although I think predicting the past—if taken to mean “what happened in the past?”—is not so easy.
So can we see the future? I would argue that many can and do very well every day predicting the future. The huge profits of options traders and hedge funds must say something about prediction. They make a lot of money by knowing—at least with some reasonable probability—what the future price will likely be of a stock or commodity.
There are many other predictions that we make that are often correct. We can predict that the sun will rise at tomorrow in Atlanta at 6:54 am. This prediction works very well. The Weather Channel does a reasonable job of predicting the weather for later today, a less good job on tomorrow, and not so good on predicting the weather this calendar day next year.
The issue is not these predictions, but whether it is possible to predict the future exactly. Laplace in 1814 claimed that given the exact position and speed of all objects in the universe at some time a “demon” would be able to use the laws of physics to predict their positions at an arbitrary time in the future. This is now called Laplace’s demon. Translated, he said:
We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.
Since Laplace’s work there has been much discussion against the idea that even in principle there could be such a demon. Many attacks are now possible. Most are based on ideas and concepts that Laplace did not have available to him in 1814. Chaos theory has been advanced as one way to show hid demon is impossible, the random nature of quantum mechanics is another, and even the natural of computational complexity.
Laplace of course had no idea of the quantum nature of the world. So it is a bit unfair for us to attack him in this way. We could extend Laplace’s intent to a quantum world, noting that quantum mechanics is a deterministic theory even as it describes branching-off worlds. Not all conceivable branches are possible, and we could ask the demon to identify the excluded ones in advance.
We would assume the demon has perfect knowledge of the initial conditions of the universe or of any local Big-Bang event, thus distinguishing our setting from the more human-relevant one in this in-depth essay by Scott Aaronson. Still, it is easier to address Laplace’s argument in the kind of world where Newtonian mechanics holds sway, and where the demon could solve N-body problems exactly even with collisions.
Accordingly, some researchers have looked at the problem of prediction of the future in a Laplacian type world, where the future is deterministic. Not long ago, in 2008, David Wolpert used a riff on Cantor’s diagonalization argument to show that prediction machines could not exist. The latter is one of the reasons that I find the question relevant to theory. His theorem is here and is summarized here:
The theorem’s proof, similar to the results of Gödel’s incompleteness theorem and Turing’s halting problem, relies on a variant of the liar’s paradox—ask Laplace’s demon to predict the following yes/no fact about the future state of the universe: “Will the universe not be one in which your answer to this question is yes?”
Recently a short note, called a “Mathbit,” was published by in the Math Monthly by Josef Rukavicka. A Mathbit is always at most a single page and is set in a gray font style.
He claims an even shorter proof that Laplace’s demon is impossible—David’s is more formal and has precise definitions. Here is the main part of Rukavicka’s argument:
Suppose that there is a device that can predict the future. Ask that device what you will do in the evening. Without loss of generality, consider that there are only two options: (1) watch TV or (2) listen to the radio. After the device gives a response, for example, (1) watch TV, you instead listen to the radio on purpose. The device would, therefore, be wrong. No matter what the device says, we are free to choose the other option. This implies that Laplace’s demon cannot exist.
I have several reservations about this proof that there is no Laplace demon. For starters, it assumes a complexity type assumption: that the prediction of the future is fast. What if the prediction of time one day into the future took more than one day? Then of course the argument would fail. Of course this raises an interesting issue. Suppose to predict the future days into the future takes more that , then this is clearly not useful. However, even if the predictor takes only to do the prediction, the that is needed to get a useful prediction could be immense. What if the prediction took
This would clearly not allow Rukavicka’s argument to be meaningful.
Another basic issue that struck me is the choice of watching TV vs. radio. Rukavicka assumes implicitly in his argument that we have the free will to decide what to do. But this seems to be the essence of the whole issue. What if we cannot make this choice? We might listen to the predictor say “TV” and tomorrow we forget our contrarian intent to listen to the radio and watch TV anyway. What if we really do not have a choice? This seems to devolve into a circular argument—or am I missing something?
Well these two issues do take the argument back into the realm of Scott’s long essay.
What do you think? In a deterministic world could there be complexity results about predictions? Are these questions related to P=NP in some manner?
]]>
Avoiding actual infinities
Carl Gauss is of course beyond famous, but he had a view of infinity that was based on old ideas. He once wrote in a letter to Heinrich Schumacher in 1831:
so protestiere ich zuvörderst gegen den Gebrauch einer unendlichen Größe als einer vollendeten, welcher in der Mathematik niemals erlaubt ist. Das Unendliche ist nur eine façon de parler, indem man eigentlich von Grenzen spricht, denen gewisse Verhältnisse so nahe kommen als man will, während anderen ohne Einschränkung zu wachsen gestattet ist.
Today we want to show that the famous diagonal argument can be used without using infinite sets.
The issue that Gauss was troubled by was the notion of an actual infinite object. For those who do not read German—or French—Ken translates the above quotation as follows:
first of all I must protest against the use of an infinite magnitude as a completed quantity, which is never allowed in mathematics. The Infinite is just a mannner of speaking, in which one is really talking in terms of limits, which certain ratios may approach as close as one wishes, while others may be allowed to increase without restriction.
Georg Cantor famously created the theory of sets, the theory of actual infinite objects, and proved many amazing results about them. His proof that the subsets of the natural numbers are uncountable is a classic result, which we have talked about before and even in video.
Suppose that we reject the actual infinite. But we allow a process that goes on forever. This is like saying per Gauss that it is allowed to “increase without restriction.” This is certainly a reasonable position, and one that perhaps Gauss and those of his time would have accepted. Thus we will avoid actual infinite sets, and follow ancient mathematicians who felt that actual infinity was unreasonable, even dangerous. So let’s agree and look toward infinity as an unending process.
More precisely we will consider an infinite set as represented by a generator. From time to time, the generator outputs a natural number. They need not come in order, elements can be repeated, but they generate natural numbers. Thus we could imagine a generator that outputs
This puts out the even numbers—do you see the pattern?
The property that we fix on for a generator is that for every natural number , there is some such that eventually generates . Note that this implies that generates an infinite set of numbers, but we have avoided referring to this set as a “completed quantity.” Thus we are shying away from the common exposition of Cantor’s theorem in which one speaks that the rows of the table are infinite lists of numbers.
Suppose that we have a collection of generators of natural numbers. We claim that there is a Cantor-like diagonal argument that will construct a new generator so that is different from all the above generators. By different we mean that for each there is some number that generates and does not generate it. We will arrange that also generates an infinite collection of numbers. Thus the conclusion will be that is not equal to any of the generators . It is a new infinite generator that we left out of the collection
Let’s assume that all the generators we consider have the above property: for each , there is some that eventually generates . Thus we are throwing away feeble generators that only generate a finite number of elements.
The construction of is simple. We will at each stage arrange that we are different from by adding a certain element to .
Stage : Consider . Suppose the current largest element in is . Wait until outputs an . Then put into . This ends stage . Now move on to stage .
We claim that is not equal to any . This is clear, since at stage we skip an element that outputs. Since we never put elements into that are smaller than the current largest, there is no way to get into at some later stage. Also it is clear that outputs some element at each stage so it is an infinite process as required. This proves, as claimed, that the collection is incomplete.
I wonder about this generator argument. Could those who rejected infinite sets had found this argument years before the creation of set theory? Or does it need a notion of unbounded computational process that wasn’t talked about until the late 1800s anyway? What do you think?
Note, we could even allow the generators themselves to be generated one at a time; this would avoid any actual infinity.
]]>
And whose theorem is it anyway?
Georg Cantor, Felix Bernstein, and Ernst Schröder are each famous for many things. But together they are famous for stating, trying to prove, and proving, a basic theorem about the cardinality of sets. Actually the first person to prove it was none of them. Cantor had stated it in 1887 in a sentence beginning, “One has the theorem that…,” without fanfare or proof. Richard Dedekind later that year wrote a proof—importantly one avoiding appeal to the axiom of choice (AC)—but neither published it nor told Cantor, and it wasn’t revealed until 1908. Then in 1895 Cantor deduced it from a statement he couldn’t prove that turned out to be equivalent to AC. The next year Schröder wrote a proof but it was wrong. Schröder found a correct proof in 1897, but Bernstein, then a 19 year old student in Cantor’s seminar, independently found a proof, and perhaps most important, Cantor himself communicated it to the International Congress of Mathematicians that year.
Today I want to go over proofs of this theorem that were written in the 1990’s not the 1890’s.
Often Cantor or Schröder gets left out when naming it, but never Bernstein, and Dedekind never gets included. Steve Homer and Alan Selman say “Cantor-Bernstein Theorem” in their textbook, so let’s abbreviate it CBT. The theorem states:
Theorem: If there is an injective map from a set to and also an injective map from to , then the sets and have the same cardinality.
Recall that a map is injective provided implies that , surjective if its range is all of , and bijective if it is both injective and surjective. We can thank the great nonexistent French mathematician Nicolas Bourbaki for standardizing these terms, noting that sur is French for “on.” The definition of and having the same cardinality is that there exists a map that is bijective. All of this applies for sets of any cardinality, finite or infinite.
The key insight for me is to think about CBT as a theorem about directed bipartite graphs. This insight is due to Gyula König. He was a set theorist, but his son became a famous graph theorist. So perhaps this explains the insight. The ideas which follow are related to proofs in a 1994 paper by John Conway and Peter Doyle, which is also summarized here and relates to this 2000 note by Berthold Schweizer. We mentioned Doyle recently here.
A directed bipartite graph has two sets of disjoint vertices the left side and the right side. All edges go only from a left vertex to a right or from a right to a left.
In a directed graph every vertex has an out-degree and an in-degree: the former is the number of edges leaving the vertex and the latter is number of edges that enter the vertex.
In order to study CBT via graph theory we need to restrict our attention to a special type of directed bipartite graph. Say is an injective bipartite graph provided it is a directed bipartite graph with the following restrictions:
The claim is:
Theorem 2 Any injective directed bipartite graph has the same number of left and right vertices.
This theorem proves CBT. Let be the injective maps from and as above, where we assume that and are disjoint. Then define as a directed bipartite graph with as the left vertices and as the right. There is an edge from to and an edge from to . The graph is injective: property (1) follows since both and are functions, and property (2) follows since both and are injective.
From now on, let be an injective directed bipartite graph with left vertices and right vertices . Note that every path in must alternate left and right vertices, because is bipartite.
The key concept is that of a maximal path. A maximal path in is a path that cannot be extended on either end. One simple example of a maximal path is a cycle:
However, in an infinite graph there can be other maximal paths. One is a two-way infinite path
And the other is a one-way infinite path
Here there is no edge into . A simple but important observation is that two distinct maximal paths have no vertices in common.
A final basic observation is the following: Suppose that is partitioned into sets
and is also partitioned into
If for each index , there is a bijection from to , then and have the same cardinality. Let’s call this the partition trick.
I am trying to include even the simplest idea of the proof. Is this helpful, or am I being too detailed? You can skip the easy parts, but my experience is that people sometimes get hung up on the most basic steps of a proof. This is why I am including all the details.
Let’s prove the theorem for the case when the graph is finite.
Theorem 3 An injective directed finite bipartite graph has the same number of left and right vertices.
We had better be able to do this. We claim the following decomposition fact: The graph is the union of disjoint cycles. This follows since the only maximal paths in are cycles—this uses that is finite.
This proves the theorem, since each cycle has the same number of left vertices as right, and therefore each cycle has a bijection from left to right. By the partition trick the theorem is proved.
So far, pretty easy.
We now will prove the theorem in the case that is infinite. By the partition trick we need only show that there is a bijection on each maximal path. This is clear for cycles—it is the same as above even for a two-way infinite cycle: since it alternates left and right, we can just take to be the map that defines the bijection. The case of a one-way infinite path is just barely harder. Let
be such a path. If is a left node, then we define and to make the bijection go back and forth over the first edge in the path, then and similarly for the third edge, and so on. If is a right vertex, we instead get for even and for odd . Pretty easy too.
Even thought the proof is about any sets of any cardinality—as large as you like—the proof employs paths that are either finite or countable in length. This seems a bit strange—no? I have wondered whether this ability to work only with such paths can be exploited in some manner. I do not see how to use this fact. Oh well.
The countable case matters most in computability and complexity theory. John Myhill proved that if and are computable, then so is some bijection . This at first seems strange—how can you recognize you are in an infinite cycle?—but it works this way in finite stages : At any odd stage, let be the least number for which has not been defined. Let . If has not been listed as a value of at a previous stage, put . Else, we have already handled some such that . Then since we hadn’t handled , and since is injective. So put . If has not already been listed as a value of , then put . Else, we have already handled such that , and we repeat with …
Eventually we must exhaust the finitely many strings previously placed into the range of , and the first new string becomes . Since is a value of , inductively we are also preserving the property . Next, however, we need to do an even stage. Then we call the least number not already placed into the range of , and consider . Then . If is not already in the domain of , we define . Else, we have already handled some such that , and inductively , so we repeat with . Eventually we obtain not already in the domain of , and define . This entire process computably defines both and in “zipper” fashion for any or , yielding a computable bijection of that induces an isomorphism between and .
In complexity theory, we want to be polynomial-time computable if and are. This is true provided and are length increasing as well as polynomial-time invertible, but not by the same algorithm—given an we can’t go back and do all previous stages in polynomial time. Instead, the algorithm found by Leonard Berman and Juris Hartmanis alternates trying
as far as possible, which must stop within length-of- steps because the length is always decreasing. If fails first then , else .
This is the second puzzle: why do the ideas have to be so different? Is there a common formulation that might be used with other levels and kinds of complexity? The Berman-Hartmanis proof does resemble the one-way-infinite path case more than it does Myhill’s proof.
Does this help in understanding the proof? There are many proofs of CBT on the web, perhaps this is a better version. Take a look.
[fixed "domain"->"range" in one place in Myhill proof; worked around WordPress bug with length-of-x for |x|.]
Karl Sundman was a Finnish mathematician who solved a major open problem in 1906. His solution would have been regarded as paradigm-“shifty” had it been a century or so earlier. It employed power series to represent the solutions of equations, namely the equations of the 3-body problem famously left unsolved by Isaac Newton. The tools of analysis needed to regard a convergent power series as defining a valid real number had been well accepted for a century, and the explicit giving of series and convergence proof even met the demands of mathematical constructivism of his day.
Today Ken and I want to explain why that problem is nevertheless still considered open, even though Sundman solved it over a hundred years ago.
This is a cautionary tale: For some problems there are solutions and there are solutions. There are solutions that make you famous and there are solutions that no one cares about. Unfortunately for Sundman his solution, which is correct mathematically, is the latter type; hence, his lack of fame—did you know about him? He did win honors from the top French and Swedish academies of science for this and other work in mechanics, and he has a Moon crater and an asteroid named for him.
His solution was generalized to any finite number of bodies in 1991 by Qiudong Wang of the University of Arizona, albeit with more restrictions on collisions and singularities than Sundman needed. All of this including Wang’s “beautiful paper” is surveyed in a great 1996 article for the Mathematical Intelligencer by Florin Diacu. We try to add a little more to what Diacu says from the perspective of our paradigms in computational complexity.
There is another aspect to this tale: Sometimes impossibility theorems are not ironclad. Sundman solved a problem that had previously been shown to be “impossible.” This is another lesson for us—beware claims that something is impossible. It may not be.
Perhaps his solution failed for two reasons. It went against what was thought to be impossible, and it was useless. That is, the solution, in modern terms, did not yield a practical algorithm.
Ken and I like this tale, even though it is about an old result, a non-theory problem, because it sheds light onto problems that are dear to us in complexity theory. We have talked many times before about galactic algorithms—recall these are algorithms that are useless in practice. We have also talked before about impossibility results—we believe that some of these may not be ironclad. So what happened to Sundman over one hundred years ago is relevant to complexity theory today.
One more word about the following tale: Ken and I are not even amateurs in Sundman’s area of research so be careful to check what we say—another caution.
The problem Sundman solved concerns the motion of three bodies under the sole force of gravity. In more detail, the problem is taking an initial set of data that specifies the positions, masses and velocities of three bodies for some particular point in time, and then determining the motions of the three bodies at a given future time. All of this must be in accord with the laws of Newtonian mechanics; relativity need not enter. It is a important problem: think sun, earth, and moon. The one-body problem is easy, while much of the two-body problem was solved by Johannes Kepler before Newton.
Simulating a system with bodies is not the issue. Subject to limitations of numerical precision and computing power, it is straightforward to apply the attractions and momenta of the bodies at each instant of time to calculate their positions at the next. Animations of particular -body cases are readily available, see here for instance. The issue is representing the positions after steps by a formula of , namely a “closed-form solution.”
Getting a closed-form solution is what we commonly mean by “solving” parameteric equations. At least that’s what Newton meant by solving when he posed the problem, and what others meant by offering prizes for . Newton could not solve it. Nor could anyone else. Let’s look and see why the great Newton failed—what was the roadblock that left the problem open?
The obvious way to solve any problem is to look at the number of quantities that one needs to solve for. Then to get enough equations to allow a solution to be found. In linear systems this is easy: if there are unknowns, then in general one needs to have at least linear equations. Of course we may need more if the equations are not independent. The precise theorem is well known: To solve for quantities we need independent linear equations.
This can roughly be generalized to non-linear systems, like the three body problem. A natural approach is to try and find additional equations that can be added to the basic equations of motion. If enough equations can be found, even though the system is nonlinear, then it could be solved. This was tried for years by some of the best mathematicians of the 19th century, and later rigorized in analytic geometry as we know it today.
Finally, in 1887 Ernst Bruns and Henri Poincaré showed that there were not enough extra equations. More precisely, there were not enough extra conserved quantities that would yield enough equations. This leads to the “obvious” conclusion that there could be no analytic solution. For bodies there are quantities and only equations that meet a natural constraint of defining quantities called “integrals”—that is, constants of motion that are algebraic functions of positions and velocities alone, or integrals of such functions. Thus in the three-body case with this yields variables and only equations. The theorem is:
Theorem: There are only ten integrals, algebraic with respect to the time, position, and the velocities only of the bodies.
This is the “impossibility” result for the three-body problem. This barrier seems definitive because it concerns equations, and so trumps any issue of how real numbers are represented.
Yet Sundman was able to solve the three-body problem, by a completely different method. This shows that care must be used when discussing impossibility theorems. Bruns and Poincaré had really shown only that one kind of natural approach must fail. In modern terms this would be like showing that there is no fast algorithm of a special type for a certain problem. They did not rule out all possible algorithms. This why Sundman’s result is not in contradiction with the above impossibility theorem.
We are not the best place too explain what Sundman did. For more details, see Diacu’s article, this 1995 MAA Chauvenet Prize survey by Donald Saari involving complex variables, and also this recent term paper by Peter Senchyna, who is currently a physics undergraduate at the University of Washington.
Roughly speaking, Sundman used an analytic method to get a power series for the three body problem. The key issue then was, does the series converge for all time in the future? The problem is that the series will not converge for initial data that yields singularities. Sundaman used some nontrivial results to show that for generic data he could avoid singularities. The trouble arises from the fact that he uses several transformations that make the series converge slowly. That is, getting the value to any useful precision requires so many terms, that his solution is of little practical use. Or is it?
So why is Sundman’s result not famous? The answer seems to be that it is a galactic algorithm. He solves the problem by finding a convergent series that correctly predicts where the bodies will be in the future, but the convergence is galactically glacial. Our friends at Wikipedia state that to get a reasonable accuracy one would need to sum terms. That is obviously way too many.
The point we want to make is simple:
Who cares if the series converges slowly, there may be a fast way to get the same result…?
I wish we could show that Sundman series could be sped up tremendously. We cannot. But we can show that there are many examples of a slow series that looks useless, but can be summed quickly. For instance, so consider the sum
To get this to an error of say would require the summation of an astronomical number of terms. So should we give up? Of course not.
Leonhard Euler in 1735 proved the amazing result that
This means that your laptop can almost instantly compute the sum to the required accuracy not by summing the series but by computing . This is easy to do.
Getting the closed form for this series was called the Basel problem. It was first stated in 1644 by Pietro Mengoli, and worked on many top mathematicians. It became a famous open problem of the day. Euler’s solution was a shock that made him immediately famous.
Of course the three-body series may not be able to be sped up in this way. But maybe it can. It might even be possible to transform it analytically into another series for which any decimal place of the answer can be computed locally, in the manner we have covered of “spigot algorithms” for . This leads to two further observations:
Still, the naive remark that the series itself converges too slowly may be no barrier at all. There are many other examples of slow series that can be computed via various tricks. Perhaps the three-body case is one of these. The slow convergence is not an ironclad impossibility theorem—not even close.
The obvious problem is to look at the series for the three body problem and see if we can apply methods to speed it up. Can we use any theory tricks to make the calculation go exponentially faster?
Jennifer Chayes is the current director of a research lab in Cambridge—that is Cambridge Massachusetts—for a company called Microsoft. She is famous for her own work in many areas of theory, including phase transitions of complex systems. She is also famous for her ability to create and manage research groups, which is a rare and wonderful skill.
Today Ken and I wish to talk about how to be “shifty” in algorithm design. There is nothing underhanded, but it’s a different playing field from what we grew up with.
Ken first noted the paradigm shift in 1984 when hearing Leslie Valiant’s British unveiling of his “Probably Approximately Correct” learning theory at the Royal Society in London. Last year Leslie wrote a book with this title. That’s about learning, while here we want to discuss the effect on algorithm design.
We illustrate this for a paper (ArXiv version) authored by Chayes and Christian Borgs, Michael Brautbar, and Shang-Hua Teng. It is titled, “Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation.” Since PageRank is a vital application, they don’t want their algorithms to be galactic. They trade off by not having the algorithms “solve” the problems the way we used to.
I, Dick, recall the “good old days of theory.” When I first started working in theory—a sort of double meaning—I could only use deterministic methods. I needed to get the exact answer, no approximations. I had to solve the problem that I was given—no changing the problem. Well sometimes I did that, but mostly I had to solve the problem that was presented to me.
In the good old days of theory, we got a problem, we worked on it, and sometimes we solved it. Nothing shifty, no changing the problem or modifying the goal. I actually like today better than the “good old days,” so I do not romanticize them.
One way to explain the notion of the good old days is to quote from a Monty Python skit about four Yorkshiremen talking about the good old days. We pick it up a few lines after one of them says, “I was happier then and I had nothin’. We used to live in this tiny old house with great big holes in the roof.” …
: You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t’ mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi’ his belt.
: Luxury. We used to have to get out of the lake at six o’clock in the morning, clean the lake, eat a handful of ‘ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!
: Well, of course, we had it tough. We used to ‘ave to get up out of shoebox at twelve o’clock at night and lick road clean wit’ tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit’ bread knife.
: Right. I had to get up in the morning at ten o’clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.
: And you try and tell the young people of today that ….. they won’t believe you.
: They won’t!
Those were the days. I did feel like I sometimes worked twenty-nine hours a day, I was paid by Yale, but so little that perhaps it felt like I paid them. Never had to drink a cup of sulphuric acid—but the coffee—oh you get the idea.
Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful.
The brilliance of the current approach is that we can change the problem. There are at least two major ways to do this:
Change the answer required. Allow approximation, or allow a partial answer. Do not insist on an exact answer.
Change the algorithmic method. Allow algorithms that can be wrong, or allow algorithms that use randomness. Do not insist that the algorithm is a perfect deterministic one.
This is exactly what Chayes and her co-authors have done. So let’s take a look at what they do in their paper.
In their paper they study PageRank, which is the definition and algorithm made famous by Google. It gives a way to rank webpages in response to a query that supplements criteria from the query itself. An old query-specific criterion was the number of matches to a keyword in the query. Rather than rank solely by this count, PageRank emphasizes a general page score. The score is sometimes interpreted as a measure of “popularity” or “authority,” leading to the following circular-seeming definitions:
A webpage is popular if it has a healthy number of links from popular pages.
A webpage is authoritative if it is well cited, especially by other authoritative pages.
What the PageRank score actually denotes mathematically is the likelihood that a person randomly traversing links will arrive at any particular page. This includes a frequency with which the person will stop clicking, do something healthier like ride a bicycle, and start again on a “random” webpage.
The situation can be modeled by the classic random walk on a directed graph. We have a graph on nodes and an matrix that is row-stochastic, meaning the entries in each row are non-negative and sum to . Given that the web-walker is at node , the entry is the probability of going next to node . If node has out-degree , then
We can tweak this e.g. by modeling the user hitting the “Back” button on the browser, or jumping to another browser tab, or using a search engine. We could also set higher in case page has few or no outgoing links. We still get an , and since the use of effectively makes the graph strongly connected and averts certain pathologies, we get a beautiful conclusion from random-walk theory: There is a unique stationary distribution , which is the unique left-eigenvector for the largest eigenvalue, which as normalized above is :
Then the PageRank of node is . It is remarkable that this simple, salient idea from the good old days works so well. A further fact from the theory (and use of ) is that if you start at any node, in the long run you will find yourself on page with frequency . Here is Wikipedia’s graphical example:
The issue is: how to compute ? In the good old days this was a trivial problem—just use linear algebra. But now the issue is that is really big, let alone being unspeakably big. The is too big even to get an approximation via the “further fact,” that is by simulating a random walk on the whole graph, and classical sparse-matrix methods might only help a little. This is where Chayes and company change the game: let us care about computing only for some ‘s, and even then, let us be content with fairly rough approximation.
The approximation to PageRank is called SignificantPageRank. The paper gives a randomized algorithm that solves the following problem.
Let us be given a graph. Then, given a target threshold and an approximation factor , we are asked to output a set of nodes such that with high probability, contains all nodes of PageRank at least , and no node of PageRank smaller than .
This is a perfect example of the shift. The algorithm is random, and the problem is to find not the nodes with a given PageRank, but those that are not too far away.
The nifty point is that the algorithm can tolerate fuzzing the matrix , in a manner called SARA for “sparse and approximate row access”:
Given and , return a set of columns and values such that for all :
- , and
- .
It is important to use this for different values of . The cost of a query is .
If we picture “” as “exponential” and take where , then this becomes an approximative version of being succinct, which we just talked about. In this scaling of we are effectively limiting to a local portion of the graph around node . Since we also have , under SARA entries outside would become effectively zero, so that the chance of “teleporting” outside on the whole would be regarded as negligible. In fact the paper also researches the case where each Web user always starts afresh at a “home node” in that portion, making just for that user. Then the -related probability is not negligible, and the resulting user-dependent estimate is called PersonalizedPageRank.
The problem they need to solve for SignificantPageRank then becomes “SignificantColumnSums”:
Given and as above, find a set of columns such that for all columns :
- ;
- .
An even simpler problem which they use as a stepping-stone is “VectorSum”:
Given a length- vector with entries in , and and :
- output yes if ;
- output no if , don’t-care otherwise.
The goal is always to avoid looking at all nodes or entries, but only an or so portion of them, where . Thus the problem shift is necessitated by being huge. This isn’t my good-old-days idea of solving a problem, but can be called “-solving” it.
Ken and I are interested because these are similar to problems we have been encountering in our quest for more cases where quantum algorithms can be simulated in classical randomized polynomial time. Thus any new ideas are appreciated, and what catches our eye is a multi-level approximation scheme that exploits the requirement of SARA to work for different . The situations are different, but we hope to adapt them.
The situation for VectorSum is that a probe still costs order-of , and returns unless . A simple-minded use of a set of random probes for the same would yield the estimate
The resulting error has order , so we need which is rather demanding. Indeed the total cost would have order , where needs to be so large as to kill any hope of making the cost or even . In the pivotal case where , we would need , incurring cost on the order of .
However, they show that by using a different precision for each random probe, they can get acceptable error with a reasonably small number of probes. The case where we have occurs only once, so its cost can be tolerated. Other probes have smaller cost, and while their precisions are looser, the aggregate precision on the estimate becomes good enough for the following result:
Theorem 1 Given as above and , VectorSum can be -solved with probability at least and cost
Well in the good old days before LaTeX we wouldn’t even have been easily able to write such a formula with a typewriter, let alone prove it. But it is certainly better than , and allows taking to meet the goal of runtime . As usual, for the details on SignificantColumnSums and the application problems, see the paper.
Do you miss the good old days? Or do you like the current approaches? What shifts, what new types of changing the goals, might we see in the future? For clearly today will one day be the “good old days.”
[changed qualifier on "certain pathologies", may as well footnote here that the "Back" button creates a Markov Chain with memory; changed intro to say British unveiling" of PAC learning.]