Cropped from source |
Dick Lipton is of course the founder and driving writer of this weblog. He is also a computer scientist with a great record of pathbreaking research. The latter has just been recognized, I am delighted and proud to say, with the award of the 2014 Knuth Prize. The prize is awarded jointly by the ACM Special Interest Group on Algorithms and Computation Theory and the IEEE Technical Committee on the Mathematical Foundations of Computing, and was instituted in 1996, shortly after the formal retirement of the great—and very much active—Donald Knuth.
Today I am glad to give my congratulations in public, and also my thanks for a wonderful and long association.
Only in reading around while writing this post did I learn that Dick won another major honor earlier this year: election to the American Academy of Arts and Sciences. The end of April was crazy with personal and unusual professional matters for both of us, and it didn’t come up. In the 2014 list for CS he is alongside Jennifer Chayes, Ron Fagin, David Harel, Daphne Koller, Leslie Lamport, Leonid Levin, and Mendel Rosenblum.
The Knuth Prize is “for outstanding contributions to the foundations of computer science.” The description also says it may be partly based on educational contributions such as textbooks and students, but “is not based on service work for the community, although service might be included in the citation for a winner if it is appropriate.” The ACM release does not mention this blog. So just for now I will imitate Basil Fawlty’s advice in a famous episode of the immortal 1970’s BBC TV sitcom “Fawlty Towers”:
Don’t mention the blog.
But I will mention foundations, since that is an aspect of the recognition that comes with a price but is a longstanding purpose of this—wait, don’t mention the…
The release includes some of Dick’s major contributions, and some others have been noted on StackExchange. I can add the first work by Dick that I was aware of: two papers in 1979–80 with Rich DeMillo—also now at Georgia Tech—which were part of various efforts to answer questions about via logic and model theory. There is also his involvement with time-space tradeoffs for , which represent the best lower bounds—well, in a broad sense, kind-of lower bounds—known for any major -complete problem. Most recent and potentially important in my purview—but you might have to comb through his DLBP page to realize they’re there—are papers on multivariable polynomials modulo composite numbers, which are at the frontier of circuit complexity lower bounds.
What generates all this is a powerful beacon of ideas that have had a photoelectric effect on many parts of computer science, more than a typical theoretician encounters. I should know—it has been my privilege to do radiographic processing for many more, even some that haven’t yet appeared on this—oops—anyway, more ideas than I can often keep up with. Not to mention that I’m also pursuing (and opening) some ideas of my own. The point is that these ideas are almost all aimed at foundations. The examples above are aimed at foundations of complexity theory. That goes double for some ideas that haven’t progressed well, at least not yet.
The problem—with the problem and much more—is that the foundations of complexity are set in sturdy concrete. They basically haven’t budged. Beams of ideas bounce off or just get absorbed. Major related parts of algorithms and crypto and other fields are likewise shielded. This is polar the opposite situation of what Bill Thurston recounted about his field of manifold foliations in his famous essay, “On Proof and Progress in Mathematics.”
The difficulty of direct progress has been so discouraging that through the theory applications grapevine I’ve heard whispers that one can phrase as, “Don’t mention foundations…” We tried to present aspects of this more positively in a recent post featuring Chayes.
However, if you aim a beam hot enough and keep it focused for long enough, events do happen. The question should not be whether it’s worth aiming the beam, but rather what’s needed to establish and maintain the intensity and focus. What we suspect is that just as in particle physics, a century on from when the likes of Joseph Thomson and Ernest Rutherford and Robert Millikan could work alone or with one associate or two, progress will require greater co-ordination of multiple researchers and shared focus on ideas than our field is yet accustomed to. Hence this…well, we can mention that various people we know and admire are trying to foster this in their own ways, not just proposed but some activated to a wonderful degree.
As I said, aiming at foundations has a price—and the payment is alas more evenly distributed among those who try than the rewards. That’s why it is good to remember that ever so often, in small as well as big, a prize comes with the price. The above, too, has an added price, but perhaps it will be ushered in with a new model for appreciating the credits.
The Knuth Prize has the unusual period of being awarded every 1-1/2 years officially. The last five awards are dated 2010, 2011, 2012, 2013, and now 2014. Can this seeming contradiction be resolved mathematically? Perhaps the “1-1/2″ is governed by left-and-right approximands in the manner of John Conway’s surreal numbers, which Knuth described in an engaging book. A left-bound of 1.25 would just barely allow it. However, I suspect that any such involvement by Knuth would have instead fixed the period at days, and then I don’t see a solution…unless maybe arithmetic is inconsistent after all.
Of course we congratulate Dick without any inconsistency.
]]>
Things we did not know
Ulam holding a strange device |
Stanislaw Ulam was a Polish-American mathematician whose work spanned many areas of both continuous and discrete mathematics. He did pioneering research in chaos theory and Monte Carlo algorithms, and also invented the concept of a measurable cardinal in set theory. His essential modification of Edward Teller’s original H-bomb design is used by nearly all the world’s thermonuclear weapons, while he co-originated the Graph Reconstruction conjecture. His name is also associated with the equally notorious 3n+1 conjecture. Thus he was involved in some strange corners of math.
Today Ken and I want to talk about some strange facts observed by Ulam and others that we did not know or fully appreciate.
Perhaps you can use them, perhaps you may enjoy them, but they are all kind of fun. At least we think so. Ulam’s autobiography Adventures of a Mathematician shows his sense of fun, and he was described by Gian-Carlo Rota in these words:
Ulam’s mind is a repository of thousands of stories, tales, jokes, epigrams, remarks, puzzles, tongue-twisters, footnotes, conclusions, slogans, formulas, diagrams, quotations, limericks, summaries, quips, epitaphs, and headlines. [H]e simply pulls out of his mind the fifty-odd relevant items, and presents them in linear succession. A second-order memory prevents him from repeating himself too often before the same public.
There is another reason for discussing this today. Class is underway and we both are trying to get things going and we wanted to start a discussion out. We are working on a longer, more technical one, about—“that would be telling”—so let us do this now. Note: fifty extra points for where that phrase comes from—with a search.
Even before we come to Ulam’s famous observation about the pattern of prime numbers in a spiral, some things strike us as strange about spirals. Here we mean the simple spiral walk pattern starting from an origin square in the infinite planar lattice:
We can begin with any number in place of —thus if we begin with this is the same as adding to all the numbers. One basic fact is that if we take any ray from the origin that goes through the center of another cell, then the th number skewered along that ray is given by a quadratic function of . For instance, the horizontal ray is given by , the northeast ray by , and the ray of north-northeast knight’s moves by . Similar rays originating from other squares also have quadratic formulas.
If we sequence the numbers that fall on full lines rather than rays, however, we do not get a quadratic function. For instance, the numbers that fall on the horizontal axis in order are They do not fit any polynomial formula at all—one has to do integer division by 2 (throwing away remainders) to get a formula. Indeed, this sequence doesn’t have an entry in OEIS, the Online Encyclopedia of Integer Sequences. [Update: see this.] Likewise, the northwest-southeast line has all the odd squares but no nice formula. Nor does the line of knight moves .
But there is a singular exception. The southwest-northeast diagonal gives , which has the formula . Why does this happen?
If we add to this sequence, we get:
Each of these numbers is prime. The sequence is all prime until you substitute to get , which is . Of course, this is the famous prime-generating formula of Leonhard Euler. The extreme good fortune of this formula has been “explained” using class-number theory, but we did not notice the strange diagonal exception until writing this post.
The story goes that Ulam became bored during a long lecture in 1963, doodled a number spiral like the one pictured above, and circled the primes to see what the plot would look like. He was struck by how many substantially long diagonal segments of primes there were, going northwest as well as northeast.
This also happens if we start the spiral with numbers other than . Here is a plot for from Helgi Rudd’s beautiful website devoted to Ulam’s spiral:
Of course, gives Euler’s sequence as a huge diagonal swath through the origin, but as Wikipedia’s article notes, the effects of Euler’s formula show up on other diagonals with larger values of even with .
Despite connections noted in both sources to earlier theorems and conjectures about quadratic polynomials, it is still considered that this affinity for diagonals has not been sufficiently “explained.” The larger picture that strikes us the most is given immediately on the front page of Rudd’s site. Many unsolved problems in number theory, including the Riemann Hypothesis itself, involve the idea of how much the primes behave like a “random” sequence. This supports belief in the Goldbach Conjecture and the Twin Primes Conjecture, with probabilistic reasoning like Freeman Dyson’s in our previous post.
Thus the prime pictures exhibit some of their non-random structure, along lines of more-extreme deficiencies shown by George Marsaglia for some 1960s-vintage pseudorandom generators. How the primes can be random and non-random at the same time relates to that other post we’re working on, but “we’re not telling.”
In 1947, Nathan Fine proved that almost all binomial coefficients are even. This seems strange to me. More precisely let be the number of odd binomial coefficients with . Then . Even stronger:
See this for some related facts.
This one comes from here.
Let’s say you invest $10 in the market and you make a 10 percent return. You now have $11. Now, let’s say you lose 10 percent. Out of $11, that’s $1.10 leaving you with $9.90 which means you are down ten cents on the deal. You gained the same percentage as you lost yet you came out behind.
Well, you might speculate it has to do with the order of the transaction. After all, the 10 percent you lost was bigger than the 10 percent you gained because you were already up on the deal. That means reversing the order should have the opposite effect. Right?
Start with $10. Now lose 10 percent first. You have nine dollars. Then gain ten percent. That’s 90 cents leaving you with…$9.90.
Yep. You lost money again.
Strange as it may seem, a gain and a loss of the exact same percentage will always leave you with less cash – regardless of the order in which they occur.
I wonder if we can use this simple but strange fact in some algorithm analysis?
Suppose you take the average of the numbers of friends each of your friends has, and compare that with your own number of friends. Chances are will be significantly higher than . This leaves a lot of people wondering, why are my friends more popular, more vivacious, more successful, than I am?
This phenomenon has been well documented on Facebook and other contexts where “friend” is rigorously quantified. It is ascribed to a 1991 paper (titled as above) by the sociologist Scott Feld, but Ken and I would be surprised if it hadn’t already been noted in connection with the collaboration graph of mathematicians. Ken recalls a talk given by Donald Knuth on properties of this graph 30 years ago at Oxford, a few years after a paper by Tom Odda that also cites studies by Ron Graham and others, but has no recollection of this being mentioned.
It is easy to explain in those terms. Think of Paul Erdős, who famously had hundreds of collaborators. Many people were touched by him. But each of them had averages that included Erdős, which alone was probably enough to boost over their own valence in the graph. More generally, let us assign to every node a “collaborativeness potential” , and consider various ways to generate random graphs in which the probability of an edge depends on and . For instance, every node might pick a set of 100 potential co-authors at random, and the edge is placed with probability . Potential connections within will have a higher chance of succeeding with those having , who in turn will expect to have more successful connections.
The result is shown with more rigor by Steven Strogatz in this New York Times column. Still, we find it strange as a raw fact of life.
What is your favorite strange fact?
[fixed to for the binomial coefficients; added update about x-axis of Ulam spiral]
Freeman Dyson celebrated his birthday last December. He is world famous for his work in both physics and mathematics. Dyson has proved, in work that was joint with Andrew Lenard and independent of two others, that the main reason a stack of children’s blocks doesn’t coalesce into pulp is the exclusion principle of quantum mechanics opposing gravity. He shaved a factor of off the exponent for bounds on rational approximation of algebraic irrationals, before the result was given its best-known form by Klaus Roth. He has received many honors—recently, in 2012, he was awarded the Henri Poincaré Prize at the meeting of the International Mathematical Physics Congress.
Today Ken and I want to talk about one of his relatively recent ideas, which is more mathematics than physics. Perhaps even more theory than mathematics.
It is about an interesting challenge he made, and its significance for knowledge in general. In his popular books on science and public policy, Dyson has issued many challenges to the scientific community and society at large, daring to “disturb the universe” as one of his books is titled. In this challenge he disturbs mathematics—at least we feel that way. Let’s look at it now.
Dyson was one of over a hundred invited respondents to the 2005 edition of the “EDGE Annual Question”:
“What Do You Believe Is True Even Though You Cannot Prove It?”
His son, science historian George Dyson—two of whose books we covered—was also a respondent, and gave a social example that would have interested Ken and his wife Debbie on their trip to Vancouver this July. George suspects that well-documented differences between calls of ravens in different parts of the Pacific Northwest influenced First Nations languages in these regions. His father, however, kept it strictly mathematical, aiming for the most needling contrast between what we believe and what we can prove.
Ken may have his own answer, mine would have been more like: Fermat’s Last Theorem or the Four Color Theorem, since in both cases I believe the results, but I cannot prove them.
Dyson answered as follows:
“Since I am a mathematician, I give a precise answer to this question. Thanks to Kurt Gödel, we know that there are true mathematical statements that cannot be proved. But I want a little more than this. I want a statement that is true, unprovable, and simple enough to be understood by people who are not mathematicians. Here it is.
Numbers that are exact powers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Numbers that are exact powers of five are 5, 25, 125, 625 and so on. Given any number such as 131072 (which happens to be a power of two), the reverse of it is 270131, with the same digits taken in the opposite order. Now my statement is: it never happens that the reverse of a power of two is a power of five.
The digits in a big power of two seem to occur in a random way without any regular pattern. If it ever happened that the reverse of a power of two was a power of five, this would be an unlikely accident, and the chance of it happening grows rapidly smaller as the numbers grow bigger. If we assume that the digits occur at random, then the chance of the accident happening for any power of two greater than a billion is less than one in a billion. It is easy to check that it does not happen for powers of two smaller than a billion. So the chance that it ever happens at all is less than one in a billion. That is why I believe the statement is true.
But the assumption that digits in a big power of two occur at random also implies that the statement is unprovable. Any proof of the statement would have to be based on some non-random property of the digits. The assumption of randomness means that the statement is true just because the odds are in its favor. It cannot be proved because there is no deep mathematical reason why it has to be true. (Note for experts: this argument does not work if we use powers of three instead of powers of five. In that case the statement is easy to prove because the reverse of a number divisible by three is also divisible by three. Divisibility by three happens to be a non-random property of the digits).
It is easy to find other examples of statements that are likely to be true but unprovable. The essential trick is to find an infinite sequence of events, each of which might happen by accident, but with a small total probability for even one of them happening. Then the statement that none of the events ever happens is probably true but cannot be proved.”
See this and this for some discussion about his answer. Of course there is the general issue of his “other examples,” but can we tackle this particular one head-on?
One can easily try numbers of the form for modest size to check that Dyson’s intuition is correct. We wonder if it is possible to at least check his intuition for an of size ?
Our challenge is:
Is there an efficient algorithm that given an determines whether the reversal of as a decimal number is a power of ?
Of course we want the algorithm to run in time polynomial in the length of . This would at least allow us to check Dyson’s intuition for extremely large numbers. Not all large numbers, but any particular large number. This seems like a plausible challenge.
Indeed we will now sketch an attack on how one might do this efficiently. The sketch is not a proof—we have not had the time to work all the details but we believe that it might be made into a real algorithm.
Of course, being able to check the conjecture for exponentially wider ranges is not the same as proving it for all . But making the machinery more efficient is a good way to understand the problem. We can try to work in some related ideas, however vague. One is encoding into matrices. There the reversal would perhaps be just the transpose. Another is that the reversal of a regular language remains regular: Given a deterministic finite automaton recognizing , one can create an NFA by adding a new start state that nondeterministically transits to some final state of , reversing each arrow of , and declaring ‘s start state the new final state. This NFA can then be converted back to a DFA. The languages of powers of or in decimal are not regular, but they are sparse enough that the idea might still help.
Let be the reversal of the number when written in decimal. Thus,
The reversal operator is nasty, non-linear, and hard to understand. But there is hope.
Suppose that is a number that we wish to check to see if for some . The idea is two step:
Several comments are in order about this potential algorithm. Clearly, since is just a few digits computing is easy given . Another point is that if is “random,” then there is a high probability that will not be a possible pattern for the low-order digits of a power of . This depends on the key fact that powers of have very constrained decimal patterns for their low-order digits. Of course all powers of end in , but much more is true. Here is a picture of how they behave:
Note: these cycles come in lengths of powers of two—see here for details. The critical point is that the cycles grow exponentially slower than the number of decimal digit patterns. So that if we can compute the top digits of and they are random we are likely to able to show that is not a power of without looking at many digits. This is an important, but simple, insight. Note, of the five lowest order digits only eight out of patterns are possible.
Next we need to show that we can compute the high order digits of when it is written in decimal. Clearly,
where the ‘s are decimal digits and is nonzero. Let’s try and obtain just the lead digit . The simple approach is to use the fact that we can compute the logarithm of to polynomial bits of precision in polynomial time. The idea is then to take logs base ten of and try to get . This almost works, but numbers like
could make the precision required more than polynomial. We believe that we can make this work, by using the following idea. Use logarithms to tell from and so on. The point is that this should be able to avoid the above problem.
Whether this can be made to work we will leave for the future.
Is Dyson right? Can we make our “algorithm” work?
]]>
See a number, make a set
Henning Bruhn and Oliver Schaudt are mathematicians or computer scientists, or both. They are currently working in Germany, but wrote their survey on the Frankl Conjecture (FC) while working together in Paris. Schaudt is also known as an inventor of successful mathematical games.
Today Ken and I wish to talk about the Frankl conjecture and a principle of mathematics.
Before we start let’s be up-front: The FC is a mathematical disease. Ken and I are trying to avoid catching a full blown case—well Ken also has a lot on his plate in the chess world. We have some small insights to share—perhaps doing this will help someone solve the problem, or at least advance the understanding of it. In any case talking about it may keep us from being completely infected.
We covered the conjecture last spring. As Bruhn and Schaudt’s survey relates, FC probably pre-dates Péter Frankl’s 1979 formulation of it while he was in-transit from Paris to Montreal, but all agree that Frankl was the last to discover it. It reads:
If a family of subsets of a finite set is closed under union, then there is an element of that belongs to at least half of the sets in the family.
One may suppose that the always belongs to the union-closed family. The “half” is tight, as follows on considering the power set of as the family.
The survey covers several equivalent forms of the FC, but not the simple re-statement involving the following “Frankl Property” (FP) for Boolean functions on :
The Frankl Conjecture (FC) property then reads: there is an such that at least half of the satisfying assignments have . The conjecture itself then states:
If has FP, then it satisfies FC.
The survey actually emphasizes the dual form, which in our terms involves the function , where is the bit-flip of . FP becomes , which means that the satisfying assignments of the flipped function correspond to an intersection-closed family of sets. The FC then states that there is an such that at most half the satisfying assignments of have . Just to be different, we stay with the union-closed form, though in fact Frankl stated the other. Then the survey’s equivalent statements include:
We say more about lattices below–the nifty fact is that the Frankl Property makes the satisfying assignments (plus ) into a lattice. The graph statement actually comes from a paper by Bruhn and Schaudt with Pierre Charbit and Jan Arne Telle, the last of whom hosted Ken for lectures in Bergen, Norway, two years ago. Since every vertex cover must include at least one vertex from every edge, at least one vertex from every edge must belong to half or more of the minimal covers. Extending this reasoning shows that every odd cycle has a popular edge, so the statement is problematic only for bipartite graphs.
The survey authors say the lattice statement strips FC “down to its bare essential parts: the elements have vanished and all that counts is the inclusion relationship between the sets.” Indeed the sets seem to vanish too. We have the opposite idea: how can we make the structure surrounding the conjecture richer?
We are complexity theorists, so our hammer is Boolean functions: forget lattices and graphs. We love Boolean functions. We try to see everything through properties of Boolean functions; this is both the strength of our area and its weakness. Okay, ours may be just a trivial restatement of FC: the function is just the indicator function of the sets. Yet. Yet, we believe that this simple insight may help us make progress on FC. There are many equivalent statements of FC as questions about lattices, about graphs, and about other mathematical objects. What we find exciting about this statement is that it suggests two things:
The latter echoes a motivation for the lattice and graph forms expressed in the survey—they yield interesting special cases. We can try to prove theorems like this:
Theorem: For any that has property in addition to FP, the FC holds.
We can prove this when is the property “symmetric” or “monotone.” We wonder about other properties such as having low degree as polynomials, having read-once decision trees, and so on.
Let be a Boolean function. The notion of the influence of the th input of is defined by
where is equal to
Also, the probability is over selected uniformly. There are many applications of this basic notion, and it is used in complexity theory in many places.
If you see a number or a count of something, it often helps to replace it by the set of “somethings.” This yields additional information, and perhaps additional insight. Among many examples of this phenomenon in mathematics, we mention the notion of Betti numbers in topology. Named for Enrico Betti, they measure the complexity of spaces: the higher the number the more complex the connectivity of the space, roughly speaking. For example, Andy Yao used them to bound the depth of decision trees.
While Betti numbers remain useful, a major advance in topology was made when they were replaced by homology groups. These groups yield the Betti numbers but contain additional information. This ability to recover the old idea, yet yield a richer context, is what makes homology groups so powerful. See this for an interesting discussion of how Betti numbers evolved into homology groups from the 1920’s on, with Emmy Noether in a leading role.
Following our principle we plan on replacing as a number, by a set. Let us use to denote those such that
Obviously the following is true:
There must be a better notation than —we are open to suggestions. Any?
The key is the following lemma:
Lemma: Let be a Boolean function and let be fixed. Then the Boolean inputs can be partitioned into six sets:
These sets have the following properties:
- The variable is equal to on ;
- The variable is equal to on ;
- The union is equal to ;
- The function is always on ;
- The function is always on ;
- Finally and and .
The key observation from this lemma is that the number of such that and is exactly
where is the set of so . This implies that for any , it satisfies FC if and only if some decomposition has for at least half of . When is monotone the next theorem makes this clear, but the hope is that FP alone can be made to imply that at least half of these have .
Every monotone function immediately has FP, since and implies This insight says to us that the FP is a kind of weaker notion than monotone. But perhaps not too much weaker.
Theorem 3 Every monotone Boolean function satisfies FC.
Proof: Since is monotone, an in must have . So witnesses FC for . This uses that one half at least of the places where is are when .
This also yields the following result.
Theorem 4 Every symmetric Frankl function satisfies FC.
Proof: Let be a non-trivial symmetric Boolean function. Let be the set of all so that have ‘s and ‘s: these are the level sets. Since is symmetric it is easy to see that for each level set either is all on this set or all . Let be the smallest such that is on the level . Then we claim that is also on all level sets with . This claim shows that is monotone and completes the proof.
So assume that . Let be in . It is clear there are and both in so that . But then by the Frankl property, . Thus the claim is proved.
It is interesting to try to get these results from the lattice version of FC. Given a Frankl function , let
Given any , the least possible upper bound of is , and by FP this belongs to , so obeys the unique-join property of lattices. If the set of lower bounds of and had no maximum, then there would be such that . However, by FP again, and , a contradiction. Thus has unique meets, though need not equal the bitwise-AND —indeed the meet can be . So FP always makes into a lattice.
For an example, consider . This is monotone, and excludes only and .
Now consider , , and . Then since is not in , so . On the other hand, so . Thus violates the equation
which defines a modular lattice. Thus we have a monotone Boolean function whose lattice is not modular, hence not distributive either. However, for monotone , “almost” satisfies the following condition, which is dual to a condition called “lower semimodular” in the survey:
For all incomparable , if there is no element of properly between and , then there is no element properly between and .
Given monotone, it follows that unless , we have so . The if-clause then implies there must be exactly one such that and . This implies that flips only place compared to , so the then-clause also holds. Thus the only exception is when acts as the meet, and indeed the above lattice is a counterexample with and .
We suspect that the proof method of a paper by Tetsuya Abe and Bumpei Nagano cited by the survey might overcome this exception, and hence prove the lattice version of FC in the monotone case directly in such manner, but we are not sure. All this should say something interesting about lattices, since monotonicity is such a natural Boolean property.
We noted that monotone Boolean functions have FP. Turned around, this says that Frankl functions generalize monotone functions. Can we hence prove complexity lower bounds on these functions?
One advantage of thinking of this as a Boolean problem is that new tools arise that might be available. Can we use the kind of Fourier methods that have worked for results on Boolean functions already? A prime example of the Boolean view being so powerful is the proof of the famous theorem of Kenneth Arrow on voting schemes.
]]>
Andrey Kolmogorov, Fred Hennie, Richard Stearns, and Walter Savitch are all famous separately; but they have something in common. Read on, and see.
Today I wish to discuss some algorithmic tricks and show that they were initially used by complexity theorists, years before they were used by algorithm designers.
To steal a phrase: it’s computational complexity all the way down. Well not exactly. The situation is slightly more complex—a bad pun. The complexity theorists often invented a concept and used it in a narrow way, while later it was rediscovered and made a general notion. This is another example of the principle: The last to discover X often gets the credit for X. I note that the dictionary gets this wrong:
Its not the first but the last. For after the last gets the credit, it’s no longer a “discovery.” Let’s look at three examples of this phenomenon.
Who invented it? Andrey Kolmogorov in 1953.
Who really invented it? Harold Lawson in 1964.
Details: Kolmogorov did so much that one should not be surprised that he invented “pointers.” In 1953 he wrote a short paper, really an abstract, “To the Definition of an Algorithm.” This later became a 26-page paper joint with the still-active Vladimir Uspensky. In that and the preceding decades, several researchers had advanced formal notions of an algorithm. Of course we now know they all are the same, in the sense they define the same class of functions. Whether one uses Turing machines, recursive functions, or lambda calculus—to name just a few—you get the same functions. This is an important point. In mathematics, confidence that a definition is right is often best seen by showing that there are many equivalent ways to define the concept.
Kolmogorov’s notion was similar to a Turing machine, but he allowed the “tape” to be an almost arbitrary graph. During the computation, in his model, the machine could add and change edges. How this and other models constitute a “Pointer Machine” is discussed in a survey by Amir ben-Amram. A beautiful explanation of Kolmogorov’s ideas is in the survey: “Algorithms: A Quest for Absolute Definitions,” by Andreas Blass and Yuri Gurevich. I quote from their paper:
The vertices of the graph correspond to Turing’s squares; each vertex has a color chosen from a fixed finite palette of vertex colors; one of the vertices is the current computation center. Each edge has a color chosen from a fixed finite palette of edge colors; distinct edges from the same node have different colors. The program has this form: replace the vicinity U of a fixed radius around the central node by a new vicinity W that depends on the isomorphism type of the digraph U with the colors and the distinguished central vertex. Contrary to Turing’s tape whose topology is fixed, Kolmogorov’s “tape” is reconfigurable.
Harold Lawson invented pointers as we know them today in 1964. He received the IEEE Computer Pioneer Award in 2000:
for inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high level language.
The idea that a pointer can be variable was by Kolmogorov, and distinct from fixed pointers in Lisp implementations as Ben-Amram notes, but making it a syntactic variable in a programming language made the difference.
Who invented it? Fred Hennie and Richard Stearns in 1966.
Who really invented it? Robert Tarjan in 1985.
Details: Hennie and Stearns proved one of the basic results in complexity theory. Initially Turing machines had one tape that was used for input and for temporary storage. It was quickly realized that one could easily generalize this to have multiple tapes—some for input, others for temporary storage. This does not change the class of computable functions. That is stable under such changes. But it does change the time that such a machine takes to compute some task. What they proved is that the number of tapes was relatively unimportant provide there were at least two. More precisely they proved that a machine with tapes that ran in time could be simulated in time by a machine with two tapes.
This result is very important and fundamental to the understanding of time complexity. The simulation is also very clever. Even for simulating three tapes by two, the issue is that the obvious way to do the simulation seems to require that the time increase from to order . But they show that the obvious method of making three (or more) tracks on one tape can be fixed to work so that the “average” simulation step takes order , using the second tape to move data. Note some simulation steps take a constant number of steps, and others take a huge number of steps. But the average is logarithmic and this proves the theorem. I have discussed this before—see here.
Bob Tarjan years later used the same fundamental idea in what is called amortized complexity. Bob’s ideas explained in his 1985 paper Amortized Computational Complexity made a useful formal distinction within the concept of an operation being good on average. The distinction is whether the operation can possibly be bad all the time, in highly “unlucky” cases. If you are finding the successor of a node in a binary search tree, you might unluckily have to go all the way up the tree, and all the way back down on the next step. But the step after that will be just one edge, and overall, the entire inorder transversal of nodes takes edge steps total, giving an average under . It doesn’t matter how unbalanced the tree is. The idea is used throughout algorithm design.
Who invented it? Walter Savitch in 1965.
Who really invented it? Whitfield Diffie and Martin Hellman in 1977.
Details: The most famous example in computational complexity is certainly Savitch’s brilliant result—still the best known—showing that nondeterministic space is contained in space. His idea is to change a search problem for a path from to into two search problems. One searches from and also searches from . If there is some point in common, then there is of course a path from to . If we insist on halving the allowance for the search each time we recurse, then we will succeed when—and only when— is exactly midway between and . This insistence guarantees recursion height plus local storage for the neighborhood of the current node, giving space overall. Ken likes to cite a “Modified Chinese Proverb” in his lectures:
A journey of a thousand miles has a step that is 500 miles from the beginning and 500 miles from the end.
Diffie and Hellman in 1977 created the idea of using this method for an attack on a block cipher. Their attack largely deflates the idea that a composition of encryptions
where each uses a different key , can be substantially more secure than a single application of some . The idea, explained in full by our friends at Wikipedia, is to guess an intermediate stage , let be the encodings up to and after stage , and the corresponding decodings. Plaintexts and ciphertexts are related by
If we had , we could try all single encoding keys to get , and store results in a lookup table. Then we can try all decryption keys to see if , and preserve pairs that match. Then for each pair try other and discard it if they don’t match, until one pair survives. Already we have time bounded by the sum not the product of the numbers of keys, multiplied instead by the number of trials. But the main point is that the time savings (at the expense of space for the table) are largely preserved in a recursion that guesses a breakpoint and recurses on the supposition that is at or near the midpoint, by an analysis similar to Savitch’s.
Okay, not all algorithmic ideas started in complexity theory. But a number of very important ones did. All three that we’ve covered in fact came from analyzing basic models of computation. Can you name some others?
Cropped from India Today source |
Manjul Bhargava is a mathematician who just won one of the 2014 Fields Medals. We offer our congratulations on this achievement. He is an expert in number theory, which makes him special to us among Fields medalists. His Fields citation includes his doctoral work on a powerful reformulation and extension of Carl Gauss’s composition law for quadratic forms. He also proved a sense in which 290 is special to us among numbers, since we have been thinking recently about quadratic forms as tools in complexity theory.
Today we talk about his “290 Theorem” with Jonathan Hanke, which is quite accessible, and also raise complexity-related questions about this result.
One of his papers is in the American Math Monthly. It is on the factorial function and its generalizations. We always applaud when a pathbreaker writes a popular survey. It is hard to imagine a more concrete subject and a more accessible journal.
A sub-surface question in computational complexity is how much difference can finite changes make? With asymptotic time or space bounds, finite changes make zero difference—finite sets have zero asymptotic complexity. However, for measures of concrete complexity, such as the size of proofs and the information content of strings, even small exceptions may make a difference.
Why is special? Well the answer starts with, why is special? Call a quadratic form in integer variables universal if it represents all positive integers, that is, if .
is universal is Joseph Lagrange’s famous Four-Square Theorem, whose implications in complexity theory we have already noted.
Indeed, Jeffrey Shallit and Michael Rabin proved that this form is effectively universal in the sense that given any integer , we can find in random polynomial time integers so that
Our complexity problems here are perhaps related but different: given a form , how easy is it to test whether is universal, and how easy is it to prove the theorem on which the test is based?
We need to define the allowed range of forms carefully. Every can be specified by a matrix such that for all ,
If we were using non-commutative algebra where , then would be unique, but now the coefficient of can be split any way between the entries and . The convention is to split evenly, so that is symmetric, and once again unique. Then the condition that for is the same as being positive definite, and often this is assumed when talking about quadratic forms.
In Lagrange’s case, is the identity matrix. Diagonal matrices are positive definite if and only if all the diagonal entries are positive, but for symmetric real matrices in general things are trickier. The matrix for is positive definite despite the two entries, while is another “cheater” despite all-positive entries. In the case , is positive definite but has half-integer entries:
Then is said to be integer-valued but not classically integral. For much of two centuries, was also thought to “cheat,” so that only forms with even cross-term coefficients were distinguished, but Bhargava’s work has helped solidify the looser “twos-out” condition as the standard.
Obviously, universal forms are interesting. In 1993, John Conway and William Schneeberger proved that if a positive-definite quadratic form with integer matrix represents all positive integers up to , then it represents all positive integers. Pretty neat. This clearly makes the checking of such a form to see if it is universal a relatively simple task. The proof was not published and was quite intricate. This result is naturally called the 15 Theorem.
Bhargava found a simpler proof in 2000, which was hailed by Conway himself. He followed this up by proving Conway and Schneeberger’s conjecture that one could relax to an integer-valued form upon replacing by : If a form represents all positive integers up to , then it is universal. Precisely stated:
Theorem 1 For any integer-valued positive-definite quadratic form , if includes
then includes . Moreover, this set of twenty-nine numbers is minimal—remove any one and the statement becomes false.
This last theorem alternates senses of being easy and complex. Its statement is complex—where do those numbers come from? However, having only twenty-nine numbers to check makes it easy to prove that a given form is universal—just produce arguments yielding each number as a value. However, the problem of finding such arguments, especially given an arbitrary , remains possibly complex—can there be any kind of extended Rabin-Shallit theorem?
What we fix on, however, is what the nature of the statement says about the complexity of the proof. This raises the question of a measure of complexity of theorems. Our intent can be approached by considering four types of theorems:
Type I. Every of type has property .
Type II. Every of type has property , except possibly for belonging to this fixed finite set of values:
Type III. If every member of has property , then every of type has property .
Type IV. A theorem about objects in which a test of kind II or III is part of the statement.
To interpret this, think of as a positive integer, fix a form , and read as “.” Then for the particular form , statement I is “ is universal,” II is “ is almost universal” (in a sense we haven’t discussed but that is clear), and III is an instance of Bhargava’s theorem for the particular (when , that is). Then Bhargava’s theorem itself is type IV, where the objects are forms , and all it does is universally quantify the assertion of type III over those .
Note that in types II and III, the statement remains true if is replaced by a larger set . We intend this also in type IV—that is, the theorem is monotone in the test. So define a number to be special if there is an such that:
In types II and III it is enough to specify ; in IV we also give the theorem statement about ‘s.
Here are some examples of all these types:
Theorem 2. Every odd-order group is solvable. (Type I)
Theorem 3. Every finite group that is not cyclic or alternating or of certain Lie types is not simple, unless it is one of 26 so-called sporadic simple groups. (Type II)
Theorem 4. Every Fibonacci number has a prime factor that does not divide any earlier Fibonacci number, except for , , and . (Also type II)
A statement of type II logically implies one of type III, but in a trivial sense. For instance, let be the assertion that the (topological generalized) Poincaré conjecture holds in dimensions. Suppose we went back in time before Michael Freedman won his Fields Medal for proving in 1982, let alone Grigori Perelman for , but after Steve Smale won his in 1966 largely for proving for . Then with we had a theorem of type II, and also type III in the form “if Poincaré is true for then it is true for all .” But the implication generally does not go the other way, and since the Poincaré is true there was no special number.
Our point is that a theorem of type I seems easier to understand than one of type II or III, let alone IV. Hence theorem I might have a clearer proof than theorem II. None of this intuition is solid, but it does seem reasonable. The presence of gives the statements other than type-I higher information complexity. Our questions are whether the freedom to enlarge mitigates the increase in complexity compared to type I, whether this might allow easier proofs of the resulting statements, and how much harder it is to prove special numbers—when must be minimal.
For now we only have a few sketchy ideas on these questions, involving the notion of the Kolmogorov complexity of a string . This is named for Andrey Kolmogorov, but various related notions were discovered at the same time by others. Here it is enough to say that is the length of the shortest program that generates . The notion extends to define for finite sets.
Can we prove that certain theorems of types II–IV cannot be proved in a fixed formal system? Our intent is to argue like this: Suppose that we could prove the statement
where is a given finite set. The means we’re taking minimal. If itself has large complexity, then this might lead to a contradiction.
A simple result can be proved. Let denote the length of formula as a string, and let ZF be the usual set theory—or any reasonable theory, provided its set of theorems is r.e.
Theorem 5. Suppose that the statement
where is a given finite set is provable in ZF. Then there is an absolute constant that depends only on ZF so that
Proof: Let a machine search the proofs of ZF for a proof of the above theorem. It will eventually find the proof. This will give us the set and so we have a description of of size at most plus a constant that depends on the machine that does the search.
Since we can get a bound on , we may be able to argue that certain concrete properties cannot be proved to fail for sets that are finite but too high in complexity. This would be great if we could actually do something like this. We would like even better to remove the dependence on , and obtain results of this kind:
There are only finitely many numbers that ZF can prove to be special.
Although when is the largest element in , we don’t get by imitating the above proof, because the machine could output an infinite sequence of ‘s that go with different ‘s. It isn’t a contradiction because something like or the index of in this sequence needs to be given as well to specify , and this need not be bounded by any function of . One can craft artificial ‘s to falsify it. But in specific cases, or in the presence of restrictions on , this might yield (or at least suggest) interesting results.
For instance, we’ve noted that homotopy groups are finite and computable, so that the sequence of their structure must have Kolmogorov complexity at most a value that we can determine. We need not develop a fast algorithm to compute them, but just a short one. This proves that while the intuitive complexity of the structure of homotopy groups seems high, we can bound their K-complexity. Having tight bounds would be interesting, and then might help analyze what kinds of relaxations of theorems make them easier to prove.
We have raised a diverse array of complexity-related questions. Can more light be shed on them? Is there any kind of global limit on special numbers?
]]>
Edward Barbeau is now a professor emeritus of mathematics at the University of Toronto. Over the years he has been working to increase the interest in mathematics in general, and enhancing education in particular. He has published several books that are targeted to help both students and teachers see the joys of mathematics: one is called Power Play; another Fallacies, Flaws and Flimflam; and another After Math.
Today I want to discuss his definition of the derivative of a number, yes a number.
We all know the concept of a derivative of a function. It is one of the foundational concepts of calculus, and is usually defined by using limits. For the space of polynomials it can be viewed as a linear operator so
The derivative operator in general satisfies many properties; one is the product law:
This rule is usually credited to Gottfried Leibniz. Somehow the great Issac Newton did not know this rule—at least that is what is claimed by some.
Barbeau defined the derivative of a natural number in 1961. Define for a natural number by the following rules:
Here is a picture from his paper:
This proves that he really did it a long time ago. Note the typewriter type face: no LaTeX back then. He proved the basic result that is well defined. This is not hard, is necessary to make the definition meaningful, but we will leave it unproved here. See his paper for details.
A simple consequence of the rules is that for a prime. This follows by induction on . For it is rule (1). Suppose that :
Unfortunately this does not hold in general. Also is not a linear operator: and . This double failure, the derivative of a power is not simple and the derivative is not linear in general, makes difficult to use. One of the beauties of the usual derivative, even just for polynomials, is that it is a linear operator.
The derivative notion of Barbeau is interesting, yet it does not seem to have been intensively studied. I am not sure why—it may be because it is a strange function—I am not sure.
There is hope. Recently there have been a number of papers on his notion. Perhaps researchers are finally starting to realize there may be gold hidden in the derivative of a number. We will see.
Most of the papers on have been more about intrinsic properties of rather than applications. A small point: most of the papers replace by : so if you look at papers be prepared for this notation shift. I decided to follow the original paper’s notation.
The papers have results of three major kinds. One kind is the study of what are essentially differential equations. For example, what can we say about the solutions of
where is a constant? The others are growth or inequality results: how fast and how slow does grow? For example, for not a prime,
A third natural class of questions is: can we extend to more than just the natural numbers? It is easy to extend to integers, a bit harder to rationals, and not so easy beyond that.
Here are two interesting papers to look at:
I tried to use to prove something interesting. I think if we could use to prove something not about but about something that does not mention at all, that would be exciting. Tools must be developed in mathematics, but the key test of their power is their ability to solve problems from other areas. One example: the power of complex analysis was manifest when it was used to prove deep theorems from number theory. Another example: the power of the theory of Turing machines was clear when it was used to yield an alternate proof of the Incompleteness Theorem.
The best I could do is use to prove an ancient result: that is not rational. Well I may be able to prove a bit more.
We note that from the product rule: , for any . Recall if were prime this would be .
Now assume by way of contradiction that is rational number. Then for some positive numbers we have
As usual we can assume that are co-prime.
So let’s take derivatives of both sides of the equation—we have to use sometime, might as well start with it.
Note that it is valid to apply to both sides of an equation, so long as one is careful to obey the rules. For example allows but there is no additive rule to make the right-hand side become which would make the equation false.
The result of taking the derviative of both sides is:
Now square both sides and substitute for to get:
This implies that divides . This leads to a contradiction, since it implies that are not co-prime. Whether we also get that divides is possibly circular, but anyway this is enough. The point is that owing to , the derivative removed the problematic factor of .
Note from the original equation, we only get that divides which is too weak to immediately get a contradiction. Admittedly ours is not the greatest proof, not better than the usual one especially owing to the squaring step, but it does use the derivative of an number.
One idea: I believe that this idea can be used to prove more that the usual fact that has no nonzero solutions over the integers. I believe we can extend it to prove the same result in any ring where can be defined, possibly modulo issues about lack of unique factorization. This handles the Gaussian integers, for example.
Can we use this strange function to shed light on some open problem in number theory? Can we use it in complexity theory? A simple question is: what is the complexity of computing ? If where and are primes, then by the rules. But we know that and thus we have two equations in two unknowns and we can solve for and . So in this case computing is equivalent to factoring . What happens in the general case? An obvious conjecture is that computing is equivalent to factoring.
[fixed error in proof that owed to typo , as noted by user "Sniffnoy" and others, and changed some following text accordingly; further fix to that proof; fixed typo p^k to p^{k-1}]
]]>
A new result on our three body problem
Allan Grønlund and Seth Pettie are leaders in algorithm design and related problems.
Today I want to give a quick follow up on our discussion of 3SUM based on a recent paper of theirs.
I had planned to report on the accepted papers that will soon appear at this upcoming 2014 FOCS conference. See here for all conference details. But Pettie just sent me their joint paper that almost answers the questions raised in the last discussion on “our three body problem.” The paper is called: Threesomes, Degenerates, and Love Triangles: I have no comment on this title. Wait saying “I have no comment” is really making a comment—oh well.
Let’s start by recalling the definition of 3SUM. Given a set of integers the problem is to determine if there are three elements in ,
Actually the version they work on allows to be a set of real numbers, not just integers. This is important because there could be tricks that work with integers—taking mods for example—that cannot be made to work with reals. Since they are getting upper bounds this generalization is just fine.
They prove a variety of theorems in their paper, so look at it for all the details. One of the main results is:
Theorem:
The decision tree complexity of 3SUM is .
So this means that the 3SUM problem has a sub-quadratic algorithm. No. It means that it has decision tree complexity that is order , dropping the logarithmic factor. So let’s look quickly at what this really means, and at the same time give some idea of how they prove such a result.
Decision tree complexity is based on a simple model: a tree is defined that at each node makes a binary decision based on some test of the inputs. Various tests are allowed but here they are always linear in the inputs. This model has been studied for ages and many interesting results are known.
Some of the most exciting ones are based on insights of Michael Fredman. In particular he showed in 1976 several cool results about the power of decision trees. I should report in more detail on these ideas, since they are old but as the new 3SUM results show, still extremely powerful. One idea is used repeatedly in this paper:
we shall refer to the ingenious observation that if and only if as Fredman’s Trick.
Michael also proved an amazing result, in my opinion, that is also used in the current paper.
Lemma: A list of numbers whose sorted order is one of permutations can be sorted with pairwise comparisons.
Here is their algorithm for 3SUM:
In their paper they prove the key facts: the algorithm is correct, and it can be implemented using Fredman’s and other ideas to have the claimed number of comparisons. What is also interesting is that such decision-tree type algorithms sometimes “cheat” by using the existence of the decision tree, but the cost of computing what the next comparison is can be very hard. This happens, for example, with the Knapsack Problem—see here. In their algorithm the cost of deciding the comparisons is order Very neat.
Allan Grønlund and Pettie’s beautiful result on 3SUM is evidence that there should be an actual algorithm. I remain optimistic, as always. Perhaps they will be able to prove that soon. In any case, the result is still wonderful.
Ellis Horowitz is one of the founders of the theory of algorithms. His thesis with George Collins in 1969 had the word “algorithm” in the title: “Algorithms for Symbolic Integration of Rational Functions.” He is known for many things, including an algorithm that after forty years is still the best known.
Today I want to talk about this algorithm, and one of the most annoying open problems in complexity theory.
We have just recently talked about the three body problem from physics, but now it is our turn in computer theory. Our version is called the 3SUM problem. The problem—I guess you probably know it already—is given three sets of integers, are there , , and so that
Suppose the sets each contain at most elements and the elements are bounded by a polynomial in , then there is a trivial algorithm that takes : just try all triples. Our friends at Wikipedia say
3SUM can be easily solved in time
This is unfair. The algorithm may be simple, but it is in my opinion very clever. It is due essentially to Ellis and his then PhD student Sartaj Sahni, who proved in 1974 a related algorithm for knapsack. More on knapsack in a moment. They called their result a “square root improvement.”
Here is the general idea. Build a table of all possible values and then check to see if is in the table. With some care this can be done in time . I once talked about this result—it is one of my favorite algorithms. For example it can be used to solve the knapsack problem in time . Recall this is the problem of finding that are so that
Here and are given integers. This is what Ellis and Sahni did in 1974.
In ancient times, before 3SUM was called 3SUM, I thought about how to improve the above algorithm. The idea of breaking things into three or more sets was natural. I worked pretty hard on trying to get a better method. I failed. Here is what I said in the previous discussion:
A natural problem is what happens with a set like ? Is there a way to tell if is in this set in time , for example? If this is possible, then the knapsack problem has an algorithm whose exponential term is . I have thought about this and related approaches quite often. I have yet to discover anything, I hope you have better luck.
Actually the usual version now is that we are given one set . Then we must determine whether there are three elements in so that
It is not hard to show that having one set instead of three is essentially no real change: just code the set name into the number.
There are lower bounds on the 3SUM problem in various restricted models of computation. Jeff Erickson showed a quadratic lower bound provided the elements are real numbers and the computation is a special type of decision tree. The result is interesting because it is an adversary argument where it is useful to use infinitesimals in the argument. See the paper for details.
Back to the 3SUM problem for integers. If the set has all elements less than in absolute value, then 3SUM can be computed in time
The trick is to cheat: use more than comparisons as in Erickson’s model.
Suppose that is a set of positive numbers less than —this is not needed but I hope makes the idea clearer. The problem is to see if contains an element from another set . Define the polynomial by
Note it’s degree is at most . Using the Fast Fourier transform we can compute in the time . The critical point is that is equal to where
and is equal to the number of in so that . But we are almost done. Just check each element from to see if its coefficient in is positive.
I wish I could report that the problem has been solved, that there is a sub-quadratic algorithm now for 3SUM. That we have finally beat Horowitz and Sahni. But that is not the case. We do have some results that are additional evidence that either the old algorithm is best, or that it will be hard to improve it.
The modern view is to make a failure to find an algorithm into a useful tool. Modern cryptography is based on this: factoring has been studied forever and no polynomial-time algorithm is known, so we purpose that it is hard and use it for the basis of crypto-systems. Think RSA.
The same as been done to 3SUM. Since in forty years we cannot beat quadratic, let’s assume that there is no sub-quadratic algorithm and use this as a tool to “prove” conditional lower bounds on problems. It’s lemons into lemonade.
Well to be careful there are better algorithms. The work of Ilya Baran, Erik Demaine, and the late Mihai Pǎtraşcu give slightly better times on RAM machines. These results are interesting, but do not get , for example. See their paper for details. Pǎtraşcu also showed a tight reduction from 3-SUM to listing 3-cliques.
There is other work on why the 3SUM problem is hard by Amir Abboud, Kevin Lewi, and Ryan Williams: in the paper: Losing Weight by Gaining Edges. They prove many interesting results; here is one that I especially like:
Theorem 1 For any , if -Clique can be solved in time , then -SUM can be solved in time .
Of course -SUM is the generalization to 3SUM where we seek numbers in a set that sum to zero.
Is there a sub-quadratic algorithm for 3SUM? I would love to see one. The problem seems hard, very hard. It has been open for forty years, and perhaps it will be open for another forty. Who knows. As always I believe that there could very well be a better algorithm—I am always optimistic about the existence of new clever algorithms.
]]>
Demons and other curiosities
Pierre-Simon Laplace was a French scientist, perhaps one of the greatest ever, French or otherwise. His work affected the way we look at both mathematics and physics, among other areas of science. He may be least known for his discussion of what we now call Laplace’s demon.
Today I want to talk about his demon, and whether predicting the future is possible.
Can we predict the past? Can we predict the present? Can we predict the future? Predicting the past and predicting the present sound a bit silly. The usual question is: Can we predict the future? Although I think predicting the past—if taken to mean “what happened in the past?”—is not so easy.
So can we see the future? I would argue that many can and do very well every day predicting the future. The huge profits of options traders and hedge funds must say something about prediction. They make a lot of money by knowing—at least with some reasonable probability—what the future price will likely be of a stock or commodity.
There are many other predictions that we make that are often correct. We can predict that the sun will rise at tomorrow in Atlanta at 6:54 am. This prediction works very well. The Weather Channel does a reasonable job of predicting the weather for later today, a less good job on tomorrow, and not so good on predicting the weather this calendar day next year.
The issue is not these predictions, but whether it is possible to predict the future exactly. Laplace in 1814 claimed that given the exact position and speed of all objects in the universe at some time a “demon” would be able to use the laws of physics to predict their positions at an arbitrary time in the future. This is now called Laplace’s demon. Translated, he said:
We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.
Since Laplace’s work there has been much discussion against the idea that even in principle there could be such a demon. Many attacks are now possible. Most are based on ideas and concepts that Laplace did not have available to him in 1814. Chaos theory has been advanced as one way to show hid demon is impossible, the random nature of quantum mechanics is another, and even the natural of computational complexity.
Laplace of course had no idea of the quantum nature of the world. So it is a bit unfair for us to attack him in this way. We could extend Laplace’s intent to a quantum world, noting that quantum mechanics is a deterministic theory even as it describes branching-off worlds. Not all conceivable branches are possible, and we could ask the demon to identify the excluded ones in advance.
We would assume the demon has perfect knowledge of the initial conditions of the universe or of any local Big-Bang event, thus distinguishing our setting from the more human-relevant one in this in-depth essay by Scott Aaronson. Still, it is easier to address Laplace’s argument in the kind of world where Newtonian mechanics holds sway, and where the demon could solve N-body problems exactly even with collisions.
Accordingly, some researchers have looked at the problem of prediction of the future in a Laplacian type world, where the future is deterministic. Not long ago, in 2008, David Wolpert used a riff on Cantor’s diagonalization argument to show that prediction machines could not exist. The latter is one of the reasons that I find the question relevant to theory. His theorem is here and is summarized here:
The theorem’s proof, similar to the results of Gödel’s incompleteness theorem and Turing’s halting problem, relies on a variant of the liar’s paradox—ask Laplace’s demon to predict the following yes/no fact about the future state of the universe: “Will the universe not be one in which your answer to this question is yes?”
Recently a short note, called a “Mathbit,” was published by in the Math Monthly by Josef Rukavicka. A Mathbit is always at most a single page and is set in a gray font style.
He claims an even shorter proof that Laplace’s demon is impossible—David’s is more formal and has precise definitions. Here is the main part of Rukavicka’s argument:
Suppose that there is a device that can predict the future. Ask that device what you will do in the evening. Without loss of generality, consider that there are only two options: (1) watch TV or (2) listen to the radio. After the device gives a response, for example, (1) watch TV, you instead listen to the radio on purpose. The device would, therefore, be wrong. No matter what the device says, we are free to choose the other option. This implies that Laplace’s demon cannot exist.
I have several reservations about this proof that there is no Laplace demon. For starters, it assumes a complexity type assumption: that the prediction of the future is fast. What if the prediction of time one day into the future took more than one day? Then of course the argument would fail. Of course this raises an interesting issue. Suppose to predict the future days into the future takes more that , then this is clearly not useful. However, even if the predictor takes only to do the prediction, the that is needed to get a useful prediction could be immense. What if the prediction took
This would clearly not allow Rukavicka’s argument to be meaningful.
Another basic issue that struck me is the choice of watching TV vs. radio. Rukavicka assumes implicitly in his argument that we have the free will to decide what to do. But this seems to be the essence of the whole issue. What if we cannot make this choice? We might listen to the predictor say “TV” and tomorrow we forget our contrarian intent to listen to the radio and watch TV anyway. What if we really do not have a choice? This seems to devolve into a circular argument—or am I missing something?
Well these two issues do take the argument back into the realm of Scott’s long essay.
What do you think? In a deterministic world could there be complexity results about predictions? Are these questions related to P=NP in some manner?
]]>