April CACM source |
Juris Hartmanis did much to lay the landscape of computational complexity beginning in the 1960s. His seminal paper with Richard Stearns, “On the Computational Complexity of Algorithms,” was published 50 years ago this month, as observed by Lance Fortnow in his blog with Bill Gasarch. It is a great achievement to open a new world, but all the more mysterious that after 50 years so much of its landscape remains unknown.
Today we ask what might determine the unseen topography and how much some recent large-data discoveries may help to map it.
The idea for this post arose from a possibly phantom memory that Juris (co-)wrote a short draft survey on “Shapes of Computations” sometime in 1986–1989 when I was at Cornell. I recall the specific phrase “long and skinny” to describe space-bounded computations. An space-bounded computation can explore all of a polynomial -sized undirected graph by saving just the current node and some auxiliary information by which to choose a neighbor for the next step. The trace of this computation becomes an -length sequence of -bit strings. A polynomial-space computation doing an exponential search of a game tree has the same long-and-skinny “shape” even though the scale is greater with regard to the input length . Polynomial time-bounded computations, however, can use polynomial space, whereupon they become “short and fat.” Breadth-first search of a graph is a canonical algorithm that hogs space for its relatively short duration.
Which computations fall between these “Laurel and Hardy” extremes? For SAT and the other -complete problems, this is the great question. The surest way to separate from and from would be to characterize these problems by a third distinct shape of computation. But we have not even separated from , nor logspace from , so what can we pretend to know?
My memory has probably been transferred from a column Juris wrote with his students Richard Chang, Desh Ranjan, and Pankaj Rohatgi for the May 1990 issue of the Bulletin of the EATCS. It has this nice diagram:
The analogy between computations and proofs has been instrumental since the early days of Kurt Gödel and Alonzo Church and Alan Turing. Proofs do, however, give nondeterminism “for free”; is treated the same as in their diagram, same as , while “nondeterministic polynomial space” equals . Hence I’ve regarded “proofs” as secondary to “computations” as objects for complexity. However:
The EATCS column defines a notion of width of a proof, characterizes via polynomial width proofs, and marvels at how the classic interactive protocol for retains the “skinny” shape with less length. Indeed, in cases where the verifier is able directly to check evaluations of the unique multilinear extension of the arithmetization of a quantified Boolean formula, every proof step involves just two terms for the rounds to . A related form of skinniness had been brought out by Jin-Yi Cai and Merrick Furst in their 1987 paper, “ Survives Three-Bit Bottlenecks.” The column goes on to emphasize that the form of the proof lends itself to quicker probabilistic verification. This aspect was shortly codified in the definition of probabilistically checkable proof, which lends itself most readily to characterize and .
Amid all this development on “long and skinny” proofs, what can we say about “short and fat” ones? Intuitively, such proofs have lots of cases, but that is not their full story. The network of logical dependence matters too. Hence we think there are most helpfully three kinds of proofs in regard to shape:
Direct evaluations of quantified Boolean formulas in have type 2, while the interactive proof with polynomials gives the “feel” of type 1 to both the prover and the recipient of the proof.
Chess problems prefer types 1 or a limited form of 2 for esthetics. The website ChessBase.com recently republished the longest branching-free “Mate-In-N” problem ever created, by Walther Jörgenson in 1976. It is mate-in-203 with no alternative move allowed to the winner, and virtually no case analysis of alternate defensive tries by the loser either.
However, a chess search often has type 3. Often there will be different starting sequences of moves that come to the same position . The value of that was computed the first time is stored in a hash table so that the later sequences are immediately shown that value, cutting off their need for any further work. This resembles breadth-first search insofar as marked nodes may be touched later along other paths. The dependencies of values become web-like. Injured values from hash collisions can cause huge ripple effects, as I covered in a post three years ago.
The stored-hash tactic is much the same as a lemma in a proof that is used multiple times. We suspect that last year’s 6-gigabyte computer-generated proof of discrepancy > 2 in the Erdős Discrepancy Conjecture has many such lemmas, and hence is more type 3 than 2. The provers Boris Konev and Alexei Lisitsa have an updated page with further computations. They do not link the whole impossibility proof of a length 1,161-or-more sequence of discrepancy 2, but do give some of the “DRUP” certificates of unsatisfiability. DRUP stands for reverse-unit propagation with clause deletion, and that propagation strikes us as largely composed of cases and lemmas. The subcases might be codable at high level via predicates for m < 1,161 expressing the unavailability of length- subsequences fulfilling some extra conditions , with such predicates being copiously re-used.
One finds an explosion of stored sub-cases in chess endgame tables. However, in many positions a vast majority of them are to prove wins that a chess master would see as “trivially true” in a few seconds. In other cases an alternative by the loser might simply jump to a later stage of the main trunk line, thus merely accelerating the same loss rather than posing a new case. (Similarly, an alternative by the winner might merely allow the defender to wind back to an initial stage, without much need for separate verification.) We wonder how far this pertains to the longest endgame win discovered in the new 7-piece endgame tables–mate in 545 moves. That is, how large is the part of the proof tree that needs to be specified, so that a chess program given values for positions in the tree could verify the rest via its local search?
This last eventuality prompts our new speculation: Perhaps we can rigorously develop a science of when large sections of proofs can be effectively handwaved. This would appeal to the computational style most often postulated for our human brains: shallow but broad and with great power of heuristic association, analogous to the complexity class of poly-size constant-depth threshold circuits. Even when yoked with computers the added-value of our grey matter is not to be discounted, as attested in my joint paper last summer—in section 6 showing how human-computer tandems performed better at chess in 2005–08 than computers alone.
We have recently twice covered a conjecture by Freeman Dyson that one feels should lend itself to this kind of treatment. Many other open conjectures in number theory are felt to be “probably” true, where “probably” has a technical sense that might be developed further into some kind of dependent structure: if that handwave is valid then all-the-more certainly so is this one. The idea could be helped by enumeration of exceptions that, once handled, enable the rest of the proof to be executed with a broad brush. As linked from an essay post by Scott Aaronson, Tim Gowers relates a relevant opinion by Don Zagier in a MathOverflow comment. We morph this opinion to say that mathematicians may need a “handwave heuristic” simply because many “obviously true” statements don’t connect to anything substantial that would give a reason for their truth.
This could push many proofs of type 3 toward types 2 or 1. Note that in the way interaction and randomness combine to move type 2 toward type 1, we are already agreeing to tolerate a chance of error. It is the nature of the kind of error involved in converting instances of type 3 into type 2 that needs further consideration. We wonder whether current developments such as homotopy type theory are embracing not just exact patterns but also heuristics for when a search is overwhelmingly likely to succeed—or to fail.
This still leaves our original question of shapes of computations. In the past both Dick and I have undertaken some exploration of conditions under which computations might be similarly “self-improved.” That idea will have to be for another time.
Can we assign particular “shapes” of computations canonically to specific computational problems? Can this help guide concrete attacks, or is it no more than tantamount to solving the big open questions to begin with?
Again we congratulate Juris and Richard on this 50th anniversary of their achievement. We also tip our hat to a comment by John Sidles in our “Case Against Cases” post which partly prompted this further essay.
[word changes to second paragraph of “short-cutting” section, ending, and caption]
Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.
Today we congratulate them on their achievement and describe part of how their new result works.
What exactly did they prove? As some have already remarked, how one expresses this says something about communications in our field. Exactly what they proved is:
Theorem 1 For some explicit constant and all depths up to , there is an -ary monotone Boolean function that has a simple -size circuit (indeed, a formula) of depth with unbounded fan-in and gates, but such that every circuit of depth with the opposite kind of gate at its output as , or with bottom fan-in at most at the inputs, either has size above or else agrees with on at most a proportion of the inputs.
That is quite a mouthful. We can at least simplify it—as they do up front in their paper—by noting that every circuit of depth trivially obeys both of the stated restrictions on circuits of depth :
Theorem 2 Every Boolean circuit of depth and size at most gets at least a fraction of the inputs wrong with regard to computing .
Johan Håstad’s famous 1986 PhD thesis had proved a similar lower bound only in the worst case. If this still seems hard to parse, however, here is a consequence they proved. The total influence of a Boolean function is the sum from to of the proportion of for which flipping the -th bit of changes the value .
Theorem 3 For some non-constant depths and size functions greater than quasi-polynomial in , there are monotone Boolean functions whose total influence is only , but that still cannot be approximated better than a fraction by circuits of depth and size at most .
This gives a big “No” to a question in two posts by our friend Gil Kalai in 2010 on his blog and in 2012 on StackExchange. It rules out any kind of strong converse to a famous 1993 theorem of Nathan Linial, Yishay Mansour, and Noam Nisan, later improved by Ravi Boppana, showing that small constant-depth circuits compute functions of low sensitivity. This theorem has many applications, so the big bound against a converse is significant news, but perhaps the above statement still does not come trippingly off the tongue. Well, here’s something else they proved:
Theorem 4 Relative to a random oracle , the polynomial hierarchy is infinite.
Now that’s a nice short statement that can grab people—at least people like us who did complexity in the 1980s and before. We know this as an open problem going back even before Charlie Bennett and John Gill proved that for a random in 1981.
However, there is not much surprise and not much mileage in that statement. It was believed even before Ron Book observed in 1994 that its negation collapses the hierarchy without an oracle: Given a relativizable class , define to be the set of languages such that the measure of giving is zero. The measure is properly on the space of infinite 0-1 sequences but it is OK to think of the usual Lebesgue measure on where e.g.
denotes the set of primes, ignoring the clash between finite sets like and their co-finite counterparts like that map to the same dyadic rational number.
For classes enjoying certain closure properties, equals . The dot means we have -machines each of whose random branches ends with one query to the given language in , whose answer becomes the result of that branch. For instance, equals the Arthur-Merlin class . Now if for every , then by standard 0-1 laws the measure of putting is zero. Then by the fact that a countable union of sets of measure zero has measure zero, the hierarchy is infinite for a random oracle. Hence its collapse for a random oracle implies for some . This in turn would collapse the hierarchy to without oracle, much as collapses it to .
The point in is that random oracles furnish random bits for the computations. The random oracles could be richer in that exponentially many poly-length computations could use exponentially many random oracle bits in-toto. But the aforementioned closure properties together with probability amplification bridge the difference to machines using polynomially many independent bits when is a hierarchy level.
The point in the new lower bound is that the random oracle bits connect instead to distributions over the input space in an average-case argument. This connection is expressed well in their paper.
Sometimes a new idea comes from a new mathematical object, but other times it comes from a new way of applying and controlling a known object. Leslie Valiant in the late 1970s introduced the kind of projection that can do the following to each variable in a Boolean or numerical formula:
An equivalent rule to the last is that you can rename two variables to be the same variable . The substitution applies simultaneously to every occurrence of a variable. By undoing this rule one can convert every formula with occurrences of variables into a formula with different variables each occurring once. This is called a read-once formula and is unique up to permuting or renaming variables, so every is a projection of a read-once formula. The formulas targeted in their proof are already read-once, so the game becomes how to analyze the formulas that arise as projections of .
When only the first two rules are applied, is a restriction of . Håstad’s 1986 proof technique analyzed random restrictions obtained by independently leaving each variable alone with some probability and then assigning a random 0-or-1 value to each variable not left alone. Restrictions applied to a read-once formula not only preserve its structure but also preserve read-onceness, which keeps the behavior of different parts of the formula independent. This independence is sacrificed by using projections, which could “entangle” different parts of the formula and thereby worsen bias.
The proof technique both then and now works by reductio ad absurdum on the depth . is always a monotone alternating – formula of a kind introduced by Mike Sipser with at the inputs, so the output gate is if is even, if odd. It is “tuned” by giving the fan-in at leach level , , so . Håstad had shown that with the right combination of and , a random restriction makes equivalent to a formula with a large enough expected number of variables that embeds the relevant structure of . A circuit of small enough size and depth , however, with high probability gets beaten down by the restrictions into a simpler form on a pace that cannot be sustained for keeping up with .
The reductio can then continue with , until an obvious contradiction is reached on going from to with the circuit that results. One can with more care actually structure the proof as a standard induction going from lower bounds for to the desired ones for , but the mechanism is the same either way.
For an average-case argument one also needs to preserve the balance between arguments giving and those giving . Otherwise the trivial circuit that always says “no” or its partner that always says “yes” already has more advantage than the desired conclusion allows. This is where one might first think of the third projection rule as counter-productive. By identifying and , any “bias” in might be propagated to more parts of the formula. However, this also intuitively creates greater sensitivity in terms of influence as defined above. If is sensitive, then letting with bit flipped, we have that and balance each other.
Rossman, Servedio, and Tan craft the argument, using a slightly different “tuning” of from Håstad’s, so that the benefits carry the day: retains its structure and balance and hardness but the prospective smaller-depth circuit “collapses.” As they note, random projections had been used in a 2001 paper by Russell Impagliazzo and Nathan Segerlind for a proof complexity lower bound, but apparently had not been applied so bluntly to circuits.
Their projections first determine blocks of variables that get mapped to the same new variable for different . Then they can be specified by a distribution over restrictions, thus simplifying the randomness analysis. If the circuit has a gate that accesses some and some , then after the projection the gate accesses both and and hence can be collapsed to (if the gate is AND) or (if OR). Another difference from Håstad’s technique is that they need to adjust the projection probability adaptively depending on outcomes at previous depths. This is the point where we say to refer to their 59-page paper for details, but they also have excellent further explanations in sections 4 and 7.
Dick and I are excited because we have thought about inductive lower bound arguments whose base cases involve read-once formulas and whose inductive steps are projections. In a post with Dick three years ago I described one case that may retain interest even though the statement it was trying to prove turned out to be false.
My planned attack began with a “niceness” theorem for functions having a funny kind of read-once arithmetical formula that has powering as a primitive operation—for instance, counts as one not three occurrences of the variables in contrast to —but does not have an additive constant in any subterm. I proved that the first partial derivatives form a Gröbner basis under any admissible monomial ordering. If one allows additive constants, then the partials still bound such a Gröbner basis. The argument is less simple than one might expect and does not obviously extend to higher derivatives—at least not obviously to me, as I’ve sat on revisions of a paper for over a decade while trying to prove that. But the theorem is still good enough to use as a base case and ask:
How fast can various complexity measures associated to Gröbner bases grow under successive applications of (random) projections?
My computer runs back then were sobering: even after just a few projections in some targeted cases the answer was, mighty fast. And as I related in the post, the desired assertion about a certain “monomial complexity” measure was refuted. But there are other formulations and conjectures and measures to try, and this new result in the Boolean case may give encouragement to try them. There is also the difference that in my case the read-once formula was “easy” and was derived from a given function we are trying to prove “hard” by undoing projections, whereas in the new case the functions are read-once but are the ones we are showing hard for lower-depth circuits. So perhaps this all runs in the opposite direction—but still the prospect of new ways to control projections is pretty cool.
What new results may come out of this breakthrough lower bound technique?
]]>
Problems beyond brute force search
Cropped from Wikipedia source |
Hans-Joachim Bremermann was a mathematician and biophysicist. He is famous for a limit on computation, Bremermann’s limit, which is the maximum computational speed of a self-contained system in the material universe.
Today Ken and I wish to talk about the limit and why it is not a limit.
A transcomputational problem is a problem that requires processing of more than bits of information. The number comes from Earth-scale considerations, but adding less than 30 to the exponent breaks the scale of the known universe. Our friends at Wikipedia say:
This number is, according to Bremermann, the total number of bits processed by a hypothetical computer the size of the Earth within a time period equal to the estimated age of the Earth. The term transcomputational was coined by Bremermann.
What is interesting is that he thought about “transcomputational” problems in 1962. Yes almost a decade before the P=NP problem was stated. See his paper for details.
He noted back then that certain problems were beyond any reasonable brute-force search. In his own words:
The experiences of various groups who work on problem solving, theorem proving and pattern recognition all seem to point in the same direction: These problems are tough. There does not seem to be a royal road or a simple method which at one stroke will solve all our problems. My discussion of ultimate limitations on the speed and amount of data processing may be summarized like this: Problems involving vast numbers of possibilities will not be solved by sheer data processing quantity. We must look for quality, for refinements, for tricks, for every ingenuity that we can think of. Computers faster than those of today will be a great help. We will need them. However, when we are concerned with problems in principle, present day computers are about as fast as they ever will be. We may expect that the technology of data processing will proceed step by step—just as ordinary technology has done. There is an unlimited challenge for ingenuity applied to specific problems. There is also an unending need for general notions and theories to organize the myriad details.
Quite insightful for a paper that dates a decade before Cook-Karp-Levin on P=NP. It also predates the limits associated to Jacob Bekenstein’s bound on the information capacity of finite space and/or to Rolf Landauer’s principle.
One wonders what might have happened if Bremermann’s paper had been better known in our theory community. Ken notes that the Russian theory community in the 1960s highlighted the question of perebor—brute-force search. But he senses the emphasis was on problems for which it could be necessary in the abstract, rather than tied to Earth-scale considerations like Bremermann’s.
Of course there are several eventualities that were missed. One of course is quantum computation—I believe all his calculations depend on a classical view of computation. There are several other points that we can raise to attempt to beat his “limit.”
Change the algorithms: Of course his limit could be applied to computing primality, for example. The brute force method is hopeless for even modest-sized numbers. Yet we know methods that are much better than brute force and so we can easily beat his limit.
Steve Wozniak visited Buffalo yesterday as a UB Distinguished Speaker. In a small group attended by Ken, he told his standard anecdote about the first program he ever wrote. This was to solve the Knight’s Tour problem on an chessboard. He first coded a brute-force solution trying all Knight moves at each step but realized before he hit “run” that it would take about years. This awakened him, he said, to the fact that good algorithms have to go hand-in-hand with good hardware.
Change the answers: Another method is to change what we consider as an answer. Approximation algorithms of course are one important example: allow the answer to be near the optimal one. This has opened the floodgates to increase the class of problems that we can solve.
Change the problems: Another method is to change the problems that we attack. In many cases we can avoid general problems and exploit special structure of a problem. Examples that come to mind include: replace dense matrices by sparse ones; replace arbitrary graphs by planar ones or those with restricted minors; and replace data analysis of arbitrary data sets by analysis of data that is generated with specific noise, like Gaussian.
Change the world: We have posted about the idea of a world without true randomness, presenting Leonid Levin’s proof that SAT is nearly polynomially solvable in such a world. That post offered the weaker idea that every efficient generator of SAT instances might be solved by Levin’s algorithm on all but finitely many instances. The finite bound might be huge, but the fact of Levin’s algorithm would carry weight: it would solve everything else based solely on the principle that “nothing succeeds like success.” We can put this idea in concrete terms like Bremermann’s:
Could we live in a world where the act of creating an instance that requires processing more than bits of information requires processing more than bits of information?
We note that Larry Stockmeyer proved that every Boolean circuit capable of deciding a certain class of logical formulas that fit into 8 lines of 80-column ASCII text must have more gates than atoms in the observable universe. But this does not rule out a little algorithm solving every that we could generate—unless we spend the time to cycle through every legal formula that fits into 640 characters.
Are there realistic limits on computation of the type that Bremermann postulated? What are the right limits in light of today’s insights into computation?
]]>
A conjecture about faculty behavior
“Dr. Kibzwang” source |
Colin Potts is Vice Provost for Undergraduate Education at Georgia Tech. His job includes being a member of the President’s Cabinet—our president, not the real one—and he is charged with academic policies and changes to such policies. He is also a College of Computing colleague and fellow chess fan.
Today I want to state a conjecture about the behavior of faculty that arose when Tech tried to change a policy.
I am currently at Georgia Tech, but this conjecture applies I believe to all institutions, all faculty. Ken mostly agrees. Potts recently supplied a wonderful example of the conjecture in action—I will get to that after I formally state it. Perhaps we should call it Potts’s Conjecture?
The conjecture is easy to state:
Conjecture 1 Let be any issue and let be any collection of distinct faculty members. Then during a long enough period of email exchanges among the above faculty on at least opinions will be voiced.
You can see why I refer to it as an anti-pigeonhole principle. Right?
I have tried to prove the conjecture—I view it as a kind of Arrow’s Paradox. I have failed so far to obtain a formal proof of it. The conjecture does have the interesting corollary:
Corollary 2 Let be any issue and let be any collection of distinct faculty members. Then during a long enough period of email exchanges on the issue some faculty member will voice at least two different opinions.
A weaker version that we will cleverly call The Weak Conjecture is the following:
Conjecture 3 Let be any issue and let be any collection of distinct faculty members. Then during a long enough period of email exchanges on the issue at least opinions will be voiced.
The point is that the total number of opinions is unbounded. Strong or weak, we can call it the Conjecture.
Of course, being mathematicians we want proofs not examples. But as in areas like number theory, one is often led to good conjectures by observations. In any event simple tests of conjectures are useful to see if they are plausible enough to try to prove.
Here is the policy change that has been suggested. You are free to skip this or go here for even more detail. The point is that this is the issue .
Per the proposal, starting in fall 2015, classes would not meet on the Wednesday before Thanksgiving, giving students an additional day for their break. A change implemented as a pilot this spring will continue to stand, which eliminated finals being held during the last exam session on the Friday before Commencement to prevent finals overlapping with graduation festivities. Starting the next academic year, it was approved to extend the individual course withdrawal deadline by two weeks, allowing students more time to evaluate whether to drop a class.
In Spring 2016, the current Dead Week would be replaced with Final Instructional Class Days and Reading Periods. The new schedule would designate Monday and Tuesday of the penultimate week of the semester as Final Instructional Class Days, followed by a day and a half of reading period, and administering the first final on Thursday afternoon. Finals would be broken up by that weekend and resume Monday, with an additional reading period the next Tuesday morning. Finals would finish that Thursday, allowing Friday for conflict periods and a day between exams and Commencement.
The final recommendation would extend the length of Monday/Wednesday/Friday classes during spring and fall semesters from 50 to 55 minutes. Breaks between classes would extend from to 10 to 15 minutes.
Pretty exciting, no? No.
The result of Potts announcing the above was a storm of emails from our faculty members. As you would expect, given the Conjecture, this quickly led to a vast number of opinions. The number of opinions seems easily to follow the Conjecture.
Ken analogizes this kind of policy tuning for a university to finding a regional optimum in the landscape of a multivariable function . A proposal like Potts’s, with so many little changed parts, resembles a step in simulated annealing where one periodically jumps out of a well to test for better conditions in another. He is not surprised that such a ‘jump’ would bring multiple reactions from faculty.
Even so, however, one would expect there to be a gradient in the new region so that opinions could converge to the bottom of the new well. This is a different matter: a helpful gradient should be in force after a jump.
April is the month when US undergraduates have been informed of all their college acceptances and in many—fortunate—cases must make a choice. Ken has a front-row seat this year. From comparing various colleges and universities with widely different policies, and noting the market incentive to diversify, he has come to a conjecture of his own:
Conjecture 4 There is no gradient: for any university , is defined only on a set of measure zero.
To all appearances, this conjecture implies the others. Is it capable of being proved? Again you—our readers—are best placed to furnish input for a proof.
Do you believe any of the conjectures? I hope we get lots of opinions…
Ken and I are divided: he thinks we will not get many, I think we will get a lot, and we both think that we may get just a couple. But in my opinion it is possible that …
[added to first paragraph; format fixes]
Theon of Alexandria was history’s main editor of Euclid’s Elements.
Today I want to talk about case analysis proofs.
The connection is that Euclid is sometimes credited with inventing case analysis proofs. He can also be credited as the first to evince a desire to avoid them. Euclid made a habit of giving just one case and leaving the reader to imitate his proof for others. One example is the theorem that given a triangle with apex , every other point on the same side of the base as makes either or . Euclid gives a proof only for the case where is outside the triangle. In other cases could be inside the triangle or incident to one of its edges.
Theon lived from 335 to 405 CE in Euclid’s town of Alexandria. He was the last Director of the Library of Alexandria before its final destruction by fire toward the end of the 4th century. It is possible that he directed the entire research institution, called the Musaeum in honor of the Muses, that had been built around the library, and whose vestiges and successor the Serapeum were destroyed in or around 391 CE. He was also the father of the mathematician Hypatia, who was gruesomely murdered in 415 by a mob supporting the Christian bishop Cyril against the Roman governor Orestes whom she advised.
Sometimes Euclid left out cases of theorems or proofs because he wasn’t going to use them later. Theon filled in several such cases and even added a theorem or two of his own. His greatest service is that he filled in extra propositions and lemmas to some of Euclid’s arguments to make them easier to follow. Theon’s editions were the only ones known until an earlier one without his emendations was discovered at the Vatican in 1803, and they remained the standard for school texts in Britain and elsewhere clear through the 1800s.
Many theorems have clean proofs, some have technical proofs, some have messy proofs. One type of messy proof uses many different cases. These proofs begin like this:
We note that there are ten possible cases based on the
The proof then proceeds to analyze each of the ten cases. Ugh. I personally do not like such proofs—who does? The main problem with them is the danger of both omission and perdition: If you omit a case that needs to be there then the proof is incomplete—or worse, wrong—while if you include all cases but err on a single one, then the whole proof is lost.
Even if the proof is correct, who will read it? Our friends at Wikipedia redirect “proof by cases” to the old classical name, proof by exhaustion. Today this connotes how the reader might feel. Concretely, they note:
A proof with a large number of cases leaves an impression that the theorem is only true by coincidence, and not because of some underlying principle or connection.
Nevertheless, they note some theorems for which the only known proofs have many cases:
Thomas Hales’s proof of the last of these has recently been formally completed using two computerized proof assistants.
Moreover, last year’s computerized advance on the Erdős Discrepancy Problem used SAT-solvers in an overt kind of proof-by-exhaustion. Perhaps only a computer can love such a proof.
More concretely, perhaps some theorems elude proof because we humans don’t see a good way to break things down into cases, whereas computers can try many options. But saying this brings the matter around full circle because it is asking us humans for a guiding principle to form the cases. This might encourage us instead to go back and look for a guiding principle by which to do the proof without cases.
Sometimes one can avoid case-analysis proofs. Sometimes finding the right wording will do it. In a pumping-lemma proof that the language is not context-free, one might break into cases according to whether a certain pair of critical substrings includes ‘s, ‘s, and/or ‘s. Or you can just say the pair cannot include all three, so that the three “regions” of a pumpable string cannot stay in sync with each other when forming or and so on. But this is only wording. Can we give a more concrete suggestion?
Here is a method. If you can manage it then I think it is the best method. It is to prove a more general statement that implies what you previously could only prove by case analysis.
Here are some simple examples of what I am thinking about.
Consider the following theorem from a webpage on proofs by exhaustion:
Theorem: If is a positive integer, then is divisible by .
The following is the case analysis proof that is given there:
This is all correct, but not very convincing. A better proof is to show the stronger result:
Theorem: If is a positive integer and is a prime, then is divisible by .
Suppose you have a graph from a certain family and want to prove that it has a path of length . A case analysis might work, but a better method might to note that all graphs in this family have high degree. Then invoke the famous theorem of Andrew Dirac:
Theorem: A graph with vertices is Hamiltonian if every vertex has degree or greater.
Suppose that you are confronted with proving that all the roots of some polynomial are real. You could compute the actual roots of the polynomial, but that is potentially error-prone. A better approach might be to find a symmetric real matrix so that the eigenvalues of are the roots of the polynomial.
A case analysis involving sums can sometimes be tamed by replacing the sums with integrals. Ken recalls a take-home final on which he spent ten pages arranging terms of a multiple summations in various ways according to the values of some parameters, only to learn from the answer key that it was a one-page consequence of Fubini’s Thoerem.
Is there a way to quantify the extent to which a theorem needs case analysis in its proofs?
]]>
Congratulations to John Nash and Louis Nirenberg on the 2015 Abel Prize
Combined from src1, src2. |
John Nash and Louis Nirenberg have jointly won the 2015 Abel Prize for their work on partial differential equations (PDEs). They did not write any joint papers, but Nirenberg evidently got Nash excited about David Hilbert’s 19th problem during Nash’s frequent visits to New York University’s Courant Institute in the mid-1950s. Nash in return stimulated Nirenberg by his verbal approach of barraging a problem with off-kilter ideas. The Norwegian Academy of Sciences and Letters recognized their ‘great influence on each other’ in its prize announcement.
Today we congratulate both men on their joint achievement.
Hilbert’s 19th problem asked whether all solutions to certain partial differential equations must be analytic functions—that is, expressible as power series on local neighborhoods. Enough progress had been made since the 1930s that the remaining task could be likened to building a short road bridge without central stanchions or much room below for support. If you just shoot the road across it is hard to maintain the level needed to reach the other side. But if you aim up then you create more room to make an arch for best support.
The level known before Nash went to work—and Ennio De Giorgi slightly earlier in Italy—was that solutions to Hilbert’s equations gave a property on second derivatives (pardoning a negligible set of points with discontinuities or other bad non-differentiable behavior) that was insufficient. The support structure on the other side of the bridge needed some kind of continuity condition on the first partial derivatives. The task was aiming at a condition low enough to prove but high enough to land where needed on the other side. This makes us wonder whether we have similar situations in computational complexity without fully realizing it.
Nirenberg is one of few surviving researchers who worked on the Manhattan Project—in his case, on a part that had been contracted to Canada’s National Research Council in Montreal. Richard Courant’s son Ernst was a co-worker and suggested NYU as a destination for a Master’s, which led to doctoral work and affiliation there. For his PhD thesis he completed an attack by Hermann Weyl on the problem of embedding the 2-sphere equipped with any Riemannian metric having positive Gaussian curvature into as a convex surface with the standard Euclidean metric so that paths of corresponding points have the same length under the respective metrics. With Louis Caffarelli and Joseph Kohn he gave what are still considered the most stringent restrictions on possible singularities in solutions to the Navier-Stokes equations. He is acclaimed as the grand master of enduringly useful inequalities, which he said he “loves” more than equations.
The basic definition for a function to be continuous is that for every and there exists such that whenever , . Actually, a more basic definition is that for every open subset of , is open as a subset of , but we are presupposing metrics on and so that is defined for each.
If for every there is a that works for all , then is uniformly continuous. In much of real or complex analysis this is the strongest uniformity condition one needs. It comes for free if is compact. We can postulate a mapping sending to :
However, this still does not articulate a numerical relationship between and . It does not guarantee differentiability, much less that the first derivatives be continuous.
The key conditions have the form
where and are real constants. If and then is a contracting mapping. Contracting mappings can be called fellow-travelers of Nash’s work. Nash used the Brouwer fixed-point theorem to prove his famous theorem about equilibria in non-zero-sum games, but one can also use a more-general fixed-point theorem together with contracting mappings. Matthias Günther found an alternative proof of Nash’s equally great embedding theorem for Riemannian manifolds into Euclidean space via contracting mappings.
If and is arbitrary, then satisfies a Lipschitz condition, named for Rudolf Lipschitz. Although Lipschitz conditions are commonly available and frequently useful, they were still too high to aim for this problem. What De Giorgi and Nash accomplished was showing that things could work with any appropriately given or chosen . The criterion with allowed is called Hölder continuity, named for Otto Hölder.
Hölder continuity turned out to be the key for Hilbert’s 19th. The proof again was like building a bridge. For any instance of Hilbert’s equations, one could take high enough and find to make the inequality work. Given any (and ), analyticity could be shown to follow. I wonder if we can solve open problems in computer theory not by attacking them directly but by trying to set up this kind of proof structure.
Applying this idea could be as simple as the condition for saying one complexity class is contained in another class . Consider the statement
where the classes are represented by standard computable enumerations of poly-time NTMs and of poly-time DTMs, respectively. The enumerations are nondecreasing with respect to code size. In terms of those machines, the statement is:
We can strengthen this by insisting that be given by a computable mapping of :
Thus a complexity class containment involves a mapping between “spaces” of machines. We can ask what further conditions such a mapping may or must meet. If we start with machines and that are “close” in some respect, how far apart can the machines and be?
Because has complete sets we get some further properties for free. If then there is an such that . The mapping from executes the reduction to and composes its output with to get such that .
It is important to note that although inputs of length given to are expanded by the reduction into formulas of size more than linear in which are input to , the code of simply embeds that of and and so has size linear in . Moreover, if we weaken the mapping condition to say
where means that is finite, then we can employ Leonid Levin’s universal search algorithm to write the code of in advance. This ensures that the code expansion from to has a reasonable additive constant. In any event, with respect to the ‘metric’ of code size we can deduce a kind of Lipschitz condition: for all :
And with respect to running time, although that of can be astronomical (as we noted in last year’s post), it is still for some fixed . The running time of does expand inputs into formulas of size (the tilde means ignoring polynomials in ), which makes an overall runtime of . Rather than use “” for exact runtimes, let us ignore more than just factors by defining a function to be if for all ,
What we’d like to do take two machines and —deterministic or nondeterministic—that have runtimes in and , respectively, and define a distance in terms of and . We’d further like to arrange at least that under the hypothesized mapping ,
perhaps with . This uses the putative runtime of to create an analogue of a Hölder condition.
If we define the metric simply on the exponents as then we get a Lipschitz condition. The running times of and become and , so their -distance is (at most) . However, we would like to involve quantities like “” and “” or something else that is exponential in and/or in the metric. We could try , but then to get even a Hölder condition on the mapping we are seeking such that
This is not valid without further qualification because is possible, among other things. We would be interested to find a reasonable metric based on running-time and/or program size that gives a Hölder but not Lipschitz condition.
Can happen anyway with a Hölder or even Lipschitz condition under a metric like ? It does with an oracle. The construction giving
basically maps each oracle NTM to an oracle DTM that simply bundles the translation of the code of into a formula into the oracle queries, and so has the same polynomial running time as up to log factors. Hence we can get a Lipschitz property under various distances that use the exponents in running times . This property does not necessarily “relativize,” and it may be interesting to ask what happens if it does.
Perhaps ideas like this can help probe other complexity class relations. When the classes do not have complete sets (or are not known to have them), even getting a computable embedding function can be problematic. That the concepts may be simple is not a block; the point is to find a combination of ideas that are conducive to deep uses. For instance, the maximum principle simply states that solutions to elliptic and parabolic PDEs attain their maximum on any connected open subset on the boundary of that subset. A Simons Foundation feature on Nirenberg quotes him as saying,
“I have made a living off the maximum principle.”
Of course this is similar to principles in convex optimization which Nash initially studied.
As with similar ideas we’ve posted, we’re casting about for a new handle on open problems. To quote Sylvia Nasar’s biography A Beautiful Mind on pages 218–221 about Hilbert’s 19th problem:
[Nash] had a theory that difficult problems couldn’t be attacked frontally. He approached the problem in an ingeniously roundabout manner, first transforming the nonlinear equations into linear equations and then attacking these by nonlinear means.
At worst we can pass on advice from De Giorgi, who narrowly beat Nash to the solution with a markedly different proof:
“If you can’t prove your theorem, keep shifting parts of the conclusion to the assumptions, until you can.”
Wikipedia’s bio of De Giorgi cites this from a MathOverflow thread titled “Should one attack hard problems?” that leads with
Nasar finishes her account by addressing the “shock” of De Giorgi’s earlier proof on Nash, quoting Peter Lax that when the two met at Courant in 1957, “it was like Stanley meeting Livingstone.” She puts more blame for Nash’s subsequent troubles, quoting Nash himself, on “his attempt to resolve the contradictions in quantum theory.” Which we have also been guilty of promoting. Oh well.
Can such ideas of continuity, metrics, and more broadly topology help to gain insight about complexity classes?
[typo fix: oracle NTM to oracle DTM, P_j –> P_k consistently, some other word changes]
Cropped from World Science Festival source |
Sean Carroll is a cosmologist in the Department of Physics at Caltech. He also maintains a blog, “Preposterous Universe,” and writes books promoting the public understanding of science. I have recently been enjoying his 2010 book From Eternity to Here: The Quest for the Ultimate Theory of Time.
Today—yes, Carroll would agree that there is a today—I would like to share an interpretation of a little quantum computing example that occurred to me while reading his book.
As befits a book about how much of physics is time-reversible, I’ve been reading it in reverse order of chapters. From my own scientific knowledge and reading of similar books I figured I knew the run-up material to his conclusions and speculations. Now I’m working backward to fill some gaps in my knowledge and highlight how he supported those speculations I query. I’ve been pleased to find freshness in his coverage even of well-traveled topics, and this extends to his chapter on quantum basics.
In a popular science book one tries to maximize mileage from simple examples but must heed the dictum ascribed to Albert Einstein to “make everything as simple as possible, but not simpler.” I didn’t think that quantum decoherence could be adequately illustrated with just two elements, but the reliable Carroll does so. Since this augments an example that Dick and I use in our quantum algorithms textbook, I thought we’d share it.
Since Carroll does not claim his answer based on papers with Jennifer Chen in 2004-05 does anything more than illustrate how an “ultimate theory” could work, I don’t think it’s “spoiling” to mention it here. His blog’s two regular graphics are the equation defining entropy on Ludwig Boltzmann’s tombstone in Vienna and this poster based on his book:
The similar diagram in his book identifies the waist as a point of lowest entropy for the entire cosmos. ‘Baby universes’ branch off in funnels in both (or one could say all) directions of time. We are in one of them, inflating out from our Big Bang, and while that was a state of incredibly low entropy within our branch, it is but a blip in a grand system that is able to increase its entropy without limit.
I would have liked to see more in the book on how this squares with the “BGV” conditions under which general relativity requires an expanding universe to have a boundary in the past, and how their model relates to the “all-or-nothing argument” covered in our post two years ago on Jim Holt’s book Why Does the World Exist? Why should the lowest entropy have some particular nonzero finite value—that is, why, if the waistline is to be labeled some kind of origin for the whole multiversal shebang? These elements are of course subservient to quantum theory in whichever theory of quantum gravity proves to reconcile best with relativity and observation.
Carroll has addressed much of this on his blog, in particular within this post a year ago on an origins/theology debate. That’s one thing blogs are good for, and he supplies many links including this long critique with much on BGV. But leaving everything aside, let’s think about systems with just two quantum elements that give binary classical values when observed. Carroll calls them “Mr. Dog” and “Miss Kitty,” whereas I’ll use other animals toward further conceptual associations.
In quantum computation, Boolean strings of bits are coded via qubits. The qubits are represented via orthogonal basis vectors in the vector space where . The standard quantum indexing scheme enumerates in lexicographical order as
Labeling these strings , the rule is that is encoded by the standard basis vector
with the lone in the -th position. Any linear combination is allowed as a quantum pure state provided . Measuring entirely then yields the output with probability . In particular, measuring yields with certainty.
This scheme telescopes for in that for for any binary strings and , the basis vector for their concatenation is given by the tensor product of and . Tensor product can be visualized first in the case of matrices and of arbitrary sizes. If
If is a matrix, then becomes an matrix, but it could also be regarded as a higher-dimensional object. Two vectors in column form are just the case . We can still visualize that the second vector is getting multiplied in block form by entries of the first vector. The indexing scheme kicks off with:
Then we obtain:
and so on for , , and .
Just as concatenation of strings is a basic operation apart from indexing notation, so too with tensor product. When , so that and have length , our representation of the tensor product involves -many indices. It seems silly to posit a quadratic expansion on top of an exponential explosion just for juxtaposing two -bit strings that don’t even interact, but the classical notation does this. Since we will limit to two qubits we don’t mind: , so . We think of as quantum coordinates and as classical indices for the same system.
Quantum operations on qubits are represented by matrices that are unitary, meaning that multiplied by its conjugate transpose gives the identity. The Hadamard matrix
is not only unitary but also self-adjoint: . It works on one qubit and maps to , which we abbreviate as , and maps to or for short.
We can interpret each entry as the transition amplitude of going from to . Since we are using column vectors the “from” is a column and the “to” is a row. Then is like a maze where each column point of entry allows either row as an exit, but the path that enters at and exits at picks up a amplitude along the way.
Think of as coming in “like a lamb” and as “like a lion.” Then (divided by ) represents the saying:
March comes in like a lion and goes out like a lamb.
Whereas, represents this winter’s actuality of coming in and going out like a lion. Let us call going lion-to-lamb or lamb-to-lion a flip, lamb-to-lamb a flap, and lion-to-lion a flop.
Both inputs and encounter amplitude for a flip, which translates to 50% probability in a measurement. The state gives equal amplitude to “lamb” or “lion” as outcomes, hence probability 50% each of observing them. The state gives different signs (speaking more generally, different phases) to the outcomes but still the same 50%-50% probabilities.
We might think that following with another operation would preserve the even chance of doing a flip, but instead it zeroes it. Multiplying gives because both off-diagonal entries sum a and a (divided by ). For the comes from the history of a flap then a flip, while the comes from a flip then a flop. The components flap-flip and flip-flop interfere, leaving no way the composed operations can accomplish a flip. If you come in as a lion then the actions flip-flap and flop-flip would individually make you a lamb, but they likewise interfere, leaving no option but to go out as a lion.
Put another way, applying to leaves a deterministic outcome of “lamb.” The coherence of the superposed state makes this possible. Likewise, applying to entails “lion.”
Now let’s introduce our second qubit and describe it as a butterfly, with meaning its wings are open and meaning closed. Here is an operation on two qubits:
This is a permutation matrix that swaps and but leaves and fixed. Viewed as a maze it routes column entrances to row exits with no choice, no meeting of corridors, and so executes a deterministic operation. In our system of lamb/lion and open/closed it says that March being a lamb leaves the butterfly undisturbed, but “lion” makes it shift its wings. So inversion is controlled by the component of the first qubit, hence the name CNOT for “Controlled NOT.”
Just by dint of juxtaposing the second qubit, we change the brute matrix notation for the Hadamard operation on our first qubit to:
Now equals the identity matrix, so we still have interference on the first qubit. But suppose we perform and then CNOT instead. Viewed as a quantum circuit going left to right it looks like this:
But since we are composing right-to-left on column vectors, the matrix for the combined operation is
On argument it yields . This state—call it —cannot be written as a tensor product of two one-qubit states. This means is entangled. In our terms it means the butterfly’s wings are open if and only if March is like a lamb; in case of lion they are shut. But if we only care about lamb-versus-lion we still have the same amplitudes and 50-50 split that we had with the one-qubit state . On argument the circuit entangles and instead.
Suppose we forgot about the butterfly and tried the same trick of applying once more to . We might expect to get “lamb” again, but what we get is:
This state is also entangled but gives equal probability to all outcomes. To the holder of the first qubit it works the same as a classical coin flip between “lamb” and “lion.” Indeed if we trace out the unseen butterfly, yields a mixed state that presents a classical distribution on , and applying to that creates no interference. Indeed, there are no cancellations at all in the entire matrix computation:
This is diagrammed by the following two-qubit quantum circuit:
This example shows two other fine points. First, the “copy-uncompute” trick (see Fig. 3 here) fails markedly when the value at the control of the CNOT gate is a superposition that is not close to a basis state. The entanglement involved in attempting to copy it via the CNOT destroys the interference needed to “uncompute” the first qubit back to , despite being its own inverse. Thus decoherence does not involve any “collapse” into classical behavior, but makes it emerge from unseen entanglements.
Second, if we run the circuit on the input state then we get which is fixed by CNOT, so the final gives back again. At no time was there any entanglement. Hence the entangling action of a quantum circuit is not invariant under the input being separable nor under choice of basis, though perhaps some algebraic-geometric measure of its entangling “potential” can be.
I had previously pictured ‘decoherence’ in terms of many little entanglements with the environment. What happens if the entanglement with our butterfly is minor? Let be a possible starting state for the butterfly, and consider the state
with chosen to normalize. Then
The probability of observing (“lion”) on the first qubit is just , regardless of . This practically negligible deviation can be interpreted in two different ways:
To continue the “butterflies” metaphor, pervasive could entail entanglement with millions of them. The question is whether this would collectively drive up beyond what is compatible with experimental results. Perhaps it wouldn’t; I’m not conversant enough with the models to guess. But I have a fresh angle that was brought again to mind by an association Carroll makes between information and the holographic principle (page 281):
[It] seems to be telling us that some things just can’t happen, that the information needed to encode the world is dramatically compressible.
If our environment is highly compressible, would this manifest as pervasive entanglement? Here are two naive thoughts. The first is that the maximally entangled state subtracts one bit of information from a two-bit system, likewise the related state . The naiveness is that the local computation , as we have seen, restores having equal-weight outcomes in the standard basis—so the idea is not local-invariant and prefers a basis. Yet it is possible that cross-cutting -entanglements could embody a large information deficit in any (natural) basis. The second naive thought is that states like might contribute even amounts to certain statistical measures in large-block fashion, so as to skew expectations of (non-)uniformity that assume full independence.
Neither thought is new—related ones have been raised in connection with cosmological problems and theories discussed in Carroll’s book, though apparently without traction as yet. What my “Digital Butterflies” post brings is a digital arena in which the information content of the environment can be varied and its dynamical effects studied. The environment stems from about 50,000 bits that initialize a tabulation hashing scheme employed by all major chess programs. Many programs generate the 50,000 bits pseudorandomly, though one could just drop them in from a truly random source since they are fixed. Perhaps the hash-table dependent effects observed in the post can inform these issues, and also work toward the “grail” of developing a practical distinguisher of general pseudorandomness.
How far can quantum computation help understand issues in cosmology?
Update 5/17/15: This 4/28 article in Quanta magazine by Jennifer Ouellette—Carroll’s wife—covers the “MERA” proposal and others attempting to derive space-time from entanglements. We have not had time to study this.
[added point to end of “Decoherence” section; two minor typo fixes]
Nun’s Priest’s Tale source |
Faadosly Polir is the older brother of Lofa Polir. You may recall he invented new ways to apply powerful mathematical techniques to prove trivial theorems, and she once claimed a great result on integer factoring. We have heard from both since, but they haven’t given us any new April Fool’s Day material, mainly because they weren’t fooling to begin with.
Today Ken and I wished to help you enjoy April Fool’s day.
We usually try to have fun on this day, but this year we have no idea that seems to be funny. We considered fooling that some university had created a faculty line for a computer mathematician. We’ve considered making light of some recent data on theory and rankings, but it’s high noon of hiring season hence serious. Chess cheating jokes?—no, Ken has been dealing with more cases at once than ever before. Even our recent shout-out to co-pilots has been overcome by events beyond our ability to give other than condolences. Nor were we able to create a phony result to celebrate this day—because good phony results still require much research effort. We thought about re-running an old April Fool discussion, as something easy while lots of other stuff is happening. But we are committed to give you, our readers, your money’s worth. So we rejected the re-run idea.
April Fool’s Day itself may have begun because of a typographical error. Reading between the lines of Wikipedia’s account, Geoffrey Chaucer may have intended to write “Since March be gone, thirty days and two” in order to place the events of “The Nun’s Priest’s Tale” on the May 2 anniversary of a royal engagement. The text in surviving copies of his Canterbury Tales reads, however, “Since March began, thirty days and two”—placing on April 1 the action in which the proud rooster Chanticleer is tricked by a fox.
So since we ran out of ideas we decided to give up on April fooling and list results that sound like April Fool jokes, but are actually true. We hope this is still fun, interesting, and informative.
Smale’s Paradox:
Stephen Smale proved in 1958 that there is a regular homotopy between the standard immersion of the sphere in 3-space and , which represents the sphere being turned inside-out. He proved this indirectly, by showing that there was only one homotopy class for a category that includes and , by virtue of both causing the corresponding homotopy group in the Stiefel manifold to vanish. When faced with this particular example, Smale’s graduate adviser, Raoul Bott, retorted that the result was “obviously wrong.” The resulting eversion of the sphere—a process of turning it inside out differentiably continuously (allowing self-intersection but not creasing)—was first visualized concretely by Arthur Shapiro in consultation with others including Bernard Morin, who has been blind since childhood. An animation is included in Vladimir Bulatov’s gallery of geometrical VRML movies.
Nikodym Set:
A Nikodym set, named for Otto Nikodym who powerfully extended a theorem by Johan Radon in measure theory, is a set of points in the unit square such that:
This is not like the Banach-Tarski paradox involving tricks with non-measurable sets: everything works nicely according to Hoyle or at least according to Henri Lebesgue.
Needle Sets:
A needle embedded in a circular disk of diameter 1 can be rotated continuously and snugly by 180 degrees within the disk. The same needle can also be inverted by rotating it inside a deltoid curve whose three points poke somewhat outside the circle but has smaller overall area. How small can a shape allowing the needle to rotate be? Soichi Kakeya raised the question and Abram Besicovitch answered it by showing that its Lebesgue measure can be as close to zero as desired. Moreover, if we only require that the needle can be placed pointing in any direction, without the continuous rotation, the measure can be zero. Then it supplements the Nikodym set. Zeev Dvir has written a nice survey connecting conjectured properties of analogous sets in higher dimensions to constructions over finite fields that relate to randomness extractors.
Potato Paradox:
Let’s move from topology and measure to simple counting. Suppose we have a 100-pound sack of hydrated potatoes that is 99% water. We let the sack dry out until it is 98% water. How much does it weigh now? The answer surprisingly is not but rather . The simplest of various reasonings given here is that the 1% non-water of 100 pounds has stayed the same while becoming 2% of something, so that something must be 50 pounds.
April Fool’s quotes source |
Polynomial Time is Big:
For all real numbers the complexity classes are all distinct owing to the Time Hierarchy Theorem: whenever there is a language
Moreover the classes nest, so . Since there are uncountably many real numbers , this seems to suggest there must be uncountably many languages involved. But contains only countably many languages.
What is the answer? The answer is that over a hundred years before people defined complexity classes we had this kind of ‘paradox’ just in the number line. Define to be the set of rational numbers between and . There are only countably many rational numbers, and yet whenever .
Probabilistic Polynomial Time Goes Hyper:
For any real number we can also define to be the class of languages such that for some relation decidable in time polynomial in ,
where ranges over for some polynomial . The standard complexity class uses . It is not difficult to show that for any rational , indeed any for which the first decimal places in binary are computable in time, . However, when is uncomputable, contains undecidable languages. Indeed, take to be simply the relation , take , and take to be the “standard right cut” of :
Then , but is undecidable. The paradox is, why should the complexity class stay equal to the nice class on a dense subset of and yet suddenly jump up in power to include “hyper-computable” languages when infinitesimally passes through an uncomputable value?
From math and complexity we go on to physics.
Mpemba Effect:
Erasto Mpemba of Tanzania, while a secondary school student in 1963, observed that ice cream mixes froze faster when they had been initially heated, compared to mixes that had been kept in cold storage before freezing. This effect is still under debate even at the level of basic controlled evidence. This must be the simplest natural physical phenomenon for which the human race, despite splitting the atom and finding the Higgs Boson, has been unable to devise a convincing closed-form experiment, let alone verify. Even subjective matters such as whether tea tastes different depending on whether milk or the tea has been added to the cup first have been verified by rigorous experiments.
Table-top Dark Energy:
This compact device enables extracting the ubiquitous tension energy of space in order to power homes in regions with too frequent cloud-cover for reliable solar energy. It departs from previously patented approaches based on the Casimir effect and improves a previous design for a room-temperature power device in 1989 by Martin Fleischmann and Stanley Pons. “This is after all almost 70% of the entire mass-energy content of the Universe, so we should be able to harness it,” said a spokesperson. Indeed according to equations of physics the device should have about times the yield of a standard home gas furnace, and it was no surprise when Faadosly and Lofa told us they’d spent their year investing in it.
Have a fun and safe April Fool’s Day.
]]>
How to tell algorithms apart
Edgar Daylight was trained both as a computer scientist and as a historian. He writes a historical blog themed for his near-namesake Edsger Dijkstra, titled, “Dijkstra’s Rallying Cry for Generalization.” He is a co-author with Don Knuth of the 2014 book: Algorithmic Barriers Failing: P=NP?, which consists of a series of interviews of Knuth, extending their first book in 2013.
Today I wish to talk about this book, focusing on one aspect.
The book is essentially a conversation between Knuth and Daylight that ranges over Knuth’s many contributions and his many insights.
One of the most revealing discussions, in my opinion, is Knuth’s discussion of his view of asymptotic analysis. Let’s turn and look at that next.
We all know what asymptotic analysis is: Given an algorithm, determine how many operations the algorithm uses in worst case. For example, the naïve matrix product of square by matrices runs in time . Knuth dislikes the use of notation, which he thinks is used often to hide important information.
For example, the correct the count of operations for matrix product is actually
In general Knuth suggests that we determine, if possible, the number of operations as
where and are both explicit functions and is lower-order. The idea is that not only does this indicate more precisely that the number of operations is , not just , but also is forces us to give the exact constant hiding under the . If the constant is only approached as increases, perhaps the difference can be hidden inside the lower-order term.
An example from the book (page 29) is a discussion of Tony Hoare’s quicksort algorithm. Its running time is , on average. This allows one, as Knuth says, to throw all the details away, including the exact machine model. He goes on to say that he prefers to know:
that quicksort makes comparisons, on average, and exchanges, and stack adjustments, when sorting random numbers.
Theorists create algorithms as one of their favorite activities. A classic way to get a paper accepted into a top conference is to say: In this paper we improve the running time of the best known algorithm for X from order to by applying methods Y.
But is the algorithm of this paper really new? One possibility is that the analysis of the previous paper was too coarse and the algorithms are actually the same. Or at least equivalent. The above information is logically insufficient to rule out this possibility.
Asymptotic analysis à-la Knuth comes to the rescue. Suppose that we proved that the older algorithm X ran in time
Then we would be able to conclude—without any doubt—that the new algorithm was indeed new. Knuth points this out in the interviews, and adds a comment about practice. Of course losing the logarithmic factor may not yield a better running time in practice, if the hidden constant in is huge. But whatever the constant is, the new algorithm must be new. It must contain some new idea.
This is quite a nice use of analysis of algorithms in my opinion. Knowing that an algorithm contains, for certain, some new idea, may lead to further insights. It may eventually even lead to an algorithm that is better both in theory and in practice.
Daylight’s book is a delight—a pun? As always Knuth has lots to say, and lots of interesting insights. The one caveat about the book is the subtitle: “P=NP?” I wish Knuth had added more comments about this great problem. He does comment on the early history of the problem: for example, explaining how Dick Karp came down to Stanford to talk about his brilliant new paper, and other comments have been preserved in a “Twenty Questions” session from last May. Knuth also reminds us in the book that as reported in the January 1973 issue of SIGACT News, Manny Blum gave odds of 100:1 in a bet with Mike Paterson that P and NP are not equal.
[fixed picture glitch at top]
Neil L. is a Leprechaun. He has visited me every St. Patrick’s Day since I began the blog in 2009. In fact he visited me every St. Patrick’s Day before then, but I never talked about him. Sometimes he comes after midnight the night before, or falls asleep on my sofa waiting for me to rise. But this time there was no sign of him as I came back from a long day of teaching and meetings and went out again for errands.
Today Ken and I wish you all a Happy St. Patrick’s Day, and I am glad to report that Neil did find me.
When I came back I was sorting papers and didn’t see him. I didn’t know he was there until I heard,
Top o’ the evening to ye.
Neil continued as he puffed out some green smoke: “I had some trouble finding you this year. Finally got where you were—good friends at your mobile provider helped me out.” I was surprised, and told him he must be kidding. He answered, “Of course I always can find you, just having some fun wi’ ye.” Yes I agreed and added that I was staying elsewhere. He puffed again and said “yes I understand.”
I said I had a challenge for him, a tough challenge, and asked if he was up for it. He said, “Hmmm, I do not owe you any wishes, but a challenge… Yes I will accept a challenge from ye, any challenge that ye can dream up.” He laughed, and added, “we leprechauns have not lost a challenge to a man for centuries. I did have a cousin once who messed up.”
I asked if he would share his cousin’s story, and he nodded yes. “‘Tis a sad story. My cousin was made a fool of once, a terrible black mark on our family. Why, we were restricted from any St Patrick Day fun for a hundred years. Too long a punishment in our opinion—the usual is only a few decades. Do ye want to know what my cousin did? Or just move on to the challenge? My time is valuable.”
I nodded sympathetically, so he carried on.
“One fine October day in Dublin me cousin was sitting under a bridge—under the lower arch where a canalside path went.
“He spied a gent walking with his wife along the path but lost in thought and completely ignoring her. He thought the chap would be a great mark for a trick but forgot the woman. She spied him and locked on him with laser eyes and of course he was caught—he could not run unless she looked away.
“He tried to ply her with a gold coin but she knew her leprechaun lore and was ruthless. He resigned himself to granting wishes but she would not have that either. With her stare still fixed she took off her right glove, plucked a shamrock, and lay both at his feet for a challenge. A woman had never thrown a challenge before, and there was not in the lore a provision for return-challenging a woman. So my cousin had to accept her challenge. It came with intense eyes:
“I challenge you to tell the answer to what is vexing and estranging my husband.”
“Aye,” Neil sighed, “you or I or any lad in the face of such female determination would be reduced to gibberish, and that is what me cousin blurted out:
“The gent looked up like the scales had fallen from his eyes, and he embraced his wife. This broke the stare, and my cousin vanished in great relief. And did the gent show his gratitude? Nay—he even carved that line on the bridge but gave no credit to my cousin.”
I clucked in sympathy, and Neil seemed to like that. He put down his pipe and gave me a look that seemed to return comradeship. Then I understood who the “cousin” was. Not waiting to register my understanding, he invited my challenge as a peer.
I had in fact prepared my challenge last night—it was programmed by a student in my graduate advanced course using a big-integer package. Burned onto a DVD was a Blum integer of one trillion bits. I pulled it out of its sleeve and challenged Neil to factor it. The shiny side flashed a rainbow, and I joked there could really be a pot of gold at the end of it.
Neil took one puff and pushed the DVD—I couldn’t tell how—into my MacBook Air. The screen flashed green and before I could say “Jack Robinson” my FileZilla window opened. Neil blew mirthful puffs as the progress bar crawled across. A few minutes later came e-mail back from my student, “Yes.”
I exclaimed, “Ha—you did it—but the point isn’t that you did it. The point is, it’s doable. You proved that factoring is easy. Could be quantum or classical but whatever—it’s practical.”
Neil puffed and laughed as he handed me back the suddenly-reappeared disk and said, “Aye, do ye really think I would let your lot fool me twice?”
I replied, “Fool what? You did it—that proves it.”
“Nay,” he said, “indeed I did it—I cannot lie—but ye can’t know how I did it enough to tell whether a non-leprechaun can do it. And a computer that ye build—be it quantum or classical or whatever—is a non-leprechaun.”
It hit me that a quantum computer that cannot be built is a leprechaun, and perhaps Peter Shor’s factoring algorithm only runs on those. But I wasn’t going to be distracted away from my victory.
“How can it matter whether a leprechaun does it?” Neil retorted that he didn’t have to answer a further challenge, “it’s not like having three wishes, you know.” But he continued, “since ye are a friend, I will tell ye three ways it could be, and you can choose one ye like but know ye: it could still be a fourth way.
“And I left ye a factor, but your student already had it, so I left ye no net knowledge at all.” And with a puff of smoke, he was gone.
Did I learn anything from the one-time factoring of my number? Happy St. Patrick’s Day anyway.
[moved part of dialogue at end from 2. to 1.]