Combined from src1, src2. |
John Nash and Louis Nirenberg have jointly won the 2015 Abel Prize for their work on partial differential equations (PDEs). They did not write any joint papers, but Nirenberg evidently got Nash excited about David Hilbert’s 19th problem during Nash’s frequent visits to New York University’s Courant Institute in the mid-1950s. Nash in return stimulated Nirenberg by his verbal approach of barraging a problem with off-kilter ideas. The Norwegian Academy of Sciences and Letters recognized their ‘great influence on each other’ in its prize announcement.
Today we congratulate both men on their joint achievement.
Hilbert’s 19th problem asked whether all solutions to certain partial differential equations must be analytic functions—that is, expressible as power series on local neighborhoods. Enough progress had been made since the 1930s that the remaining task could be likened to building a short road bridge without central stanchions or much room below for support. If you just shoot the road across it is hard to maintain the level needed to reach the other side. But if you aim up then you create more room to make an arch for best support.
The level known before Nash went to work—and Ennio De Giorgi slightly earlier in Italy—was that solutions to Hilbert’s equations gave a property on second derivatives (pardoning a negligible set of points with discontinuities or other bad non-differentiable behavior) that was insufficient. The support structure on the other side of the bridge needed some kind of continuity condition on the first partial derivatives. The task was aiming at a condition low enough to prove but high enough to land where needed on the other side. This makes us wonder whether we have similar situations in computational complexity without fully realizing it.
Nirenberg is one of few surviving researchers who worked on the Manhattan Project—in his case, on a part that had been contracted to Canada’s National Research Council in Montreal. Richard Courant’s son Ernst was a co-worker and suggested NYU as a destination for a Master’s, which led to doctoral work and affiliation there. For his PhD thesis he completed an attack by Hermann Weyl on the problem of embedding the 2-sphere equipped with any Riemannian metric having positive Gaussian curvature into as a convex surface with the standard Euclidean metric so that paths of corresponding points have the same length under the respective metrics. With Louis Caffarelli and Joseph Kohn he gave what are still considered the most stringent restrictions on possible singularities in solutions to the Navier-Stokes equations. He is acclaimed as the grand master of enduringly useful inequalities, which he said he “loves” more than equations.
The basic definition for a function to be continuous is that for every and there exists such that whenever , . Actually, a more basic definition is that for every open subset of , is open as a subset of , but we are presupposing metrics on and so that is defined for each.
If for every there is a that works for all , then is uniformly continuous. In much of real or complex analysis this is the strongest uniformity condition one needs. It comes for free if is compact. We can postulate a mapping sending to :
However, this still does not articulate a numerical relationship between and . It does not guarantee differentiability, much less that the first derivatives be continuous.
The key conditions have the form
where and are real constants. If and then is a contracting mapping. Contracting mappings can be called fellow-travelers of Nash’s work. Nash used the Brouwer fixed-point theorem to prove his famous theorem about equilibria in non-zero-sum games, but one can also use a more-general fixed-point theorem together with contracting mappings. Matthias Günther found an alternative proof of Nash’s equally great embedding theorem for Riemannian manifolds into Euclidean space via contracting mappings.
If and is arbitrary, then satisfies a Lipschitz condition, named for Rudolf Lipschitz. Although Lipschitz conditions are commonly available and frequently useful, they were still too high to aim for this problem. What De Giorgi and Nash accomplished was showing that things could work with any appropriately given or chosen . The criterion with allowed is called Hölder continuity, named for Otto Hölder.
Hölder continuity turned out to be the key for Hilbert’s 19th. The proof again was like building a bridge. For any instance of Hilbert’s equations, one could take high enough and find to make the inequality work. Given any (and ), analyticity could be shown to follow. I wonder if we can solve open problems in computer theory not by attacking them directly but by trying to set up this kind of proof structure.
Applying this idea could be as simple as the condition for saying one complexity class is contained in another class . Consider the statement
where the classes are represented by standard computable enumerations of poly-time NTMs and of poly-time DTMs, respectively. The enumerations are nondecreasing with respect to code size. In terms of those machines, the statement is:
We can strengthen this by insisting that be given by a computable mapping of :
Thus a complexity class containment involves a mapping between “spaces” of machines. We can ask what further conditions such a mapping may or must meet. If we start with machines and that are “close” in some respect, how far apart can the machines and be?
Because has complete sets we get some further properties for free. If then there is an such that . The mapping from executes the reduction to and composes its output with to get such that .
It is important to note that although inputs of length given to are expanded by the reduction into formulas of size more than linear in which are input to , the code of simply embeds that of and and so has size linear in . Moreover, if we weaken the mapping condition to say
where means that is finite, then we can employ Leonid Levin’s universal search algorithm to write the code of in advance. This ensures that the code expansion from to has a reasonable additive constant. In any event, with respect to the ‘metric’ of code size we can deduce a kind of Lipschitz condition: for all :
And with respect to running time, although that of can be astronomical (as we noted in last year’s post), it is still for some fixed . The running time of does expand inputs into formulas of size (the tilde means ignoring polynomials in ), which makes an overall runtime of . Rather than use “” for exact runtimes, let us ignore more than just factors by defining a function to be if for all ,
What we’d like to do take two machines and —deterministic or nondeterministic—that have runtimes in and , respectively, and define a distance in terms of and . We’d further like to arrange at least that under the hypothesized mapping ,
perhaps with . This uses the putative runtime of to create an analogue of a Hölder condition.
If we define the metric simply on the exponents as then we get a Lipschitz condition. The running times of and become and , so their -distance is (at most) . However, we would like to involve quantities like “” and “” or something else that is exponential in and/or in the metric. We could try , but then to get even a Hölder condition on the mapping we are seeking such that
This is not valid without further qualification because is possible, among other things. We would be interested to find a reasonable metric based on running-time and/or program size that gives a Hölder but not Lipschitz condition.
Can happen anyway with a Hölder or even Lipschitz condition under a metric like ? It does with an oracle. The construction giving
basically maps each oracle NTM to an oracle DTM that simply bundles the translation of the code of into a formula into the oracle queries, and so has the same polynomial running time as up to log factors. Hence we can get a Lipschitz property under various distances that use the exponents in running times . This property does not necessarily “relativize,” and it may be interesting to ask what happens if it does.
Perhaps ideas like this can help probe other complexity class relations. When the classes do not have complete sets (or are not known to have them), even getting a computable embedding function can be problematic. That the concepts may be simple is not a block; the point is to find a combination of ideas that are conducive to deep uses. For instance, the maximum principle simply states that solutions to elliptic and parabolic PDEs attain their maximum on any connected open subset on the boundary of that subset. A Simons Foundation feature on Nirenberg quotes him as saying,
“I have made a living off the maximum principle.”
Of course this is similar to principles in convex optimization which Nash initially studied.
As with similar ideas we’ve posted, we’re casting about for a new handle on open problems. To quote Sylvia Nasar’s biography A Beautiful Mind on pages 218–221 about Hilbert’s 19th problem:
[Nash] had a theory that difficult problems couldn’t be attacked frontally. He approached the problem in an ingeniously roundabout manner, first transforming the nonlinear equations into linear equations and then attacking these by nonlinear means.
At worst we can pass on advice from De Giorgi, who narrowly beat Nash to the solution with a markedly different proof:
“If you can’t prove your theorem, keep shifting parts of the conclusion to the assumptions, until you can.”
Wikipedia’s bio of De Giorgi cites this from a MathOverflow thread titled “Should one attack hard problems?” that leads with
Nasar finishes her account by addressing the “shock” of De Giorgi’s earlier proof on Nash, quoting Peter Lax that when the two met at Courant in 1957, “it was like Stanley meeting Livingstone.” She puts more blame for Nash’s subsequent troubles, quoting Nash himself, on “his attempt to resolve the contradictions in quantum theory.” Which we have also been guilty of promoting. Oh well.
Can such ideas of continuity, metrics, and more broadly topology help to gain insight about complexity classes?
[typo fix: oracle NTM to oracle DTM, P_j –> P_k consistently, some other word changes]
Cropped from World Science Festival source |
Sean Carroll is a cosmologist in the Department of Physics at Caltech. He also maintains a blog, “Preposterous Universe,” and writes books promoting the public understanding of science. I have recently been enjoying his 2010 book From Eternity to Here: The Quest for the Ultimate Theory of Time.
Today—yes, Carroll would agree that there is a today—I would like to share an interpretation of a little quantum computing example that occurred to me while reading his book.
As befits a book about how much of physics is time-reversible, I’ve been reading it in reverse order of chapters. From my own scientific knowledge and reading of similar books I figured I knew the run-up material to his conclusions and speculations. Now I’m working backward to fill some gaps in my knowledge and highlight how he supported those speculations I query. I’ve been pleased to find freshness in his coverage even of well-traveled topics, and this extends to his chapter on quantum basics.
In a popular science book one tries to maximize mileage from simple examples but must heed the dictum ascribed to Albert Einstein to “make everything as simple as possible, but not simpler.” I didn’t think that quantum decoherence could be adequately illustrated with just two elements, but the reliable Carroll does so. Since this augments an example that Dick and I use in our quantum algorithms textbook, I thought we’d share it.
Since Carroll does not claim his answer based on papers with Jennifer Chen in 2004-05 does anything more than illustrate how an “ultimate theory” could work, I don’t think it’s “spoiling” to mention it here. His blog’s two regular graphics are the equation defining entropy on Ludwig Boltzmann’s tombstone in Vienna and this poster based on his book:
The similar diagram in his book identifies the waist as a point of lowest entropy for the entire cosmos. ‘Baby universes’ branch off in funnels in both (or one could say all) directions of time. We are in one of them, inflating out from our Big Bang, and while that was a state of incredibly low entropy within our branch, it is but a blip in a grand system that is able to increase its entropy without limit.
I would have liked to see more in the book on how this squares with the “BGV” conditions under which general relativity requires an expanding universe to have a boundary in the past, and how their model relates to the “all-or-nothing argument” covered in our post two years ago on Jim Holt’s book Why Does the World Exist? Why should the lowest entropy have some particular nonzero finite value—that is, why, if the waistline is to be labeled some kind of origin for the whole multiversal shebang? These elements are of course subservient to quantum theory in whichever theory of quantum gravity proves to reconcile best with relativity and observation.
Carroll has addressed much of this on his blog, in particular within this post a year ago on an origins/theology debate. That’s one thing blogs are good for, and he supplies many links including this long critique with much on BGV. But leaving everything aside, let’s think about systems with just two quantum elements that give binary classical values when observed. Carroll calls them “Mr. Dog” and “Miss Kitty,” whereas I’ll use other animals toward further conceptual associations.
In quantum computation, Boolean strings of bits are coded via qubits. The qubits are represented via orthogonal basis vectors in the vector space where . The standard quantum indexing scheme enumerates in lexicographical order as
Labeling these strings , the rule is that is encoded by the standard basis vector
with the lone in the -th position. Any linear combination is allowed as a quantum pure state provided . Measuring entirely then yields the output with probability . In particular, measuring yields with certainty.
This scheme telescopes for in that for for any binary strings and , the basis vector for their concatenation is given by the tensor product of and . Tensor product can be visualized first in the case of matrices and of arbitrary sizes. If
If is a matrix, then becomes an matrix, but it could also be regarded as a higher-dimensional object. Two vectors in column form are just the case . We can still visualize that the second vector is getting multiplied in block form by entries of the first vector. The indexing scheme kicks off with:
Then we obtain:
and so on for , , and .
Just as concatenation of strings is a basic operation apart from indexing notation, so too with tensor product. When , so that and have length , our representation of the tensor product involves -many indices. It seems silly to posit a quadratic expansion on top of an exponential explosion just for juxtaposing two -bit strings that don’t even interact, but the classical notation does this. Since we will limit to two qubits we don’t mind: , so . We think of as quantum coordinates and as classical indices for the same system.
Quantum operations on qubits are represented by matrices that are unitary, meaning that multiplied by its conjugate transpose gives the identity. The Hadamard matrix
is not only unitary but also self-adjoint: . It works on one qubit and maps to , which we abbreviate as , and maps to or for short.
We can interpret each entry as the transition amplitude of going from to . Since we are using column vectors the “from” is a column and the “to” is a row. Then is like a maze where each column point of entry allows either row as an exit, but the path that enters at and exits at picks up a amplitude along the way.
Think of as coming in “like a lamb” and as “like a lion.” Then (divided by ) represents the saying:
March comes in like a lion and goes out like a lamb.
Whereas, represents this winter’s actuality of coming in and going out like a lion. Let us call going lion-to-lamb or lamb-to-lion a flip, lamb-to-lamb a flap, and lion-to-lion a flop.
Both inputs and encounter amplitude for a flip, which translates to 50% probability in a measurement. The state gives equal amplitude to “lamb” or “lion” as outcomes, hence probability 50% each of observing them. The state gives different signs (speaking more generally, different phases) to the outcomes but still the same 50%-50% probabilities.
We might think that following with another operation would preserve the even chance of doing a flip, but instead it zeroes it. Multiplying gives because both off-diagonal entries sum a and a (divided by ). For the comes from the history of a flap then a flip, while the comes from a flip then a flop. The components flap-flip and flip-flop interfere, leaving no way the composed operations can accomplish a flip. If you come in as a lion then the actions flip-flap and flop-flip would individually make you a lamb, but they likewise interfere, leaving no option but to go out as a lion.
Put another way, applying to leaves a deterministic outcome of “lamb.” The coherence of the superposed state makes this possible. Likewise, applying to entails “lion.”
Now let’s introduce our second qubit and describe it as a butterfly, with meaning its wings are open and meaning closed. Here is an operation on two qubits:
This is a permutation matrix that swaps and but leaves and fixed. Viewed as a maze it routes column entrances to row exits with no choice, no meeting of corridors, and so executes a deterministic operation. In our system of lamb/lion and open/closed it says that March being a lamb leaves the butterfly undisturbed, but “lion” makes it shift its wings. So inversion is controlled by the component of the first qubit, hence the name CNOT for “Controlled NOT.”
Just by dint of juxtaposing the second qubit, we change the brute matrix notation for the Hadamard operation on our first qubit to:
Now equals the identity matrix, so we still have interference on the first qubit. But suppose we perform and then CNOT instead. Viewed as a quantum circuit going left to right it looks like this:
But since we are composing right-to-left on column vectors, the matrix for the combined operation is
On argument it yields . This state—call it —cannot be written as a tensor product of two one-qubit states. This means is entangled. In our terms it means the butterfly’s wings are open if and only if March is like a lamb; in case of lion they are shut. But if we only care about lamb-versus-lion we still have the same amplitudes and 50-50 split that we had with the one-qubit state . On argument the circuit entangles and instead.
Suppose we forgot about the butterfly and tried the same trick of applying once more to . We might expect to get “lamb” again, but what we get is:
This state is also entangled but gives equal probability to all outcomes. To the holder of the first qubit it works the same as a classical coin flip between “lamb” and “lion.” Indeed if we trace out the unseen butterfly, yields a mixed state that presents a classical distribution on , and applying to that creates no interference. Indeed, there are no cancellations at all in the entire matrix computation:
This is diagrammed by the following two-qubit quantum circuit:
This example shows two other fine points. First, the “copy-uncompute” trick (see Fig. 3 here) fails markedly when the value at the control of the CNOT gate is a superposition that is not close to a basis state. The entanglement involved in attempting to copy it via the CNOT destroys the interference needed to “uncompute” the first qubit back to , despite being its own inverse. Thus decoherence does not involve any “collapse” into classical behavior, but makes it emerge from unseen entanglements.
Second, if we run the circuit on the input state then we get which is fixed by CNOT, so the final gives back again. At no time was there any entanglement. Hence the entangling action of a quantum circuit is not invariant under the input being separable nor under choice of basis, though perhaps some algebraic-geometric measure of its entangling “potential” can be.
I had previously pictured ‘decoherence’ in terms of many little entanglements with the environment. What happens if the entanglement with our butterfly is minor? Let be a possible starting state for the butterfly, and consider the state
with chosen to normalize. Then
The probability of observing (“lion”) on the first qubit is just , regardless of . This practically negligible deviation can be interpreted in two different ways:
To continue the “butterflies” metaphor, pervasive could entail entanglement with millions of them. The question is whether this would collectively drive up beyond what is compatible with experimental results. Perhaps it wouldn’t; I’m not conversant enough with the models to guess. But I have a fresh angle that was brought again to mind by an association Carroll makes between information and the holographic principle (page 281):
[It] seems to be telling us that some things just can’t happen, that the information needed to encode the world is dramatically compressible.
If our environment is highly compressible, would this manifest as pervasive entanglement? Here are two naive thoughts. The first is that the maximally entangled state subtracts one bit of information from a two-bit system, likewise the related state . The naiveness is that the local computation , as we have seen, restores having equal-weight outcomes in the standard basis—so the idea is not local-invariant and prefers a basis. Yet it is possible that cross-cutting -entanglements could embody a large information deficit in any (natural) basis. The second naive thought is that states like might contribute even amounts to certain statistical measures in large-block fashion, so as to skew expectations of (non-)uniformity that assume full independence.
Neither thought is new—related ones have been raised in connection with cosmological problems and theories discussed in Carroll’s book, though apparently without traction as yet. What my “Digital Butterflies” post brings is a digital arena in which the information content of the environment can be varied and its dynamical effects studied. The environment stems from about 50,000 bits that initialize a tabulation hashing scheme employed by all major chess programs. Many programs generate the 50,000 bits pseudorandomly, though one could just drop them in from a truly random source since they are fixed. Perhaps the hash-table dependent effects observed in the post can inform these issues, and also work toward the “grail” of developing a practical distinguisher of general pseudorandomness.
How far can quantum computation help understand issues in cosmology?
[added point to end of “Decoherence” section; two minor typo fixes]
Nun’s Priest’s Tale source |
Faadosly Polir is the older brother of Lofa Polir. You may recall he invented new ways to apply powerful mathematical techniques to prove trivial theorems, and she once claimed a great result on integer factoring. We have heard from both since, but they haven’t given us any new April Fool’s Day material, mainly because they weren’t fooling to begin with.
Today Ken and I wished to help you enjoy April Fool’s day.
We usually try to have fun on this day, but this year we have no idea that seems to be funny. We considered fooling that some university had created a faculty line for a computer mathematician. We’ve considered making light of some recent data on theory and rankings, but it’s high noon of hiring season hence serious. Chess cheating jokes?—no, Ken has been dealing with more cases at once than ever before. Even our recent shout-out to co-pilots has been overcome by events beyond our ability to give other than condolences. Nor were we able to create a phony result to celebrate this day—because good phony results still require much research effort. We thought about re-running an old April Fool discussion, as something easy while lots of other stuff is happening. But we are committed to give you, our readers, your money’s worth. So we rejected the re-run idea.
April Fool’s Day itself may have begun because of a typographical error. Reading between the lines of Wikipedia’s account, Geoffrey Chaucer may have intended to write “Since March be gone, thirty days and two” in order to place the events of “The Nun’s Priest’s Tale” on the May 2 anniversary of a royal engagement. The text in surviving copies of his Canterbury Tales reads, however, “Since March began, thirty days and two”—placing on April 1 the action in which the proud rooster Chanticleer is tricked by a fox.
So since we ran out of ideas we decided to give up on April fooling and list results that sound like April Fool jokes, but are actually true. We hope this is still fun, interesting, and informative.
Smale’s Paradox:
Stephen Smale proved in 1958 that there is a regular homotopy between the standard immersion of the sphere in 3-space and , which represents the sphere being turned inside-out. He proved this indirectly, by showing that there was only one homotopy class for a category that includes and , by virtue of both causing the corresponding homotopy group in the Stiefel manifold to vanish. When faced with this particular example, Smale’s graduate adviser, Raoul Bott, retorted that the result was “obviously wrong.” The resulting eversion of the sphere—a process of turning it inside out differentiably continuously (allowing self-intersection but not creasing)—was first visualized concretely by Arthur Shapiro in consultation with others including Bernard Morin, who has been blind since childhood. An animation is included in Vladimir Bulatov’s gallery of geometrical VRML movies.
Nikodym Set:
A Nikodym set, named for Otto Nikodym who powerfully extended a theorem by Johan Radon in measure theory, is a set of points in the unit square such that:
This is not like the Banach-Tarski paradox involving tricks with non-measurable sets: everything works nicely according to Hoyle or at least according to Henri Lebesgue.
Needle Sets:
A needle embedded in a circular disk of diameter 1 can be rotated continuously and snugly by 180 degrees within the disk. The same needle can also be inverted by rotating it inside a deltoid curve whose three points poke somewhat outside the circle but has smaller overall area. How small can a shape allowing the needle to rotate be? Soichi Kakeya raised the question and Abram Besicovitch answered it by showing that its Lebesgue measure can be as close to zero as desired. Moreover, if we only require that the needle can be placed pointing in any direction, without the continuous rotation, the measure can be zero. Then it supplements the Nikodym set. Zeev Dvir has written a nice survey connecting conjectured properties of analogous sets in higher dimensions to constructions over finite fields that relate to randomness extractors.
Potato Paradox:
Let’s move from topology and measure to simple counting. Suppose we have a 100-pound sack of hydrated potatoes that is 99% water. We let the sack dry out until it is 98% water. How much does it weigh now? The answer surprisingly is not but rather . The simplest of various reasonings given here is that the 1% non-water of 100 pounds has stayed the same while becoming 2% of something, so that something must be 50 pounds.
April Fool’s quotes source |
Polynomial Time is Big:
For all real numbers the complexity classes are all distinct owing to the Time Hierarchy Theorem: whenever there is a language
Moreover the classes nest, so . Since there are uncountably many real numbers , this seems to suggest there must be uncountably many languages involved. But contains only countably many languages.
What is the answer? The answer is that over a hundred years before people defined complexity classes we had this kind of ‘paradox’ just in the number line. Define to be the set of rational numbers between and . There are only countably many rational numbers, and yet whenever .
Probabilistic Polynomial Time Goes Hyper:
For any real number we can also define to be the class of languages such that for some relation decidable in time polynomial in ,
where ranges over for some polynomial . The standard complexity class uses . It is not difficult to show that for any rational , indeed any for which the first decimal places in binary are computable in time, . However, when is uncomputable, contains undecidable languages. Indeed, take to be simply the relation , take , and take to be the “standard right cut” of :
Then , but is undecidable. The paradox is, why should the complexity class stay equal to the nice class on a dense subset of and yet suddenly jump up in power to include “hyper-computable” languages when infinitesimally passes through an uncomputable value?
From math and complexity we go on to physics.
Mpemba Effect:
Erasto Mpemba of Tanzania, while a secondary school student in 1963, observed that ice cream mixes froze faster when they had been initially heated, compared to mixes that had been kept in cold storage before freezing. This effect is still under debate even at the level of basic controlled evidence. This must be the simplest natural physical phenomenon for which the human race, despite splitting the atom and finding the Higgs Boson, has been unable to devise a convincing closed-form experiment, let alone verify. Even subjective matters such as whether tea tastes different depending on whether milk or the tea has been added to the cup first have been verified by rigorous experiments.
Table-top Dark Energy:
This compact device enables extracting the ubiquitous tension energy of space in order to power homes in regions with too frequent cloud-cover for reliable solar energy. It departs from previously patented approaches based on the Casimir effect and improves a previous design for a room-temperature power device in 1989 by Martin Fleischmann and Stanley Pons. “This is after all almost 70% of the entire mass-energy content of the Universe, so we should be able to harness it,” said a spokesperson. Indeed according to equations of physics the device should have about times the yield of a standard home gas furnace, and it was no surprise when Faadosly and Lofa told us they’d spent their year investing in it.
Have a fun and safe April Fool’s Day.
]]>
How to tell algorithms apart
Edgar Daylight was trained both as a computer scientist and as a historian. He writes a historical blog themed for his near-namesake Edsger Dijkstra, titled, “Dijkstra’s Rallying Cry for Generalization.” He is a co-author with Don Knuth of the 2014 book: Algorithmic Barriers Failing: P=NP?, which consists of a series of interviews of Knuth, extending their first book in 2013.
Today I wish to talk about this book, focusing on one aspect.
The book is essentially a conversation between Knuth and Daylight that ranges over Knuth’s many contributions and his many insights.
One of the most revealing discussions, in my opinion, is Knuth’s discussion of his view of asymptotic analysis. Let’s turn and look at that next.
We all know what asymptotic analysis is: Given an algorithm, determine how many operations the algorithm uses in worst case. For example, the naïve matrix product of square by matrices runs in time . Knuth dislikes the use of notation, which he thinks is used often to hide important information.
For example, the correct the count of operations for matrix product is actually
In general Knuth suggests that we determine, if possible, the number of operations as
where and are both explicit functions and is lower-order. The idea is that not only does this indicate more precisely that the number of operations is , not just , but also is forces us to give the exact constant hiding under the . If the constant is only approached as increases, perhaps the difference can be hidden inside the lower-order term.
An example from the book (page 29) is a discussion of Tony Hoare’s quicksort algorithm. Its running time is , on average. This allows one, as Knuth says, to throw all the details away, including the exact machine model. He goes on to say that he prefers to know:
that quicksort makes comparisons, on average, and exchanges, and stack adjustments, when sorting random numbers.
Theorists create algorithms as one of their favorite activities. A classic way to get a paper accepted into a top conference is to say: In this paper we improve the running time of the best known algorithm for X from order to by applying methods Y.
But is the algorithm of this paper really new? One possibility is that the analysis of the previous paper was too coarse and the algorithms are actually the same. Or at least equivalent. The above information is logically insufficient to rule out this possibility.
Asymptotic analysis à-la Knuth comes to the rescue. Suppose that we proved that the older algorithm X ran in time
Then we would be able to conclude—without any doubt—that the new algorithm was indeed new. Knuth points this out in the interviews, and adds a comment about practice. Of course losing the logarithmic factor may not yield a better running time in practice, if the hidden constant in is huge. But whatever the constant is, the new algorithm must be new. It must contain some new idea.
This is quite a nice use of analysis of algorithms in my opinion. Knowing that an algorithm contains, for certain, some new idea, may lead to further insights. It may eventually even lead to an algorithm that is better both in theory and in practice.
Daylight’s book is a delight—a pun? As always Knuth has lots to say, and lots of interesting insights. The one caveat about the book is the subtitle: “P=NP?” I wish Knuth had added more comments about this great problem. He does comment on the early history of the problem: for example, explaining how Dick Karp came down to Stanford to talk about his brilliant new paper, and other comments have been preserved in a “Twenty Questions” session from last May. Knuth also reminds us in the book that as reported in the January 1973 issue of SIGACT News, Manny Blum gave odds of 100:1 in a bet with Mike Paterson that P and NP are not equal.
[fixed picture glitch at top]
Neil L. is a Leprechaun. He has visited me every St. Patrick’s Day since I began the blog in 2009. In fact he visited me every St. Patrick’s Day before then, but I never talked about him. Sometimes he comes after midnight the night before, or falls asleep on my sofa waiting for me to rise. But this time there was no sign of him as I came back from a long day of teaching and meetings and went out again for errands.
Today Ken and I wish you all a Happy St. Patrick’s Day, and I am glad to report that Neil did find me.
When I came back I was sorting papers and didn’t see him. I didn’t know he was there until I heard,
Top o’ the evening to ye.
Neil continued as he puffed out some green smoke: “I had some trouble finding you this year. Finally got where you were—good friends at your mobile provider helped me out.” I was surprised, and told him he must be kidding. He answered, “Of course I always can find you, just having some fun wi’ ye.” Yes I agreed and added that I was staying elsewhere. He puffed again and said “yes I understand.”
I said I had a challenge for him, a tough challenge, and asked if he was up for it. He said, “Hmmm, I do not owe you any wishes, but a challenge… Yes I will accept a challenge from ye, any challenge that ye can dream up.” He laughed, and added, “we leprechauns have not lost a challenge to a man for centuries. I did have a cousin once who messed up.”
I asked if he would share his cousin’s story, and he nodded yes. “‘Tis a sad story. My cousin was made a fool of once, a terrible black mark on our family. Why, we were restricted from any St Patrick Day fun for a hundred years. Too long a punishment in our opinion—the usual is only a few decades. Do ye want to know what my cousin did? Or just move on to the challenge? My time is valuable.”
I nodded sympathetically, so he carried on.
“One fine October day in Dublin me cousin was sitting under a bridge—under the lower arch where a canalside path went.
“He spied a gent walking with his wife along the path but lost in thought and completely ignoring her. He thought the chap would be a great mark for a trick but forgot the woman. She spied him and locked on him with laser eyes and of course he was caught—he could not run unless she looked away.
“He tried to ply her with a gold coin but she knew her leprechaun lore and was ruthless. He resigned himself to granting wishes but she would not have that either. With her stare still fixed she took off her right glove, plucked a shamrock, and lay both at his feet for a challenge. A woman had never thrown a challenge before, and there was not in the lore a provision for return-challenging a woman. So my cousin had to accept her challenge. It came with intense eyes:
“I challenge you to tell the answer to what is vexing and estranging my husband.”
“Aye,” Neil sighed, “you or I or any lad in the face of such female determination would be reduced to gibberish, and that is what me cousin blurted out:
“The gent looked up like the scales had fallen from his eyes, and he embraced his wife. This broke the stare, and my cousin vanished in great relief. And did the gent show his gratitude? Nay—he even carved that line on the bridge but gave no credit to my cousin.”
I clucked in sympathy, and Neil seemed to like that. He put down his pipe and gave me a look that seemed to return comradeship. Then I understood who the “cousin” was. Not waiting to register my understanding, he invited my challenge as a peer.
I had in fact prepared my challenge last night—it was programmed by a student in my graduate advanced course using a big-integer package. Burned onto a DVD was a Blum integer of one trillion bits. I pulled it out of its sleeve and challenged Neil to factor it. The shiny side flashed a rainbow, and I joked there could really be a pot of gold at the end of it.
Neil took one puff and pushed the DVD—I couldn’t tell how—into my MacBook Air. The screen flashed green and before I could say “Jack Robinson” my FileZilla window opened. Neil blew mirthful puffs as the progress bar crawled across. A few minutes later came e-mail back from my student, “Yes.”
I exclaimed, “Ha—you did it—but the point isn’t that you did it. The point is, it’s doable. You proved that factoring is easy. Could be quantum or classical but whatever—it’s practical.”
Neil puffed and laughed as he handed me back the suddenly-reappeared disk and said, “Aye, do ye really think I would let your lot fool me twice?”
I replied, “Fool what? You did it—that proves it.”
“Nay,” he said, “indeed I did it—I cannot lie—but ye can’t know how I did it enough to tell whether a non-leprechaun can do it. And a computer that ye build—be it quantum or classical or whatever—is a non-leprechaun.”
It hit me that a quantum computer that cannot be built is a leprechaun, and perhaps Peter Shor’s factoring algorithm only runs on those. But I wasn’t going to be distracted away from my victory.
“How can it matter whether a leprechaun does it?” Neil retorted that he didn’t have to answer a further challenge, “it’s not like having three wishes, you know.” But he continued, “since ye are a friend, I will tell ye three ways it could be, and you can choose one ye like but know ye: it could still be a fourth way.
“And I left ye a factor, but your student already had it, so I left ye no net knowledge at all.” And with a puff of smoke, he was gone.
Did I learn anything from the one-time factoring of my number? Happy St. Patrick’s Day anyway.
[moved part of dialogue at end from 2. to 1.]
source |
Larry Shaw apparently created the concept of Pi Day in 1988. He was then a physicist who worked at the San Francisco Exploratorium. He and his colleagues initially celebrated by marching around in circles, and then eating pies—that is fruit pies. As Homer Simpson would say: hmm.
Today Ken and I want to add to some of the fun of Pi Day, and come back to a different Pi that has occupied us.
See here and here for many of the more exotic celebrations that revolve around Pi Day. Of course Pi Day, or day, is based on the famous number
The extra excitement is that this year the date is 3/14/15. This uses the American “month first” convention. The international “date first” convention would go 31.4.15 but April does not have 31 days. The “year first” convention will have to wait another 1,126 years to hit 3141-5-9, and even so they pad with 0’s which makes it impossible. So month-first it has to be. We can get more digits by raising a toast at exactly 9:26:53 in the morning or evening.
Recall can be defined in many ways. One is that it is the reciprocal of the number
This is due to Srinivasa Ramanujan.
The classic definition is based on the fact that the circumference of a circle is a constant function of its radius, and half of that constant is . This must be proved and is not trivial. It was known to Euclid—see here for a detailed discussion.
The symbol Π is like
One is a capital and one is a product symbol: can you tell which is which? Perhaps you can but they are pretty close, and their derivation is the same. So let’s talk about the product symbol during this wonderful Pi Day.
The product is quite basic to mathematics and theory too. We note that
can be used to define various important functions. But the simplest such function is already an important open problem. That’s right: simply multiplying two integers, .
What is the complexity of integer product? There are multiple views on multiplication, going back to when it was first formulated as a problem.
In 1960 the great Andrey Kolmogorov conjectured that integer multiplication required quadratic time. That is, the simple high school method was optimal. This conjecture was plausible, especially back then when the field of computer theory was just beginning. So he organized a seminar at Moscow State University to study his conjecture with the hope to prove it. Within a week one of the students, Anatoly Karatsuba, found a clever algorithm that ran in time order-of
for -bit number multiplication.
The idea is to let and write and . Then we have
where and and . The multiplications by and are just bit-shifts to known locations independent of the values of and , so they don’t affect the time much. It is neat that and need just one recursive multiplication of -bit numbers each, but the two such multiplications for would remove all the advantage and still give time.
What Karatsuba noted instead was that
This needs just one more multiplication. End of conjecture. Boom: the conjectured lower bound was wrong. Kolmogorov explained the beautiful result at the next meeting, and then terminated the seminar.
Pretty cool to terminate a seminar—no?
SODA is the premier conference on algorithms, which has the wonderful properties of being held in January and usually at a warm locale.
The SODA view is to optimize the algorithm. The simplest meaning of this is to optimize the asymptotic running time. Andrei Toom, 5 years younger than Karatsuba was, took up the task and in a 1963 paper gave a hierarchy of improved exponents approaching but not reaching 1. The next step, called “Toom-3,” breaks into 3 equal pieces not 2, and by reducing 9 recursive multiplications to 5, achieves running time order-of
Steve Cook streamlined the description and the analysis in his doctoral thesis in 1966 and often gets joint credit.
If you break into pieces the running time is proportional to , whose exponent behaves roughly like and approaches . For any fixed the term is a constant, and one could be tempted to ignore it. However, as Wikipedia’s article avers,
[T]he function unfortunately grows very rapidly.
One can try to improve the algorithm by making depend on in the recursion, but as the article notes in the next sentence, this is an open research problem.
The barrier was broken by Arnold Schönhage and Volker Strassen in 1971. This achieved a running time of the form
with . The constant in the is high but not astronomical, and their algorithm has been programmed to give superior performance to Cook-Toom for over 30,000 bits.
The latest major part of the history is that Martin Fürer in 2007 improved to . This looks like more than but becomes less as grows. Alas the is so high that the algorithm is galactic. Work last year by David Harvey, Joris van der Hoeven, and Grégoire Lecerf improving the constant in the exponent has not changed that.
Fürer’s paper appeared in STOC not SODA, but I have a different picture in my mind when I think of STOC.
STOC is, of course, one of the top theory conferences, which is held in late spring or early summer and where it is held follows a complex secret formula. I have no idea what that formula is, but according to Wikipedia:
STOC is traditionally held in a different location each year.
Well not exactly: what does “different” mean here? STOC has repeated sites, just never at the same site two years in a row. Ken attended STOC in Montreal twice in fairly close succession: 1994 and 2002.
Okay let’s look at multiplication from a STOC view, that is from a more foundational view. We have already discussed the relationship between the cost of multiplication on Turing Machines and the famous conjecture of Juris Hartmanis that for any algebraic irrational number , the first digits of cannot be computed on a multitape Turing machine in time—and could even require time.
Part of the significance of this question is that it impacts the separation between and then integer multiplication is almost linear time. We know they are different, but only by a hair—less than even. The idea is that if and are really that close, then a common alternation method used to check integer multiplication can be simulated deterministically.
The method using alternation is to guess and check it by selecting a prime of about log bits. Then we must check . This is easy if one can compute an bit number mod fast. This can be done by a block structure trick and using alternation again to only have to check that one block is okay. We haven’t worked this all out, but clearly integer multiplication is involved in a fundamental way to complexity class relations. Another use of similar structure ideas is in this paper on checking integer multiplication by Dima Grigoriev and Gérald Tenenbaum.
Ken has found a different way this week to escape to a warm locale, by workshop rather than by conference. There is in fact a new branch of algebra and geometry named for the tropics, in which the binary minimum (or maximum) function plays the role of and replaces multiplication. Then the integer product problem goes away, perhaps.
Ken has a different idea, however. Perhaps we can avoid doing multiplication altogether. We don’t even need to use integers. Let stand for any “decoding” function from strings to integers, which need not be 1-to-1: an integer might have many string codes. The idea is to ask:
Are there linear-time computable binary functions , , and such that for some decoding function —of any complexity or maybe not even computable–and all strings :
The last clause allows us to simulate a comparison function, so that the strings emulate a discretely ordered ring whose operations are all in linear time. Is there such a fish? We don’t care so much about the complexity of since it would only need to be applied once after a plethora of efficient and operations, and in view of might not need to be computed at all.
What is the complexity of ? The operation, not the number.
[fixed radius/circumference in intro]
simple-talk interview source |
Stephen Johnson is one of the world’s top programmers. Top programmers are inherently lazy: they prefer to build tools rather than write code. This led Steve to create some of great software tools that made UNIX so powerful, especially in the “early days.” These included the parser generator named Yacc for “Yet Another Compiler Compiler.”
Today I (Dick) want to talk about another of his tools, called lint. Not an acronym, it really means lint.
Steve was also famous for this saying about an operating system environment for IBM mainframes named TSO which some of us were unlucky enough to need to use:
Using TSO is like kicking a dead whale down the beach.
Hector Garcia-Molina told me a story about using TSO at Princeton years before I arrived there. One day he wrote a program that was submitted to the mainframe. While Hector was waiting for it to run he noticed that it contained a loop that would never stop, and worse the loop had a print statement in it. So the program would run forever and print out junk forever. Yet Hector, because of the nature of TSO, could not kill the program. Hector went to the system people to ask them to kill his program. They answered that they could not kill it until it started to run. Even better: the program would not run until that evening—do not ask why. So they could not kill it. But the evening crew could once it started. So they left a handwritten note to kill Hector’s program later that night. A whale indeed.
Steve’s lint program took your C program, examined it, and flagged code that looked suspicious. The brilliant insight was that lint had no idea what you were really doing, but could say some constructs were likely to be bugs. These were flagged and often lint was right. A beautiful idea.
For example, consider the following simple C fragment:
while (x = y)
{
...
}
This is legal C code. But, it is most likely an error. The programmer probably meant to write:
while (x == y)
{
...
}
Recall that in C the test for equality is x == y while x = y is the assignment of y to x. The former could be correct yet it is likely a mistake. These are exactly the type of simple things that lint could flag.
The lint program has changed over the years and now there are more powerful tools that can flag suspicious usage in software written in many computer languages. It was originally developed by Steve in 1977 and described in a paper “Lint, a C program checker” (Computer Science Technical Report 65, Bell Laboratories, 1978).
I believe that we could build a lint for math that would do what Steve’s lint did for C code: flag suspicious constructs. Perhaps this already exists—please let me know if it does. But assuming it does not, I think even a tool that could catch very simple mistakes could be quite useful.
There is lots of research on mechanical proof systems. There is lots of interest in proving important theorems in formal languages so they can be checked. See this and this for some examples. Yet the vast majority of math is only checked by people. I think this is fine, even essential, but a lint program that at least caught simple errors would be of great use.
Let me give three types of constructs that it could catch. I assume that our lint would take in a LaTeX file and output warnings.
Unused variables. Consider
The lint program would notice that the variable is never used. Almost surely the intent was to write
Again note: this is not a certainty, since the former is a legal math expression.
Unbound variables. Consider
If there is nothing before to constrain , this is at best poor writing. Does range over all reals, all integers, or just all natural numbers? Again a construct that should be flagged.
Under-constrained variables. Consider the statement,
For some it follows that .
The statement may be technically true when , but for purpose of clear communication it needs a qualifier that . Perhaps the writer wrote that stands for a positive real number some pages earlier—we would not expect lint to pick that up. But we could reasonably ask lint to check for a mention of “” in a previous formula and/or paragraph.
The TextLint applet page hosted by Lukas Renggli with Fabrizio Perin and Jorge Ressia does not flag the unused-variable condition, and evidently does not try to handle the other two situations. It also fails to catch 2^16 which will give not the undoubtedly-intended . This is more a LaTeX syntax issue than the kind of math-semantics error we are gunning for; the programs mentioned here also seem limited to this level.
Does a lint program like this—for general mathematical writing not just LaTeX code—already exist? If not, should we build one?
[added “environment” qualifier to TSO]
Eric Allender, Bireswar Das, Cody Murray, and Ryan Williams have proved new results about problems in the range between and -complete. According to the wide majority view of complexity the range is vast, but it is populated by scant few natural computational problems. Only Factoring, Discrete Logarithm, Graph Isomorphism (GI), and the Minimum Circuit Size Problem (MCSP) regularly get prominent mention. There are related problems like group isomorphism and others in subjects such as lattice-based cryptosystems. We covered many of them some years back.
Today we are delighted to report recent progress on these problems.
MCSP is the problem: given a string of length and a number , is there a Boolean circuit with or fewer wires such that
For of other lengths , , we catenate the values of for the first strings in under the standard order. Since every -ary Boolean function has circuits of size which are encodable in bits, MCSP belongs to with linear witness size.
Several Soviet mathematicians studied MCSP in the late 1950s and 1960s. Leonid Levin is said to have desired to prove it -complete before publishing his work on -completeness. MCSP seemed to stand aloof until Valentine Kabanets and Jin-Yi Cai connected it to Factoring and Discrete Log via the “Natural Proofs” theory of Alexander Razborov and Steven Rudich. Eric and Harry Buhrman and Michal Koucký and Dieter van Melkebeek and Detlef Ronneburger improved their results in a 2006 paper to read:
Theorem 1 Discrete Log is in and Factoring is in .
Now Eric and Bireswar complete the triad of relations to the other intermediate problems:
Theorem 2 Graph Isomorphism is in . Moreover, every promise problem in belongs to as defined for promise problems.
Cody and Ryan show on the other hand that proving -hardness of MCSP under various reductions would entail proving breakthrough lower bounds:
Theorem 3
- If then , so .
- If then .
- If then (so ), and also has circuit lower bounds high enough to de-randomize .
- In any many-one reduction from (let alone ) to , no random-access machine can compute any desired bit of in time.
The last result is significant because it is unconditional, and because most familiar -completeness reductions are local in the sense that one can compute any desired bit of in only time (with random access to ).
The genius of MCSP is that it connects two levels of scaling—input lengths and —in the briefest way. The circuits can have exponential size from the standpoint of . This interplay of scaling is basic to the theory of pseudorandom generators, in terms of conditions under which they can stretch a seed of bits into bits, and to generators of pseudorandom functions .
An issue articulated especially by Cody and Ryan is that reductions to MCSP carry seeds of being self-defeating. The ones we know best how to design involve “gadgets” whose size scales as not . For instance, in a reduction from we tend to design gadgets for individual clauses in the given 3CNF formula —each of which has constant-many variables and encoded size. But if involves only -sized gadgets and the connections between gadgets need only lookup, then when the reduction outputs , the string will be the graph of a -sized circuit. This means that:
The two horns of this dilemma leave little room to make a non-trivial reduction to MCSP. Log-space and reductions are (to different degrees) unable to avoid the problem. The kind of reduction that could avoid it might involve, say, -many clauses per gadget in an indivisible manner. But doing this would seem to require obtaining substantial non-local knowledge about in the first place.
Stronger still, if the reduction is from a polynomially sparse language in place of , then even this last option becomes unavailable. Certain relations among exponential-time classes imply the existence of hard sparse sets in . The hypothesis that MCSP is hard for these sets impacts these relations, for instance yielding the conclusion.
A paradox that at first sight seems stranger emerges when the circuits are allowed oracle gates. Such gates may have any arity and output 1 if and only if the string formed by the inputs belongs to the associated oracle set . For any we can define to be the minimum size problem for such circuits relative to . It might seem axiomatic that when is a powerful oracle such as then should likewise be -complete. However, giving such an oracle makes it easier to have small circuits for meaningful problems. This compresses the above dilemma even more. In a companion paper by Eric with Kabanets and Dhiraj Holden they show that is not complete under logspace reductions, nor even hard for under uniform reductions. More strikingly, they show that if it is hard for under logspace reductions, then .
Nevertheless, when it comes to various flavors of bounded-error randomized Turing reductions, MCSP packs enough hardness to solve Factoring and Discrete Log and GI. We say some more about how this works.
What MCSP does well is efficiently distinguish strings having -sized circuits from the vast majority having no -sized circuits, where . The dense latter set is a good distinguisher between pseudorandom and uniform distributions on . Since one-way functions suffice to construct pseudorandom generators, MCSP turns into an oracle for inverting functions to an extent codified in Eric’s 2006 joint paper:
Theorem 4 Let be a dense language of strings having no -sized circuits, and let be computable in polynomial time with of polynomially-related lengths. Then we can find a polynomial-time probabilistic oracle TM and such that for all and ,
Here is selected uniformly from and is uniform over the random bits of the machine. We have restricted and more than their result requires for ease of discussion.
To attack GI we set things up so that “” and “” represent a graph and a permutation of its vertices, respectively. More precisely “” means a particular adjacency matrix, and we define to mean the adjacency matrix obtained by permuting according to . By Theorem 4, using the MCSP oracle to supply , one obtains and such that for all and -vertex graphs ,
Since is 1-to-1 we can simplify this while also tying “” symbolically to :
Now given an instance of GI via adjacency matrices, do the following for some constant times independent trials:
This algorithm has one-sided error since it will never accept if and are not isomorphic. If they are isomorphic, then arises as with the same distribution over permutations that it arises as , so Equation (1) applies equally well with in place of . Hence finds the correct with probability at least on each trial, yielding the theorem .
The proof for is more detailed but similar in using the above idea. There are many further results in the paper by Cody and Ryan and in the oracle-circuit paper.
These papers also leave a lot of open problems. Perhaps more importantly, they attest that these open problems are attackable. Can any kind of many-one reducibility stricter than reduce every language in to MCSP? Can we simply get from the assumption ? The most interesting holistic aspect is that we know new lower bounds follow if MCSP is easy, and now we know that new lower bounds follow if MCSP is hard. If we assume that MCSP stays intermediate, can we prove lower bounds that combine with the others to yield some non-trivial unconditional result?
[added paper links]
Cropped from source |
Jeff Skiles was the co-pilot on US Airways Flight 1549 from New York’s LaGuardia Airport headed for Charlotte on January 15, 2009. The Airbus A320 lost power in both engines after striking birds at altitude about 850 meters and famously ditched in the Hudson River with no loss of life. As Skiles’s website relates, he had manual charge of the takeoff but upon his losing his instrument panel when the engines failed,
“Captain Chesley Sullenberger took over flying the plane and tipped the nose down to retain airspeed.”
Skiles helped contact nearby airports for emergency landing permission but within 60 seconds Sullenberger and he determined that the Hudson was the only option. His front page does not say he did anything else.
Today we tell some stories about the technical content of forms of emptiness.
I am teaching Buffalo’s undergraduate theory of computation course again. I like to emphasize early on that an alphabet need not be only a set of letter or digit symbols, even though it will be or or similar in nearly all instances. The textbook by Mike Sipser helps by having examples where tokens like “REAR” or “<RESET>” denoting actions are treated as single symbols. An alphabet can be the set of atomic actions from an aircraft or video game console. Some controls such as joysticks may be analog, but their output can be transmitted digitally. What’s important is that any sequence of actions is represented by a string over an appropriately chosen alphabet.
I go on to say that strings over any alphabet can be re-coded over . Or over ASCII or UTF-8 or UNICODE, but those in turn are encoded in 8-bit or 16-bit binary anyway. I say all this justifies flexible thinking in that we can regard as “the” alphabet for theory but can speak in terms of a generic char type for practice. Then in terms of the C++ standard library I write alphabet = set<char>, string = list<char>, language = set<string>, and class = set<language>. I go on to say how “” abbreviates object-oriented class notation in which set<State;> Q; and alphabet Sigma; and State start; and set<State> finalStates; are fields and delta can be variously a map or a function pointer or a set of tuples regarded as instructions.
In the course I’m glad to go into examples of DFAs and NFAs and regular expressions right away, but reaching the last is high time to say more on formal language theory. I’ve earlier connected and to Boolean or and and, but concatenation of languages needs explaining as a kind of “and then.” One point needing reinforcement is that the concatenation of a language with itself, written or , equals , not . The most confusion and error I see, however, arises from the empty language versus the empty string (or in other sources).
I explain the analogy between multiplication and concatenation although the latter is not commutative, and that the operation between languages naturally “lifts” the definition of for strings. I then say that behaves like and behaves like under this analogy, but I don’t know how well that catches on with the broad range of students. So not always but a few times when lecture prep and execution has left 6–10 minutes in the period, I wrap a story into an example:
Let denote the alphabet of controls on a typical twin-engine Cessna business jet. I will define two languages over this alphabet—you tell me what they are:
After carefully writing this on board or slide, I say, “you have enough information to answer this; it could be a pop quiz.” I let 15–20 seconds go by to see if someone raises a hand amid bewildered looks in silence, and then I say, “OK—I’ll tell a real-life story.”
My father Robert Regan was a financial reporter specializing in aluminum and magnesium. Once in the 1970s he covered a meeting of aluminum company executives in North Carolina. One of the executives failed to show for the first evening, and the news of why was not conveyed until he appeared in splint and bandages at breakfast the next morning.
He told how his twin-engine crew-of-two jet had lost all power immediately after takeoff. With no time or room for turning back, the pilot spotted the flat roof of a nearby bowling alley and steered for it as little he could. The jet pancaked on the roof and bounced into the mercifully empty parking lot. Everyone survived and could thank the way the force of impact had been broken into two lesser jolts. The end of the executive’s tale and interaction with my father in the breakfast-room audience went about as follows:
Exec: I have never seen such a great piece of quick thinking and calm control in my lifetime of business, to say nothing of the sheer flying skill. That pilot ought to get a medal.
RR: The co-pilot deserves a medal too.
Exec: Why? He didn’t do anything.
RR: Exactly.
Maybe only then the significance of the words “appropriate for the co-pilot to initiate” in my definitions of and dawns on the class, as well as the Boolean and. The appropriate string is in both cases: the co-pilot should not “initiate” any actions.
As witnessed by the stories above, in the case of there is a good chance even if both engines fail, so the second clause is certainly satisfied. Thus . Perhaps the example of Sullenberger and Skiles at 850 meters makes it too pessimistic for me to say is a goner at 2,000 meters, but the point of the example is that an unsatisfied conjunct in a set definition makes the whole predicate false even if the part depending on is true. Thus the intent is .
There it is: the difference between and can be one of life and death. How much the story helps burnish the difference is hard to quantify, but at least much of the class tends to get a later test question involving this difference right.
Whether I tell the story or not, I next have to convey why turns around and becomes . I say that the convention helps make the power law true for all , but why is this law relevant for ? Why do we need to define anyway, let alone stipulate that it equals ?
If I say it’s like in arithmetic, the students can find various sources saying is a “convention” and “controversial.” So I say it’s like the convention that a for-loop
for (int i = 0; i < n; i++) { ... }
naturally “falls through” when . Even if the loop is checking for conditions that might force your code to terminate—and even if the body is definitely going to kill your program on entry—if the loop executes 0 times then you’re still flying. It’s a no-op represented by rather than a killing , so the whole flow-of-control analysis is
Thus it comes down to the logical requirement that a universally quantified test on an empty domain defaults to true. Not just they but I can feel this requirement better in programming terms.
To go deeper—usually as notes for TAs if time permits in recitations or as a note in the course forum—I connect to logic and relations. I’ve defined a function from a set to a set as a relation that satisfies the test
Now we can ask:
Is the empty relation a function?
There’s an impulse to answer, “of course it isn’t—there aren’t any function values.” But when the test becomes a universally quantified formula over an empty domain, and so it defaults to true. Thus counts as a function regardless of what is, even if too.
Because , the only possible relation on is . So the cardinality of the set of functions from to is . The notation for the set of functions from a set to a set , namely , is motivated by examples like being the set of binary functions on . There are such functions, and in general
With all this gives . Thus and are needed for the universe of mathematics based on sets and logic to come out right.
The same analysis shows that an empty relation on a nonempty domain is not a function. This means that even when stuff is empty, the type signature of the stuff matters too. One student in my course told me last week that the realization that “empty” could come with a type helped him figure things out.
Real advances in mathematics have come from structure channeling content even when the content is degenerate or empty. There are more hints of deeper structure even in basic formal language theory. I generally encourage the notation for regular expressions over Sipser’s in order to leverage the analogy between concatenation and multiplication, even though equals not any notion of “.” The property does not hold over any field, except for the possibility of the “field of one element” which we discussed some time back.
Now consider the suggestive analogy
What substance does it have beyond optics? The latter equation holds provided even over the complex numbers, and also holds in a sense for . The analogy , works in both equations to yield and . We then find it disturbing, however, that substituting , fails because which is not infinite.
Does it really fail? Perhaps it succeeds in some structure that embraces both equations—perhaps involving ? Our earlier post and its links noted that has an awful lot of structure and connections to other parts of mathematics despite its quasi-empty content.
We know several ways to build a universe on emptiness. In them the supporting cast of structure rather than is the real lead. The new actor in town, Homotopy Type Theory, aims to find the right stuff directly in terms of types and the identity relation and a key notion and axiom of univalence. As related in a recent survey by Álvaro Pelayo and Martin Warren in the AMS Bulletin, the object is to make and other sets emerge from the framework rather than form its base.
Does handling and right take care of everything?
]]>
Plus updated links to our Knuth and TED talks
Ada Lovelace was nuts. Some have used this to minimize her contributions to the stalled development of Charles Babbage’s “Analytical Engine” in the 1840s. Judging from her famously over-the-top “Notes” to her translation of the only scientific paper (known as the “Sketch”) published on Babbage’s work in his lifetime, we think the opposite. It took nuttily-driven intensity to carry work initiated by Babbage several square meters of print beyond what he evidently bargained for.
This month we have been enjoying Walter Isaacson’s new book The Innovators, which leads with her example, and have some observations to add.
Martin Campbell-Kelly and William Aspray, in their 2013 book Computer: A History of the Information Machine with Nathan Ensmenger and Jeffrey Yost, represent a consensus scholarly view:
“One should note, however that the extent of Lovelace’s intellectual contribution to the Sketch has been much exaggerated. … Later scholarship has shown that most of the technical content and all of the programs in the Sketch were Babbage’s work. But even if the Sketch were based almost entirely on Babbage’s ideas, there is no question that Ada Lovelace provided its voice. Her role as the prime expositor of the Analytical Engine was of enormous importance to Babbage…”
We agree with much of this but feel the intellectual aspect of amplification given by her notes is being missed and needs its special due.
Babbage was a polymath and rose to the Lucasian Professorship at Cambridge in the line of Isaac Newton and Paul Dirac and Stephen Hawking, but he never gave a lecture there while making many forays into politics and polemics and industrial practice and theology. He held weekly public gatherings in London all through the 1830s which Lovelace frequented. They included a prototype of his “Difference Engine,” which the British government had funded for the creation of error-free military and scientific tables to the tune of over ten million dollars in our money, but which he abandoned on perceiving the loftier idea of universal computation. In 1837 he wrote a long manuscript on the design of his “Analytical Engine” and its mechanics for the four basic arithmetical operations plus root-extraction. He dated it finished on his forty-sixth birthday 12/26/37, but did not publish it in any form. In 1840 he gave invited lectures on the engine at the University of Turin. They were scribed by a military mathematician and later politician named Luigi Menabrea who produced a paper in French two years later.
Lovelace was also friends with the electrical pioneers Charles Wheatstone and Michael Faraday. Wheatstone prompted her to translate Menabrea’s paper, which Babbage encouraged further by suggesting she add her own notes to it. Her translation was dutiful, but her seven “Notes” labeled A–G swelled to almost triple the length.
Her Note G mainly concerned the steps for calculating the th Bernoulli number (using notation rather than today’s or ). The technical parts included the following:
This was worked out in lengthy correspondence with Babbage and then, as Isaacson details, a month of “crunch time” for Lovelace in July 1843 before the printer deadline. Babbage wrote the following in his autobiography two decades later:
“[I] suggested that she add some notes to Menabrea’s memoir, an idea which was immediately adopted. We discussed together the various illustrations that might be introduced: I suggested several but the selection was entirely her own. So also was the algebraic working out of the different problems, except, indeed, that relating to the numbers of Bernoulli, which I had offered to do to save Lady Lovelace the trouble. This she sent back to me for an amendment, having detected a grave mistake which I had made in the process.”
There is debate on whether “algebraic working out” refers to the first part of Note G or the whole, but Lovelace asked for “the necessary … formulae” in a letter, so it seems clear to us that only the run-up to formula (8) is referred to. The identity of the “grave mistake” is not known, but it seems like an error of derivation not programming.
Menabrea’s paper has a brief mention of Bernoulli numbers toward the end, which points toward Babbage having raised but not elaborated their computation in his Turin lectures. What we don’t know is how far Babbage had worked out the programming details before and after 1840. What we do see is evidence of layers of stepwise refinement from spec to program over time.
Per Babbage’s account, Lovelace worked out at least the bottom layer of all her examples. The examples in her notes B–F have stronger ties to Menabrea’s coverage. Despite Babbage’s crediting her also for their initial-layer algebraic work it is plausible that he had already digested all details. The start of Note A seems to indulge algebraic whimsy in how it styles the Difference Engine’s limitation to polynomials of degree six. She says the “particular function whose integral it was constructed to tabulate” is
thus reducing it in a sense to nothingness. She puns on six applications of being “the whole sum and object of that engine,” trashing it by comparison to the Analytical Engine. Babbage tried to add a preface inveighing against the government’s refusal to fund the Analytical Engine in a way that would have come from her in print; her principled refusal may have surprised him into asking to scrap the whole paper before he saw sense and relented. In any event we can understand those who limit her credit in sections A–F to exposition and programming.
In the Bernoulli section, however, her impact in all layers comes out strong. There is a curious brief passage on why she omitted filling in plus or minus signs. Its stated deliberateness could be covering a gap in command, but in any event represents deferring a non-crucial step. The table and programming parts involve details of procedure and semantics at levels beyond what is evident from Babbage’s 1837 manuscript.
Hence we feel that, translated into today’s terms, her “Notes” make at least a great Master’s project. The question is, would it be more? Our analogy means to factor out niceties like all this happening a cool 100 years before Turing-complete computers began to be built and programmed. We’re trying to map it fairly onto graduate work in computer systems today. So which is it, master’s or doctorate?
We see several seeds of a PhD. In a long paragraph early in Note A and again later she expands on the semantic categories of the machine, which Allan Bromley toward the end of his 1982 paper opines had not been “clearly resolved” by Babbage. Unlike Babbage she keeps this development completely apart from the hardware. This includes her perceptive distinction:
“First, the symbols of operation are frequently also the symbols of the results of operations.”
Between Notes B and E she grapples with the numerics of real-number division and a hint of the idea of dividing by a series and achieving better approximation through iteration. Also in Note E, she develops the structural difference between nested and single iterations. In Note G there is hint of care about variables being free versus bound in logical sequences. Riding atop all this are her philosophical conclusions and prognostications, some of which Dick discussed and which Alan Turing answered at peer level in his 1950 paper “Computing Machinery and Intelligence.” They may be airy but all except perhaps the negative one on originality were right, and in our book that counts for a lot. As does the non-nutty overall precision of her work.
It is not unusual for a systems PhD to be “based on” and stay within the domain of equipment and topics by which the advisor applied for funding. The criterion for impact should be, did the student amplify the advisor’s vision? Did he—or she—find solutions that were not charted in advance? Does the student’s work enlarge the advisor’s prospects? Could he/she continue to develop the applications? That she and Babbage didn’t is most ascribable to lack of external funding and to engineering challenges that are still proving tough today for a project to build the Analytical Engine.
Isaacson’s book goes on to cover intricacies faced by female programmers of the ENIAC that were evidently less appreciated by the men on the hardware side. The end of Menabrea’s paper reflects the same mode of emphasizing the operations in hardware as Babbage’s 1837 manuscript. Even as Babbage helped with the complexities of the Bernoulli application in their correspondence, the point is that he did not pre-digest them—and as Isaacson sums up, it’s her initials that are on the paper. Isaacson does not say “amplifier,” but Maria Popova does in writing about his chapter:
“But Ada’s most important contribution came from her role as both a vocal champion of Babbage’s ideas, at a time when society questioned them as ludicrous, and as an amplifier of their potential beyond what Babbage himself had imagined.”
Hence to those who say that calling her the world’s first programmer is “nonsense” we reply “nuts.” Along with Autodesk co-founder John Walker, whose Fourmilab website hosts the definitively formatted Web copy of the “Sketch” and much else on the engine, we feel reading her paper is evidence enough that she was the first to face a wide array of the vicissitudes and significances of programming.
We note that last month polished recordings of our talks from last October became available:
We also note a long update by Gil Kalai of multiple developments “pro” and “con” on the feasibility of quantum computing—to cite a remark by Dick in our private conversations about this post, “if only Babbage and Lovelace had been able to show how the Analytical Engine could break cryptosystems…”
How do you regard our “advisor-student” terms for assessing Ada Lovelace’s contributions?
[clarified in the intro that Isaacson leads with Lovelace; added “much else” for Walker; added update on Kalai (as originally intended); “three weeks” –> “a month” in July 1843; added note about her treatment of division as one of “some”->”several” seeds of a PhD; other minor edits]