Thomas Muir coined the term “permanent” as a noun in his treatise on determinants in 1882. He took it from Augustin Cauchy’s distinction in 1815 between symmetric functions that alternate when rows of a matrix are interchanged versus ones that “stay permanent.” To emphasize that all terms of the permanent have positive sign, he modified the contemporary notation for the determinant of a matrix into
for the permanent. Perhaps we should be glad that this notation did not become permanent.
Today Ken and I wish to highlight some interesting results on computing the permanent modulo some integer value.
Recall the permanent of a square matrix is the function defined as follows by summing over all permutations of , that is, over all members of the symmetric group :
It looks simpler than the determinant formula,
but soon acquired a reputation as being ‘strangely unfriendly’ compared to the determinant. We owe to Les Valiant the brilliant explanation that computing the permanent exactly, even restricted to matrices with all entries 0 or 1, is likely to be very hard, whereas the determinant is easy to compute.
Muir is known for rigorizing a lemma by Arthur Cayley involving another matrix polynomial that at first looks hard to compute but turns out to be easy. The Pffafian of a matrix is defined by
This vanishes unless is skew-symmetric, meaning , whereupon Muir following Cayley proved the relation
The Pfaffian, and its use in the FKT algorithm for counting matchings in planar graphs, figures large in Valiant’s 2006 discovery that some generally-hard computations become easy modulo certain primes. All this is background to our query about which matrices might have easy permanent computations modulo which primes.
Herbert Ryser found an inclusion-exclusion method to compute the permanent in operations:
This was found in 1963 and still stands as the best exact method. Note that it is exponential but is still better than the naive method which would sum terms. David Glynn recently found a different formula giving the same order of performance.
Mark Jerrum, Alistair Sinclair, and Eric Vigoda found an approximation method for non-negative matrices that runs in probabilistic polynomial time. This award-winning result is based on delicate analysis of certain random walks. It fails for a matrix with even one negative term, since they show that such matrices can have permanents that are NP-hard to approximate.
Modulo 2, of course, the determinant and permanent of integer matrices are the same. It seems to be less well known that the permanent is easy modulo any power of 2. Modulo , the known time is , and this too was proved in Valiant’s famous paper. However, subsequent to that paper, computing the permanent modulo any other integer was shown to be NP-hard under randomized reductions.
But wait. There are some special cases modulo that we would like to point out that actually are easy to compute—that take only polynomial time.
Grigory Kogan wrote a paper in FOCS 1996 that addresses this issue. It is titled “Computing Permanents over Fields of Characteristic 3: Where and Why It Becomes Difficult.” His main positive result was the following:
Theorem 1 Let be a field of characteristic . Let be a matrix over such that . Then can be computed in time .
Further, he gave a slightly worse polynomial time for computing when is a matrix of rank one. When has rank two, however, computing the permanent mod 3 remains randomly hard for , indeed complete for .
The details in the full version involve using mod 3 to regard certain matrices as skew-symmetric and hence work with their Pfaffians. The proof also uses extension fields in which exists, and the theorem holds over any such field.
We wonder what similar tricks might be available modulo other primes. One advantage of working modulo is that the permanent becomes randomly self-reducible with no loss in numerical accuracy.
Let’s look at using this idea to answer questions about the permanent of the famous Hadamard matrices. As we have remarked before, the original ones of order were previously defined by Joseph Sylvester:
Jacques Hadamard gave the functional equation for any matrix bearing his name,
which for is known to require to be a multiple of . Put with odd and . Whether such matrices exist is known when , when is a prime power, or when and is a prime power. The case avoids these since and , and remains unknown.
If then satisfies Kogan’s theorem. If then it appears we can still leverage the proof by working with instead. However—and this is by way of caution—if is a multiple of 3 then every is nilpotent mod 3, and it follows that . Nevertheless, all this means that we can compute the permanent of any Hadamard-type matrix modulo in polynomial time.
A further open question is whether there exists a Hadamard matrix with . This is open even for Sylvester’s matrices. This is known to need . Of course, to vanish, it must vanish mod 3. We wonder how far gaining knowledge about behavior modulo 3 and other primes might help with these problems.
Some of these papers treat related questions for other matrices of entries , perhaps with some ‘s and/or a normalizing constant factor. Greater reasons for interest in questions about permanents has come recently from boson sampling, for which we also recommend these online notes scribed by Ernesto Galvão of lectures given by Scott Aaronson in Rio de Janeiro a year ago. A main issue is whether the randomized equivalence of worst and average case for permanents mod can be carried over to the kinds of real-or-complex matrices that arise.
Can we do better? Can we compute the permanent for a larger class of orthogonal matrices?
]]>
What to do when afraid to see if what you want is true
Cropped from Canadian Bergler Society source |
Edmund Bergler coined the term in 1947, the great writers Francis Fitzgerald—F. Scott to most—and Joseph Conrad among many others suffered from it, as did the great cartoonist Charles Schulz. The problem is writer’s block.
Today Ken and I want to write about something that I wonder if any of you have ever had.
I will call it prover’s block. It is related to, but different from, writer’s block. Of course writer’s block is the condition that makes one unable to write, unable to create new sentences, unable to produce. It is the fear of the blank sheet of paper, which today is more likely the fear of that blank laptop screen in front of you.
There are many suggestions on how to overcome writer’s block. One I like is from the poet William Stafford who offered this advice to poets:
There is no such thing as writer’s block for writers whose standards are low enough.
The point is not to write garbage. The point is to write something: get started and be prepared to throw away lots, but write. Start getting your ideas down and trust that later, with much re-writing and edits, the writing will be okay. Of all the advice I find this one very useful. I certainly use it for GLL. I hope we do enough re-writes and edits so that most of what gets out is not garbage.
Some what is prover’s block? Let me explain in a personal way, since am just about over a bad case. I actually hope that writing this piece will help me overcome my block.
I have been working for a long time—let’s not say how long right now—to prove a certain Lemma X. I have thought at least a hundred times I had found a proof of X, but alas each time I started to work out the details the proof failed. After a while I began to doubt that X was true, but I really want X to be true. If it is true I will have proved something quite nice. No not that—not a “breakthrough”—but something that is still quite important.
A few weeks ago I looked at the statement of X from a new angle. How I missed this angle before who knows; somehow I did miss it. A quick rough check showed that this new approach should yield a proof of X. So I ran right off to the computer to write up the LaTeX version of the full details of the proof. Right.
No. I did nothing. I am afraid. I want this new approach to work so very much. I think it will. But the fear is as with all the previous ones this approach will collapse when I start hashing out all the details. This is prover’s block. I am stuck right here.
I have a great new approach to X, but am afraid to work out all the details. Perhaps this is one of the advantages of working with co-authors. On this one, however, I am alone.
Sometimes I, Ken writing now, find it helps even just to define a few new LaTeX macros in a document header to get rolling. A similar idea definitely works for “programmer’s block”: define a few routines to make the problem smaller.
The “” typesetting program which I introduced to Oxford in 1985, and which is still going strong today as “Scientific Word,” had the philosophy that nothing is ever started from scratch. There was no “New Document” menu item—every document had to begin as a modification of another document. I still do that with many LaTeX documents, including solo posts for this blog.
Personality-wise I work better in the mode of modifying and extending over creating ex nihilo. It may not be simplistic to ascribe this trait to the ‘P’ versus ‘J’ component of the Myers-Briggs typology. The ‘P’ stands for “perceiving” but may as well stand for “perfectionistic” or “procrastinating,” whereas those with high ‘J’ (for “judging”) may align with those able to generate content quickly from scratch with less concern over errors or polish.
Specifically with regard to proofs, one thing I’ve noticed is in trying to prove a “simple” lemma on-the-fly while typing. Often the details mill around and cause backtracking to the extent that I’m not even sure the lemma is true anymore. I still find I need to sit with a notebook or sheets of paper to nail it down.
Is Lemma X proved by this new method? I, Dick, am about to find out. This has energized me to delve in to seeing if it works or not. The worst that can happen is I will have a new angle on X and potentially new ideas will emerge. The best that can happen is that I will finally prove X.
I will let you know. Thanks for listening.
]]>
Error correction for chatting
Bernhard Haeupler is an Assistant Professor in Computer Science at CMU. He previously spent a year as a postdoc at Microsoft Research, Silicon Valley, and a year visiting Bob Tarjan at Princeton. He is currently hard at work on the Program Committee of STOC 2015.
Today we wish to talk about the problem of error correction for chatting.
Often in communication between two or more parties errors can occur owing to the nature of the communication link, and these errors can corrupt the meaning of the messages. Error detection and correction is a well-studied, immensely important area, started formally by Claude Shannon in 1948 and still extremely active today.
At the recent FOCS 2014 meeting held in November, Haeupler presented work on the problem of adding error correction efficiently to an interactive channel. Thus there are still interesting basic questions in error correction.
The situation is simple: Imagine that Alice wants to chat with Bob, her constant companion. Alice sends a message to Bob; he thinks a while, then sends a message back to Alice. She thinks some and replies with her next message, and so on. This continues for some time, and of course they both want the chat to be accurate. Since the channel they are using may add errors, Alice and Bob want to sure that no errors are propagated:
Alice: Hi Bob let’s go to lunch. Are u free?
Bob: Sure. Do u know Thai place on New St.?
Alice: I ate there once. Good.
Bob: Ok. How about 12:30?
Alice: See u then.
Of course if a small error occurred and the message “Ok. How about 12:30?” became “Ok. How about 12:00?” then Bob would be waiting 30 minutes for Alice. This requires only a two-bit error. A larger error could make the third message say, “I hate their food.”
The standard answer is that they should use some type of error correction to protect against that. But how to do that in the most efficient manner is the subject of Haeupler’s paper, “Interactive Channel Capacity Revisited.”
There seems to be a simple answer: just use error correction on each message and you are done. Suppose the message takes up bits in the absence of errors. If you need to transmit bits to cope with the errors, then your overhead factor is and your rate of meaningful bits is . As a function of an error parameter on the channel, is also called the capacity , but the bounds to come will be easier to understand if we think of as the “tax” imposed by the channel. Thus to get overhead we need to bound away from .
Shannon showed that for the binary symmetric channel, in which is the independent probability of flipping a transmitted 0 to 1 or 1 to 0, one can get the tax down to
asymptotically as , with tiny chance of overall failure. Efficient coding schemes were found to approach this, and were later extended to most cases where up to errors can be made in an adversarial manner, provided the adversary observes only what is transmitted on the channel.
The issue, however, is that when there are multiple messages, the error factor applies to the whole of them. This situation may seem artificial—hey it’s theory—but is interesting. The considerations might not apply in force to human chats, but could become significant in chats between agents.
Let’s say Alice and Bob have some conversation in that can be done in the error-free case in symbols. Here this means is the classical communication complexity of their conversation, but let’s suppose is just the total length. To simplify further let’s suppose that in the error-free case they would alternate in rounds, with each message being bits. How can they cope if and are known to the channel adversary in advance, and if the adversary can read their transmissions before deciding whether to strike? Quoting the paper:
They want a coding scheme which adds redundancy and transforms any such conversation into a slightly longer -symbol conversation from which both parties can recover the original conversation outcome even when any fraction of the coded conversation is corrupted.
The reason this is harder that the usual error model is that suppose Alice and Bob use a standard error correction method. The adversary could destroy completely any one message and still not affect more that an fraction of the bits exchanged, so long as , which means using rounds. It is this type of error burst that makes the problem harder than the usual error model. There is also the question of whether Alice and Bob’s revised chat of bits must stick to a similar fixed order with messages of some fixed length .
It is not at first clear that there is any solution. What if the first message is completely wiped out? Suppose the error wasn’t Alice saying she hated the food, but rather starting with: “I hate u Bob”? The key idea is that after some subsequent incoherence they could take a deep breath, reset, figure the channel adversary had shot most of his bolt of error, and settle down assured of a relatively error-free retake provided they keep small. But how can they communicate so as to gauge how much error, without taking up much more bandwidth?
Leonard Schulman, in his 1993 paper, “Deterministic Coding for Interactive Communication,” gave a clever solution that bounded the tax away from 1, that is by for constant , provided either for adversarial errors, or with independent random errors. But even with the tiny his was proved existentially rather than computed. Although his rate was linear, it was impractical and nowhere close to the Shannon bound.
At FOCS 2012, Zvika Brakerski and Yael Kalai—not related to our friend Gil Kalai—made Schulman’s bound constructive and reasonable for . At FOCS 2013, Gillat Kol and Ran Raz upper-bounded the tax by
with , provided Alice and Bob’s pre-error chat alternates with at most
bits per message. Most significantly, they gave an asymptotically matching lower bound on the tax of
when , even if the errors are independent, provided the encoded chat similarly follows fixed alternation with . For small enough , this showed that Shannon’s was concretely unachievable, thus separating the interactive and non-interactive cases.
Note, however, that the message length per round expands as gets smaller. Though this might seem an artificial tradeoff, it carries an important interpretation:
What empowers the adversary is not so much the ability to wipe out any one message, as the ability to strike when one of Alice or Bob is scheduled to speak for an extended time but the other has important timely knowledge.
The most flexible way for Alice and Bob to chat is to alternate bits, and just ignore one person’s bits when the other holds the floor. This policy could immediately double the overhead factor . However, we can use it to circumvent the above restriction on the period. Can this make a difference?
Haeupler’s new theorem improves the upper bound on the tax in a way tabbed as “surprising” because it shatters the optical impression of Kol and Raz’s matching bounds and holds for adversarial error. His main result is:
Theorem 1 For small enough , Alice and Bob can simulate any protocol of classical communication complexity by a protocol alternating bits and tolerating adversarial errors with tax
If the errors are random then the tax is .
Although square roots of numbers smaller than 1 can be tricky, to see that this is an improvement, note that Kol and Raz’s bound is
whereas Haeupler’s is
And it is just when each interaction is on the binary symmetric channel. Despite having just broken through someone else’s matching bounds, Haeupler goes on to boldly conjecture that his bounds are optimal in their respective settings. This would imply that a higher tax for interaction than the Shannon bound cannot be avoided.
Further reason to consult his paper for details is that his proof method is quite pretty. Both Alice and Bob chat as if there were no errors. However, they do the following:
The key is that although an adversary can destroy completely some earlier messages, the periodic checks will reveal that and all will be saved. The shortness of the checks is crucial to the savings.
Can his bounds be improved, or are his optimality conjectures correct? Is this a useful model?
]]>
Announcing publication of our textbook with MIT Press
By permission of Nataly Meerson, artist : source |
Richard Feynman had a knack for new ways of seeing. His Feynman diagrams not only enabled visualizing subatomic processes, they also rigorously encapsulated an alternative formalism that cross-validated the equations and procedures of quantum field theory. His 1948 path-integral formulation sprang out of work by Paul Dirac that re-interpreted a continuous Lagrangian operator as a matrix multiplication. Fast forward to his 1985 article “Quantum Mechanical Computers” (a followup to his 1981/82 keynote speech “Simulating Physics With Computers”) and there are only matrices and circuit diagrams to be seen.
Today, December 5 as Dick and I write, is the US publication day of our textbook with MIT Press, titled Quantum Algorithms Via Linear Algebra: A Primer. It is also available from Amazon. Both places offer it for less than two adult IMAX tickets to see “Interstellar.” Publication abroad is on 1/1/15.
Quantum computing has captured the imagination of scientists and entrepreneurs from all walks of research and business. Whether any computers that operate in the quantum regime exist in the world today, however, remains a puzzle. Hence what has really been driving the surge are quantum algorithms, which by our expectant understanding of Nature promise to accomplish tasks beyond the feasibility of our abundant classical computers. The algorithms have stunning beauty yet can be taught with minimal prior involvement of either ‘quantum’ or ‘computing’ as they are made of matrices. Our text builds on elementary linear algebra and discrete mathematics to tell their story at an undergraduate level.
We first intended to make it a short story, growing out of a pair of posts by Dick four years ago. With a few shortcuts on arguing the feasibility of certain quantum states we could have dispensed with quantum circuits and held to a “Brief” format under 100 pages. Desire for completeness and the visual appeal of circuits led us to enlarge the fundamentals. Then we realized we could support some advanced topics, including what we believe is the first coverage in any general text of quantum walks and quantum walk search algorithms. Interaction with the quantum group at IBM Thomas Watson Labs, including Charles Bennett whose inspiration shows on the first page of Feynman’s 1985 paper, led me to include an expanded treatment of quantum gates, framed in the exercises of five chapters to minimize interference with the main flow. We still kept it under 200 pages.
Here is the table of contents, including page numbers and a few section titles:
Our idea of a 10-to-12-week undergraduate course runs up through section 13.4, possibly including chapter 14. A longer or advanced course or graduate seminar may include some of the later advanced topics.
The last main chapter 16 is notable for what we didn’t do in the earlier chapters: talk about complexity classes and the theory of quantum circuits. No complexity class names appear before that chapter. We limit “machine” models to an informal presence in chapter 4, and we describe “polynomial time” as meaning that whenever the problem size doubles, the time can increase by a constant factor c that might be greater than 2. Hence there is no prescribed dependence on computer theory, beyond Boolean logic networks as often included in a discrete mathematics course.
Nor is any physics required—even the sum-over-paths idea is introduced by showing how matrix multiplication counts paths in graphs. Then it is visualized via “maze diagrams” introduced in chapter 7, whose title plays on how the subsequent algorithms are named after people and also plays on Feynman’s middle name. (There are no Feynman diagrams.)
We are both chess fans, and we close chapter 15 with the result that quantum computers can speed up evaluating formulas and playing chess. My favorite childhood chess book was An Invitation to Chess: A Picture Guide to the Royal Game by Irving Chernev and Kenneth Harkness. It assumes nothing and begins with how the pieces move, but unlike any other chess guide I know, it progresses smoothly and with pictures up through some fairly advanced strategy. It ends with a chess endgame composition by Leonid Kubbel as an ode to beauty, which inspired me to compose endgames of my own. We hope that our book will provide the same smoothness and encouragement.
One thing important to us is that the book should look and feel like a linear algebra text. This entailed keeping to an ordinary column-vector (or transposed row vector) representation of quantum pure states, and avoiding the customary physics notation of Paul Dirac. We followed recent ISO/IEC standards of bold lowercase italic for vectors and bold uppercase italic for matrices, in heavier, less-serifed fonts. We did include some examples of Dirac notation that especially show its advantages, so as not to obstruct its usage when desired.
We skirted famous philosophical issues of quantum mechanics, but instead tried to promote the issue of scale between natural processes and the notation. I knew Oxford physicist James Binney as a Fellow of Merton College in the 1980s, and I’m delighted to find a similar emphasis in his recent textbook with David Skinner used for undergraduate physics at Oxford. They begin their section 6.2 on “Quantum computing” with the famous old story of the creation of the game of chess, whose agreed royal reward was one grain of rice for the first square, two grains for the second square, four grains for the third square, and (unwittingly to the king) doubling to a mammoth total of grains after the last square. They continue (their emphasis):
What is the relevance of this old story for quantum mechanics? … By the time we have built a system from 64 two-state systems, our composite system will have basis states. …[It is] physically miniscule, [but] to calculate the dynamics of this miniscule system we would have to integrate the equations of motion of amplitudes! This is seriously bad news for physics.
The idea behind quantum computing is to turn this disappointment for physics into a boon for mathematics. We may not be able to solve equations of motion, but Nature can evolve the physical system, and appropriate measurements made on the system should enable us to discover what the results of our computations would have been if we had the time to carry them out. If this approach to computation can be made to work in practice, calculations will become possible that could never be completed on a conventional computer.
Our saving grace is that although the linear algebraic objects—that is, the vectors and matrices—are so huge as to make “our computations” with them unscalable, the linear algebraic formulas do scale when put in succinct functional form. The question is how and whether Nature has a way to treat those functions in turn as some kind of object whose form may be ineffable to us. It may be necessarily ineffable if factoring and some other quantum-feasible tasks require exponential time in the classical regime. But how could Nature do it? Feynman famously advised:
Do not keep saying to yourself, if you can possibly avoid it, “But how can it be like that?” because you will get “down the drain,” into a blind alley from which nobody has yet escaped. Nobody knows how it can be like that.
We certainly have no idea. However, we have an idea of what might jar new ideas loose, and accordingly our book promotes the view from combinatorics. That is why we blend numbers and strings early on, why graphs come in chapter 3 (where they also help for reading circuits in the next chapter), why we have a whole chapter on handy “tricks,” and why we include a chapter on the number theory used to make period-finding solve factoring though it has no quantum content. It is why we incorporate the “coin space” of a quantum walk on a graph into a “doubled-up” graph and then phrase the interference analysis in terms of counting heads/tails sub-sequences in the coin-flips. Finally, Chapter 16 includes my quasi-original extension of the proof of upper bounds for in this paper, whose authors expressly reference Feynman’s sum-over-paths formulation, with lighter theorem statements and proofs than in my post and “cookbook” draft paper on this subject two years ago.
Our final submitted typescript included everything in new LaTeX macros commissioned by MIT Press, even the exact front-matter, and came to 206 pages (192 numbered). Yet the published version, with no other content besides the cover, has 208 pages. The reason is a law of quantization that limits one’s ability to “save trees” by improving page-breaks and line-breaks. Can you explain this quantum principle?
[fixed 1984->1948]
Susan Horwitz was—it is hard to type “was”—a computer scientist who did important work in the area of software engineering. She passed away this summer on June 11th.
Today Ken and I wish to talk about Susan’s work.
Ken and I send belated, but heart-felt, thoughts to her family including her husband Tom Reps. I, Dick, knew her for quite some time and found her always a joy to be around. She was too young to have died. Too young.
One thing I mean is that she started out an an undergraduate music major—not just performing music, but studying worldwide cultural approaches to music. She had visiting appointments and sabbaticals all over Europe. Thomas Ball of Microsoft Research wrote a tribute in August with more on this side of her as well as her professional work. Rather than duplicate what he and others have said of her accomplishments, we will try to catch the larger spirit of problems that engaged her.
One of the great paradoxes of computer theory, in my opinion, is that we are always looking left and right for new applications, while a very important one stands just in front of us. The “new applications” are many and include biology, economics, computer architecture, and materials design to name a few. But the one that is right there, and has been there from the dawn of computing, is software. All digital systems from cloud computing to personal computing really are immense amounts of software. Without software there would be no smartphones, no laptops, no modern airplanes, cars, and soon not even toasters.
Yet the paradox is that theorists tend to find the new areas much more exciting than software. Take a look at the titles from any recent theory conference: there are few, if any, papers on software. I never see “Quantum approaches to software design,” or “Approximation methods for rapid code optimization via Semi-Definite Programming.” Have you?
I think the reason is simple: software engineering is just plain hard. We have Alan Turing to thank, since everything that one might want to do with software hits the Turing Tarpit as Alan Perlis called it. That is, when you consider fundamental questions, the features of programming languages that give them human appeal take a back seat to information properties that might as well be analyzed in lambda calculus or some graph-based computational network model. Questions about software from the original “does it halt?” to “is it correct?” all run up against fundamental computational limits.
The situation with software is even more difficult because it also runs up against limits of humans. The problem with software is that it is written by humans to perform some task, often in a complex environment, and the lack of a clear understanding of what it should do in all situations leads to errors—“bugs.” Even if one attempts to prove that a program is correct, one of the difficult tasks is to understand what correct means. Is correct completely defined? Does it cover the complex environment that the software will operate in? All of this is why software engineering is so hard.
A shift that has happened over the years is the key to the success that Susan had in her work, and more generally, accounts for why the field has been able to make more progress in the last few years than over previous decades. The shift is that one really—all claims aside—does not look for correctness in any program. Rather the shift is to look to see that the program satisfies certain important properties.
For example, a device driver may have a complex job to do in talking to real-time hardware and also to the rest of the software system. Defining its full correctness criterion is probably very difficult indeed. But we all agree that the driver should have a simple property like termination. When the driver is called it should reruns some value. The correct one would be nice, but any value is better than having the driver “hang” and never terminate. This is an example of the shift:
Replace complex correctness properties with simpler desirable properties.
I am not really sure who should get the main credit for this shift, but Susan and her colleagues certainly were part of this brilliant insight.
Here is a mathematical example of this type of shift. Suppose we have some function defined on the unit interval . We may wish to be able to prove that the function is exactly the one that we want, but that may be too hard to do. However, we may easily be able to show that is continuous or even smooth. These properties are much weaker than what we might desire, yet they are important properties. They may be sufficient for the result that we are trying to prove. This type of paradigm shift is what caused software correctness to be more attackable than ever before. It curiously coincided with the advances made by Susan and her colleagues in their work on program analysis.
Susan was an expert on data flow and pointer analysis, and generally on program analysis. For example, she created with Reps and Mooly Sagiv the concept of grammar-restricted reachability—now called context-free language reachability. Their POPL 1995 paper “Precise Interprocedural Dataflow Analysis via Graph Reachability,” also known as the RHS algorithm, showed how many hard program-analysis problems can be solved efficiently by transforming them into the problem of graph reachability. Later Thomas Ball, whom we mentioned above, used the RHS algorithm as part of Microsoft’s SLAM checking project. This work had a real impact on the world: over 85 per cent of the crashes in versions of Windows at the time were found and fixed by this project.
By the way, Ball wrote:
“SLAM originally was an acronym, but we found it too cumbersome to explain. We now prefer to think of ‘slamming’ the bugs in a program.”
There is archaeological evidence that it stood for “Software, Languages, Analysis, and Model-Checking.” This work was summarized in 2002 by Bill Gates as:
[T]hings like even software verification, this has been the Holy Grail of computer science for many decades but now in some very key areas, for example, driver verification we’re building tools that can do actual proof about the software and how it works in order to guarantee the reliability.
Susan did some hardcore theory work too. She proved that certain very simple properties, of the kind we discussed already, were not always possible to determine. Here is a quote from the introduction of her paper, “On the Non-Approximability of ‘Points-to’ Analysis”:
Unfortunately, in this paper, we show that no such approximation algorithms exist, either for flow-sensitive or for flow-insensitive pointer analysis, unless P=NP. In the flow-insensitive case, we show that if there were such an algorithm we could solve the NP-complete Hamiltonian Path problem in polynomial time. In the flow-sensitive case, we prove a stronger result. We relax our notion of accuracy and require that the approximate solution be bounded by a polynomial in the size of the precise solution (instead of by a constant) and show that even such weaker approximate algorithms are infeasible unless P=NP. We prove this by showing that if there were such an algorithm we could solve the NP-complete 3-SAT problem in polynomial time.
Ken has a tiny example that conveys some of the flow-analysis flavor without getting onto the scale of the papers. Consider this code for swapping the two nodes after those referenced by pointers p and q in a circularly linked list—the point was to write it without any if-then tests for small cases or nodes p and q being adjacent or identical:
For instance, if p points to node A in ABCDEF and q points to C, then B and D are swapped to give ADCBEF. The code works correctly in all cases, and can be understood as swapping the links into and out of B and D. But can you prove it? Ken conceived it while he was at Cornell as a challenge for the pre/post-condition proof methods championed by David Gries and others. The methods seem unavoidably to incur a bushy case analysis, and Ken doesn’t recall whether he or anyone else completed it by hand.
Nevertheless, Susan and her co-workers programmed real tools for cases of far higher complexity. One of my favorites is presented in her 2003 paper with her student, Suan Yong, titled “Protecting C programs from attacks via invalid pointer dereferences.” Again, the paradigm shift is that you need not try to make the programs perfectly secure, but can demonstrate the ability to stop many kinds of attacks, and maybe not telegraph the weaknesses that inevitably remain.
The computer science community is poorer without Susan. We will miss her smile, her activities to promote women in computing such as WISE, and her research. Again our thoughts to Susan’s family, to Tom, and to all whose lives were touched by her.
[some word and link fixes]
TRUST security source |
Dexter Kozen has been on the faculty of computer science at Cornell for almost 30 of the department’s 50 years. He first came to Cornell 40 years ago as a graduate student and finished a PhD under Juris Hartmanis in just over 2 years. He was named to the Joseph Newton Pew, Jr., professorship 20 years ago, and celebrated his 60th birthday in 2012. Besides many research achievements he co-authored an award-winning book on Dynamic Logic.
Today we salute the 50th anniversary of Cornell’s department and keep a promise we made 5 years ago to talk about a diagonalization theorem by Dexter. It yields what may be an interesting finite puzzle.
The 50th anniversary symposium earlier this fall was a great occasion; I wish I’d been able to drive over for it. There were two days of talks by Cornell faculty and alumni. It coincided with the department moving into a new building, Gates Hall, named for—who else?—Bill and Melinda Gates as principal donors. Bill Gates came for the dedication and engaged Cornell president David Skorton in a one-hour conversation on how to serve students best.
I have fond memories of my time at Cornell in 1986–7 and 1988–9 split with time at Oxford. I was given a home in the department though my postdoc came from Cornell’s Mathematical Sciences Institute, and I was treated incredibly well. It was a great personal as well as professional time for me. When I last visited two years ago, Gates Hall was just going up across the street, beyond the left-field fence of the baseball field I viewed from my office window in the 1980s. Dick and I wish them the best for the next 50 years.
“Innovative Departments” source |
Dexter is on sabbatical in the Netherlands and Denmark this academic year. I first met him in Denmark, at ICALP 1982 in Aarhus. We took part in a local tradition of performing musical numbers at the conference banquet. He had a band called the “Steamin’ Weenies” when I was at Cornell. He has kept his music up with bands of various names through the “Katatonics,” which performed in a Superstorm Sandy benefit two years ago at Ithaca’s club “The Gates” (not named for—who else?—Bill and Melinda Gates).
It is hard to believe, but summer 1982 is three-fourths of the way back to Steve Cook’s 1971 paper which put the versus question on everyone’s map. Back then we knew it was hard but there was more optimism. I have told the story of sharing a train car with Mike Sipser on the way to that conference and hearing Mike’s confidence in being able to solve the question.
The aspect that captivated me that summer was formal logic issues involved in separating complexity classes. They can be approached in two ways: as sets of languages and as subsets of machines. The former I tried to conceptualize as a topological space. As with the Zariski topology of algebraic sets employed by Alexander Grothendieck, it satisfies only the axiom and so is not a Hausdorff space. The latter view was developed by Dexter’s 1979 paper, “Indexings of Subrecursive Classes.”
Dexter’s paper cut to the chase away from logic or topology by treating the problem as one about programming systems, i.e., recursive enumerations of machines that have nice properties like efficient substitution and composition. He showed a tradeoff between power and niceness of what one can do with them. The niftiest theorem, in section 7 of his paper, shows that if you insist on representing by machines that mark cells on a tape as a polynomial time clock, and insist on a composition operator giving only a constant-factor overhead in program size, then both universal simulation and diagonalization need more than polynomial space. Thus such representations of will not support a proof of different from , let alone . But in the previous section he gave a theorem showing that if you give up all efficient closure properties, then you can in principle achieve any diagonalization you like.
The only property of the class that is used by the theorem is that it is closed under finite differences: if is different from a language by a finite set, meaning that the symmetric difference is finite, then is in . And of course it uses that has an effective programming system, which is what distinguished bounded closed sets in my topological view.
Theorem 1 Let be any enumeration of by machines, and let be any language outside of . Then we can define a function such that
- , and
- where we define .
Moreover, any reasonable formal system that can prove can prove statements 1 and 2. Indeed, we can make a permutation, so that the new indexing involves the same machines as the original, only scrambled. In the case , this yields his prose conclusion:
If is provable at all, then it is provable by diagonalization.
As he was quick to add, this doesn’t say that the mere fact of enables diagonalization, only that working by diagonalization does not impose any additional burden on a reasonable formal system.
Proof: We define in stages. At each stage we have “marked” the values for . At stage , take to be the least unmarked value such that
and define . This also marks . A suitable always exists since infinitely many languages in include , which handles the case , and infinitely many exclude , so there’s always a when too. Also clearly is one-to-one since we mark each value. To see that is onto, let be the least unmarked value at any stage . Since and differ by an infinite set, there will always be a in the symmetric difference, and the algorithm will set for the least such .
The proof works even if is undecidable—we just get that is uncomputable—and there are uncountably many ‘s to go with uncountably many ‘s. When is computable the in the algorithm is computable, and if the indexing includes many copies of machines accepting or then we can get or lower.
The inverse of , that is, the mapping , however, may have explosive growth for infinitely many , depending on how sparse the symmetric difference of with some languages in is. Still, if the statement is provable in a given formal system, then the inverse is provably total, so the diagonalization can be verified. Another way of putting this is that the formal system is able to verify that every language in —that is, every , is eventually enumerated, so the formal system can tell that the class being diagonalized against is all of .
The ease of the proof and its relation to oracle results contributed to controversy over how to interpret it. I’ve mostly felt that the logical “horse” was being put behind or at best alongside the “cart” of combinatorial proof methods. Recently I’ve appreciated how the proof focuses the question on how large your idea of “” is rather than how hard is, which was a key issue in the attempted proof by Vinay Deolalikar. The oracle result showing for some computable languages (any -complete language will do) was supposed to argue that diagonalization against the class had no special power as a tool for .
The proof relativizes by taking to be an ensemble of polynomial-time oracle Turing machines giving for each oracle set , and by taking a fixed oracle TM such that is -complete for each . We get a fixed oracle transducer such that executes the algorithm for . Whenever it computes a total function whose diagonal is . If you think of “” as “ relativized to ” then this seems like a recipe for contradiction if the unrelativized is total and . But what happens when is just that computes an undefined function—indeed it doesn’t halt. To make an algebraic analogy, should be parsed as not , so “the diagram does not commute” and there is no problem.
I came back to Dexter’s theorem this fall term while re-thinking how best to introduce diagonalization in an undergrad or grad theory course. The classical way is
where is the “Gödel Number” of the machine . My own usual style is to identify a machine or program with its code as a string, thus writing
This removes any need to talk about the correspondence between strings and numbers, Gödel or any kind of numbers. However, styling this via programming raises questions such as whether “” means the source or compiled code, and does it matter how they differ?
Recently I’ve preferred to think of as an object, indeed analogizing components to fields of a class. Then “” re-appears as an encoding of the object by a (number or) string. We define:
Now it doesn’t matter if we say is the source code or the compiled executable, and it can be anything—we can downplay the “self-reference” aspect if we wish. The encoding function need not be 1-to-1, provided it is “extensional” in the sense that
We still get the diagonal contradiction: if , then contradicting . And , which implies that there is some such that , but by extensionality we have and again that is a contradiction. (Well, you don’t have to tell students this more-complicated proof—just say that is 1-to-1 and it’s just as quick as the usual diagonal proof.)
Returning to Dexter’s diagonal set , we see it is the same as with . Looking at by itself, much as we can relax being 1-to-1, we can relax being onto, so long as is “functionally onto” the languages, in keeping with statement 1 of Theorem 1. Indeed, when the programming system repeats each language infinitely often, we could redo the proof to make an increasing function.
We can abstract this by considering any function from a set into its power set and functions on . It is interesting to try to formalize exactly when , that is,
with being extensional with respect to and being onto the image of . One interesting requirement is that when is not onto , must cover , that is:
Again I wonder if there is a useful analogy in abstract algebra or category theory for this situation. Given the function and , we can define for some choice of such that . For other we can put for any such that . Given the function , we can define for any by taking such that , since is functionally onto , and define .
The abstraction nicely gives a concrete puzzle when is a finite set. Then we don’t have Dexter’s property of being closed under finite differences, but we can still ask:
Is every unindexed subset a diagonal? That is, for all with , does there exist such that ?
Equivalently, does there exist such that ? If is 1-to-1, then must be onto, which means must be a permutation since is finite, and likewise must be its inverse. There are interesting cases where is not 1-to-1, in which the extra latitude for and becomes important.
For example, let and let , , and . Then is 1-to-1, so we have six permutations to consider. They become the six columns after the first in the following table:
Every subset not in the range appears; curiously appears twice. If instead we define only, then only three permutations are relevant, but having gives us three other functions to consider:
Once again we have all eight subsets between the range and the diagonals, this time with no repetition.
Which functions have this “pan-diagonal” property? If some element belongs to every set then we can never get into any diagonal, so except for some small- cases the answer is ‘no’ for such . Note that the complementary function gives equivalent results by duality. Hence the answer is also generally ‘no’ when some element is excluded from all subsets . This extends to say that if some elements are collectively included in or fewer subsets, and is one-to-one, then it is impossible to get as a diagonal, so the answer is ‘no’ unless is already part of the range. Note, however, that our second example shows that success is still possible if is not 1-to-1.
Is there an easily-stated criterion for to be pan-diagonal? Or is -hardness lurking about? That is the puzzle. One can also pose it for infinite , when the range of is not closed under finite differences.
How powerful is diagonalization? Does the puzzle have a simple answer?
We also wish everyone a Happy Thanksgiving.
]]>
The role of 2 and 3 in mathematics
Electrika source |
Margaret Farrar was the first crossword puzzle editor for The New York Times. Ken fondly recalls seeing her name while watching his father do the daily and weekly NYT puzzles—they were under Farrar’s byline as editor until 1969 when she retired from the Times. More than a maker of countless puzzles, she also created many of the meta-rules for crossword puzzles, which are still used today in modern puzzle design.
Today Ken and I wish to discuss a light topic: how 2 and 3 are different in many parts of theory and mathematics.
What do 2 and 3 have to do with crossword puzzles? Farrar enshrined the “rule of 2 and 3″ while producing the first crossword puzzle book in 1924 for the fledgling publisher Simon & Schuster. The rule says that 2-letter words are forbidden but 3-letter words are fine in moderation. In the crossword game Scrabble, however, 2-letter words are not only allowed but are vital to strategy. So 2 and 3 are different—yes.
Additional meta-rules include this interesting one:
Nearly all the Times crossword grids have rotational symmetry: they can be rotated 180 degrees and remain identical.
When asked why, Farrar said:
“Because it is prettier.”
In other respects crossword puzzles are more liberal than Scrabble rules. Proper names, abbreviations, multiple-word phrases, prominent foreign words, and clean/trendy slang terms are allowed. Clues ending in ‘?’ may have puns and other wordplay. Here is a small example from us at GLL:
While 2 and 3 are different enough between crossword puzzles and Scrabble, they are even more so in mathematics. For example, 2 is magic:
Try that with 3 or any other number. But we are after deeper examples of how 2 differs from 3.
In Number Theory: The number 2 is the only even prime. I recalled here a story of a colleague who works in systems. He was listening to a talk by a famous number theorist. The latter constantly said things like:
Let p be an odd prime and …
My friend asked, “what is an odd prime?”—thinking it must be special in some way. The answer back was: not 2.
In Group Theory: The famous Feit-Thompson Theorem shows that 2 is very special. Any group of odd order—a group with an odd number of elements—must be a solvable group. This was immensely important in the quest to classify all simple groups. Every non-cyclic simple group must have even order, and so must have an element so that .
In Complexity Theory: The evaluation of the permanent is believed to be hard. The best known algorithm still for an permanent is exponential. Yet modulo 2 the permanent and the determinant are equal and so it is easy to evaluate a permanent modulo 2. This relies on the deep insight that
modulo 2.
In Quadratic Forms: The theory is completely different over fields with odd characteristic compared to those with characteristic 2. A neat book by Manfred Knebusch begins with this telling verse:
In Algebraic Geometry: I have talked about the famous, still open, Jacobian Conjecture (JC) many times. It is open for two variables or more. But it has long been solved for polynomial maps of degree at most 2. Degree three is enough to prove the general case:
Theorem 1 If the JC is true for any number of variables and maps of degree at most three, then the general case of JC is true.
In Complexity Theory: Many problems flip from easy to hard when a parameter goes from 2 to 3. This happens for coloring graphs and for SAT—to name just two examples.
In Physics: It is possible to solve the two-body problem exactly in Newtonian mechanics, but not three.
In Diophantine Equations: is solvable in positive integers, but as Pierre Fermat correctly guessed, and all higher powers are not.
In Voting and Preferences: Kenneth Arrow’s paradox and other headaches of preferences and apportionment set in as soon as there are 3 or more parties.
Computing: Off-on, up-down, NS-EW, open-closed, excited-slack, hot-cold, 0-1 is all we need for computing, indeed counting in binary notation.
In Counting Complexity: Although is in polynomial time, the counting version is just as hard as . Even more amazing, remains -complete even for monotone formulas in 2CNF or in 2DNF (they are dual to each other).
In Polynomial Ideals: Every system of polynomial equations can be converted to equations of three terms, indeed where one term is a single variable occurring in just one other equation. The idea is simply to convert into and , and so on. This cannot be done with two terms.
However, systems with two terms, which generate so-called binomial ideals, share all the (bad) complexity properties of general ideals. In particular, testing whether a system forces two variables to be equal is -complete. The proof takes and to be the start and accept states of a kind of 2-counter machine known to characterize exponential space. For example, an instruction to decrement counter , increment counter , and go from state to state yields the binomial . Thus a configuration such as becomes on substituting for .
In Diophantine Equations: Hilbert’s 10th problem is known to be undecidable for equations in 11 variables. A broad swath of classes of 2-variable Diophantine equations have been shown to be decidable, enough to promote significant belief that a decision procedure for all of them will be found. For three variables, however, the situation is highly unknown according to this 2008 survey by Bjorn Poonen. Only the trivial one-variable decidability is known.
In Diophantine Equations, yet again: Poonen also relates Thoralf Skolem’s observation that every Diophantine equation is equivalent to one of degree 4 in a few more variables. One simply breaks down a power like into the terms you had with replaced by . Degree 2 is decidable, but degree 3 is unknown—the feeling is that it’s likely undecidable. All this recalls an old quote by James Thurber:
“Two is company, four is a party, three is a crowd. One is a wanderer.”
What is your favorite example of the is different from phenomenon? Recall that Albert Meyer once said:
Prove your theorem for and then let go to infinity.
]]>
Creating vast beautiful mansions from the becoming of nothing
L’espace d’un homme film source |
Alexander Grothendieck, who signed his works in French “Alexandre” but otherwise kept the spelling of his German-Jewish heritage, passed away Thursday in southwestern France.
Today we mourn his passing, and try to describe some of his vision.
Part of the story of this amazing mathematician is that in 1970 he renounced his central position at the Institut des Hautes Études Scientifiques (IHES) in Paris, and made himself so remote shortly after formally retiring from the University of Montpellier in 1988 that not even family and friends could track him. He boycotted his 1966 Fields Medal ceremony in Moscow to protest the Red Army’s presence in eastern Europe, and declined the Crafoord Prize in 1988.
As captured by this obituary, he had left to seek a society kinder and more just than the ones that killed his father at Auschwitz and convicted him in 1977 of violating a French law dating to 1945 against feeding and sheltering an unregistered alien. More will be told of this story as his voluminous writings from the hinterland are being read. But between 1945 and 1970 he published mathematics of unparalleled sweep and power, conveying escalations of abstraction to the solution of concrete problems, and this is the part we wish to appreciate.
One of humanity’s greatest intellectual tropes is Plato’s “Allegory of the Cave,” which likens what we apprehend through our senses to shadows of forms projected on a wall by a dimly-lit fire. The forms hail from an outside world whose light is blinding to one prisoner unchained and led out of the cave. Although Plato speaking through Socrates addressed all of reality, let us just imagine this outside world as Euclidean space, in which the Platonic solids shimmer in their ideal forms. Then what Grothendieck perceived when he was led into the light is the following:
The outside of the cave is another cave.
In this outer cave the focal point is , zero. This zero is the only solution to the equation . It is also the only solution in that cave to the equation . Likewise , , and so on. These are different equations, but each has only the same single root in the Euclidean space of the outer cave. We can add the words “with different multiplicities,” but what difference do they make to the objects from which we draw our solution?
Perhaps this is but a projection along a beam of elements in a higher space that can furnish different solution structures to these different equations. What Grothendieck regarded as needed for “truly natural methods in geometry,” as related by Jacob Murre quoting a lecture by Grothendieck in 1959, is the employment of nilpotents, that is, elements such that the sequence eventually gives zero. Such elements can be as simple as modulo or the matrix , but organizing them is what unchains us from the single zero.
In moving essays written by Grothendieck’s friend and colleague Pierre Cartier for the 40th and 50th anniversaries of the IHES, coinciding with Grothendieck’s 70th and 80th birthdays, Cartier did not shrink from invoking Albert Einstein for intellectual comparison. Nor did Grothendieck, as the latter essay relates regarding the approach to space.
Einstein famously derived the core of his physical theories by working out all the logical consequences of the visions in his thought experiments. One of them is that there is no focal point of space. Nor is space a pre-existing entity, as Isaac Newton had posited, but rather space emerges from relational properties of its contents. A manifold is not made by its points, and it need not be determined by the locally Euclidean structure near any one point, but rather by how open sets around points mesh together. But in math, when we have no matter, what can we take as the content that drives the structure?
In executing mathematics we can approach contents only via definitions and formulas and proofs, which are pieces of syntax. Plato was aware of this. In his “Allegory of the Divided Line,” which immediately precedes the “Cave” passage in his Republic, Plato divided the mathematical world internally in the same ratio by which he divided it from the world of sense experience. Mathematical Platonists distinguish themselves from formalists by affirming reality beyond formulas and proofs, which can seem like chains on the intellect. However, all schools can alike acclaim the way major advances in the 20th Century came from treating syntax as objects. One example comes clearest in Leon Henkin’s proof of Kurt Gödel’s Completeness Theorem by employing logical statements as elements of the constructed model.
Cartier’s 1998 essay picks right up from this. Consider a model assigning a truth value to every proposition in some Boolean algebra . Associate to the set of all models that make . These sets obey the rules:
Thus our “points” correspond to special subsets of the space of models. The sets of can be generalized to sets of valuation functions obeying , , and for all ,
giving . In algebraic geometry there is a similar relation between points and special sets of functions giving equations that they solve. The upshot is that if we can identify the “special” property so that other sets besides our original ‘s have it, then from those sets we can harvest more “points.”
Suppose we have a system of equations , where ranges over some space . Consider all objects of the form
where the multipliers are arbitrary functions, including constants. Then any common solution to the equations also makes . The set of such functions can be regarded as the “algebraic consequences” of . is clearly closed under addition and under multiplication by arbitrary elements, so it forms an ideal in the function space.
Now consider any point in an -dimensional Euclidean space . It is the unique solution to the simple system of equations
of course setting each . The ideal is then maximal in the space of polynomials over , meaning that but for any other ideal ,
Every maximal ideal is prime, meaning that if a product belongs to , then either or . If neither were in , then the ideal would properly contain and hence have to be all of , whereupon we could find a scalar such that . But then would belong to after all.
The concepts of ideal and prime and maximal can be applied even in simpler spaces such as the set of integers. Every integer generates the ideal of multiples of . If factors properly as , then and , so is not prime, while and , so is not maximal. But when is prime, is both prime and maximal, and these are the only prime or maximal ideals of . For other spaces such as our polynomials over , however, the concepts of prime and maximal do not coincide. Grothendieck culminated a long list of people who realized that while maximality is the “pointy” property, primality is the “special” one.
With respect to our original points , for any ideal we can identify the set
called the algebraic set or variety determined by . If , then it is enough that be the common solution set of the in . There is something analogous to the above example of Boolean valuations going on, except that the set operations are flipped around:
Here means the ideal closure of , and equals when and . The product also gives .
We must skip over some wonderful finiteness theorems by David Hilbert and his students—and over distinctions such as the “base ring” being a field that is/is-not algebraically closed and projective versus affine space—to say only that algebraic sets are primitively defined in first-order arithmetic and hence are “neat” in many senses. Specially neat are those that cannot be written as in a nontrivial way, that is without one of or being all of . Then we can’t have or in a nontrivial manner, and exactly what this means is that the ideal of all functions that vanish on is prime. Such a is called irreducible, and sometimes the term “variety” is still reserved for this case. In an abstract but natural way, irreducible varieties of all dimensions up to can be made to behave like points. To quote Cartier on “the meaning of the word scheme” (his emphasis):
“One must, of course, understand that the space Grothendieck associated with an algebraic variety is not the set of its own points, but the set of its irreducible subvarieties.”
To see where the nilpotents come in, and how Grothendieck unchained us not only from zero but from Euclidean points overall, we can begin by perceiving how evaluating a function is like doing long division with remainder. With respect to any ideal of , the relation is an equivalence relation, and allows us to write the quotient . For our Euclidean point , evaluating can be achieved syntactically by reducing modulo (the ideal generated by) . Taking modulo by long division works out the same as substituting , and the same goes iteratively with the other elements of .
The reduction process works for any ideal and gives a unique result , provided a special kind of basis giving named after Wolfgang Gröbner is used for the iterated long division. Thus the “evaluation” is well-defined for any ideal and can be carried out by an algorithm that first expands any initial set of generators for into a Gröbner basis. Alas all known algorithms have doubly exponential worst-case time complexity, perhaps unavoidably since deciding whether is complete for exponential space even when is linear and the initial generators for have constant degree. Nevertheless, these algorithms are run all the time for important equation-solving applications, and impress on us this philosophical fact:
We can do richer kinds of evaluation in the space delineated by our syntax than in the external Euclidean space.
This holds even when we return to our simple equations , with , only one variable. The ideal is prime—indeed maximal—but is not prime. Also , and the ideal generated by equals . Nevertheless, when we reduce a polynomial like modulo these respective ideals, we get different results.
Thus we can dispense with the original points, even the origin in . However, we would still like to preserve our primitive idea of “evaluation” in some kind of external space. How can we do this, and in what kind of space?
We cannot squeeze these answers out of our Euclidean space. We can interpret a quotient as endowing with coordinates as a space in its own right, but that only works up to the prime ideals. Once we connected irreducible varieties to prime ideals, that much was one-and-done. We can’t get multiple function values , , out of our single, irreducible zero. There is no “square root of zero” different from zero. To go further, Grothendieck drew inspiration from how multiple-valued complex functions such as square-root and can still be treated as holomorphic, by “snipping” and then “layering” to fan out their branches.
For square-root, let us snip the non-negative real axis out of the complex plane. This leaves an open subset , on which every has a unique square root with positive real part. The function is analytic on , as is the other branch . To get them to coexist as a single entity with the essence of being holomorphic, however, requires a way of building “layers” on , and on other open subsets as needed to cover the part of that was snipped out.
Here is where the edifices become tall and the abstraction too steep to cover in a single post. Considering the infinitely-branching function on one hand, and infinitely many degrees of equations on the other, we can expect that infinite structures will be employed. Indeed, Grothendieck built them above every open subset in a “glued-together” manner. We cannot even easily resort to our usual sign-off to “see the paper for details.”
Yet we can say that the structures carry the idea of “becoming” points in via the concepts of fibres and sheaves, and that nilpotent elements are employed. Einstein’s relational foundation is actuated by what Grothendieck termed his “relative view” of defining morphisms between representations as regulated by category theory, rather than defining stand-alone objects. The category of sheaves is abstracted to topos theory, by which the Greek word for “place” supplants the original idea of “point.” His French word étale described a flat sea as “spreading” like these layerings. It further reflects his heritage by deriving from a German word estal meaning “place,” whereby it also connotes spreading out goods in layers in a stall that can be one of many spread out over a marketplace.
All this became massive, so much that Grothendieck’s manuscripts before and after leaving IHES spread to hundreds and thousands of pages, as well as his personal memoir in the 1980s. Indeed, as related by Winfried Scharlau in a 2008 article for the AMS Notices, some of Grothendieck’s colleagues believed that he tired at the prospect of climbing his own mountains.
We can “morph” the description of Grothendieck’s “rising sea” approach in an essay by Colin McLarty to say that Grothendieck preferred to harness surveyors, engineers, and dam-builders so he could float to the top on rising waters, rather than do the ascent by “hammer and chisel.” He decried Pierre Deligne’s 1974 completion of their program of proving the famous conjectures by André Weil by methods he and others felt were not “morally right” on account of bypassing Grothendieck’s still-open “standard conjectures.”
A 2004 article by Allyn Jackson reproduces a cartooned abstract by Grothendieck for a colloquium in 1971 by which he warned that doing his lecture “in black-and-white detail for Springer Lecture Notes would likely take 400–500 pages,” ending by writing that from “a life-distancing logical delirium” it was “high time to change course.” Today’s mathematical community has in two years still barely touched a similarly-motivated though procedurally different theory erected by Shinichi Mochizuki on foundations named for Oswald Teichmüller, despite its “mere” 512 pages in first drafts.
So what can all this mean for us who work in what Grothendieck described as a “mansion” in which “the windows and blinds are all closed,” while he was one of those “whose spontaneous and joyful vocation it has been to be ceaseless building new mansions”? At least he did not call our dwelling a cave. However, in complexity theory we have it worse than Plato’s cave-prisoners in not merely missing the blinding world outside, but sensing its impact as a negative image in our present ignorance of lower bounds.
Much of complexity theory translates naturally to questions about polynomials over finite fields. This goes not only to for Boolean functions but also to and for higher primes and , which in turn yield questions about Boolean solutions to equations over these fields. There are possible advances to be had by improving the partial correspondence to problems in zero characteristic. The larger program out of the Weil conjectures seeks to transfer geometry to positive characteristic. Can we see how its further development might allow us to extract combinatorial results needed to put bounds on complexity?
Polynomials modulo composite numbers give us a more immediate frontier, one represented by the complexity class , whose nonuniform version was only recently separated from nondeterministic exponential time. These polynomials behave badly in manners stemming from nilpotent elements in the rings for composite . Can we somehow supply “extra points” and valuations to raise their structure toward that of polynomials over finite fields, and thus at least achieve bounds known for polynomials modulo primes?
A third example, my favorite and most immediate for the theme of this post, concerns the famous lower bound of Volker Strassen and Walter Baur on the size of arithmetic circuits computing some natural families of polynomial functions in zero characteristic. Its proof, as we recounted in 2010, turns on a property of geometric degree that pertains only to affine Euclidean space (or to its projective cousin). It employs the ideal generated by
where the “mapping variables” ensure the ideal is prime since the graphs of all mappings are irreducible varieties. Unfortunately, the highest geometric degree attainable for when has ordinary total degree is , whose logarithm for gives the known lower bound. This bound however holds for simple functions such as , while comes nowhere close to the exponential lower bounds we conjecture for functions such as the permanent.
Higher algebraic geometry has yielded notions of “algebraic degree” that can go one or more exponential orders higher, typically . If the Strassen-Baur technique could be transferred to their higher spaces, then we could hope for strong lower bounds. The étale idea and related facets of algebraic geometry and representation theory also animate Ketan Mulmuley’s “Geometric Complexity Theory” programme. I once tried to find a combinatorial shortcut using a degree-like measure of counting “minimal monomials” in ideals, which we described here. It is striking that the determinant polynomials score zero on this measure, whereas the permanents score astronomically even for , but as with the other degree measures , there are counterexamples to being a circuit size lower bound.
Can the answer to versus be caught up in the “rising sea”? Or will it need something even stronger than “hammer and chisel”? What can we learn from his work? A sign of hope is that for all their heft and abstraction, his schemes can be programmed.
Our condolences to his relations and friends.
[“life-enrapturing”->”life-distancing”]
Zohar Manna is an expert on the mathematical concepts behind all types of programming. For example, his 1974 book the Mathematical Theory of Computation was one of the first on the foundations of computer programming. He wrote textbooks with the late Amir Pnueli on temporal logic for software systems. As remarked by Heiko Krumm in some brief notes on temporal logic, there is a contrast between analyzing the internal logic of pre- and post-conditions as each statement in a program is executed, and analyzing sequences of events as a system interacts with its environment.
Today I want to talk about an encounter with Zohar years ago, and how it relates to a puzzle that I love.
Zohar did one of the coolest PhD theses ever. It was on schema theory, which we have talked about before here. He was advised by both Robert Floyd and Alan Perlis in 1968—two Turing Award winners. The thesis Termination of Algorithms was short, brilliant, a gem. I recall when I started as a graduate student reading it and thinking this was beautiful. His own abstract is:
The thesis contains two parts which are self-contained units. In Part 1 we present several results on the relation between the problem of termination and equivalence of programs and abstract programs, and the first order predicate calculus. Part 2 is concerned with the relation between the termination of interpreted graphs, and properties of well-ordered sets and graph theory.
I will explain the puzzle in a moment, but first let me describe the encounter I had with Zohar. At a STOC, long ago, I saw him and started to explain my result: the solution to the puzzle. After hearing it he said that he liked the solution and then added that he once had worked on this problem. I said that I thought I knew all his work and was surprised to hear that. He smiled and seemed a bit reluctant to explain. Eventually he explained all.
It turned out that he and Steven Ness had a paper On the termination of Markov algorithms at the 1970 Hawaiian International Conference on System Science, held in Honolulu. Zohar explained the conference was not the “best,” but was held every year in the Hawaiian Islands. In January. Where it is warm. And sunny. It sounded like a great conference to me.
I soon went to a couple of these conferences. I stopped going after they accepted one of my papers, then rejected it—this is a long story that I will recount another time. The “accepted-rejected” paper finally did appear in an IEEE journal.
Zohar explained that he went to the conference for the same obvious reasons. He also explained to me the “three-person rule.” The Hawaiian conference was highly parallel and covered many areas of research. Zohar said that you were always guaranteed to have at least three people in the room for your talk: There was the speaker, the session chair, and the next speaker. Hence the three-person rule.
The issue is the distributive law:
Consider any expression that has only variables and the two operations plus () and times (. Suppose one applies the distributive law to the expression, in any order. One stops when there are no further possible applications. The question, the puzzle, was: Does the distributive law always stop? Of course it does. Or does it? The puzzle is to prove that it actually does stop.
I raised this recently as my favorite result, with a smile, during my Knuth Prize lecture at FOCS. I said I had a nice solution and would be glad either to give it or let the audience think about it. They seem to want to think about it, so I gave no solution.
My lecture had been right before dinner, and the next day I spoke to some people about whether they’d thought about it. A few said they had some idea of how to proceed, but no one seemed to have a proof. The reason the problem is a bit challenging is that the rule increases the size of the expression: two copies of now appear instead of one. This means that any local measure of structure may fail.
Indeed Zohar’s proof uses a well ordering argument that is not too hard, but is perhaps a bit hard to find. Check out his paper with Ness.
The first thing I noticed immediately when I heard the problem—see here for the context—was this: The distributive law preserves the value of the expression. We apply it because it expands the expression but it does not change the value of the expression. A rule like
does not preserve value. So who cares whether it halts or not?
But the distributive law preserves the value. So here is a proof based on that observation. Notice the following two things:
The trick is to replace all the variables by . The value stays the same, but it is easy to argue that if the rule never stops then eventually the value of the expression would increase without bound. This is a contradiction.
Are there other termination problems that can be attacked in this way?
]]>
A pointed question about the plane
Stanisław Ulam was one of the great mathematicians of the last century. We talked about him in a recent post on his prime spiral and other strange mathematical facts. He is associated with several famous problems, including the 3n+1 problem and the Graph Reconstruction conjecture.
Today we want to talk about one of his oldest conjectures.
The conjecture was first stated in 1945. It is simple to state, seems plausible that it is true, but so far has resisted all attempts at resolution. René Descartes could have stated in the 1600s—well almost.
You can skip this and the next section on Ulam and get right to his conjecture. But its been open almost sixty years, so it can wait a minute or two.
Ulam did amazing work that impacted a vast part of mathematics and physics. He also wrote popular articles that were wonderful to read. The areas include: set theory, topology, transformation theory, ergodic theory, group theory, projective algebra, number theory, combinatorics, and graph theory. One particular example is his invention of the Monte Carlo method in the 1940’s, while working at the Los Alamos National Laboratory. The name, Monte Carlo, is due to Nicholas Metropolis, after the casino, where Ulam’s uncle, Michał Ulam, was supposed to have frequently played.
The reconstruction conjecture states roughly that every undirected non-trivial graph is determined up to isomorphism by its subgraphs. Paul Kelly also gets credit for his slightly earlier work. This conjecture is wonderful in that it seems so plausible: Take a graph and give us all the subgraphs. How can that not yield the original graph? Yet it is still open since around 1960. One aspect that is especially teasing is that it has been solved in the affirmative for regular graphs, which are the hardest case for graph isomorphism, but this does not tell us much about the general case.
In 1976 Ulam published his autobiography: Adventures of a Mathematician. Here is the cover of the 1991 version:
When the book was first published, we in the theory community all read it. It was quite successful, on all measures, since Ulam was a wonderful writer. He mixed stories of people, events, and mathematics; all put together in an engaging way.
The book had another hook: Ulam included an easy-to-state puzzle that many immediately begin to work on and try to solve.
He asked for the maximum number of questions required to determine an integer between one and a million, where the questions are answered “Yes” or “No” only, and where one untruthful answer is allowed.
This is binary search with one lie allowed.
There is a strategy that is not too bad. At worst we could just do binary search asking each question twice. If we get the same answers to a question we know all is well; and if we get two different answers we know one is a lie. So this works in in worst case.
The trick is that it is possible to avoid such a brute force method and come much closer to the usual binary search bound of . Of course we are not worrying about rounding off the value of , which is a small matter.
The problem became a race among many of us and the finish line was hit quickly. His original problem can be done in exactly questions. This is pretty close indeed to the binary search value. The general answer for even any constant number of lies was found to be
The constant depends on the number of lies. Thus lies have a low order effect on the search time. I always liked this result and wonder whether if it has been used in practice? The result is due to Ron Rivest, Albert Meyer, Danny Kleitman, Karl Winklmann, and Joel Spencer. They beat us all—perhaps an unfair fight—with such a powerful group.
Let’s now state Ulam’s really old problem. Not the reconstruction conjecture from the 60’s nor his search problem which was solved quickly in the ’70s, but his question about points in the plane.
Say a subset of the points in the Euclidean plane form a rational-distance (RD) set provided the distance between any two points in is a rational number. Thus the four points at the corners of a unit square is not an RD set. Of course we know that the diagonal distances are which is not rational.
It is easy to construct an infinite set that is a RD set. Take all the points with is rational. These clearly have all rational distances: The distance between and is
This can be done for any line and the same idea can be made to work for circles.
Notice that every line and every circle is a sparse subset of the plane. So Ulam made the natural conjecture:
If is an RD set then it is not dense in the plane.
Recall that a set is dense provided every open disk in the plane contains at least one point from the set. Paul Erdős conjectured that if a set is an RD set, then should be very special. Indeed. None has been found after many attempts. See this for some progress.
Do RD sets exist? What about restricting the distances to lie in other subfields of the reals? I believe that it should be easy to prove there is a proper subfield of the reals so that there is an infinite set of points whose distances all lie in this field. However, to classify these subfields is clearly very hard, since it would solve the Ulam conjecture and more.
]]>