[ Leo Stein ] |
Leo Stein is an assistant professor in the department of Physics and Astronomy at the University of Mississippi. His research interests include general relativity from an astrophysical standpoint.
Today I want to share an unusual proof of his.
Mathematics and complexity theory are all about proving theorems. Most of the time, so far, we prove the old way: we write out a humanly readable proof. At least we hope the proof is readable. Some of the time, we use a computer to check or even create the proof. Sometimes we do extensive numerical computations, but these are not proofs.
I have known, as I am sure you do, forever that a quadratic equation can be solved in closed form. That is
has the two solutions
I have discussed this before here and its relationship to the World’s Fair in Flushing Meadows.
A natural question is: Are square roots needed in any formula for quadratic equations? The answer is “Yes”.
Theorem 1 There does not exist any continuous function from the space of quadratic polynomials to complex numbers which associates to any quadratic polynomial a root of that polynomial.
Corollary 2 There is no quadratic formula built out of a finite combination of field operations and the functions , and the coefficients of the polynomial.
The corollary uses the basic fact that are continuous functions. Note that each has a single branch on complex plane, whereas radicals and the logarithm function do not. So how do we prove the theorem?
Here is a novel, I think, proof that uses an app. Stein has written the app and it is here. He explains how to use it. I strongly suggest that you try this yourself.
To get a feel for all this, drag the coefficient to and the coefficient to . You should have two real roots in root space (one at , the other at ). Let’s call the negative root, and the positive root. Now move the coefficient around in a small loop (i.e. move it around a little bit, and then return it to where it started). Note that the roots move continuously, and then return to their original positions. Next, move in a big loop (big enough that it orbits around ). Something funny happens: the roots and switch places.
Leo Goldmakher says here:
Pause and think about this for a second. This is really, really weird.
Here is one immediate consequence of this observation:
Theorem 3 There does not exist any continuous function from the space of quadratic polynomials to complex numbers which associates to any quadratic polynomial a root of that polynomial.
And so the corollary follows.
Goldmakher writes out a more conventional proof in his paper titled Arnold’s Elementary Proof Of The Insolvability Of The Quintic. He also shows the following theorem:
Theorem 4 Fix a positive integer . Any quintic formula built out of the field operations, continuous functions, and radicals must have nesting of level more than .
This says that there can be no fixed formula for fifth degree, quintic, polynomials. Of course, this follows from Galois theory, but his proof uses just calculus. The Arnold is Vladimir Arnold.
Do you know other cases of an app with animation conveying the essence of a mathematical proof? This means more than “proofs in pictures” or “proofs without words”—the animation and interactivity are crucial.
]]>Joseph Oesterlé and David Masser are famous for their independent discovery of the ABC conjecture.
Today I want to point out an unfair comment about their discovery.
Anonymity on the Internet was captured by a famous 1993 cartoon in the New Yorker magazine titled, “On the Internet, nobody knows you’re a dog.” Amazing to think that was more than a quarter-century ago and remains true. But people can tell if what you’ve written is something inappropriate.
The comment is:
SAYS WHO??? I have some trouble with this item.
Masser is a Fellow of the Royal Society, who was elected in 2005. He is
also responsible, following an earlier insight of Joseph Oesterlé, for formulating the abc conjecture; this is a simple statement about integers which seems to hold the key to much of the future direction of number theory.
See this link for his full citation and the comment. Click on the show more bibliography button there. The comment is apparently anonymous, although the author is probably known to some. I thank Joël Ouaknine for pointing out this strange comment.
Update: Ken speculates that it’s a misplaced comment by an editor of the Royal Society website itself. Perhaps they compose HTML from MS Word or Acrobat or other software that provides comment bubbles—but this one escaped the bubble and wasn’t noticed. Editors of Wikipedia have automatic tools for flagging assertions that are unsupported or at least need citation.
What the comment undoubtedly shows is vigorous debate behind the walls of Britain’s august institution. So let’s say a little more on what the comment is about.
The biggest mysteries about numbers often concern the interaction between addition and multiplication. For example:
Suppose that where are positive and co-prime natural numbers. Let be the product of all the distinct prime divisors of . Then the ABC conjecture says that
Note, this inequality does indeed connect adding with multiplying. The usual conjecture is stronger, see this for details.
The ABC conjecture appears to be open, even though Shinichi Mochizuki has claimed a proof for years. See this for a discussion about the status of the conjecture.
Despite multiple conferences dedicated to explicating Mochizuki’s proof, number theorists have struggled to come to grips with its underlying ideas. His series of papers, which total more than 500 pages, are written in an impenetrable style, and refer back to a further 500 pages or so of previous work by Mochizuki, creating what one mathematician, Brian Conrad of Stanford University, has called “a sense of infinite regress.”
The comment on Masser’s work is wrong, strange, inappropriate. Oesterlé and Masser deserve more credit, not less, for their brilliant discovery of the ABC conjecture. There are now many—perhaps hundreds—of applications of the ABC conjecture. For example consider generalizations of Fermat’s Last Theorem. Suppose that
where are odd primes. And . Provided are positive and co-prime, it follows by the ABC conjecture that is bounded by . This is impossible for large enough since . Therefore, (*) can only have a finite number of solutions. Pretty neat.
Do you know of any other inappropriate comments of this kind?
[added remark by Ken, linked rather than embed dog cartoon]
[Added prime r must be 7 or larger. Thanks to comment by MadHatter.]
[ Rich DeMillo ] |
Rich DeMillo is a strong leader, a famous researcher, and a long-time best friend. Proof: He was the first CTO at HP and was the Dean of Computing at Georgia Tech; He helped created mutation a powerful software testing method and did seminal work in complexity theory. The last is clear.
Today I want to talk about his recent work on voting systems.
The 2020 election is over a year away, yet it is on our collective minds. People voice concerns everyday on social media, on TV, on cable, in print, everywhere. Their concerns are that our next national election will be compromised. Rich has turned his concern into activism: he is working hard to make elections trusted in general and the 2020 election in particular.
Rich is scheduled to give a talk this coming Monday, May 13, at Georgia Tech. I wish I could be there, but cannot. I do plan to watch the video of his presentation—see here. The talk is based on his recent paper joint with Andrew Appel and Philip Stark titled, “Ballot-Marking devices (BMDs) cannot assure the will of the voters.” As an aside, their paper (ADS) has already generated measurable interest—it’s been downloaded over times and viewed over times.
There are several criterion that a “good” election should have.
Rich says in his talk abstract:
Many people believe that, in an Internet-enabled world, secure, safe voting should be easy to achieve. For example, using known cryptographically secure protocols (maybe even blockchains), a secure website might be developed to relieve voters of the burden of driving to a polling place on election day.
This belief is wrong. Elections are hard—impossible?—to safeguard. A U.S. national election is the union of about ten thousand local elections. Each has different rules and protocols, which makes safeguarding the national election difficult.
The last point is central. There must be a record of the votes to allow audits after the election is over. We must be able to audit and check that the tabulation was correct. This is the central question that Rich and his co-authors discuss. We will turn to this issue in a moment.
Before that I note that keeping a vote secret is impossible in an absolute sense. Suppose that you vote “yes” in some district. After the election suppose that the count in that district is made public, as it usually is. Say of the votes were yes. Then clearly information is leaked about how you voted.
There are two main ways to record votes. One method is to have voters hand-mark their ballots, in the old-fashioned way. It is simple, cheap, and not 21st century. Hand-marked ballots can be read by automatic scanners, at least in principle. A difficulty is voters are human and may mark their ballots incorrectly. They may miss a box, or mark two boxes, or make some other mistake. What if the voter is instructed to:
Select two of the following six choices.
There can be other difficulties: Some voters may have special needs and may require instructions to be in a large type font, for example.
Another recoding method is to have voters use a device to print their paper ballots. These are cleverly called Ballot Marking Devices (BMDs). The name sounds slightly strange to me; there is an alternative name, electronic ballot markers (EBMs).
The authors, ADS, argue that BMDs are dangerous. Such devices can fail, they argue, and not protect the election. The BMDs rely on software, complex special purpose software, and thus are subject to bugs, errors, mistakes, and to active attacks by adversaries.
A BMD device takes input from a voter and then prints out a paper ballot. Often the ballot will contain a machine-readable bar code. This is so scanners can more easily read the paper ballots, later. The problem, the danger, is that most voters cannot tell if a bar code is correct or not. An attacker need only have the BMD confirm that you voted say “yes” and print a ballot that says “yes”. Then the attacker has the BMD cheat you by printing a bar code that says “no”. This is a nasty attack, which is hard to stop. The ADS team discusses this and related problems with BDMs.
Can complexity theory help us design better elections? Unclear. Can election theory help us understand complexity theory? Perhaps.
Theorem 1 The Election Hardness Axiom implies that the MAJORITY function cannot be computed by a polynomial size constant depth Boolean circuit of NOT, AND, and OR gates.
That is, the MAJORITY function is not in . What is the Election Hardness Axiom? It is the empirical fact that there is no practical way to compute who won an election. The MAJORITY function is the tabulation of votes: The outcome of an election is the same as computing the MAJORITY function of the votes—“yes” is a and “no” is a .
Okay we are kidding. But not completely. Suppose that MAJORITY function were in . Then a series of decisions of the form:
The tabulators have looked at the following ballots and we agree that there is a “yes” vote in ballot .
Rich, in his talk abstract, states that it is unlikely that crypto theory could be used to create trusted elections. His reason is voters will not trust elections that rely on crypto results. I agree. But I wonder if ideas from theory could be useful. Here are two high-level thoughts.
There is a vast literature on computing in the presence of faults. Usually “faults” are thought to occur at the nano level: the faults are due to hiccups in electronics. What if the faults came from errors in the counting of votes? What if the faults were at the macro level? That is at the level of human decisions? Perhaps we will revisit this in the future.
There is a vast literature on computing as a “game”. An election is usually viewed as being run by some trusted party. This could be replaced by assuming that the election is a game. Imagine two parties D and R. As the tabulation is performed D and R can challenge each other. They interact as in game. Could this help make the election trusted? Perhaps we will revisit this too in the future.
Can we elections be trusted? Can we formalize the connection between election and ? Could this connect be useful? Can theory help with future elections?
]]>
More hard Boolean functions
Peyman Afshani, Casper Freksen, Lior Kamma, and Kasper Larsen (AFKL) have a recent paper which we just discussed.
Today Ken and I will update our discussion.
Their paper assumes the network coding conjecture (NCC) and proves a lower bound on the Boolean complexity of integer multiplication. The main result of AFKL is:
Theorem 1 If the NCC is true, then every Boolean circuit that computes the function has size .
The function is: Given an -bit number and a number so that , compute the -bit product of by :
This is a special case of the integer multiplication problem. In symbols it maps and to , as in our photo above.
Our point, however, is not about integer multiplication. Nor even about NCC—no knowledge of it will be needed today, so read on even if you are not aware of NCC. No. Our point is that a whole lot of other Boolean functions would inherit the same circuit lower bound as . And several aspects of that seem troubling.
We are impressed by the AFKL paper but also worried. Proving a super-linear lower bound in the unrestricted Boolean complexity model has long been considered a difficult problem. Maybe a hopeless problem. Yes they are proving it not for a single-output function; they are proving it for a multiple-output function. Still I thought that it seems too good to be correct. Even worse, assuming NCC they also resolve other open problems in complexity theory. I am worried.
What we suggest is to catalog and study the consequences of their results. If we find that their results lead to a contradiction, then there was something to be worried about. Or perhaps it would mean that NCC is false. If we find no contradiction, then everything we discover is also a consequence of NCC. Either way we learn more.
Let’s call a Boolean function an AFKL function provided it has Boolean circuit complexity if the NCC is true. Thanks to AFKL, we now know that integer multiplication is an AFKL function. I started to think about: What functions are in this class? Here are some examples:
We describe the last three next. We show they have linear size-preserving reductions from the function.
Define by
for . For any input not of this form, let be .
Theorem 2 The Boolean function is an AFKL function.
Proof: Let be the input to where and in binary. In linear size we can test , when there is nothing to do, so presume . The first step is to create
This is just binary-to-unary conversion and has linear-size circuits—as in multiplex decoding and as remarked by AFKL. This becomes the first bits of an application of to the -bit string
It yields
Changing the first bit to then leaves the desired output of .
The point is that is a super-simple function. It just moves the initial block of ‘s in a string to the end. It is amazing that this function should have only non-linear, indeed -sized, circuits.
This also means that Ken’s function, which takes and moves all the s to the end of , is hard even in the special cases where all the ‘s are at the front. What’s strange is that Ken proves his function equivalent to another special case where is even and exactly half the characters are . This latter case is one in which is easy, but the two cases are separate. All this is touch-and-go enough to compound our “worry.”
The following is also an AFKL function.
for where an empty OR is defined to be . This can even further be restricted to the case where exactly one of the are and the rest are . Call this the sparse convolution function.
Theorem 3 The sparse convolution is a monotone AFKL function.
Proof: We will give a sketch of why this is true. Define
It is not hard to show that this yields the FLIP function. We can reduce computing it to a convolution of the ‘s and where
The key is to note that exactly one will be non-zero, and so the convolution is sparse.
The sparse convolution function raises an interesting question: Are the methods for sparse FFT useful here? The lower bound for AFKL functions suggests that they are not applicable.
The subtitle of our post marveled that a core-theory advance on circuits for multiplication had come via the practical side of throughput in computer networks. AFKL deserve plaudits for linking two communities. We should mention that one theoretician we both know well, Mark Braverman, with his students Sumegha Garg and Ariel Schvartzman at Princeton, proved a fact about NCC that is relevant to this discussion:
Theorem 4 Either NCC is false, or bit-operations save a whole factor in the network size.
Even this paper, however, does not address lower bounds on Boolean circuits. The only prior link between NCC and Boolean complexity is a 2007 paper by Søren Riis, which is cited by AFKL, and has a 2011 followup by Demetres Christofides and Klas Markström. The paper by Riis has a new “guessing game” on graphs and a demonstration that a lower-bound conjecture of Leslie Valiant needs to be rescued by dividing by a factor. Theorem 4, however, seems to say that no such shading can apply to NCC.
When we ask Google “network coding conjecture Boolean circuit lower bounds” (without quotes), the first page shows AFKL, our posts, and this 2014 survey by Ryan Williams—which mentions neural networks but not NCC. On the next page of hits we see Riis and the followup paper but nothing else that seems directly relevant. Nor does appending `-multiplication’ help screen out AFKL and our posts.
There is said to be empirical evidence for NCC. We wonder, however, whether that has reached the intensity of thought about circuit lower bounds. We say this because the implications from NCC make three giant steps:
So one side of our worry is whether NCC can actually shed light on so many fundamental issues from complexity theory, more than absorbing light. At the very least, AFKL have re-stimulated interest in all of these issues.
Is hard? Is NCC true? What other Boolean functions are AFKL functions? What about other consequences of the NCC to complexity theory?
]]>
An award for educational writing
[ ACM ] |
Robert Sedgewick is the 2018 recipient of the ACM Outstanding Educator Award.
Today we congratulate Bob on this wonderful honor.
The award is named after Karl Karlstrom. Years ago, he was an editor at the publishing house Prentice-Hall. To convey why the award was named for him, it may suffice to quote one nugget. This is “Fortune 341” from the old motd (message of the day) program which gave some humor or wisdom when you logged into UNIX/Linux:
“I have travelled the length and breadth of this country, and have talked with the best people in business administration. I can assure you on the highest authority that data processing is a fad and won’t last out the year.”
— Editor in charge of business books at Prentice-Hall publishers, responding to Karl V. Karlstrom (a junior editor who had recommended a manuscript on the new science of data processing), c. 1957
Karl K. had a knack for being right, you see.
I recall Karl fondly. We mostly interacted when we were both attending some theory conference. I often found myself talking to him over a drink while we sat in a hotel bar. This was back, ages ago, when I did drink a beer or two. Karl was one we could count on to amuse and also—most importantly—pick up the bar tab. He had an expense account. The IEEE “Computer Pioneers” site says this about him:
Early computer science textbook editor who put Prentice-Hall in the forefront, but who lost heart when he learned that the best textbook criteria are short words, big type, wide margins, and colored illustrations. ACM named its education award after him.
The ACM award may be named for Karlstrom, but I suspect that many of the awardees, including Bob, never had the pleasure of meeting Karlstrom. Too bad.
I believe we all know why Bob was selected to get this award. He has done some wonderful work in many aspects of education. He is best known for his series of Algorithms textbooks. I thought it might be fun to recall a couple of Sedgewick stories that have nothing to do with his main work.
A big result. One day Bob grabbed me and told me that he had a wonderful result. This is when I was still at Princeton. I asked what was the breakthrough? He explained:
I now can do arrows really well. Really.
What? He explained that TeX and LaTeX did not do arrows well. This refers, of course, to arrows as in directed graphs or flow diagrams. Bob uses lots of diagrams, with lots of arrows, in his textbooks. He had worked hard to get a postscript hack that made arrows look great. Thus he could typeset an arrow so it looked perfect even when it touched another object. I listened and was unsure what to make of his claim. Was he losing it? He then showed me a print-out of some of his arrows. I have to say they really did look quite good.
A secret result. Bob and I worked for a while on a front-end to TeX we called notech. The concept was to have the absolute minimum of commands, and have the notech system figure out what you mean. For example, in an earlier system for typesetting from Bell Labs, called Troff, a new paragraph was marked by the command . Thus
This is part of a paragraph. .PP And this is the start of the next paragraph.
This is ugly and TeX’s idea is much better. As you probably know the start of a new paragraph is marked by a blank line. No ugly command like .
What Bob and I did was to try and take this idea as far as possible. The system notech tried to guess line breaks, math displays, tables, verbatim for C code, text displays, and much more. It did this with out using commands for as much as possible. I used the system for years for all my papers and memos and notes. Eventually, I gave it up and switched to LaTeX like every one else.
A public result. Bob also worked with me and my team in the 1980’s on systems for designing VLSI chips. One such paper was joint with Jacobo Valdes, Gopalakrishnan Vijayan, and Stephen North: VLSI Layout as Programming. The trouble with this and related work is that it never took off; it never had as much impact as we thought it would. Oh well.
We wish Bob the best. May he be awarded many other prizes.
]]>
Practice leads theory
Peyman Afshani, Casper Freksen, Lior Kamma, and Kasper Larsen have a beautiful new paper titled “Lower Bounds for Multiplication via Network Coding”.
Today we will talk about how practical computing played a role in this theory research.
The authors (AFKL) state this:
In this work, we prove that if a central conjecture in the area of network coding is true, then any constant degree boolean circuit for multiplication must have size , thus
almostcompletely settling the complexity of multiplication circuits.
We added the strikeout because of the upper bound that we discussed recently here.
AFKL have conditionally solved a long standing open problem: “How hard is it to multiply two -bit numbers?” Their proof shows that a conjecture from practice implies a circuit lower bound. This is rare: using a conjecture from practice, to solve a complexity open problem. We have used conjectures from many parts of mathematics, and from some parts of physics, to make progress, but drawing on experience with practical networking is strikingly fresh.
The authors AFKL explain the history of the multiplication problem. We knew some of the story, but not all the delicious details.
In 1960, Andrey Kolmogorov conjectured that the thousands of years old -time algorithm is optimal and he arranged a seminar at Moscow State University with the goal of proving this conjecture. However only a week into the seminar, the student Anatoly Karatsuba came up with an time algorithm. The algorithm was presented at the next seminar meeting and the seminar was terminated.
Ken and I wish we could have Kolmogorov’s luck, in one of our seminars. Partly because it would advance knowledge; partly because it would let us out of teaching. Sweet.
The main result of AFKL is:
Theorem 1 Assuming the Network Conjecture, every general boolean circuit that computes the product of two -bit integers has size order at least .
This says that the boolean complexity of multiplication is super-linear. No restriction of a bounded depth, no restriction on the operations allowed, no restrictions at all. Given our non-existent lower bounds this is remarkable. If it was unconditional, it would be a terrific result. But it still is a strong one.
We will next explain what the Network Coding Conjecture (NCC) is.
One of the basic papers was authored by Rudolf Ahlswede, Ning Cai, Shuo-Yen Li, and Raymond Yeung here. The paper has close to ten thousand citations, which would be amazing for a theory paper.
In basic networks each node can receive and send messages to and from other nodes. They can only move messages around—they are not allowed to peer into a message. The concept of network coding is to allow nodes also to decode and encode messages. Nodes can peer into messages and create new ones. The goal, of course, is to decrease the time required to transmit information through the network.
The following example combines figures from a 2004 paper by Zongpeng Li and Baochun Li which formulated the NCC. At left is a situation where two senders, with an -bit message and with an -bit message , wish to transmit to respective receivers and . The network’s links are one-way as shown, with two intermediate nodes and , and each link can carry bits at any one time.
If and are black-boxes that must be kept entire, there is no way to solve this in three time steps. But if the nodes can read messages and do lightweight computations, then the middle diagram gives a viable solution. Node reads and and on-the-fly transmits their bitwise exclusive-or to node . Node relays this to each receiver, who has also received the other party’s message directly. The receivers can each do a final exclusive-or to recover the messages intended for them.
The ability to look inside messages seems powerful, and there are networks where it helps even more dramatically. Incidentally, as noted by Wikipedia, the exclusive-or trick was anticipated in a 1978 paper showing how the two senders can exchange their messages and by relaying them to a satellite which transmits back to each.
However, there is another solution if the links are bi-directional and messages can be broken in half. Sender simply routes half of one way around the network and the other half the other way. Sender does similarly. This is shown at far right. Each link never has more than bits of total load and the three-step elapsed time is the same. Moreover, the link from to is not needed. This is just an undirected network commodity flow with fractional units.
In fact, no example is known in an undirected network where encoding beats fractional routing. That is, the known network encoding rate is just the flow rate of the network. The network coding (NCC) conjecture is informally:
The coding rate is never better than the flow rate in undirected graphs.
The paper by Li and Li gave formal details and several equivalent statements. Quoting them:
For undirected networks with integral routing, there still exist configurations that are feasible with network coding but infeasible with routing only. For undirected networks with fractional routing, we show that the potential of network coding to help increase throughput in a capacitied network is equivalent to the potential of network coding to increase bandwidth efficiency in an uncapacitied network. We conjecture that these benefits are non-existent.
What has become of the NCC in the fifteen years since? Here’s how Ken and I see it:
The Good News: The NCC helps solve long-standing open problems. Since this conjecture is widely believed this is impressive. Besides integer multiplication, NCC has been used to prove other lower bounds. For example, Larsen working with Alireza Farhadi, Mohammad Hajiaghayi, and Elaine Shi used it to prove lower bounds on sorting with external memory.
The Bad News: The NCC helps solve long-standing open problems. This suggests that this conjecture could be deep and hard to resolve. The boolean complexity of integer multiplication is a long standing open question. Since the NCC leads to a non-linear lower bound, perhaps proving this conjecture could be hopeless.
I have mixed feelings about these lower bound results. They are impressive and shed light on hard open problems. But I wonder if the NCC could be wrong. There is a long history in complexity theory where guesses of the form:
The obvious algorithm is optimal
have failed. The situation strikes us as resembling that of the (Strong) Exponential Time Hypothesis, in ways we discussed four years ago.
The authors AFKL did not know that an upper bound had been proved for integer multiplication when they posted their paper. They did, however, prove a stronger version of Theorem (1) for a problem with a known upper bound. This is to create circuits with input gates and output gates that given and a binary number output the string whose bits through equal , with other bits . A conditional lower bound on this shift task implies the same for multiplication, since the shift is the same as multiplication by .
Theorem 2 Assuming the NCC, circuits for the shift task need size order .
The proof is disarmingly elementary: The input gives “senders” and each value of creates a different set of “receivers.” With a circuit fixed, they show one can fix a shift so that the average distance from sender to receiver in an undirected multi-commodity flow is , giving total flow. If achieves smaller size, then it represents a counterexample to the NCC.
Pretty neat—this is half a page in the paper. The paper proves more intricate results relating to conjectures by Les Valiant about Boolean circuits of bounded fan-in and depth that compute permutations and their reduction to depth-3 circuits of unbounded fan-in. This also may extend to the sorting/shifting problem Ken wrote about long ago in a guest post for Lance Fortnow and Bill Gasarch’s blog.
Is the NCC true? Can it be proved for some interesting classes of graphs? I believe it is known for tiny size graphs of at most six nodes. What about, for example, planar graphs?
[inserted “conditionally” before “solved” in intro]
[Fixed AFKL typos]
[Russell and Whitehead ] |
Bertrand Russell and Alfred Whitehead were not primarily trying to mechanize mathematics in writing their famous book. They wanted to assure precision and certainty in proofs while minimizing the axioms and rules they rest on. They cared more about checking proofs than generating theorems. By the way: They are listed in the order Whitehead and Russell on the book. See this for a discussion about the importance of the order.
Today Ken and I thought we would add a few more thoughts on why proofs get checked.
We discussed those who claim proofs in our previous post. Once a proof is claimed, it needs people to check it. This is not as fraught as the replication crisis in other sciences where “proof” is a statement of statistical significance whose most intensive check needs repeating the experiment.
If you do a Google search on “why check proofs” you get lots of hits on using automated proof checkers. Coming on eleven decades after the publication of Russell and Whitehead’s three-volume opus Principia Mathematica, these are still in their formative years. We covered a major system of this kind some years ago.
We are personally more interested in what motivates us humans to check proofs. We believe that there are various factors that make it less or more likely to find a good human checker. So today we will try to list some of them.
One of the questions that was raised by some commenters to our recent post is: Why should I check your proof?
This is a critical question. If their is no reason to check your proof, then your result will not get checked. It is almost a tautology. We like this question and thought we could suggest several ways to increase the likelihood that one will check another person’s proof.
So lets assume that Alice is claiming some new theorem and we ponder whether Bob will spend time checking it.
This happens when Bob is required to check her proof. This can happen if Bob is a referee of her paper. It could also be when Bob is hired to do this task. It usually is a weak reason for making someone do the checking. In real life we think that it is unlikely to be a strong motivator.
This happens when Bob feels that he will benefit from checking. The main type of situation here is: Alice’s theorem uses some new method or trick. If Bob believes that this method can be used in his work, in his research, in his future papers, then he is strongly motivated.
We are all very self-centered in our research. If we think we could in the future use your method we are likely to spent time and energy on your proof. Thus if Bob is convinced that Alice has a some new ideas, he is much more likely to spent the time checking her theorem. This means that Alice should—if possible–explain that her proof of uses something new. Proofs that are “just technical inductions” are very unlikely to get Bob to read them. In many areas some authors have stated things like: The proof is a careful induction… This is not a good idea.
This happens when Bob has some “skin” in the game. A classic situation is when Bob has an earlier result that is affected by Alice’s new theorem. If is stronger than Bob’s previous result, then he is motivated to check her theorem. Or if shows that his earlier theorem is false, this is a very strong motivation. Or perhaps Alice has proven a lemma that enables Bob to push something through.
Often we have situations where you do have skin in the game. An old example that comes to mind is from group theory. The problem is a natural question about a class of groups: Let be the class of groups that are generated by elements and all elements in the group satisfy, . Sergei Adian and Pyotr Novikov proved that is infinite for odd, by a long complex combinatorial proof in 1968. This is a famous result.
Shortly after another group theorist, John Britton, claimed an alternative proof in 1970. Unfortunately, Adian later discovered that Britton’s proof was wrong. I do not have first-hand information, but I was told that Adian was motivated by wanting to have the proof. He worked hard until he discovered an unrepairable bug in Britton’s 300-page monograph. The proof was unsalvageable.
A much newer example is from a recent book by Shing-Tung Yau, The Shape of a Life. He is a famous geometry expert and has made many important contributions to many areas of mathematics. We will probably discuss his book in detail in the future, but for today it has a neat example of “skin in the game”. He writes about an enumeration problem of counting how many curves lie on a certain manifold—a century old problem. One group used a clever trick to get the number
However another group discovered via a different method that the count was
Somewhat a different count—not even close. Clearly, both sets of authors were heavily motivated to check their work. And within a month the larger count was found to be wrong and the first was correct.
This is from the wonderful P vs NP pages of Gerhard Woeginger. It was pointed out to us by the commenter gentzen. Quoting Woeginger’s page, including its use of “showed”:
In February 2016, Mathias Hauptmann showed that P is not equal to NP. Hauptmann starts from the assumption that P equals , proves a new variant of the Union Theorem of McCreight and Meyer for , and eventually derives a contradiction. This implies P not equal to NP.
Woeginger gives a link to Hauptmann’s paper, “On Alternation and the Union Theorem,” and thanks two people who communicated this to him.
The union theorem of Albert Meyer and Edward McCreight is the classic theorem that shows how to encode many complexity classes into one. Hauptmann’s idea is not unreasonable. He makes an assumption that P=NP and tries to use it to improve the union theorem. This is a nice idea: Make a strong assumption and then try to improve a deep result. The hope is that this will lead to a contradiction. His abstract ends by saying, “Hence the assumption cannot hold.” We do not know if this paper has received a thorough reading. Update: We have learned that a pair of experts reviewed the argument and found that part of it implied a contradiction to the deterministic time hierarchy theorem, while another part relativizes in a way that would yield a false statement under certain oracles.
Hauptmann is a colleague of Norbert Blum at the University of Bonn. Two years ago, Blum claimed to prove P NP by making technical improvements on a well-known circuit-based attack from the 1980s and 1990s. He has had a long track record of expertise and reliability in this area and his paper was read right away.
The reading was helped by his paper being well-organized, straightforward, and relatively short—the crucial segment was under ten pages. The news broke while we were preparing a post on the August 2017 total solar eclipse in the US. In the 24–48 hours it took us to modify our post, we were already able to draw on several accounts by first-responder readers and check those accounts ourselves against the paper.
The error was triangulated in an interesting way. It was first observed that if Blum’s attack could succeed by the means and premises stated, then it would extend to prove something else that is known not to be true. Once this was ascertained, a closer reading was able to zero in on the exact technical point of error. Blum soon acknowledged this and that the breach was unfixable. The attempt still combines circuit theory and graph theory in ways a student can benefit from learning about, and this furnished its own incentive to read it.
We appreciate the comments on the previous post and hope this adds some additional insights.
[added update about Hauptmann’s paper]
]]>The Claimers |
The Claimers are a gang on the hit AMC television series The Walking Dead. They are the main antagonists in the second half of the zombie-apocalypse show’s Season 4. According to Wikipedia’s description, they “live by the philosophy of ‘claiming’.”
Today Ken and I discuss issues about ‘claiming’ and give advice on how to present your claims—or not.
Yes, this post is about our own “claimers” in complexity theory. It is especially about those who claim to have a solution to P=NP. We will not give any names today. You know who you are.
The TV Claimers meet a grisly end. We will not say any more about it. We want to be nice. But we would like to not keep seeing the same level of zombie claims raised again and again.
Please: Do not stop reading. Yes we know that it is likely that no claimer really has such a proof. However, our suggestions apply to all of us when we have a non-trivial result. Especially a result that has been open, even if the result is not a major open problem. So please keep reading today.
This is a list of ideas for anyone who claims to have solved P=NP or some similar hard open problem in mathematics. There are already lots of suggestions online about what you should do, so this is just a list of additional thoughts. We hope they are helpful.
In order to succeed in mathematics research one has to be a bit arrogant. It is quite difficult to prove new things without some swagger. However, proving or resolving P=NP requires a very non-humble attitude. I think many claimers have not thought how arrogant they are being. The P=NP problem is a huge open problem. Thousands and thousands of researchers have spent years thinking about it. Why do you, the claimer, think you see the light and we remain in the dark?
It might be useful for the claimers to ponder: Why did I succeed where all others have failed? It might be useful to be a bit humble and at least think what did they see that we all missed? If they can say something like:
The reason I succeeded in finding an algorithm for P=NP is that I noticed that No one else seems to see that this insight is very powerful. It is very useful since it implies
I think the vast majority of claimers of P=NP or other big results have almost always worked alone. This is okay, but the average number of authors these days of a theory paper is pretty large. So any paper that is sole authored, perhaps, leads the community think it is unusual—and is wrong. Another point is that being part of a team may help control the arrogance. It can also be invaluable in detecting errors.
There is an advantage in “proving” P=NP over other major problems. There is the Clay prize of a million dollars. I wonder if claimers could use the prize money in some interesting way. How about saying: If you read the my proof and repair it or make it more readable and it is correct, then you get something. A certain dollar amount. Or a percentage of the prize. Or—you get the idea.
Many claimers have also supplied working code for their algorithm. That is they also supply a program that claims to solve some NP-complete problem. I have several thoughts about this. In some cases it seems that it could be possible to have code that works for small size problems, but not in the general case. This seems to be possible for the claims by some that they can solve the Traveling Salesman Problem, for example. Their algorithm could be correct for small instances.
Mathematics is filled with surprises like: This effect works for all values of less than some bound. If the claimers give a program, our expectation based on experience is that it may work for small cases but will probably fail in general.
The last point is that working code could actually be valuable. If the code can be used to solve SAT problems how about using your program to enter a SAT contest and win it. A win, or even a good showing, would help tremendously in convincing people to read the paper. Or use the code to break some known cryptosystem. That would also convince people that they need to read your paper.
Okay we all dream about solving a major open problem. Or even a minor one. Here we give an outline of how to write up such a paper.
I would suggest that you not have any statements about why P=NP is an important problem. None. No history of the problem. No literature survey is needed. None. You goal is to get an expert to read and believe the proof. They will just skip over the above. Also please no statements of how your algorithm that solves P=NP is going to change the world. Just give us the proof.
This is really hard. Hard. I have read a number of claimers’ papers. I try to be helpful. However, many of us do not have the time to look at such papers. Years ago, before Fermat’s Last Theorem was solved, a famous mathematician once made up a post-card that looked like this:
I think that we all have a mental version of this card. There are definitely ways to help induce someone to read a paper and its proof. Look at some recent top theory papers. Even the authors of these papers, often well known authors, work hard to motivate potential readers. The authors often do several things:
They often sketch the proof. By leaving out details they may help get a reader interested.
They often explain the new trick—or tricks. The goal here is to explain some new insight that is used in the proof. We are very self-oriented: If I see that your new trick could be useful in my research that is a huge motivator for me to understand the proof. People are very excited about a strong result, but they are even more excited about a new trick. Explain what is new in your proof. If there is nothing new, no new trick or method, then hmmmm
They often first prove a weaker result. That is, they show that their method can already make progress. If you could prove that the zeta function has all its nontrivial zeros on the critical line, that would be apocalyptic—the famous Riemann Hypothesis, of course. But if you could merely prove that there is no zero in some new region, then that would still be wonderful. And also probably more believable. If you could prove that there is no zero with its real part that would be huge. If the proof of this is simpler, then use it to get readers excited about your full result.
An observation related to the last point is:
Why do all claims of progress on P=NP give a polynomial time bound?
How about just getting a better bound of say for the Traveling Salesman Problem? Or a better bound for factoring? Or a better bound for your favorite problem?
I hope these points help. One last pointer is to double-check dependencies. If your proof relies on a result by someone else, make sure the result really gives what you need. Terms may be defined differently from what you expect, or you may really need a feature of the proof rather than the mere statement. A mis-attributed result can become “undead.”
]]>Valentine Kabanets is a famous complexity theorist from Simon Fraser University. He has been at the forefront of lower bounds for over two decades.
Today we draw attention to this work and raise an idea about trying to unravel what makes circuit lower bounds hard.
He is the common author on two new papers on the Minimum Circuit Size Problem (MCSP), which belongs to but is not known to be complete or in . We posted on MCSP four years ago and mentioned his 1999 paper with Jin-Yi Cai, which gives evidence for MCSP truly being neither complete nor in . This “intermediate” status and the problem’s simplicity have raised hopes that direct attacks might succeed. The new papers prove direct lower bounds against some restricted circuit/formula models, including constant-depth circuits with mod- gates for prime. But they stop short of mod- for composite and other barrier cases.
He has a nifty research statement on his home page. It shows how derandomization, pseudorandomness, circuit complexity, and crypto combine into his two current projects. In a clickable tab for the third heading, he puts the meta-issue in pithy terms:
Why is proving circuit lower bounds so difficult?
His first answer tab speaks a connection we have also often emphasized here:
Traditionally, designing efficient algorithms is the subject of the theory of algorithms, while lower bounds are sought in complexity theory. It turns out, however, that there is a deep connection between the two directions: better algorithms (for a certain class of problems) also yield strong lower bounds (for related problems), and vice versa: strong lower bounds translate into more efficient algorithms.
Of course we agree, and we love connections shown in the new papers to problems such as distinguishing a very slightly biased coin from a true one. But we will try to supplement the algorithmic view of circuit lower bounds with a direct look at the underlying logic.
Okay we all know that circuit lower bounds are hard. For all Kabanets’ success and beautiful work—he like the rest of the complexity field—are unable to prove what we believe is true. They cannot in the full circuit model prove anything close to what is believed to be true for at least a half a century: There are explicit Boolean functions that cannot be computed by any linear size circuit.
We feel that the logical structure of lower bounds statements gives insight into their difficulty. Perhaps this is almost a tautology. Of course the logical structure of any mathematical statement helps us understand its inherent difficulty. But we believe more: That this structure can reveal quite a bit about lower bounds. Let’s take a look at lower bounds and see if this belief holds up.
In particular let’s compare the two main approaches to proving lower bounds: non-uniform and uniform. Our claim is that they have different logical structure, and that this difference explains why there is such a gap between the two. While lower bounds—non-uniform or uniform—are hard, uniform ones are at least possible now. Non-uniform lower bounds are really very difficult.
Here is one example. To prove an explicit size lower bound for Boolean circuits—we’ll be content with just a linear one—we must give a particular family of Boolean functions (each of inputs) so that:
Here is a constant and is assumed to be large enough. The terrific paper of Kazuo Iwama, Oded Lachish, Hiroki Morizumi, and Ran Raz gives explicit Boolean functions whose size for circuits with the usual not and binary and and or operators exceeds .
Let’s look at the above example more carefully. Suppose that in place of a single Boolean function on inputs we have a list of them:
Can we prove the following?
The first thing to note is the effect of letting the number of functions vary:
If , this just becomes our original explicit circuit lower bound problem.
If is a huge value, however, this becomes the exponential lower bound shown by Claude Shannon—a known quantity.
In our terms, the latter takes equal to , so that given our function list is just the list of all Boolean functions. If all we care about is an lower bound, then the high end of the range can be something like . So at the high end we have a simple counting argument for the proof but have traded away explicitness. The question will be about the tradeoffs for in-between the extremes.
The above idea that we can model the lower bound methods by controlling the length of the list of the functions is the key to our approach. Perhaps it may help to note an analogy to other famous hard problems of constructing explicit objects. In particular, let’s look at constructing transcendental numbers. Recall these are real numbers that are not algebraic: they are not roots of polynomials with integer coefficients. They include and
The Liouville numbers of Joseph Liouville.
These are explicit numbers that were proved by him in 1844 to be transcendental. In terms of our model .
The great and puzzle. This is the observation that of or , at least one is a transcendental number. In our terms this gives .
The famous theorem of Georg Cantor—read as proving the existence of transcendental numbers since algebraic ones are countable.
Here the high end of the range is as extreme as can be. Cantor’s `list’ of numbers is uncountable—in our model, is the cardinality of the real numbers. Note, the fact that his is huge, really huge, may explain why some at the time were unimpressed by this result. They wanted the ‘list’ to be small, actually they wanted . See this for a discussion of the history of these ideas.
The theorem by Waim Zudilin, in a 2001 paper, that at least one of the numbers must be irrational. It is for “irrational” not “transcendental,” but exemplified in a highly nontrivial manner. The technical point that makes this work is interactions among these numbers that cannot be captured just by considering any one of them separately. This has .
The issue is this: Suppose that we have a list of several boolean functions . Then we can join them together to form one function so that
Clearly the function is easy implies that all of the are easy. This join trick shows that we can encode several boolean functions into one function. Note, we can even make have only order where has bits.
Thus we can join any collection of functions to make a “universal” one that is at least as hard as the worst of the single functions. More precisely,
Here is the circuit complexity of the boolean function .
If is bigger than , that is if , then the joined function has more than linearly many variables. Can we possibly establish nontrivial interactions among so many functions, say ?
One can also try to get this effect with fewer or no additional variables by taking the XOR of some subset of functions in the list. If this is done randomly for each input length then one can expect hard functions to show up for many . If this process can then be de-randomized, then this may yield an explicit hard function. We wonder how this idea might meld with Andy Yao’s famous XOR Lemma and conditions to de-randomize it.
Ken and I thought about the above simple fact about joins, which seems special to functions. Joining by interleaving the decimal expansions is not an arithmetic operation. However, it appears that there may be a similar result possible for transcendental numbers.
Lemma 1 Suppose that and are real numbers. Then
is a transcendental complex number if at least one of or are transcendental.
Proof: Let be an algebraic number. Thus there must be a polynomial with integer coefficients so that
Then it follows by complex conjugation that
Therefore and are both algebraic; thus, so is their sum which is . Thus is algebraic. It follows that is also algebraic. This shows that is transcendental.
A question: Can we show that we can do a “join” operation for three or more numbers? That is given numbers can we construct a number that is transcendental if and only if at least one of is transcendental?
Is the model useful? Is it possible for it to succeed where a direct explicit argument () does not? Does it need to rise above technical dependence on the case via the join construction?
Maybe the barriers just need 3-Dan martial-arts treatment. Source—our congrats. |
[ Her own hall ] |
Anita Jones is, of course, a famous computer scientist, a famous Carnegie Mellon University graduate, and a decorated past government official. She was the Director for all U.S. Defense Department research—except nuclear—for 1993 to 1997. She has won numerous awards—perhaps the coolest is that the U.S. Navy honored her by naming a seamount in the North Pacific Ocean for her. Here is what Google Maps returns when given the coordinates 51. 25′ N and 159 10′ W.
Today I want to discuss a recent paper of hers.
Okay, not so recent. Well it was published in 1976. It is a joint paper of ours. The topic was information flow with applications to security and cryptography: “The Enforcement of Security Policies for Computation.” Anita and I thought that is was a pretty neat paper. The paper was eventually accepted for the the operating system conference (SOSP) conference in 1975 and later appeared in the JCSS journal—see here.
Two comments on our paper. It was indeed accepted to SOSP, but the acceptance was complicated. It might make a good story, but it happened so long ago that I doubt anyone who would be interested. The paper also totally missed the boat. We had a good idea, a good topic, but we totally missed getting the right model. Totally.
Here is the boat we missed:
We missed it because we looked for a deterministic theory of information flow. An application of the theory was to be to security and cryptography. It was at least partially successful in explaining several puzzles that were around then. But we missed seeing that the best theory had to be probabilistic. That a good theory of information flow needs to argue about probabilities… We missed this totally.
So we are going to introduce a probabilistic theory of No we are still interested in deterministic information flow. We wish to introduce the notion of seeing a value. The intuition is that often in making arguments we may need to say that we can “see” a certain value. This happens naturally in arguments in cryptography. For example, we can express that a function is invertible as: if we can see the value of we can see the value of . Let mean that we can see the value . Think of the symbol as an “eye” that can see the value of the expression . Thus
means that is an invertible function. Rather than formally define let’s give some simple rules that should explain the notion.
Here is a real application to demonstrate the notion of “seeing”. It comes not directly from security but from a 2017 paper by Saminathan Ponnusamy and Victor Starkov on the Jacobian Conjecture. We have however covered cryptographic aspects of the conjecture. The authors look at functions of the form
where and
They give a proof that this function is invertible. Here is one using our “seeing” notion:
Then we get that
and recall that
Thus by subtraction it follows that we get for each . This proves that is invertible.
Note, the proof in the above paper is quite short. However it is a bit complicated and we believe that the above proof is quite transparent.
Is this notion of any use? Also how hard will it be to formally define this notion?
]]>