Cropped from tribute by Tom Leighton |
Ron Graham just passed away Monday at the age of in La Jolla near UCSD.
Today Ken and I wish to say a few words about Ron.
Tributes are being written as we write, including this from the Simons Foundation. Here is the American Mathematical Society announcement, which we saw first:
Ron Graham, a leader in discrete mathematics and a former president of both the AMS (1993-1994) and the MAA (2003-2004), died on July 6. He was 84. Graham published more than 350 papers and books with many collaborators, including more than 90 with his wife, Fan Chung, and more than 30 with Paul Erdős. He was known for his infectious enthusiasm, his originality, and his accessibility to anyone who had a mathematics question.
A tribute by Brady Haran embeds several short videos of Ron and his work. Fan’s own page for Ron has much more. We have made a collage of images from his life:
Ron was special and will be greatly missed by all. We at GLL send our thoughts to his dear wife, Fan. Ken and I knew Ron for many years. Ken knew Ron since a visit to Bell Labs in the 1980s and meeting Fan too at STOC 1990. I knew Ron since I was at Yale in the 1970’s—a long time ago. I recall fondly meeting him for the first time when he was at Bell Labs.
Ken and I thought we would give some personal stories about Graham.
Ken’s story is told here. In breaking a confidence by telling Erdős the secret about Bobby Fischer recounted there, Ken hoped that it would spread behind the scenes to enough people that Fischer would be less blamed for failing to play Anatoly Karpov in 1975. Since Erdős was staying with the Grahams, presumably it would have emerged there. The social excursion during STOC 1990 was a dinner cruise in Baltimore’s harbor. Ron and Fan and Ken found each other right away, and some questions to Ken about chess quickly went to the Fischer topic. At least Ken knows the secret was retold at least once.
Ron told me once that he was the accountant for Erdős. One of Ron’s jobs was to keep track of the prize money that Erdős owed. Ron would send out the checks to whoever solved the next problem. One of the brilliant insights of Erdős was to make the problems hard, but at least some where solvable. Ron told me that for years no one would actually cash the checks. They would frame them and proudly display them.
Ron said that he liked this for the obvious reason—less cash for Erdős to have to pay. But the advent of color xerox machines in the 1970’s changed this. He told me that people began cashing the checks and displaying the color copy. Bummer.
My first talk at Bell Labs was on my work on the planar separator theorem—joint work with Bob Tarjan. At the beginning of the talk I saw that Ron had a pile of papers on his desk. He was a manager and I guessed he had some paper work to do. I gave my talk. At the end I when up to Ron in the back and he said:
I did not get any work done.
I still fondly remember that as high praise.
Graham loved to do hand stands. I recall walking around Bell Labs one day when out of the blue Ron did a full handstand. He said that he liked to do these on the hand rail of the stairs. The trick he said was: “To not fall down.”
I searched for him doing handstands and found out he and Fan lived in a modern beautiful house.
When two mathematicians found a circular home designed by architect Kendrick Bangs Kellogg in La Jolla, they treasured their unique discovery.
Ron kept a simply organized page of all his papers. They are not sorted by subject or kind, but the titles are so descriptive that you can tell at a glance where the fun is. A number of them are expositions in the popular magazines of the AMS and MAA.
Among them, we’ll mention this note from 2016, titled “Inserting Plus Signs and Adding.” It is joint with Steve Butler, who penned his own reminiscence for Lance and Bill’s blog, and Richard Strong.
Say that a number is “reducible” to a number in one step (in base ) if there is a way to insert one or more signs into the base- representation of so that the resulting numbers add up to . For example, 1935 is reducible to 99 via . The number 99 reduces only to 18 via , and 18 reduces only to 9, which cannot be reduced further. Thus Ron’s birth year took reduction steps to become a single digit. However, doing gives 18 straightaway and thus saves a step. The paper gives cases where inserting everywhere is not a quickest way to reduce to a single digit.
Definition 1 For any base and number denoting an input length, not magnitude, define to be the least integer such that all base- numbers of length can be reduced to a single digit within steps.
The question—of a complexity theoretic nature—is:
Given , what is the growth rate of as ?
Here are some possible answers—which would you expect to be correct in the case where is base 10?
Your expectation might be wrong—see the paper for the answer and its nifty proof. For a warmup, if you want to answer without looking at the paper, prove that the final reduced digit is the same regardless of the sequence of reductions.
Ron is also known for very big integers, including one that held the record for largest to appear in a published mathematical proof. You can find it among the above tributes and also on a T-shirt. We could also mention his role in the largest proof known to date—at 200 terabytes it almost doubles the size of the tables for proving results of seven-piece chess endgames.
If you desire serious fun, look also to Ron’s books. He wrote several, including co-authoring the nonpareil textbook Concrete Mathematics with Don Knuth and Oren Patashnik.
Ron, in the tradition famously followed by Erdős, liked to put money on problems. A $10 dollar problem was much easier than a $100 one. A $1,000 one is extremely hard, and so on. In Ron’s paper on his favorite problems he stated this one:
Let . Challenge: prove the inequality for all ,
And he put the prize at $1,000,000. He added:
Why is this reward so outrageous? Because this conjecture is equivalent to the Riemann Hypothesis! A single violating would imply there are infinitely many zeroes of the Riemann zeta function off the critical line . Of course, the $1,000,000 prize is not from me but rather is offered by the Clay Mathematics Institute since the Riemann Hypothesis is one of their six remaining Millennium Prize Problems. We hope to live to see progress in the Challenges and Conjectures mentioned in this note, especially the last one!
Alas Ron did not get to see this resolved. Nor of course did Erdős, nor may any of us. But Ron is prominently mentioned on another Simons page where Erdős lives on, and so may Ron.
Ron died at age . Perhaps he liked that it is the sum of a twin prime , and also three times a perfect number. We will always remember because of Ron. Added 7/10: is also his current h-index according to Google Scholar. HT in comment.
[some word changes, update about h-index]
]]>“Founding Frenemies” source |
John Adams and Thomas Jefferson did not use Zoom. Their correspondence, from 1777 up to their deaths hours apart on July 4, 1826, fills a 600-page book.
Today, Independence Day in the US, we consider the kind of intellectual fireworks represented by the correspondence.
Jefferson and Adams were intellectual opposites as well as political rivals. Adams favored a strong central government to bridle human passions, whereas Jefferson’s support for the French Revolution continued beyond its devolution into the Reign of Terror. They debated many other points of politics, philosophy, and culture.
Abigail Adams, the wife of John, joined in some of the exchanges. Because she often stayed in Massachusetts while he was in Philadelphia or New York or elsewhere, the husband and wife exchanged many letters—over 1,100 in all. His letter to her on July 3, 1776, instituted the use of fireworks to celebrate anniversaries of the Declaration of Independence.
Today there is not much in the way of fireworks displays. Most have been canceled because we cannot allow crowds to view them. In the Buffalo area, some townships are having small displays with limited access, and some displays are being set on high points for possible area viewing. So we felt we should write about fireworks of a different kind, a kind that is not restricted by the pandemic and might thrive through it. But first we’ll make a point about the history of fireworks.
Fireworks go back at least 1,100 years to China, where chemists discovered the fun of stuffing volatile compounds into tubes of bamboo or paper and setting them off. Some have pyrotechnics going back another 1,000 years, to about 200 BCE, insofar as bamboo was known to pop with a loud sound when dried and heated. Gunpowder traveled best of the compounds and made its way into Europe at least by the 1200s. The first recorded wide-scale fireworks display in England was in 1486 for the wedding of King Henry VII to Elizabeth of York, which ended the Wars of the Roses. Shakespeare mentions fireworks in Love’s Labours Lost. The Mughals in India from the 1500s to the 1800s made fireworks a diversion for noble women on the Diwali holiday:
Cleveland Museum of Art source |
Our point is that 1776 isn’t even halfway back to the beginning of using fireworks for celebrations, even just in the West. Can we even call it “Early”? Lavish displays to mark major events were common by the mid-1700s. A royal display in 1749 was accompanied by orchestral music commissioned from George Frideric Handel and went ahead despite rain. Over 12,000 people also paid to attend the main rehearsal six days earlier, many braving an hours-long traffic jam on approaches to the London Bridge. That feels quite modern to us. Adams’s letter mentioned other social features we know today:
It ought to be solemnized with Pomp and Parade, with Shews, Games, Sports, Guns, Bells, Bonfires and Illuminations from one End of this Continent to the other from this Time forward forever more.
The pandemic has curtailed others of these. The major North American team sports have not resumed either. Some parades have been run in “reverse” mode: the floats and performers stay put while spectators drive by slowly in cars.
Adams’s letter has another, earlier, passage that chills today. The letter begins by saying that the Declaration was supposed to have been made in December, 1775, and enumerates plans the colonies had made contingent on this. He then says that what caused the plans to be aborted was an outbreak of disease:
All these Causes however in Conjunction would not have disappointed Us, if it had not been for a Misfortune, which could not be foreseen, and perhaps could not have been prevented, I mean the Prevalence of the small Pox among our Troops. . . . This fatal Pestilence compleated our Destruction.—It is a Frown of Providence upon Us, which We ought to lay to heart.
The ellipsis is in the letter—as Ken’s children have pointed out, trailing off thought with dots in letters or e-mails or Facebook posts or texts is a distinctive habit of us older folk. Thus a specific outbreak of a contagious disease changed our history then as now.
We have remarked on how the pandemic has affected opportunities to exchange ideas and how to compensate. One impacted series that both of us intended to visit this spring has been the series of workshops at the Simons Institute in Berkeley.
Still, the Simons Foundation has continued its other ways to stimulate ideas. Here we offer our congratulations to Venkatesan Guruswami, Omer Reingold, and David Woodruff, who have just been appointed as Simons investigators for 2020.
In briefly talking about their work, we want to make a point about how the pandemic enables taking the long view of ideas—in a way that appointments such as these promote. It is easy to get wrapped up in immediate aspects of a current hot problem and not be aware that it has a history. The history may not involve exactly the same ideas as the problem, but related ideas whose importance was appreciated much earlier. “Early” may not mean the Middle Ages or the 1700s as with fireworks, but it can mean times before any of us were born.
Venkatesan and Omer and David each have done some stellar research, broadly in various parts of theory. They each have many results, but we thought we would highlight just one result each. We picked a result that we think is representative, is deep, is beautiful, and is one that we personally admire the most.
Venkatesan did important work on a problem that was created before complexity theory existed. Our favorite is his ground-breaking work on list decoding.
What is the best way to encode data to protect it against various kinds of errors? This is still open. But Venkatesan changed the landscape.
The questions about error correcting codes go back to the 1940’s. Usually the first results are credited to Richard Hamming in 1947. Soon the notion of list decoding was introduced. The cool idea is that doing not require an answer, but allow a list of possible answers. The hope is that with other information about the message we might be able to select the answer.
Venkatesan and Ken’s colleague Atri Rudra found explicit codes that achieve list-decoding capacity, that is, they have optimal redundancy.
What we like so much is the model is so natural and so powerful. There are many applications of list decoding to complexity theory. See Madhu Sudan’s survey for some additional comments.
Omer did his most important work on problems that were first studied in the early days of complexity theory. Our favorite is his beautiful work on small-memory deterministic graph walks.
Is ? This is still open. But Omer made a huge contribution to our understanding of fundamental complexity classes. Romas Aleliunas, Dick Karp, Laszlo Lovasz, and Charlie Rackoff proved earlier that random small space could navigate undirected graphs provided they could flip coins. In a sense Omer removed the coins to get his result that undirected graph connectivity is in . The previous result was easy—I can say that because I (Dick) was a co-author on it—but Omer’s theorem is deep.
Omer’s proof drew heavily on expander graphs and the zig-zag product from his earlier work with Salil Vadhan and Avi Wigderson for creating them.
David did his most important work on problems that were only created relatively recently. Our favorite is his work on approximately counting distinct elements. This work is joint with Daniel Kane and Jelani Nelson and appeared at PODS 2010. It was the first streaming algorithm with an optimal combination of space usage and update time. Here is the relevant table from their paper (KNW):
Streaming algorithms are relatively new and parts of data science are newer. But working with data is old, as old as codes. This finally leads us to pose an outlandish question:
Can all of this work be usefully interpreted from the standpoint of coding theory?
This is outlandish, because the word “code” does not even appear in either Reingold’s paper or KNW. But part of holding coding theory to be a paradigm, as both Ken and I experienced in graduate school, is that its perspective should expand. Is this capable of creating intellectual fireworks? We’ll see.
Have a safe and happy fourth of July.
[some small fixes]
]]>Composite crop of src1, src2 |
Joshua Greene and Andrew Lobb proved last month that every smooth Jordan curve in the plane and real , there are four points on the curve that form a rectangle with sides of ratio .
Today we explain how this result relates to Otto Toeplitz’s famous “square peg conjecture,” which is the case when the curve need not be smooth.
We noticed this via an article last Thursday by Kevin Hartnett for Quanta. Hartnett describes this research advance as a product of the pandemic inducing them to take time for deeper reflection on fundamental problems. We wonder how much is bubbling on hard problems in our own field—this is one reason for our last post’s interest in (good kinds of) “gossip.” He also gives great diagrams for the geometrical intuition.
We will portray this advance instead along lines of things we’ve said recently about how to attack hard problems by seeking and solving simpler ones or special cases as stepping stones. Dick wrote a post two years ago on the original Toeplitz problem, which this work still leaves open. That post focused on ways general Jordan curves can be nasty. This one needs an extra niceness condition but proves a stronger result for all . How much can progress on “nice” inform problems with “nasty”? That kind of question comes up in complexity theory all the time.
A Jordan curve is the image of a continuous 1-1 map from the circle to the plane. The condition of being 1-1 prevents the image from intersecting itself, so it is a single closed loop. By the Jordan curve theorem, the loop always partitions the rest of the plane into two connected regions, exactly one of which is bounded. Amazingly, a Jordan curve can have positive Lebesgue measure, yet cannot fill all of . Such curves can, however, approach the kind of space-filling curves defined by Giuseppe Peano, and thus have any Lebesgue density less than , as was noted also by William Osgood in his 1903 paper.
The Toeplitz conjecture is that every Jordan curve has four points that form a square. As noted in Dick’s post, the positive-area case is actually an easy yes-case. Nasty cases are where the curve is nowhere-differentiable with zero area. Thus far, the problem has been answered yes only in the presence of some uniformity condition that limits the local nastiness of the curve. The simplest one is for the curve to be smooth in that has a continuous first derivative. Here is a smooth curve that is not convex, so that the square need not be “inside” the curve:
This diagram is from Benjamin Matchske’s wonderful recent survey of the peg problem in the AMS Notices. Four months ago we’d have said it looks like a thin heart or fat boomerang. Now it looks to us like a face mask.
As the survey notes, it is easy to show that every Jordan curve has some rectangle. The rectangle problem is to show that rectangles of every aspect ratio can be pegged to the curve. There are two main reasons the square and rectangle problems are hard and harder:
The property in the second point enables arguments of the kind: for generic curves the number of squares is odd, therefore it is nonzero. This then carries over to sufficiently nice curves in the generic closure. The failure of this property for rectangles is a main reason the results by Greene and Lobb are new.
The new paper is written tersely at a high level, and we must confess not being able to catch all details in compressed time. But we can highlight some aspects of the argument. First, as we have said, it exemplifies:
This is however coupled with a second aspect:
It may be that the rectangle conjecture is not only true but “equally true” in the sense that the fundamental reason applies equally well to the case of rectangles. The parity argument that works for cases involving squares may be a crutch that misses the deepest explanations. Promoting arguments that avoid this crutch may marshal resources needed to make a breakthrough on the main problem for completely general Jordan curves.
This runs somewhat counter to what we have said about versus proof attempts recently. The reason for believed by most is that problems like SAT require exponential time, not just super-polynomial time. Yet no one knows even a super-linear lower bound, apart from barely-superlinear bounds that exploit grainy aspects of the multitape Turing machine model and apply only to it. Nor is any super-linear lower bound known on Boolean circuit size. Maybe it is a viable strategy also to “bring up reserves” by finding restricted cases where (conditional) exponential lower bounds can be proven, as well as to explore contingencies like SETH, as we covered here. But both of use have always felt that the super-linear frontier holds the buried keys to further progress.
The Greene-Lobb proof has two aspects that may also resonate in complexity:
The article in Quanta has great diagrams illustrating how the set of pairs of points on the curve corresponds to a Möbius strip in a related space. Cole Hugelmeyer, a graduate student at Princeton, proved last year how to get rectangles covering at least one-third of possible values by embedding the Möbius strips in a four-dimensional space. Intersections between a strip and a rotated copy yield rectangles on the original Jordan curve.
That led Greene and Lobb to consider larger objects with properties like pairs of Möbius strips in the larger space. The Klein bottle is the natural next thing to consider and led to a feature of their proof that made the desired conclusion pop out:
The clever point pivots on the fact that a Klein bottle cannot be smoothly embedded into the complex plane as a Lagrangian submanifold. The proof shows that for any prescribed aspect ratio one can construct a mapping that almost succeeds in embedding such a Klein bottle. The fact that it must fail means that the image must yield a point of intersection witnessing the failure. The presence of this point then yields the construction of four other points on the Jordan curve that form the vertices of the needed rectangle.
We have briefly mentioned how the concept of obstructions is integral to the Geometric Complexity Theory attack on . Closer to home, however, is how László Babai's graph isomorphism algorithm and proof works by identifying a subclass of graphs called Johnson graphs as obstructions to a simpler algorithm, as we highlighted here.
Can you find more lessons from the new advance on the Toeplitz problem? Dick and I have considered ideas of blowing up from languages to pairs of languages (and making reductions between problems the fundamental units of analysis) but this has not gone beyond dreamwork.
]]>
Gossip and more.
Composite of , src1, src3 |
Jessica Deters, Izabel Aguiar, and Jacqueline Feuerborn are the authors of the paper, “The Mathematics of Gossip.” They use infection models—specifically the Susceptible-Infected-Recovered (SIR) model—to discuss gossip. Their work was done before the present pandemic, in 2017–2019. It is also described in a nice profile of Aguiar. Their analogy is expressed by a drawing in their paper:
Not just for today, but for the summer at least, Ken and I want to share some gossip, share some problems, and ask our readers a question.
The question first. Ken and I wonder if GLL should start a virtual theory “lunch” meeting, that would meet periodically via Zoom. It would be like meeting for a theory lunch in the old days—just not all in the same room. Some topic might be agreed on, perhaps a short presentation, and always a chance to swap some gossip. Plus maybe ask the group for advice on a problem.
I do miss the old meetings:
AcademicKeys #78 source |
What do you think? Should we have such meetings?
The study of gossip feeds into other issues of the spread of information and misinformation during an election year. Ken’s Buffalo colleague Kenny Joseph has research on the spread of fake news and Twitter sentiment using mathematical tools adjacent to those of the trio above. Ken and I still intend to say more about epidemiology models ourselves when we get time. But no, here by “gossip” we just mean actual pieces of gossip—just as at a conference or other kind of in-person meeting.
Here are two examples of the kind of gossip we might exchange.
Anna Gilbert is moving from Michigan math to Yale math and statistics. She will be the John C. Malone Professor of Mathematics, Professor of Statistics & Data Science. Pretty impressive. She was at Michigan math for years. She told me that she could not wait for the next power of , and thus had to try a new place, with new challenges.
Rich DeMillo is a long time friend who is at Georgia Institute of Technology and is not moving. He continues working on making voting fair, secure, and efficient. He is referenced in a recent article on voting issues in Georgia. See here for details.
Note: we have in mind pleasant and factual gossip. The most useful kind is about probable directions and emphases to make projects attractive to pursue. This leads into our other component.
Here are a problem from each of us as examples of what could be discussed in these meetings and why that might give advantage over just hunting the literature. Both are about factoring—always factoring…
I, Ken, would like to know about field tests of approximative methods in quantum computing, specifically of shortcuts to Shor’s Algorithm. The approximations I have in mind are rougher than those I find in the literature and need not be physically natural.
To explain, the way Shor’s algorithm is proven correct in Shor’s paper and all textbooks we know—including ours—uses an exact analysis involving the quantum Fourier transform, in which exponentially fine phase angles appear in terms. Approximation can be argued in several ways. Circuits of Hadamard, CNOT, and -gates, in which no phase angle finer than appears, can approximate arbitrary quantum circuits to exponential precision with polynomial overhead. With just Hadamard and Toffoli gates, hence and as the only phases, one can approximate the data returned by measurements in the algorithm, though without approximating the algorithm’s result vectors in complex Hilbert space. There are other ways to approximate those vectors while eliding the finer phase components. We would like to see more attention to the concrete overheads of all these methods.
What I would really like to discuss, however, is efforts toward more-brusque approximations that could yield new classical attempts on factoring. For a broad example, note that not only does reduce to but also individual steps in Shor’s algorithm can be broken down as reductions to counting. Now suppose we apply approximate counting heuristics to those steps. The stock answer for why this doesn’t work to approximate quantum measurement properties globally is that those probabilities have the form
where and are functions and is something like . Note the knowledge beforehand that is between and . The point is that is exponential yet smaller than the additive approximations possible in polynomial time for and individually, so the approximation gives no help to the difference.
However, this does not prevent using approximations of magnitudes that are not differences at intermediate steps. For a vein of more particular examples, I raise the translation from quantum gates to Boolean formulas in my 2018 paper with Amlan Chakrabarti and my recent PhD graduate Chaowen Guan. This translation can encode the state at any intermediate stage of an execution of Shor’s algorithm by a Boolean formula . The size of stays linear in the size of the quantum circuit being simulated—the exponential explosion happens only when we try to count solutions to . The formula encodes all the information in , including the implicit presence of fine phase angles. Now suppose we alter to a whose corresponding is a simplified approximation of . The kicker is that might not need to be a legal quantum state. The transformations in our paper for later stages of the circuit will still apply building on .
Is there any chance of this working? Heuristic approaches applying SAT to factoring have been tried and found to be tough. The nice site BeyondNP includes links to #SAT counters such as sharpSAT and Cachet. Thus we are not asking anything outlandish. Leveraging Shor’s algorithm might be a new approach. Has anyone tried it? That’s the kind of question I would visit a conference to ask, where wider arity might work better than asking people individually. Thus also for raising it in a meeting.
I have recently been thinking about the power of weak sub-theories of Peano Arithmetic. There are many proofs known that there are an infinite number of prime numbers. The usual proofs use this:
For all there is some prime that divides .
Given this it is not hard to prove, in many ways, that there are an infinite number of primes. Euclid’s original proof uses it in the step: Let divide . The idea is suppose some weak theory can prove the above. This means that it can prove:
I believe that this shows that if the theory is weak enough that this implies that factoring is in polynomial time. Is this known? Is it true?
Once again, we can hunt for literature on this. We can ask individual people, such as Avi Wigderson and various co-authors of his. But our hunch is that this topic was explored in the 1990s without a definitive resolution. It could be more effective to get up to speed on it and share ideas in a meeting.
Should we start a virtual theory lunch? Would you attend?
Added 6/21: Getting back to gossip, we wonder if the all-to-all nature of a Zoom meeting, versus few-to-few in a conference hallway, would filter out the badder kinds of gossip.
]]>Norbert Blum is a computer science theorist at the University of Bonn, Germany. He has made important contributions to theory over his career. Another claim to fame is he was a student of Kurt Mehlhorn, indeed the third of Mehlhorn’s eighty-eight listed students.
Today I wish to discuss a new paper by Blum.
No, it does not solve the P versus NP problem. The title of his paper is: On the Approximation Method and the P versus NP Problem. Its is available here.
Blum. like most complexity theorists, believes that P is weaker than NP. This is usually stated as PNP. The staff at GLL have the idea that we should state this as
This is clearer, more to the point, and logically what PNP actually says. We will soon have T-shirts, mugs, and other stuff available in our web store at https:donotgotothisaddressplease.com.
In 2017 Blum released a paper that tried to prove PNP. It caused a sensation—it was discussed on the complexity blogs such as In theory and Shtetl-Optimized. And also at GLL. Blum’s paper got thousands of Twitter mentions. Unfortunately he had to retract it, since it is wrong: He said:
The proof is wrong. I shall elaborate precisely what the mistake is. For doing this, I need some time. I shall put the explanation on my homepage
Look at here for more comments that were made after his paper was released.
He did, months later in 2017, post a two-page retraction
here. His original paper’s abstract:
Berg and Ulfberg and Amano and Maruoka have used CNF-DNF-approximators to prove exponential lower bounds for the monotone network complexity of the clique function and of Andreev’s function. We show that these approximators can be used to prove the same lower bound for their non-monotone network complexity. This implies P not equal NP.
This approach is what we will discuss.
Blum’s new paper does not claim to prove PNP, but gives his thoughts on PNP. I think he has earned our attention. It must have been difficult to go from thinking you have solved the problem to retracting your paper. I have thought, privately, that I had solved some neat problems. Only later to discover that I was wrong. I cannot imagine how tough it was to do this in public.
Blum’s work on proving lower bounds began with his dissertation under Mehlhorn, which included a 1985 paper on monotone network complexity for convolutions. Earlier in 1984 Blum proved a lower bound of order . This stood for thirty years until in 2015 Magnus Find, Alexander Golovnev, Edward Hirsch, and Alexander Kulikov improved it to order . A long way from super-polynomial lower bounds. See also a talk about this work.
Blum’s new paper discusses an old approach to prove boolean circuit lower bounds. The methods he used in 1984 and those improved in 2015 do not seem to be on track to prove even non-linear circuit lower bounds.
Let’s look at his comments at a high level. See his paper for details.
Suppose that one has a boolean function that is monotone: recall this means that if , then changing some input from does not change the value of . Then it is always possible to compute without using any negations: only and/or operations are needed. Sometimes one can prove that the number of such operations is super-linear, sometimes even super-polynomial. Even bounds in this restricted model can be deep.
The idea that has tempted Blum and many other complexity theorists is: Can we extend the proofs for lower bounds without negations to ones with negations? One problem is there is a function so that the following are true:
This is the famous Tardos function due to Éva Tardos. The existence of this function sunk Blum’s original paper. And it makes life hard for this general program—this is an instance of what our previous post meant by a proof attempt running up against a fundamental law. Negations can help tremendously in computing a function.
In his new paper he surveys boolean complexity ideas—especially those linked to monotone complexity. He begins by trying to argue that the largeness feature of the natural proofs barrier, which applies to combinatorial properties defined via sub-additive circuit complexity measures, does not constrain approximation complexity measures of the kind he envisions. He then proceeds to define CNF-DNF approximators and further what he calls sunflower approximators. He does enough development to highlight a missing piece of information about monomial representations of non-approximated pieces of the Boolean function one is trying to prove hard. He concludes that without this information, his methods cannot even prove super-linear size lower bounds on general circuits.
He ends with this assessment:
How to proceed the work with respect to the P versus NP problem? Currently, I am convinced that we are far away to prove a super-polynomial lower bound for the non-monotone complexity of any explicit Boolean function. On the other hand, the strongest barrier towards proving PNP could be that it holds P = NP. To ensure that the whole time spent for working on the P versus NP problem is not used to prove an impossible theorem, I would switch to the try to develop a polynomial algorithm for the solution of an NP-complete problem.
Note, we have changed his PNP to PNP. Ken and I agree with him on trying to work both on PNP and P=NP. However, see our comments below.
I applaud Blum for thinking about PNP. We need people to be fearless if it is ever going to be solved. However, I personally believe that his approach may be wrong:
I am not as sure as he is that PNP. I do think that P=NP is possible, especially if algorithms are allowed to be galactic. Recall these are algorithms that run in polynomial time, but in polynomials of astronomical degree.
Also I am not sure if the boolean approach to PNP is the right one. Suppose there is a so that SAT has boolean circuits of size where
It still could be the case that PNP, since there may be no uniform algorithm for SAT.
Restating the last point: I believe we should try to prove what is needed, and not any more. The approach to PNP based on boolean circuit complexity is trying to prove too much. A proof that SAT has super-polynomial circuits does imply more than PNP. A proof that SAT cannot be solved in time on a multitape Turing machine would imply much less than P < NP, yet still be a breakthrough
Be cheap, prove the least possible.
]]>
Proofs and perpetual motion machines
Leonardo da Vinci is, of course, famous for his paintings and drawings, but was also interested in inventions, and in various parts of science including mathematics and engineering. It is hard to imagine that he died over 500 years ago, given his continued impact on our world. He invented practical and impractical inventions: musical instruments, a mechanical knight, hydraulic pumps, reversible crank mechanisms, finned mortar shells, and a steam cannon.
Today I wish to discuss proofs and perpetual motion machines.
You might ask: What do proofs and perpetual motion machines have in common? Proofs refer to math proofs that claim to solve open problems like P NP. Ken and I get such claims all time. I take a look at them, not because I think they are likely to be correct. Rather because I am interested in understanding how people think.
I started to work on discussing such proofs when I realized that such “proofs” are related to claims about perpetual motion machines. Let’s see how.
A perpetual motion machine is a machine that operates indefinitely without an energy source. This kind of machine is impossible, as da Vinci knew already:
Oh ye seekers after perpetual motion, how many vain chimeras have you pursued? Go and take your place with the alchemists.
—da Vinci, 1494
I like this statement about applying for US patents on such machines:
Proposals for such inoperable machines have become so common that the United States Patent and Trademark Office (USPTO) has made an official policy of refusing to grant patents for perpetual motion machines without a working model.
Here is a classic attempt at perpetual motion: The motion goes on “forever” since the right side floats up and the left side falls down.
The analogy of proofs and to perpetual motion machines is: The debunking such a machine is not done by looking carefully at each gear and lever to see why the machine fails to work. Rather is done like this:
Your machine violates the fundamental laws of thermodynamics and is thus impossible.
Candidate machines are not studied to find the exact flaw in their design. The force of fundamental laws allows a sweeping, simple, and powerful argument against them. There are similar ideas in checking a proof. Let’s take a look at them.
Claims are made about proofs of open problems all the time. Often these are made for solutions to famous open problems, like PNP or the Riemann Hypothesis (RH).
Math proofs are used to try to get to the truth. As we said before proofs are only as good as the assumptions made and the rules invoked. The beauty of the proof concept is that arguments can be checked, even long and complex ones. If the assumptions and the rules are correct, then no matter how strange the conclusion is, it must be true.
For example:
The Riemann rearrangement theorem. A sum
that is conditionally convergent can be reordered to yield any number. Thus there is series
that sums conditionally to your favorite number and yet the is just a arrangement of the . This says that addition is not commutative for infinite series.
Cover the largest triangle by two unit squares: what is the best? The following shows that it is unexpected:
The point of a proof is that it is a series of small steps. If each step is correct, then the whole is correct. But in practice proofs are often checked in other ways.
The starting point for my thoughts—joined here with Ken’s—are these two issues:
Note that a deep, hard theorem can still have straightforward logic. A famous theorem of Littlewood has for its proof the structure:
The RH-false case takes under a page. The benefit with this logic is that one gets to assume RH for the rest. The strategy for the famous proof by Andrew Wiles of Fermat’s Last Theorem (FLT)—incorporating the all-important fix by Richard Taylor—has this structure:
Wiles had done the last step long before but had put aside since he didn’t know how to get . The key was framing so that it enabled bridged the gap in his originally-announced proof while its negation enabled the older proof.
Thus what we should seek are proofs with simple logic at the high level that breaks into cases or into sequential sub-goals so that the proof is a chain or relatively few of those goals.
This makes Ken and I think again about an old paper by Juris Hartmanis with his students Richard Chang, Desh Ranjan, and Pankaj Rohatgi in the May 1990 Bulletin of the EATCS titled, “On IP=PSPACE and Theorems With Narrow Proofs.” Ken’s post on it included this nice diagram of what the paper calls “shapes of proofs”:
Ken’s thought now is that this taxonomy needs to be augmented with a proof shape corresponding to certain classes believed to be properly below polynomial time—classes within the NC hierarchy. Those proofs branch at the top into manageable-size subcases, and/or have a limited number of sequential stages, where each stage may be wide but is shallow in its chains of dependencies. Call this shape a “macro-tree.”
The difference between the macro-tree shape and the sequential shapes pictured above is neatly captured by Ashley Ahlin on a page about “Reading Theorems”:
Note that, in some ways, the easiest way to read a proof is to check that each step follows from the previous ones. This is a bit like following a game of chess by checking to see that each move was legal, or like running a spell-checker on an essay. It’s important, and necessary, but it’s not really the point. … The problem with this is that you are unlikely to remember anything about how to prove the theorem, if you’ve only read in this manner. Once you’re read a theorem and its proof, you can go back and ask some questions to help synthesize your understanding.
The other high-level structure that a proof needs to make evident—before seeing it is reasonable to expend the effort to check it—is shaped by barriers. We have touched on this topic several times but maybe have not stated it full on for P versus NP. A recent essay for a course led by Li-Yang Tan at Stanford does so in just a few pages. A proof should state up front how it works around barriers, and this alone makes its strategy easier to follow.
The idea of barriers extends outside P versus NP, of course. Peter Scholze seems to be invoking it in a comment two months ago in a post by Peter Woit in April on the status of Shinichi Mochizuki’s claimed proof of the ABC conjecture:
I may have not expressed this clearly enough in my manuscript with Stix, but there is just no way that anything like what Mochizuki does can work. … The reason it cannot work is a[nother] theorem of Mochizuki himself. … If the above claims [which are negated by the theorem] would have been true, I would see how Mochizuki’s strategy might have a nonzero chance of succeeding. …
Thus what Ken and I conclude is that in order for a proof to be checkable chunk by chunk—not line by line—it needs to have:
Lack of a clear plan in the first already says the proof attempt cannot avoid being snagged on a barrier, as surely as natural laws prevent building a perpetual-motion machine.
Does this help in ascertaining what shape a proof that resolves the P versus NP problem must have?
]]>
Framing a controversial conversation piece as a conservation law
Snip from Closer to Truth video on DA |
John Gott III is an emeritus professor of astrophysical sciences at Princeton. He was one of several independent inventors of the controversial Doomsday Argument (DA). He may have been the first to think of it but the last to expound it in a paper or presentation.
Today we expound DA as a defense against thought experiments that require unreasonable lengths of time.
Gott thought of the argument when he saw the Berlin Wall as a 22-year-old touring Berlin in 1969. He reasoned that his visit was a uniformly random event in the lifetime of the wall. That assumption gave him a 75% likelihood that he was not observing the wall in the first quarter of its lifetime. Since the wall was then 8 years old, that became a 75% likelihood that the wall would not last beyond 1993. It came down in late 1989.
The “Doomsday” name comes when one’s birthdate is regarded as a uniformly random sample from the sequence of all human births. If you are my age, is probably closer to ordinal 60 billion than 70 or 100 billion. We can then say we are 95% confident that we are not in the initial 5% of this sequence. That entails the sequence stopping before 1.2 trillion births. If our population levels off at 10 billion with 80 years’ life expectancy, that makes the lifetime of humanity extend no further than the year 12,000 AD. The upshot is that a longer entails asserting that our random sample gave a point unusually early in the span. The purer form of DA also argues that is not unusually late, giving this picture:
Modified from Michael Stock source |
This doubles the span of allowed with 95% confidence while giving reason—at the time occurs—to believe that the end is not imminent: at least about 1.75 billion more births will come after . For my birth, however, this is already a given.
The dependence on which observer is taken as the reference point is one shiftable parameter of the DA. If you are a preteen reader, then your own birth may be closer to ordinal 70 billion in the sequence, which becomes your reference point. You can then tack on another 2,000 years to . The earliest human cave painters may have been among the first 3 billion Homo sapiens. With regard to their reference point, has already gone past their 95% limit.
A more fundamental rebuff to DA comes from the equal reasonableness of an alternate uniformity assumption: that you are a uniformly random element of the set of all possible human beings. Only a subset of will ever be born. The longer is, the higher was your prior probability of belonging to . Thus the fact of your birth can be construed as weighting the odds toward longer in a way that cancels out the short- reasoning of DA.
Even when an instance of DA passes these objections, the inference remains controversial. We wrote about DA last year in connection with estimating the lifespan of open problems remaining open. A clear non-instance is trying to apply DA to estimate the lifespan of the Covid-19 pandemic. We have all been going through the span together and now is not a uniformly random sample.
The DA assumptions would however hold if an alien tourist with no prior knowledge of events dropped in on Earth today. The delicacy of the assumptions makes it significant to seek scenarios where DA firmly applies—and better, where the inference may be deemed necessary to preserve the validity of established modes of inference against extreme skeptical hypotheses. This is what we will try to argue in regard to inferences of cheating at chess.
We have posted numerous times about my statistical chess model, its giving judgments of odds against null hypotheses of fair play in the form of z-scores, and my means of validating them. We will take as granted for this argument that the modeling is true in the sense that the distribution of z-scores from testing honest players conforms to the standard normal distribution.
Now let us talk about chess in the years B.C.—before Covid—when the game was played over-the-board (OTB) in-person across a table. Suppose I obtained a z-score of 4.265 from a test of one player in one tournament. I have chosen this number for all of the following reasons:
Suppose there were no other relevant information about the case. How would one assess the significance of the z-score of 4.265? Here are two different ways of reasoning that—in the case of OTB chess—arrive at similar answers:
A 5.0 standard, however, gives a natural frequency of just over 1-in-3.5 million. The resulting error rate of once in 20-to-30 years might be acceptable in prospect. And the Bayesian argument based on a 0.0001 prior leaves about 350-to-1 odds against the null hypothesis, which is comfortably within the comfortable-satisfaction range as it has been applied.
FIDE nevertheless has maintained a policy that statistical evidence must be accompanied by some other kind of evidence. If a player is caught looking at a chess position in a bathroom, or found to have a buzzing device or wires on his-or-her person, or signaling behavior is observed, then in fact much lower z-scores (to a threshold of 2.50, about 160-to-1 odds, in current FIDE regulations) are deemed to lend strong support to such evidence.
2015 Peter Doggers/Chess.com source |
I posted a similar rationale on my own website in early 2012, where causal evidence is likened to the “black spot” in the novel Treasure Island.
Now, however, suppose we have the 4.265 and one more piece of “evidence” that is pertinent but not as clearly causal. It could be:
Say a search of the player turned up nothing, but this occurred after the sequence of games giving the 4.265, a day after the player had been put on notice of suspicion. So the extra information is not a black spot but instead a “grey spot.” What can we conclude now?
The Bayesian argument seems to depend on judging how this information affects the prior probability of cheating. Does it make cheating a more likely hypothesis? We don’t actually know. Whereas the 1-in-10,000 global prior estimate was based on knowing dozens of cases over the past decade, only a handful conformed to this level of indication—short of more obvious things like making frequent visits to the restroom or being seen with an ear adornment. The most we can say is that the datum is not irrelevant. An example of an irrelevant datum would be if the player were wearing neon green sneakers—not bulky, no wires, just a weird green.
I would like, however, to argue that the player’s membership in a smaller sample that is pertinent enhances the significance of the z-score. must be defined by criteria that are not only independent of my statistical analysis of the games but also pertinent so as to avoid selection bias. What is needed to quantify this enhancement is:
(a) to collect all (other) kinds of items on a par with the above—say ostentatious bracelets that could camouflage electronic indicators—and
(b) to establish that the frequency of players having any such accoutrement over the global mass of tournaments is at most, say, 1-in-100.
Now there are several equivalent ways to continue the reasoning. One is to say that since is “at worst” independent, the face-value odds are amplified by a factor of at least 100. The Bayesian mitigation then still leaves about 1,000-to-1 odds against the null hypothesis. Another is to say that in any given year, the natural chance of seeing the conjunction of and the z-score is at most 1-in-100. Thus aside from the frequency of true positives, a policy of sanctioning in such cases would have a prospective error rate of once in 100 years. The conjoined error rate of that and sanctioning on 5.0 in isolation would be acceptable.
A Bayesian defense attorney might still counter: Consider a thought experiment in which we test 100,000 such “bulky” players. We don’t have any new information on the prior rate of cheating by players in . For all we know, it is still 1-in-10,000. Thus the same terms as before will apply: our experiment will expect to have 10 cheaters in plus the one natural false positive, leaving the odds only 10-to-1 as before. Put another way: without knowing the import of specializing to on the likelihood of cheating, you can’t reach any further conclusion.
The nub of rejecting this counter-argument is that:
Because there are only about 1,000 players in per year, the thought experiment of testing 100,000 players in now takes 100 years.
Moreover, the defense attorney is asserting that the mistaken false positive has occurred unusually early in this span. If this is the first year under consideration, then it is a uniformly random event in the first 1% of . By the same reasoning of DA, the odds of this are only 1-in-100. Compounded by the 10-1 odds against this particular score in the thought experiment being the false positive, we recover something near the 1,000-to-1 odds of the original reasoning.
We might allow that we are not in the first year of “the cheating era” in chess. The thicket of high-profile cases with solid grounds for judgment goes back a little over 10 years. The factor from DA then goes down to 1-in-10. But this still leaves the overall odds about 100-to-1 against the null hypothesis, and that is commonly taken as an anchor point for the standard of comfortable satisfaction after all mitigating factors have been addressed.
Thus I am casting the Doomsday interval argument as a defense against unreasonably long thought experiments. It restores a dimension of time that is ignored by the Bayesian objection. This dimension of time is correctly preserved in the analysis of the expected error rate from a policy of imposing sanctions under this –z combination of circumstances.
Is my line of reasoning valid? You can be the judge. If so, then it is a class of instances where DA is applied merely to conserve an inference of unlikelihood that was originally made by other means. This supports the validity of DA-type inference in general.
We are now in the third month of “the online era” in chess. Even though online platforms can process many more kinds of information than I can avail from OTB play, my work has proved highly relevant for global early indications, second opinions, and transparent explanations. Alas, the sanction rate at the new featured tournaments has been well in excess of 1%. We hope this will come down as the playing pool—which has been greatly democratized in massive online events—wises up to the reality of getting caught.
What I want to discuss here is how this brave new world flips the Bayesian reasoning in a way that may come on too strong for the prosecution, again by its indifference to the element of time.
Take the 4.265 z-score with a prior. The face-value odds from the z-score are now mitigated only to 1-in-1,000. This gives 99.9% confidence in imposing a sanction. However, the rate of errors would be higher than once-per-year because more players total have been involved per tournament. The tournaments are played at faster Rapid and Blitz paces allowing eight or more games per day, whereas classic OTB tournaments feature one game per day, sometimes two, over a span of a week to ten days.
This is also set against a vastly higher global sample size. Whereas the entire historical record of OTB chess represented by the ChessBase Mega database has yet to hit the 10 million games mark, the online platform Lichess has now hit 75 million games played per month. Adding in ICC and chess.com and ChessBase’s and FIDE’s own servers yields an equation that recalls Ps 90:4 and 2 Peter 3:8:
A thousand years of OTB are but a day that passes online.
For online platforms in isolation, absent anything to distinguish one player’s set of games from any other’s (such as their belonging to a highest-profile tournament), this means that even a 5.0 standard is inadequate for sure judgment. At their volume, online sites can see deviations of by natural chance more than once per day. Thus they either tolerate a higher rate of errors or adopt a standard so high as to let many more guilty parties through the sieve.
Such volume means all the more that one should hold a score of 4.265 as insufficient for judgment. This is despite the vastly higher Bayesian likelihood that a sanction based on that score is correct. The greater frequency of actual cheating does mean that the rate of error per positive reading declines, but the rate per absolute time, with regard to the fixed population of honest players, may matter more. This has accompanied deliberations of whether sanctions for online cheating must be given less permanent consequences in order to allow setting thresholds so that a high percentage of actual cheaters are flagged and the error rate can be tolerated.
Does this analysis square with you? Does it help in understanding controversies over the original Doomsday Argument’s paradigm?
For another pass over the argument, suppose I get a z-score of 4.265 in a narrowly-defined event such as one country’s championship league. Does that limit the sample size, so that the score is more dispositive? The kind of reasoning in point (b) above, where we had to gather all possible indicators that would lead us to constrain the sample, would however mandate widening it at least to include other countries’ leagues. This is an aspect of the “look elsewhere” effect where the space of potential tests is widened even before actual tests are considered. Possibly it should be widened to include all tournaments with similar levels of players, in which case we are back to the “square 1” of the 1-in-100,000 section of this post. The point of the analysis of the extra datum about the player is that the sample expansion has an effective pre-defined limit.
[Added note about online cheating detection to third bullet in section 3. Clarified: “… the conjunction of these two factors” –> “… the conjunction of B and the z-score” and changed the succeeding sentence.]
Alfred Whitehead was a logician and philosopher, who had a student of some note. The student was Bertrand Russell and together they wrote the famous three-volume Principia Mathematica. It took several hundred pages to get to the result that .
Amazing.
Today I thought that discussing truth might be an interesting topic.
Whitehead said:
There are no whole truths; all truths are half-truths. It is trying to treat them as whole truths that plays the devil.
I like this quote. Whitehead was not the best lecturer, however. He gave the prestigious Gifford lectures a year after the astronomer Arthur Eddington. As Wikipedia relates quoting Victor Lowe:
Eddington was a marvellous popular lecturer who had enthralled an audience of 600 for his entire course. The same audience turned up to Whitehead’s first lecture but it was completely unintelligible, not merely to the world at large but to the elect. My father remarked to me afterwards that if he had not known Whitehead well he would have suspected that it was an imposter making it up as he went along … The audience at subsequent lectures was only about half a dozen in all.
Between the pandemic and the unrest in our cities there is debate about what is the “truth”. On cable news—CNN, MSNBC, FOX—one hears statements about the truth. You can also hear statements like “the experts know” or the “model” shows that this is true. Can math shed light on these discussions? What would Whitehead say?
Mathematical truth is the one absolute we can count on—right? Math is precise in its own way, but does it yield truth? Not so clear.
Whitehead’s proof that takes 100’s of pages; it may or may not increase your confidence. Here is a short “proof” that :
Math proofs are only as safe as two elements that are unavoidably social:
In the above proof snippet, one step divided by which is the source of the error that . A more worrisome issue is reasoning from assumptions. Wrong assumptions are a problem.
One definition of expert is: An expert is somebody who has a broad and deep competence in terms of knowledge, skill and experience through practice and education in a particular field.
More amusing definitions are:
Mark Twain defined an expert as “an ordinary fellow from another town.” Will Rogers described an expert as “A man fifty miles from home with a briefcase.” Danish scientist and Nobel laureate Niels Bohr defined an expert as “A person that has made every possible mistake within his or her field.”
I find the use of the term expert in regard to the pandemic at best puzzling. How can anyone be an expert when the current situation is unique? The last pandemic happened over a hundred years ago. Unfortunately Twain, Rogers, and Bohr are closer to being correct. The situation we find ourselves in today does not lend itself to being an expert. At least in my non-expert opinion.
Yes there are people, for example, who are experts on various viral agents. But there is more we do not know about this agent that we do know.
Models are created by experts, so you probably guess that I am not bullish on models. There are lots of models, for example, on the projection of how many will be infected, and how many will get seriously sick, and sadly how many will succumb. These models are based on various assumptions about how the virus works. Most of these assumptions are not proved in any sense.
I plan on saying more about truth in the future. Take care.
]]>
Theory and practice
[ MIT ] |
Arvind Mithal—almost always referred to as Arvind—is now the head of the faculty of computer science at a Boston trade school. The school, also known as MIT, is of course one of the top places for all things computer science. From education to service to startups to research, MIT is perhaps the best in the world.
Today I thought we might discuss one of Arvind’s main research themes.
Although this work started in the 1980’s I believe that it underscores an important point. This insight might apply to current research topics like quantum computers.
But first I cannot resist saying something personal about him. Arvind is a long time friend of mine, and someone who gets the stuck in elevator measure of many hours. This measure is one I made up, but I hope you get the idea.
Arvind is the only name I have ever known for Arvind. I was a bit surprised to see that the Wikipedia reference for him states his “full” name. He told me that he found having one name—not two—was a challenge. For example, he sometimes had to explain when arriving at a check-in desk at a conference that he only had one name. His name tag often became:
because the conference software could not handle people with one name.
About forty years ago one of the major open problem in CS was how to make computers go faster. It is still an issue, but in the 1980’s this problem was one of all-hands-on-deck. It was worked on by software engineers, by electrical engineers, by researchers of all kinds including complexity theorists. Conferences like FOCS and STOC—hardcore theory conferences—often contained papers on how to speed up computations.
Two examples come to mind:
The idea of systolic arrays led by Hsiang-Tsung Kung then at CMU. Also his students, especially Charles Leiserson, made important contributions.
The idea of the Ultracomputer led by Jacob Schwartz at NYU. An ultracomputer has N processors, N memories and an N log N message-passing switch connecting them. Note, the use of “N” so you can tell that it came from a theorist.
Arvind tried to make computers faster by inventing a new type of computer architecture. Computers then were based on the classic von Neumann architecture or control flow architecture. He and his colleagues worked for many years trying to replace this architecture by dataflow.
A bottleneck in von Neumann style machines is caused by the program counter fetch cycle. Each cycle the program counter decides which instruction to get, and thus which data to get. These use the same hardware channel, which causes the famous von Neumann bottleneck. We have modified Wikipedia’s graphic to make the bottleneck aspect clearer:
Arvind is usually identified with the invention of dataflow computer architecture. The key idea of this architecture is to avoid the above bottleneck by eliminating the program counter. If there are no instructions to fetch, then it would seem that we can beat the bottleneck. Great idea.
Dataflow architectures do not have a program counter, and so data is king. Roughly data objects move around in such a machine, and they eventually appear at computational units. Roughly these machines operate, at a high level, like a directed graph from complexity theory. Data moves along edges to nodes that compute values. Moreover, nodes compute as soon as all the input data is present. After the node computes the value, the new data is sent to the next node; and so on.
The hope was that this would yield a way to increase the performance of computers. No program counter, no instruction fetching, could make dataflow machines faster.
The dataflow idea is clever. The difficulty is while dataflow can work, classic von Neumann machines have been augmented in various ways. The competition rarely stays put. These updates did not eliminate the von Neumann bottleneck, but they reduce the cost of it, and let von Neumann machines continue to get faster. Caches are one example of how they avoid the bottleneck, thus making the dataflow machines less attractive. Thus, a cache for program instructions makes it likely that the program fetch step does not actually happen. This is an attack on the main advantage of dataflow machines.
The paper by David Culler, Klaus Schauser, and Thorsten von Eicken titled Two Fundamental Limits on Dataflow Multiprocessing? discusses these issues in detail. They start by saying:
The advantages of dataflow architectures were argued persuasively in a seminal 1983 paper by Arvind and Iannucci and in a 1987 revision entitled “Two Fundamental Issues in Multiprocessing”. However, reality has proved less favorable to this approach than their arguments would suggest. This motivates us to examine the line of reasoning that has driven dataflow architectures and fine-grain multithreading to understand where the argument went awry.
What are the lessons? Is there one?
I claim there are lessons. The work of Arvind on dataflow was and is important. It did not lead to the demise of von Neumann machines. I am using one right now to write this.
Dataflow did lead to insights on programming that have many applications. The dataflow idea may yet impact special computational situations: there is interest in using them for data intensive applications.
Ken adds that dataflow has been realized in other ways. Caches and pipes and subsequent architecture innovations profit from designs that enhance locality. The MapReduce programming model gives a general framework that fits this well. The paradigm of streaming is even a better fit. Note that Wikipedia says both that it is “equivalent to dataflow programming” and “was explored within dataflow programming”—so perhaps the parent lost out to the wider adaptability of her children.
I think there are lessons: Practical goals like making hardware go faster, are complex and many faceted. A pure theory approach such as systolic arrays or ultra computers or dataflow machines is unlikely to suffice. Also the existing technology like von Neumann machines will continue to evolve.
A question is: Could quantum computers be subject to the same lesson? Will non-quantum machines continue to evolve in a way to make the quantum advantage less than we think? Or is this type of new architecture different? Could non-quantum machines incorporate tricks from quantum and
Speaking about theory and practice: Noga Alon, Phillip Gibbons, Yossi Matias, and Mario Szegedy are the winners of the 2019 ACM Paris Kanellakis Theory and Practice Award.
We applaud them on receiving this award. Sadly the Paris award reminds us that Paris Kanellakis died in the terrible plane crash of American Airlines Flight 965 on December 20, 1995. Also on that flight were his wife, Maria Otoya, and their two children, Alexandra and Stephanos.
The citation for the award says:
They pioneered a framework for algorithmic treatment of streaming massive datasets, and today their sketching and streaming algorithms remain the core approach for streaming big data and constitute an entire subarea of the field of algorithms.
In short they invented streaming.
]]>
Why is the proof so short yet so difficult?
Saeed Salehi is a logician at the University of Tabriz in Iran. Three years ago he gave a presentation at a Moscow workshop on proofs of the diagonal lemma.
Today I thought I would discuss the famous diagonal lemma.
The lemma is related to Georg Cantor’s famous diagonal argument yet is different. The logical version imposes requirements on when the argument applies, and requires that it be expressible within a formal system.
The lemma underpins Kurt Gödel’s famous 1931 proof that arithmetic is incomplete. However, Gödel did not state it as a lemma or proposition or theorem or anything else. Instead, he focused his attention on what we now call Gödel numbering. We consider this today as “obvious” but his paper’s title ended with “Part I”. And he had readied a “Part II” with over 100 pages of calculations should people question that his numbering scheme was expressible within the logic.
Only after his proof was understood did people realize that one part, perhaps the trickiest part, could be abstracted into a powerful lemma. The tricky part is not the Gödel numbering. People granted that it can be brought within the logic once they saw enough of Gödel’s evidence, and so we may write for the function giving the Gödel number of any formula and use that in other formulas. The hard part is what one does with such expressions.
This is what we will try to motivate.
Rudolf Carnap is often credited with the first formal statement, in 1934, for instance by Eliott Mendelson in his famous textbook on logic. Carnap was a member of the Vienna Circle, which Gödel frequented, and Carnap is considered a giant among twentieth-century philosophers. He worked on sweeping grand problems of philosophy, including logical positivism and analysis of human language via syntax before semantics. Yet it strikes us with irony that his work on the lemma may be the best remembered.
Who did the lemma first? Let’s leave that for others and move on to the mystery of how to prove the lemma once it is stated. I must say the lemma is easy to state, easy to remember, and has a short proof. But I believe that the proof is not easy to remember or even follow.
Salehi’s presentation quotes others’ opinions about the proof:
Sam Buss: “Its proof [is] quite simple but rather tricky and difficult to conceptualize.”
György Serény (we jump to Serény’s paper): “The proof of the lemma as it is presented in textbooks on logic is not self-evident to say the least.”
Wayne Wasserman: “It is `Pulling a Rabbit Out of the Hat’—Typical Diagonal Lemma Proofs Beg the Question.”
So I am not alone, and I thought it might be useful to try and unravel its proof. This exercise helped me and maybe it will help you.
Here goes.
Let be a formula in Peano Arithmetic (). We claim that there is some sentence so that
Formally,
Lemma 1 Suppose that is some formula in . Then there is a sentence so that
The beauty of this lemma is that it was used by Gödel and others to prove various powerful theorems. For example, the lemma quickly proves this result of Alfred Tarski:
Theorem 2 Suppose that is consistent. Then truth cannot be defined in . That is there is no formula so that for all sentences proves
The proof is this. Assume there is such a formula . Then use the diagonal lemma and get
This shows that
This is a contradiction. A short proof.
The key is to define the function as follows: Suppose that is the Gödel number of a formula of the form for some variable then
If is not of this form then define . This is a strange function, a clever function, but a perfectly fine function, It certainly maps numbers to numbers. It is certainly recursive, actually it is clearly computable in polynomial time for any reasonable Gödel numbering. Note: the function does depend on the choice of the variable . Thus,
and
Now we make two definitions:
Now we compute just using the definitions of :
We are done.
Where did this proof come from? Suppose that you forgot the proof but remember the statement of the lemma. I claim that we can then reconstruct the proof.
First let’s ask: Where did the definition of the function come from? Let’s see. Imagine we defined
But left undefined for now. Then
But we want that happens provided:
This essentially gives the definition of the function . Pretty neat.
Okay where did the definition of and come from? It is reasonable to define
for some . We cannot change but we can control the input to the formula , so let’s put a function there. Hence the definition for is not unreasonable.
Okay how about the definition of ? Well we could argue that this is the magic step. If we are given this definition then follows, by the above. I would argue that is not completely surprising. The name of the lemma is after all the “diagonal” lemma. So defining as the application of to itself is plausible.
Another way to think about the diagonal lemma is imagine you are taking an exam in logic. The first question is:
Prove in that for any there is a sentence so that
You read the question again and think: “I wish I had studied harder, I should have not have checked Facebook last night. And then went out and ” But you think let’s not panic, let’s think.
Here is what you do. You say let me define
for some . You recall there was a function that depends on , and changing the input from to seems to be safe. Okay you say, now what? I need the definition of . Hmmm let me wait on that. I recall vaguely that had a strange definition. I cannot recall it, so let me leave it for now.
But you think: I need a sentence . A sentence cannot have an unbound variable. So cannot be . It could be for some . But what could be? How about . This makes
It is after all the diagonal lemma. Hmmm does this work. Let’s see if this works. Wait as above I get that is now forced to satisfy
Great this works. I think this is the proof. Wonderful. Got the first question.
Let’s look at the next exam question. Oh no
Does this help? Does this unravel the mystery of the proof? Or is it still magic?
[Fixed equation formatting]