Just Arvind
Theory and practice
[ MIT ] |
Arvind Mithal—almost always referred to as Arvind—is now the head of the faculty of computer science at a Boston trade school. The school, also known as MIT, is of course one of the top places for all things computer science. From education to service to startups to research, MIT is perhaps the best in the world.
Today I thought we might discuss one of Arvind’s main research themes.
Although this work started in the 1980’s I believe that it underscores an important point. This insight might apply to current research topics like quantum computers.
But first I cannot resist saying something personal about him. Arvind is a long time friend of mine, and someone who gets the stuck in elevator measure of many hours. This measure is one I made up, but I hope you get the idea.
Arvind is the only name I have ever known for Arvind. I was a bit surprised to see that the Wikipedia reference for him states his “full” name. He told me that he found having one name—not two—was a challenge. For example, he sometimes had to explain when arriving at a check-in desk at a conference that he only had one name. His name tag often became:
because the conference software could not handle people with one name.
Dataflow—The prehistory
About forty years ago one of the major open problem in CS was how to make computers go faster. It is still an issue, but in the 1980’s this problem was one of all-hands-on-deck. It was worked on by software engineers, by electrical engineers, by researchers of all kinds including complexity theorists. Conferences like FOCS and STOC—hardcore theory conferences—often contained papers on how to speed up computations.
Two examples come to mind:
The idea of systolic arrays led by Hsiang-Tsung Kung then at CMU. Also his students, especially Charles Leiserson, made important contributions.
The idea of the Ultracomputer led by Jacob Schwartz at NYU. An ultracomputer has N processors, N memories and an N log N message-passing switch connecting them. Note, the use of “N” so you can tell that it came from a theorist.
Arvind tried to make computers faster by inventing a new type of computer architecture. Computers then were based on the classic von Neumann architecture or control flow architecture. He and his colleagues worked for many years trying to replace this architecture by dataflow.
A bottleneck in von Neumann style machines is caused by the program counter fetch cycle. Each cycle the program counter decides which instruction to get, and thus which data to get. These use the same hardware channel, which causes the famous von Neumann bottleneck. We have modified Wikipedia’s graphic to make the bottleneck aspect clearer:
Dataflow—The promise
Arvind is usually identified with the invention of dataflow computer architecture. The key idea of this architecture is to avoid the above bottleneck by eliminating the program counter. If there are no instructions to fetch, then it would seem that we can beat the bottleneck. Great idea.
Dataflow architectures do not have a program counter, and so data is king. Roughly data objects move around in such a machine, and they eventually appear at computational units. Roughly these machines operate, at a high level, like a directed graph from complexity theory. Data moves along edges to nodes that compute values. Moreover, nodes compute as soon as all the input data is present. After the node computes the value, the new data is sent to the next node; and so on.
The hope was that this would yield a way to increase the performance of computers. No program counter, no instruction fetching, could make dataflow machines faster.
Dataflow—The actual
The dataflow idea is clever. The difficulty is while dataflow can work, classic von Neumann machines have been augmented in various ways. The competition rarely stays put. These updates did not eliminate the von Neumann bottleneck, but they reduce the cost of it, and let von Neumann machines continue to get faster. Caches are one example of how they avoid the bottleneck, thus making the dataflow machines less attractive. Thus, a cache for program instructions makes it likely that the program fetch step does not actually happen. This is an attack on the main advantage of dataflow machines.
The paper by David Culler, Klaus Schauser, and Thorsten von Eicken titled Two Fundamental Limits on Dataflow Multiprocessing? discusses these issues in detail. They start by saying:
The advantages of dataflow architectures were argued persuasively in a seminal 1983 paper by Arvind and Iannucci and in a 1987 revision entitled “Two Fundamental Issues in Multiprocessing”. However, reality has proved less favorable to this approach than their arguments would suggest. This motivates us to examine the line of reasoning that has driven dataflow architectures and fine-grain multithreading to understand where the argument went awry.
Dataflow—Lessons
What are the lessons? Is there one?
I claim there are lessons. The work of Arvind on dataflow was and is important. It did not lead to the demise of von Neumann machines. I am using one right now to write this.
Dataflow did lead to insights on programming that have many applications. The dataflow idea may yet impact special computational situations: there is interest in using them for data intensive applications.
Ken adds that dataflow has been realized in other ways. Caches and pipes and subsequent architecture innovations profit from designs that enhance locality. The MapReduce programming model gives a general framework that fits this well. The paradigm of streaming is even a better fit. Note that Wikipedia says both that it is “equivalent to dataflow programming” and “was explored within dataflow programming”—so perhaps the parent lost out to the wider adaptability of her children.
I think there are lessons: Practical goals like making hardware go faster, are complex and many faceted. A pure theory approach such as systolic arrays or ultra computers or dataflow machines is unlikely to suffice. Also the existing technology like von Neumann machines will continue to evolve.
A question is: Could quantum computers be subject to the same lesson? Will non-quantum machines continue to evolve in a way to make the quantum advantage less than we think? Or is this type of new architecture different? Could non-quantum machines incorporate tricks from quantum and
Open Problems
Speaking about theory and practice: Noga Alon, Phillip Gibbons, Yossi Matias, and Mario Szegedy are the winners of the 2019 ACM Paris Kanellakis Theory and Practice Award.
We applaud them on receiving this award. Sadly the Paris award reminds us that Paris Kanellakis died in the terrible plane crash of American Airlines Flight 965 on December 20, 1995. Also on that flight were his wife, Maria Otoya, and their two children, Alexandra and Stephanos.
The citation for the award says:
They pioneered a framework for algorithmic treatment of streaming massive datasets, and today their sketching and streaming algorithms remain the core approach for streaming big data and constitute an entire subarea of the field of algorithms.
In short they invented streaming.
Proof of the Diagonal Lemma in Logic
Why is the proof so short yet so difficult?
Saeed Salehi is a logician at the University of Tabriz in Iran. Three years ago he gave a presentation at a Moscow workshop on proofs of the diagonal lemma.
Today I thought I would discuss the famous diagonal lemma.
The lemma is related to Georg Cantor’s famous diagonal argument yet is different. The logical version imposes requirements on when the argument applies, and requires that it be expressible within a formal system.
The lemma underpins Kurt Gödel’s famous 1931 proof that arithmetic is incomplete. However, Gödel did not state it as a lemma or proposition or theorem or anything else. Instead, he focused his attention on what we now call Gödel numbering. We consider this today as “obvious” but his paper’s title ended with “Part I”. And he had readied a “Part II” with over 100 pages of calculations should people question that his numbering scheme was expressible within the logic.
Only after his proof was understood did people realize that one part, perhaps the trickiest part, could be abstracted into a powerful lemma. The tricky part is not the Gödel numbering. People granted that it can be brought within the logic once they saw enough of Gödel’s evidence, and so we may write for the function giving the Gödel number of any formula and use that in other formulas. The hard part is what one does with such expressions.
This is what we will try to motivate.
Tracing the Lemma
Rudolf Carnap is often credited with the first formal statement, in 1934, for instance by Eliott Mendelson in his famous textbook on logic. Carnap was a member of the Vienna Circle, which Gödel frequented, and Carnap is considered a giant among twentieth-century philosophers. He worked on sweeping grand problems of philosophy, including logical positivism and analysis of human language via syntax before semantics. Yet it strikes us with irony that his work on the lemma may be the best remembered.
Who did the lemma first? Let’s leave that for others and move on to the mystery of how to prove the lemma once it is stated. I must say the lemma is easy to state, easy to remember, and has a short proof. But I believe that the proof is not easy to remember or even follow.
Salehi’s presentation quotes others’ opinions about the proof:
Sam Buss: “Its proof [is] quite simple but rather tricky and difficult to conceptualize.”
György Serény (we jump to Serény’s paper): “The proof of the lemma as it is presented in textbooks on logic is not self-evident to say the least.”
Wayne Wasserman: “It is `Pulling a Rabbit Out of the Hat’—Typical Diagonal Lemma Proofs Beg the Question.”
So I am not alone, and I thought it might be useful to try and unravel its proof. This exercise helped me and maybe it will help you.
Here goes.
Stating the Lemma
Let be a formula in Peano Arithmetic (). We claim that there is some sentence so that
Formally,
Lemma 1 Suppose that is some formula in . Then there is a sentence so that
The beauty of this lemma is that it was used by Gödel and others to prove various powerful theorems. For example, the lemma quickly proves this result of Alfred Tarski:
Theorem 2 Suppose that is consistent. Then truth cannot be defined in . That is there is no formula so that for all sentences proves
The proof is this. Assume there is such a formula . Then use the diagonal lemma and get
This shows that
This is a contradiction. A short proof.
The Proof
The key is to define the function as follows: Suppose that is the Gödel number of a formula of the form for some variable then
If is not of this form then define . This is a strange function, a clever function, but a perfectly fine function, It certainly maps numbers to numbers. It is certainly recursive, actually it is clearly computable in polynomial time for any reasonable Gödel numbering. Note: the function does depend on the choice of the variable . Thus,
and
Now we make two definitions:
Now we compute just using the definitions of :
We are done.
But …
Where did this proof come from? Suppose that you forgot the proof but remember the statement of the lemma. I claim that we can then reconstruct the proof.
First let’s ask: Where did the definition of the function come from? Let’s see. Imagine we defined
But left undefined for now. Then
But we want that happens provided:
This essentially gives the definition of the function . Pretty neat.
But but …
Okay where did the definition of and come from? It is reasonable to define
for some . We cannot change but we can control the input to the formula , so let’s put a function there. Hence the definition for is not unreasonable.
Okay how about the definition of ? Well we could argue that this is the magic step. If we are given this definition then follows, by the above. I would argue that is not completely surprising. The name of the lemma is after all the “diagonal” lemma. So defining as the application of to itself is plausible.
Taking an Exam
Another way to think about the diagonal lemma is imagine you are taking an exam in logic. The first question is:
Prove in that for any there is a sentence so that
You read the question again and think: “I wish I had studied harder, I should have not have checked Facebook last night. And then went out and ” But you think let’s not panic, let’s think.
Here is what you do. You say let me define
for some . You recall there was a function that depends on , and changing the input from to seems to be safe. Okay you say, now what? I need the definition of . Hmmm let me wait on that. I recall vaguely that had a strange definition. I cannot recall it, so let me leave it for now.
But you think: I need a sentence . A sentence cannot have an unbound variable. So cannot be . It could be for some . But what could be? How about . This makes
It is after all the diagonal lemma. Hmmm does this work. Let’s see if this works. Wait as above I get that is now forced to satisfy
Great this works. I think this is the proof. Wonderful. Got the first question.
Let’s look at the next exam question. Oh no
Open Problems
Does this help? Does this unravel the mystery of the proof? Or is it still magic?
[Fixed equation formatting]
Math Tells
How to tell what part of math you are from
Gerolamo Cardano is often credited with introducing the notion of complex numbers. In 1545, he wrote a book titled Ars Magna. He introduced us to numbers like in his quest to understand solutions to equations. Cardano was often short of money and gambled and played a certain board game to make money—see the second paragraph here.
Today, for amusement, Ken and I thought we’d talk about tells.
What are tells? Wikipedia says:
A tell in poker is a change in a player’s behavior or demeanor that is claimed by some to give clues to that player’s assessment of their hand.
Other Tells
Ken and I have been thinking of tells in a wider sense—when and whether one can declare inferences amid uncertain information. Historians face this all the time. So do biographers, at least when their subjects are no longer living. We would also like to make inferences in our current world, such as about the pandemic. The stakes can be higher than in poker. In poker, if your “tell” inference is wrong and you lose, you can play another hand—unless you went all in. With science and other academic areas the attitude must be that you’re all-in all the time.
Cardano furnishes several instances. Wikipedia—which we regard as an equilibrium of opinions—says that Cardano
acknowledged the existence of imaginary numbers … [but] did not understand their properties, [which were] described for the first time by his Italian contemporary Rafael Bombelli.
This is a negative inference from how one of Cardano’s books stops short of treating imaginary numbers as objects that follow rules.
There are also questions about whether Cardano can be considered “the father of probability” ahead of Blaise Pascal and Pierre de Fermat. Part of the problem is that Cardano’s own writings late in life recounted his first erroneous reasonings as well as final understanding in a Hamlet-like fashion. Wikipedia doubts whether he really knew the rule of multiplying probabilities of independent events, whereas the essay by Prakash Gorroochurn cited there convinces us that he did. Similar doubt extends to how much Cardano knew about the natural sciences, as correct inferences (such as mountains with seashell fossils having once been underwater) are mixed in with what we today regard as howlers.
Every staging of Shakespeare’s Hamlet shows a book by Cardano—or does it? In Act II, scene 2, Polonius asks, “What do you read, my lord?”; to which Hamlet first replies “Words words words.” Pressed on the book’s topic, Hamlet perhaps references the section “Misery of Old Age” in Cardano’s 1543 book De Consolatione but what he says is so elliptical it is hard to tell. The book also includes particular allusions between sleep and death that go into Hamlet’s soliloquy opening Act III. The book had been published in England in 1573 as Cardan’s Comfort under the aegis of the Earl of Oxford so it was well-known. Yet the writer Italo Calvino held back from the inference:
To conclude from this that the book read by Hamlet is definitely Cardano, as is held by some scholars of Shakespeare’s sources, is perhaps unjustified.
To be sure, there are some who believe Shakespeare’s main source was Oxford, in manuscripts if not flesh and blood. One reason we do not go there is that we do not see the wider community as having been able to establish reliable principles for judging what kinds of inferences are probably valid. We wonder if one could do an experiment of taking resolved cases, removing most of the information to take them down to the level of unresolved cases, and seeing what kinds of inferences from partial information would have worked. That’s not our expertise, but within our expertise in math and CS, we wonder if a little experiment will be helpful.
To set the idea, note that imaginary numbers are also called complex numbers. Yet the term complex numbers can mean other things. Besides numbers like it also can mean how hard it is to construct a number.
In number theory, the integer complexity of an integer is the smallest number of ones that can be used to represent it using ones and any number of additions, multiplications, and parentheses. It is always within a constant factor of the logarithm of the given integer.
How easy is it to tell what kind of “complex” is meant if you only have partial information? We don’t only mean scope-of-terminology issues; often a well-defined math object is used in multiple areas. Let’s try an experiment.
Math Tells
Suppose you walk in log-in to a talk without any idea of the topic. If the speaker uses one of these terms can you tell what her talk might be about? Several have multiple meanings. What are some of them? A passing score is
- She says let be a c.e. set.
- She says let be in .
- She says by the König principle.
- She says is a prime.
- She says is a prime.
- She says is solvable.
- She says let its degree be .
- She says there is a run.
- She says it is reducible.
- She says it is satisfiable.
Open Problems
What are your answers? Do you have some tells of your own?
The 2020 Y Prize
Can one maintain privacy while publicly congratulating?
Cropped from X Facemask src |
X, who shall go nameless here, recently won a prize for research.
Today we congratulate X and wish to talk a bit about their work while safeguarding X’s identity.
X also won another prize Z—but we will not mention which one. In both cases the honor is about the creation of a new field rather than the solution of a long-standing open problem. Her—OK, X is a she—brilliant work has allowed many others to write papers in these areas. And it has led to countless conferences, meetings, and talks.
Oops, we used the word “brilliant.” That has already leaked her identity, given where you are reading this. Just type it into our search box.
We assume you get the point. One can leak information about people without mentioning them directly.
The Real Start of the Post
Her work in Algorithmic Fairness too |
Cynthia Dwork is the winner of the 2020 Donald E. Knuth Prize. The prize is awarded jointly by ACM SIGACT and the IEEE Computer Society Technical Committee. The ceremony will include a lecture by Cynthia at FOCS 2020, which is still projected to meet in-person at Duke next November 16–19.
Today we congratulate Cynthia and talk a little about her work.
I, Dick, have known her for many years. I am excited to see that she has won the Knuth prize.
She has always been a top researcher and a terrific speaker. I fondly recall some of her earlier talks at Princeton and elsewhere. There is something about her ability to explain complex ideas in a lucid manner. I have often thought that this ability must be one of the reasons she has been so successful in her research.
We will talk a little about differential privacy (DP) and then her other joint work which led to another prize. We should be quick to mention that others have been instrumental in both developments.
Quantifying an Idea
DP might have been appreciated long before there were large databases and computer systems. The genius is not just the simplicity of the idea but the development of mathematical tools to govern its application.
For a simple example, suppose a professor has a general policy of posting exam average scores in her class of 100. Suppose she posts that the average is 73.3 after 98 students have taken the exam but two need to make it up a week later. Then suppose she re-posts the average after they take it and it is 72.5. The difference may not seem like much. But the class learns two things:
- Both students had their makeup scores included in the average. (If the 73.3 is exact then it is not even possible to get a rounded 72.5 with just one score of zero, pending the other.)
- The two students collectively bombed the exam. (Just possibly one got a zero while the other did OK, but even then the other’s score was below average.)
Your first thought might be that the fault is in the excessive precision in the grade reporting. If she had rounded to the nearest integer then both averages would have been reported as “73” and there would be no problem. But rounding is accidental. The new average could have been 72.49, which would be rounded down to “72,” when the set of possible interpretations is worse than before.
In wide-scale situations where queries for multiple averages at different times are allowed, effects of rounding may cancel so that a large statistical analysis can subtract them out. Thus passively reducing precision is not enough. The key is to affect the precision actively by adding a random to the reported results, where is judiciously chosen. The point is:
Adding , better than rounding, makes inference much harder.
In the exam-score example, randomly adding or subtracting a number between 0 and 2 changes the calculus so that seeing “73.3” and then “72.5” does not allow a high-confidence inference about the makeup students’ scores. One could reason that the first number could easily have been 72.3 and the second 73.5, in line with interpretations where the makeup students upped the average. Actually, a jump from 72.3 to 73.5 in the true averages is impossible from just two new scores, so the class could deduce that the reported averages are fudged. But that is OK. Two basic purposes of reporting the averages are met despite the fudging:
- to show the exam was consistent with past years—especially good to know this year;
- to give you some idea (knowing your own score exactly) where you stand relative to the class.
In related situations, one could argue that a mean of for is not enough. If the exam has 100 one-point questions each with a 73.3% expectation of getting it right for everyone, then the standard deviation about the mean is almost . Using a mean change of seems also fine in this case, whereas rounding to the nearest 05 (i.e., reporting “70” or “75” as the average) might be too coarse.
We do not intend our example to be definitive, only that it conveys relevant issues. The example is enough to convey that DP leads into deep matters in statistics and what can be inferred by statistical methods. The latter melds the complexity-theoretic notion of knowledge gain from interactive protocols with statistical (machine) learning. And as with quantum computing, theorems and methods from the subfield have come back to inform fundamental questions about complexity (of course, this is true of crypto in general).
Privacy and Online Chess
I, Ken, am the lead on this post, and have had data-privacy affect me personally. The pandemic has forced the cancellation of all over-the-board (OTB) tournaments, some as far ahead as August. Chess has all moved online, as heralded by two recent articles in the New York Times and one in the Wall Street Journal proclaiming, “Sports: Chess Is the New King of the Pandemic.”
As these articles also note, cheating has long been a major concern for online chess. I have been a recourse for second-opinions in online cheating cases but have not tried to be primary in this domain. A great conversation in November 2006 with Daniel Sleator (who—besides being famous for splay trees and amortized analysis—co-founded the Internet Chess Club) told me how online cheating detection avails much information about the manner of play that is not applicable or reliably available in OTB chess. Also for academic reasons, I chose a minimal approach of using no information besides the moves of the games and the Elo ratings of the players. An unexpected advantage of that choice is exemplified by an e-mail I received—fifteen minutes before a conference call on Wednesday with representatives of the International Chess Federation (FIDE) and other online platforms—from a different chess organizer requesting a statement that:
…you will only use the data of the games, and not any personal data that might be obtained from the [tournaments] for your research, such as player name etc. … in writing as a standard precaution … to make sure that there isn’t third party usage of the data or usage outside of its intent.
Some recent legal decisions have confirmed that the moves of the games are public data and cannot be copyrighted. The e-mail did not mention the players’ ratings. As used by FIDE and the US and most online platforms (and much more widely), those are 4-digit numbers, and they are public. Knowing all four digits might be enough to pinpoint the identity of a player.
Thus I suggested rounding the rating to the nearest 10 or even 20. This makes little difference to my statistical cheating tests, which already give 25 Elo points benefit-of-doubt for uncertainty about the rating. Tournament officials I’ve served have done this in a couple of individual cases but now the need is en masse.
The question based on the previous section above is whether I should instead (or also) suggest that they implement full DP by adding with a mean of or or so, with-or-without rounding. A caveat is that it is easy for someone with a tournament spreadsheet to round a column of figures before exporting text to me, but applying a random function is less familiar and prone to major mess-ups.
The Other Prize
The Knuth Prize is not Cynthia’s only win for 2020. In December she was awarded the 2020 IEEE Richard W. Hamming Medal. A video gives a one-minute summary of differential privacy and her work on non-malleable cryptosystems via lattice bases. The latter stems from her 2000 paper with Danny Dolev and Moni Naor (DDN).
We have not covered this crypto topic before. It involves chosen-ciphertext attacks where Eve, the evesdropper, first gets hold of a transmitted ciphertext . The cryptosystem is malleable if Eve can know enough about the structure of to make a legal ciphertext , one that may decode to a plaintext that advances her interests compared to the intended . Wikipedia’s article gives an example where embeds a numerical field that Eve can recognize and alter en-route. The paper gives a similar example where Eve cannot disrupt the transmission of but can separately send her own with a value that gives her advantage. This may be so even without being able to decode or gain any information about the plain value of , except that is different in a meaningful way. For instance, could be a lower bid on a contract or higher bid in a secure auction.
One would think that should just be encoded via an error-correcting code whose range is secure so that an attacker would be overwhelmingly unlikely to find another value in the legal range of the code. However, this clashes with the goal of homomorphic encryption which allows numerical operations on encoded values. Or the technology used may be too simple, as in the paper’s mention of submitting bids to a fax number.
Note: both fax machines and the paper date to the previous millennium. But the paper next gives an example that is staying with us in this millennium, where is a proof of . Suppose Alice arranges to transmit to a referee Ralph using a zero-knowledge protocol, but Eve is able to intercept their messages. Eve can impersonate Ralph to Alice and impersonate Alice to Ralph as Ralph asks questions in the protocol. It may seem impossible to forestall such a “man-in-the-middle” attack without extrinsic considerations of hardware and timing, but DDN design an ingenious scheme for non-malleable zero-knowledge proofs.
A Video
En-passant of this, we want to note a great video that was made for next month’s Women in Theory workshop. It has an all-female cast and lyrics by Avi Wigderson parodying the song “I Will Survive.” Cynthia is not in the video, but she likewise has been a great spokesperson for the experiences of women in our professional life.
Open Problems
Our congratulations again to Cynthia—by name—on the Knuth and Hamming prizes.
Consistency and P=NP
Can we at least show that is consistent?
[ From personal page ] |
Jan Krajicek is an expert on linking computational complexity and mathematical logic. He has authored four books on this area including the often cited text Bounded Arithmetic and Complexity Theory. Moreover, he has studied the consistency of various of theories and whether they can prove and related questions.
Today I thought we would discuss consistency proofs in general.
Mathematics of COVID-19
Its not just
[ Sir Francis Galton by Charles Wellington Furse ] |
Francis Galton is a perfect example of a Victorian era scientist. Sir Galton, he was knighted in 1909, had many roles: a statistician, a sociologist, a psychologist, an anthropologist, a tropical explorer, a geographer, a meteorologist, a geneticist, and an inventor. He coined the phrase “nature versus nurture” and more.
Today we trade in jokes for some mathematics of the virus.