Dan Mote, the President of the National Academy of Engineering (NAE), just announced this year’s class of members elected to the NAE.
I am thrilled to see that several computer scientists were among the class.
Actually in putting together the list below of computer members I had some difficulty. If I made an error I apologize in advance. One of the interesting results of looking at the whole list of 80 members is that computing played a major role in many member’s citations. Yet for many of these I would not classify them as computer scientists. This is just another example of the importance of computing in all of engineering. So if I left out someone from what I considered another area, well I am sorry.
Congratulations to all on your election.
Here are the new members, along with their citations. The NAE likes pithy citations that start with the word “For.”
I cannot resist to say that I have a personal connection to both Boneh and Leiserson. I was honored to be the graduate advisor of Dan and the undergraduate advisor to Charles: the former at Princeton and the latter at Yale. Both were a delight to work with, and I am very excited that they are now in the academy.
I also noticed that areas that were selected for recognition were concentrated into three. Security was covered several times; large scale systems of several kinds was also; and theory was key for two new members.
Ken points out that all ten precede me in the alphabet—indeed they run snugly up to “Lip—.” As a member I voted but had no effect there.
The main issue that confronts computer science is to get more members into the NAE. Many deserving researchers are not yet members and I hope that will be solved in the future. But for today let’s congratulate all those who did get into the NAE this year.
Ken: We are leaving yesterday’s post “unrolled” on the front page since this comes so soon after. To keep it short we elided material on deep learning and big data including this interview that would have brought “engineering” into the “science” versus “magic” issue.
]]>
Can we avoid accepting what we cannot verify?
Cropped from biography source |
Arthur Clarke was a British writer of great breadth and huge impact. He was a science writer, of both fiction and non-fiction. His works are too many to list, but 2001: A Space Odyssey—the novel accompanying the movie—is perhaps his most famous. He received both a British knighthood and the rarer Pride of Sri Lanka award, so that both “Sri” and “Sir” were legally prefixed to his name.
Today Dick and I want to raise questions about modern cryptography, complexity, and distinguishing science from “magic.”
Clarke added the following widely quoted statement to the 1973 revised edition of his book Profiles of the Future: An Inquiry into the Limits of the Possible as the third of three “laws” of prediction:
Any sufficiently advanced technology is indistinguishable from magic.
When I, Ken, first read the quote, I was surprised: it didn’t seem like something a proponent of science would say. Tracing versions of his book’s first chapter in which the first two “laws” appear, it seems Clarke first had nuclear fission in mind, then transistors, semiconductors, clocks incorporating relativity, and a list that strikingly includes “detecting invisible planets.” I agree with critiques expressed here and here: these phenomena are explainable. Perhaps Richard Feynman was right that “nobody really understands quantum mechanics,” but besides rebuttals like this, the point is that a natural explanation can satisfy by reduction to simpler principles without their needing to be fully understood. In all these cases too, experiment gives the power to verify.
My newfound appreciation for Clarke’s quip, and Dick’s all along, comes from realizing the potential limits of verification and explanation in our own field, computational complexity. This applies especially to crypto. Let’s see what the issues are.
As often in computing, issues have “ground” and “meta” levels. Let’s start on the ground.
The big center of gravity in complexity theory has always been the idea of . The idea is that although it may take a lot of effort to find a solution to a problem in but if and when you find one, it is easy to check. Separate from whether , there was much success on programs whose outputs are checkable with markedly greater efficiency than the original computation. This gives not only instant assurance but usually also an explanation of why the solution works.
An even lower example in complexity terms is verifying a product of matrices. Although we may never be able to multiply in time, once we have we can verify in time by randomly choosing -vectors and checking that equals times . Here the randomized check may give less explanation, but the ability to repeat it quickly confers assurance. We should mention that this idea originated with Rūsiņš Freivalds, who passed away last month—see this tribute by his student Andris Ambainis.
As the field has matured, however, we’ve run into more cases where no one has found a verifying predicate. For example, every finite field has elements whose powers generate all the non-zero elements of . However, no way is known to recognize them in time polynomial in let alone find one. The best recent news we know is a 2013 paper by Ming-Deh Huang and Anand Narayanan that finds one in time polynomial in which is fine when is small. In other things too, we seem to be moving away from to non-checkable complexity classes.
To see a still broader form of not knowing, we note this remark by Lance Fortnow on the Computational Complexity blog about a ramification of the deep-learning technique used to craft the new powerful Go-playing program by Google:
“If a computer chess program makes a surprising move, good or bad, one can work through the code and figure out why the program made that particular move. If AlphaGo makes a surprising move, we’ll have no clue why.”
To say a little more here: Today’s best chess programs train the coefficients of their position-evaluation functions by playing games against themselves. For example, the Weights array for last year’s official Stockfish 6 release had five pairs:
Line 115 of the current Stockfish source file evaluate.cpp has four pairs:
(Update 2/11/16: the array was removed entirely—see this change note.)
The myriad Score numbers following it are likewise all different. They are mysterious but they are concretely present, and one can find the position-features they index in the code to compute why the program preferred certain target positions in its search. A case of why Deep Blue played a critical good move against Garry Kasparov in 1997 caused controversy when IBM refused Kasparov’s request for verifying data, and may still not be fully resolved. But I’ve posted here a similar example of tracing the Komodo program’s refusal to win a pawn, when analyzing the first-ever recorded chess game played in the year 1475.
Deep learning, however, produces a convolutional neural network that may not so easily reveal its weights and thresholds, nor how it has learned to “chunk” the gridded input. At least with the classic case of distinguishing pictures of cats from dogs, one can more easily point after-the-fact to which visual features were isolated and made important by the classifier. For Go, however, it seems there has not emerged a way to explain the new program’s decisions beyond the top-level mixing of criteria shown by Figure 5 of the AlphaGo paper. Absent a human-intelligible explanation, the winning moves appear as if by magic.
James Massey, known among many things for the shift-register deducing algorithm with Elwyn Berlekamp, once recorded a much-viewed lecture titled,
“Cryptography: Science or Magic?”
What Massey meant by “magic” is different from Clarke. In virtually all cases, modern cryptographic primitives rely on the intractability of problems that belong to the realm of or at worst . The widespread implementation of public-key cryptography and digital signatures via RSA relies on factoring which belongs to . Even if we knew , we still wouldn’t have a proof of the security needed to establish that the world really confers the capabilities promised by these primitives.
As above the issue is lack of ability to verify a property, but this time of an algorithmic protocol rather than a single solution or chess or Go move. This “meta” level goes to the heart of our field’s aspiration to be a science. Of course we can prove theorems conditioned on hardness assumptions, but apart from the issue we have covered quite a few times before of loopholes between security proofs and application needs, we perceive a more fundamental lack:
The theorems don’t (yet) explain from first principles what it is about the world that yields these properties.
Modern cryptography is filled with papers that really say the following: If certain simple to state complexity hardness assumptions are correct, then we can do the following “magical” operations. This is viewed by those who work in cryptography, and by those of us outside the area, as an example of a wonderful success story. That factoring is hard implies the ability to encrypt messages is not so hard to believe, but that we can make the encryption asymmetric—a public key system—was at first a real surprise. But that this assumption further allows us to do some very powerful and wonderful operations is even more surprising.
Massey gives as example the technique developed by Amos Fiat and Adi Shamir to convert any interactive proof of knowledge into one that carries the signature of the person conveying proof of knowledge without needing that person’s active participation. It preserves the feature that only the person’s fact of knowing is transmitted, not the known data itself. This is truly wondrous: you can interact offline.
We can trace the composition of hardness assumptions, plus the assumption of implementing a suitable random oracle, in a conditional theorem. Since we cannot prove any of the conditions, Massey asks, is this all an illusion? How can we tell it’s not? For us this raises something further:
If a capability seems too wondrous to believe, can that be grounds for doubting one or more of the assumptions that underlie it?
In short, the magical success of crypto could be misleading. Could the assumptions it rests on go beyond magic into fantasy? In short could factoring be easy, and therefore modern crypto be wrong? Could the whole edifice collapse? Who knows?
The three issues we have highlighted can be summarized as:
Can we distinguish the parts of theory that underlie these problems from “magic”? Must we bewail this state of our field, or is it an inevitable harbinger that complexity, learning, and crypto are becoming “sufficiently advanced”?
]]>
A non-announcement announcement
Crop from Farkas Prize src |
Michel Goemans is the chair of this year’s ACM/IEEE Knuth Prize committee. He teaches at MIT and among many wonderful achievements co-won the 2000 MOS/AMS Fulkerson Prize with David Williamson for their great work on approximation for MAX CUT and MAX SAT and other optimization problems.
A few days ago he emailed me to ask if Ken and I would announce this year’s call for nominations.
Of course, as Michel wrote in his email to me, he realizes that we really do not usually make announcements. And indeed he is correct. So Ken and I are confronted with a dilemma. We want to help, but our main purpose at GLL is to present time-independent posts that balance history and technical details. Besides, if we start doing announcements like this, then year-over-year it would become like Groundhog Day. Our problem is:
How do we make the announcement and still follow our usual form?
A second potential problem is that if we were to start doing announcements like this, then year-over-year it could become like “Groundhog Day.” So we ask, “How do we make the announcement and still follow our usual form?” Besides…
One idea we had was that we would look at the history of prizes. Prizes in science and math have been around for many many years. There are two types of prizes; well there are many types, but there are two extremes. One is called ex ante prizes and the other is called ex post prizes. An ex ante prize is an attempt to use money to direct research. They are of the form:
If you can solve X, then you will win the following amount of money.
An early example was the Longitude Prize, which was based on the reality that knowledge of latitude is easy to refresh by looking at the sun and stars, but using them for longitude requires accurately knowing the local time.
In 1714, the British government offered a very large amount of money for determining a ship’s longitude. The top prize was 20 thousand British pounds—an immense amount of money at the time. John Harrison was awarded the top prize in 1773, which he solved by creating a clock—amazingly compact like a watch—that kept accurate time even on a ship at sea. It did this in calm seas, rough seas, hot temperatures, or cold temperatures. A great read about the prize, its solution, and the controversy in paying Harrison is Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time, by Dava Sobel.
A math example of an ex ante prize was the Wolfskehl Prize. Paul Wolfskehl created the prize named for him for the first to present a valid proof of Fermat’s Last Theorem. Of course, Andrew Wiles won the prize in 1997.
Some have stated that such prizes are not very useful. They often increase visibility of the problem, but attract amateurs who may or may not really understand the problem. The more recent Clay Prizes are ex ante, and it is unclear if they had any effect at all on the solution of the first of the prize problems to be solved: the Poincaré Problem. Recall the solver, Grigori Perelman, refused the prize money of $1,000,000.
Nobel Prizes are ex post prizes as are our own Turing Awards—okay they are called “awards” but that is a minor point. The Knuth Prize is definitely an ex post prize. It is given for not one achievement, but rather for a lifetime of work in the area of theory of computing. The call states:
The Prize is awarded for major research accomplishments and contributions to the foundations of computer science over an extended period of time.
I did win the prize two years ago, in 2014. I was thrilled, honored, and surprised. There are so many great theorists that I was very honored to receive it. Clearly, now is the time to try and put together a strong case for your favorite colleague. Only one can win, but you cannot win if there is not a serious case put together. So good luck to all. The committee consists of Allan Borodin, Uri Feige, Michel Goemans, Johan Håstad, Satish Rao, and Shang-Hua Teng. Here is the information on making a nomination. Nominations are due on Mar 31, 2016.
Laci Babai won it last year. Of course this was for his past work and its great innovations and deep executions, including a share of the first ever Gödel Prize. But we may have to credit the 2015 committee with prescience given Laci’s achievement with Graph Isomorphism (GI). Likewise, planning for a Schloss Dagstuhl workshop on GI began in 2014 without knowing how felicitous the December 13–18, 2015 dates would be. Perhaps there is a stimulating effect that almost amounts to a third category of prizes—as per a tongue-in-cheek time-inverting post we wrote a year ago.
Who will win this year? Of course the winner has to be nominated first—unless it’s the MVP of the NHL All-Star Game. There are many terrific candidates and I am glad not to be on the committee that has to decide. One prediction is that whoever wins will be extremely elated and honored.
[spellfix in subtitle]
Ernst Kummer was a German mathematician active in the early 1800s. He is most famous for his beautiful work on an approach designed to solve Fermat’s Last Theorem (FLT).
Today we will talk about a barrier that stopped his approach from succeeding.
I am currently teaching basic discrete mathematics at Tech. The first topic is elementary number theory, where we have covered some of the key properties of primes. Of course, we have covered the Fundamental Theorem of Arithmetic (FTA). Recall it says:
Theorem: Every natural number greater than is the product of primes, and moreover this decomposition is unique up to ordering of the primes.
This is a classic theorem, a theorem that is fundamental to almost all of number theory. It was implicit in Euclid’s famous work, and was perhaps stated and proved precisely first by Carl Gauss in his famous Disquisitiones Arithmeticae.
Alas as Kummer discovered the FTA fails for other sets of numbers. For other types of numbers half of the theorem still holds: every number that is not a generalization of can still be written as a product of “primes.” But uniqueness is no longer true.
In 1847, Gabriel Lamé claimed, in a talk to the Paris Academy, that he had solved FLT by using complex numbers—in particular numbers that were generated by the th roots of unity. Recall those are complex numbers that are solutions to the equation
Clearly is a solution always, but for primes there are other solutions: in total. Lamé’s argument was perfect, rather easy, used standard arguments, and was incorrect. The great mathematican Joseph Liouville at the talk questioned whether Lame’s assumption about unique factorization was justified; and if not, it followed that the proof was not valid. See here for a fuller discussion of this.
Liouville’s intuition was right. The FTA failed for Lamé’s numbers. Weeks after Lamé’s talk, it was discovered that Kummer had three years earlier shown that FTA indeed held for some primes but failed for . One might guess that the reason Lamé, and others, thought that factorization was unique for these numbers is that the first counterexample was for . Kummer famously showed that FTA could be replaced by a weaker and very useful statement, which allowed his methods to prove FLT in many cases. But not all.
The failure of FTA for what is now called cyclotomic integers is a well studied and important part of number theory. It is well beyond my introductory class in discrete mathematics. This failure has led to the discovery of related failures of uniqueness. One of the classic examples is the Hilbert Numbers—named after David Hilbert—of course.
Hilbert numbers are the set of natural numbers of the form . Thus they start:
Every Hilbert number greater than is the product of “Hilbert primes.” But note that a number can be prime now without being a real prime, that is a prime over all the natural numbers. Note that is a Hilbert prime: it cannot be factored as , since is not a Hilbert number.
The key observation is that some Hilbert numbers can be factored in more than one way:
Thus FTA fails for Hilbert numbers.
Another popular example of the failure of FTA uses square roots.
Note the numbers then are all those of the form:
where are integers. Many examples can be created in this way by changing to other integers. It is a major industry trying to understand which yield a set of numbers so that
satisfy the FTA.
Let us consider an extremely complex subset of the natural numbers:
Okay is just the set of all even numbers. This set is closed under addition and multiplication, and we claim that it has the following nice properties:
Let’s look at each of these in turn. A prime in is where is an odd natural number. The usual argument shows that every number in is a product of primes. For example:
Note, this process now stops, since is a prime. The last part showing that FTA fails in this setting is easy: Let be distinct odd primes. Then
Note, the factorizations are different. For example,
Does the simple example of even numbers help in understanding why FTA is special? Where was it first stated in the literature that even numbers fail to satisfy unique factorization? We cannot find it after a quick search of the web: all examples we can find are either the Hilbert numbers, some examples using square roots, or something even more complex.
[fixed some formatting issues]
Cropped from BBC feature on AI |
Marvin Minsky, sad to relate, passed away last Sunday. He was one of the great leaders of artificial intelligence (AI), and his early work helped shape the field for decades.
Today Ken and I remember him also as a theorist.
Like many early experts in computing, Minsky started out as a mathematician. He majored in math at Harvard and then earned a PhD at Princeton under Albert Tucker for a thesis titled, “Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem.” This shows his interest in AI with focus on machine learning and neural nets from the get-go. However, he also made foundational contributions to computability and complexity and one of his first students, Manuel Blum who finished in 1964, laid down his famous axioms which hold for time, space, and general complexity measures.
We will present two beautiful results of Minksy’s that prove our point. AI may have lost a founder, but complexity theory has too. His brilliance will be missed by all of computing, not just AI.
In 1967 Minsky published his book Computation: Finite and Infinite Machines. This book was really not an AI book, but rather a book on the power of various types of abstract machines.
The book includes a beautiful result which we’ll call the two-counter theorem:
Theorem 1 A two-counter machine is universal, and hence has an undecidable halting problem.
A two-counter machine has two counters—not surprising. It can test each counter to see if it is zero or not, and can add 1 to a counter, or subtract 1 from a counter. The finite control of the machine is deterministic. The import of the two-counter theorem is that only two counters are needed to build a universal computational device, provided inputs are suitably encoded. It is easy to prove that a small number of counters suffice to simulate Turing machines. But the beauty of this theorem is getting the number down to two.
To recall some formal language theory, one counter is not sufficient to build a universal machine. This follows because a counter is a restricted kind of pushdown store: it has a unary alphabet plus ability to detect when the store is empty. Single-pushdown machines have a decidable halting problem. It was long known that two pushdown stores can simulate one Turing tape and hence are universal: one holds the characters to the left of the tape head and the other holds the characters to the right. What is surprising, useful, and nontrivial is that two counters are equally powerful.
I will not go into the details of how Minsky proved his theorem, but will give a hint. The main trick is to use a clever representation of sets of numbers that can fit into a single number. Perhaps Minksy used his AI thinking here. I mention this since one of the key ideas in AI is how do we represent information. In this case he represents, say, a vector of natural numbers by
This is stored in one counter. The other is used to test whether say . Or to replace by . Making this work is a nice exercise in programming; I covered this in a post back in March 2009.
We must add personal notes here. I used this theorem in one of my early results, way back when I started doing research. Ken was also inspired by the connection from counter machines to algebraic geometry via polynomial ideal theory that was developed by Ernst Mayr and Albert Meyer at MIT. The basic idea is that if we have, say, an instruction (q,x--,y++,r) that decrements counter and increments while going from state to state , then that is like the equation represented by the polynomial . If the start and final states are and then the counter machine accepts the null input if and only if the polynomial belongs to the ideal generated by the instructions.
In 1969 Minksy and Seymour Papert published their book Perceptrons: An Introduction to Computational Geometry. It was later republished in 1987.
The book made a huge impression on me personally, but I completely missed its importance. Also the book is notorious for having created a storm of controversy among the AI community. Perhaps everyone missed what the book really was about. Let’s take a look at what it was about and why it was misunderstood.
For starters the book was unusual. While it is a math book, with definitions and theorems, it is easy to see that it looks different from any other math book. The results are clear, but they are not stated always in a precise manner. For example, Jan Mycielski’s review in the Jan. 1972 AMS Bulletin upgrades the statement of the main positive result about learning by perceptrons. Inputs are given only in the form of square arrays. The grand effect of this—combined with the book’s hand-drawn diagrams—is a bit misleading. It makes the book not seem like a math book; it makes it very readable and fun, but somehow hides the beautiful ideas that are there. At least it did that for me when I first read it.
It is interesting that even Minsky said the book was misunderstood. As quoted in a history of the controversy over it, he said:
“It would seem that Perceptrons has much the same role as [H.P. Lovecraft’s] The Necronomicon—that is, often cited but never read.”
With hindsight knowledge of how complexity theory developed after the excitement over circuit lower bounds in the 1980s gave way to barriers in the 1990s, Ken and I propose a simple explanation for why the book was misunderstood:
It gave the first strong lower bounds for a hefty class of Boolean circuits, over a decade ahead of its time.
How hefty was shown by Richard Beigel, the late Nick Reingold, and Dan Spielman (BRS) in their 1991 paper, “The Perceptron Strikes Back”:
Every family of depth- circuits can be simulated by probabilistic perceptrons of size .
That is, has quasipolynomial-size perceptrons. The key link is provided by families of low-degree polynomials of a kind that were earlier instrumental in Seinosuke Toda’s celebrated theorem putting the polynomial hierarchy inside unbounded-error probabilistic polynomial time. In consequence of the strong 1980s lower bounds on constant-depth circuits computing parity, BRS observed that subexponential-size perceptrons cannot compute parity either.
Minsky and Papert’s version of that last fact is what those Perceptrons editions symbolize on their covers. By the Jordan curve theorem, every simple closed curve divides the plane into “odd” and “even” regions. A perceptron cannot tell whether two arbitrary points are in the same region. This is so even though one need only count the parity of the number of crossings of a generic line between the points. Minsky and Papert went on to deduce that connectedness of a graphically-drawn region cannot be recognized by perceptrons either. Of course we know that connectedness is hard for parity under simple reductions and is in fact complete in deterministic logspace.
How close were Minsky and Papert to knowing they had and more besides? Papert subsequently wrote a book with Robert McNaughton, Counter-Free Automata. This 1971 book drew connections to algebraic and logical characterizations of formal languages, but did not achieve the power and clarity of descriptive complexity which flowered in the 1980s.
The other key concept they lacked was approximation. This could have been available via their polynomial analysis but would have needed the extra mathematical sophistication employed by BRS: probabilistic polynomials and approximation by polynomials. Perhaps they could have succeeded by noting connections between approximation and randomized algorithms that were employed a decade later by the likes of Andy Yao. Of course we all who worked in the 1970s had those opportunities.
The misunderstanding of their book reminds Ken of some attempts he’s seen to prove that the hierarchy of polylog circuit depth (and polynomial size) is properly contained in polynomial time. Take a function which you can prove to not be computed by your favorite family of (quasi-)polynomial size circuits of depth 1—or depth “1-plus” as perceptrons are. We may suppose always has the same length as . Now define the function on any of length to be the -fold composition —or even the vector of -folds for to Then still belongs to . You might expect to prove that computing on inputs of length requires levels of your circuits , thus placing outside and establishing .
But this conclusion about does not follow ipso facto. The inability of one layer of perceptrons to compute simple functions does not extend to multiple layers—and some length-preserving offshoots of parity make just essentially the same function, no more complex. Of course we understand this now about layers in circuit complexity. Getting snagged on this point—without sharp mathematical guidelines in the book—is what strikes us as coming out even in Wikipedia’s discussion of perceptrons:
[The lower bounds] led to the field of neural network research stagnating for many years, before it was recognized that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer. … It is often believed that [Minsky and Papert] also conjectured (incorrectly) that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function.
One can judge our point further relative to this in-depth paper on the controversy.
None of this should dim our appreciation of Minsky as a theorist. We offer a simple way to put this positively. The two-pronged frontier of circuit complexity lower bounds is represented by:
We haven’t cracked counting modulo 6 any more than counting modulo any composite in general. Nor do we have strong lower bounds against threshold circuits of those depths—while in the arithmetical case at least, high exponential bounds on depth 3 broadly suffice.
Well, a perceptron is a one-layer threshold circuit with some extra AND/OR gates, and modular counting is the first nub that Minsky and Papert identified. They knew they needed some extra layers, understood that connectedness on the whole does not “simply” reduce to parity, and probably flew over the import of having an extra factor in the modulus, but the point is that they landed right around those frontiers—in 1969. They were after broader scientific exploration than “” which hadn’t yet become a continent on the map. This makes us wonder what kind of deeper reflection than worst-case complexity might land us closer to where solutions to the lower-bound impasse lie.
Our condolences to his family and the many people who worked with him and knew him best.
[word changes toward end: “probabilistic analysis” -> “polynomial analysis”; “connectivity” –> “connectedness”]
Crop from Seneca chess quote source |
Lucius Seneca was a Roman playwright, philosopher, and statesman of the first century. He is called “the Younger” because his father Marcus Seneca was also a famous writer. His elder brother Lucius Gallio appears as a judge in the Book of Acts. Beside many quotations from his work, Seneca is famous for one he is not known to have said:
“To err is human.”
Lest we cluck at human error in pinning down ancient quotations, the source for the following updated version is also unknown—even with our legions of computers by which to track it:
“To err is human, but to really screw things up requires a computer.”
Today I report a phenomenon about human error that is magnified by today’s computers’ deeper search, and that I believe arises from their interaction with complexity properties of chess.
I have previously reported some phenomena that my student Tamal Biswas and I believe owe primarily to human psychology. This one I believe is different—but I isolated it in full only a week ago so who knows. It is that the proportion of large errors by human players in positions where computers judge them to be a tiny fraction of a pawn ahead is under half the rate in positions where the player is judged ever so slightly behind.
The full version of what Seneca actually wrote—or perhaps didn’t write—is even more interesting in the original Latin:
Errare humanum est, perseverare autem diabolicum, et tertia non datur.
This means: “To err is human; to persevere in error is of the devil; and no third possibility is granted.” The phrase tertium non datur is used for the Law of Excluded Middle in logic. In logic the law says that either Seneca wrote the line or he didn’t, with no third possibility. We will say more about this law in an upcoming post. Amid disputes about whether human behavior and its measurement follow primarily “rational” or “psychological” lines, we open a third possibility: “complexitarian.”
Here are some important things to know about chess and computer chess programs (called “engines”).
One upshot is that depth of cogitation is solidly quantifiable in the chess setting. We have previously posted about our papers giving evidence of its connection to human thinking and error. The new phenomenon leans on this connection but we will argue that it has a different explanation.
My training sets include all recorded games in the years 2010–2014 between players rated within 10 points of the same century or half-century milepost of the Elo rating system. They range all the way from Elo 1050 to Elo 2800+, that is, from beginning adult tournament-level players to the human world champion and his closest challengers. This “milepost set” has over 1.15 million positions from 18,702 games, counting occurrences of the same position in different games but skipping the first 8 turns for both players. Each position was analyzed to depth at least 19 by both the newly-released version 7 of the number-two ranked Stockfish and last month’s version 9.3 of Komodo, using a multitude of single threads (for reproducibility) on the supercomputing cluster of the University at Buffalo Center for Computational Research.
The value of a position is the same as the value of the best move(s) in the position. In multi-line mode we can get the value of the played move directly, while in single-line mode we can if needed take the value of the next position. A value of +1.50 or more is commonly labeled a “decisive advantage” by chess software (though there are many exceptions), whereas values between -0.30 and +0.30 are considered grades of equality—at least by some chess software. Accordingly let’s call any move that drops the value by 1.50 or more a blunder. We will tabulate ranges in -0.40…+0.40 where a blunder most matters. Value exactly 0.00 gets a range to itself.
The final wrinkle is that we will use the engines’ highest-depth values to group the positions, but distinguish cases where the human player’s move was regarded as a blunder at depth 5, 10, 15, and 20. The last includes some depth-19 values. Doing so distinguishes immediate blunders like hanging your queen from subtle mistakes that require a lot of depth to expose. The data divides fairly neatly into thirds, one for “amateur” players under 2000 (387,024 positions from 6,187 games), then the 2000–2250 range (394,186 from 6,380), then 2300 and above (372,587 from 6,035). The depth-5 numbers for Stockfish 7 are “off” for reasons I demonstrated in a chess forum post; responses seem to confirm a recent bug that has lessened impact for depths 9 and higher. (See Update below.) The tables are for single-line mode (the multi-line mode results are similar including the Stockfish anomaly), and positions are analyzed sans game history so that only repetitions downstream in the search are detected.
The point is not that weaker players make more large mistakes, but rather the “Red Sea” effect at 0.00. It becomes more pronounced at the greatest depths and for the strongest players. Here are the figures for players rated 2600 and above, from 90,045 positions in 1,335 games. They are tabled in smaller intervals of 0.10 within the “equality” range -0.30 to +0.30.
Although Stockfish and Komodo have differences in their evaluation scales—happily less pronounced than they were 1 and 2 years ago—they agree that the world’s elite made six times more large errors when on the lower side of equality. This begs asking not only why but whether it is a real human phenomenon.
There seems no possible rational explanation—indeed the sharp change at 0.00 counters the rational hypothesis discussed in this paper and toward the end of my 2014 TEDxBuffalo talk. But it is also hard to sustain a psychological explanation. If being slightly behind puts one off one’s game, why is the ratio accentuated for the top players? That this training set includes only games in which the players are evenly matched minimizes any variance from overconfidence or “fear factors.”
My proposed explanation leans on the above-mentioned engine behavior with repetitions and 0.00 values. Spot-checks affirm that the wide 0.00 bin includes many positions where the armies trade blows but fall into a repeating sequence from which neither can veer without cost. If the machine judges your position worth +0.01 or more then it places you discretely above this no-man’s zone. Any notionally big slip you make still has a good chance of being caught in this large attraction basin and hence being charged as only a small error by the computer. Whereas a slip from 0.00 or below has no safety net and falls right down.
This hypothesis comes with one large falsifiable prediction and perhaps a few others regarding games played by computers and computer-human teams as opposed to humans alone. My entire model uses no feature of chess apart from the move values supplied by engines at progressively increasing depths of search. It thus transfers to any alternating-decision game in which sufficiently authoritative move values can be computed. Some games like Go rule out repeating sequences, while others like Shogi allow them but rarely end in draws.
For any strategy game that disallows repetitions and/or has negligible draw frequency, there will be no similar “firewall at 0.00” effect on the rate of human mistakes.
Go and Shogi are just as hard as chess, so this hypothesis does not pertain to worst-case complexity. Rather it addresses heuristically complex “most-case” behavior. It also highlights a general risk factor when using computers as judges: notional human errors are necessarily being evaluated by machines but the machines are evidently not doing so with human-relevant accuracy.
This last issue is not theoretical or didactic for me. In applying my statistical model to allegations of cheating I need to project distributions of errors by unaided human players of all skill levels accurately. I have been aware of ramped-up error below and away from 0.00 since 2008, when I devised a pleasingly symmetric way to smooth it out: Take a differential dμ(x) that has value 1 at x = 0.00 but tapers off to 0 symmetrically. This says that the marginal value of a centipawn is the same whether you are ahead or behind the same amount but lessens when the (dis)advantage is large. When a player makes an error of raw magnitude e in a position of value v, charge not e but rather the integral of dμ(x) from x = v – e to x = v. The metric can be weighted to make 0.00 effectively as wide as neighboring intervals.
The idea is that if you blunder when, say, 0.50 ahead, the integral will go through the fattest part of the metric and so charge near face value. But when 0.50 behind it will go through the thinner region -0.50 to -0.50 – e and so record a lesser value for my analyzer. This “metric correction” balanced and sharpened my system in ways I could verify by trials such as those I described on this blog in 2011 here.
Privately I’ve indulged an analogy to how relativity “corrects” the raw figures from Newtonian physics. But with today’s chess programs cannonballing to greater depths, I can tell when carrying over my previous calibrations from the earlier Rybka and Houdini engines that this method is being strained. Having now many times the amount of data I previously took on commodity PCs has made the quantum jump at 0.00 so clear as to need special attention. Making an ad hoc change in my model’s coefficients for positions of non-positive value still feels theoretically uncomfortable, but this may be what the data is dictating.
How can you explain the phenomenon? How should regard for phenomenology—in both experimental and philosophical senses—influence the manner of handling it?
Update 2/6/16: I received a suggestion to modify one line of the Stockfish 7 code to avoid shallow-depth “move count based pruning” at nodes belonging to the current principal variation. This inserts a third conjunct && !PvNode into what is currently the second block of “step 13” in the Stockfish source file search.cpp. Re-running the data for the Elo 2600–2800 levels produced the following table:
Evidently the change greatly reduces the depth-5 “noise” but still leaves it distinguishably higher than the depth-5 figures for Komodo 9.3.
Cropped from source (Garrett Coakley) |
Euclid is, of course, the Greek mathematician, who is often referred to as the “Father of Geometry.”
Today Ken and I want to talk about an “error” that appears in the famous volumes written by Euclid a few years ago—about 2300 years ago.
The “error” is his use of the word ‘random’ when by modern standards he should be saying arbitrary. I find this surprising, since I think of random as a modern concept; and I find it also surprising, since the two notions are not in general equivalent.
It seems clear that Euclid said ‘random.’ The root-word he used, tuchaios, endures as the principal word for “random” in modern Greek and is different from words meaning “arbitrary” or “generic” or “haphazard” or even “stochastic.” The only meaning of tuchaios or Euclid’s exact phrase hos etuchen we’ve found that would make his statement remain strictly correct is, “it is unimportant which.” However, the way hos etuchen was put in the voice of Pope Clement I seems not to square with that meaning either.
I never studied the Elements, Euclid’s famous collection of thirteen books on geometry. Not that long ago, many schools used the Elements as the textbook for the introduction of mathematics. Abraham Lincoln is said to have studied the Elements until he could recite it perfectly. I never looked at any part of it until I came across Book II while doing some research. And I was quite surprised to see the use of the notion of “random” there in the text.
Book II is focused on a geometric approach to identities, which are much easier to understand as algebraic identities today. The proposition that caught my eye is Proposition II.4.
Proposition 1 If a straight line be cut at random, the square on the whole is equal to the squares on the segments and twice the rectangle contained by the segments.
Perhaps this is easy to understand as a geometric statement. Today we would write algebraically that it stands for the identity
Why did Euclid state it geometrically? Perhaps the main advantage was that it allowed him to reason directly about geometric objects. After all he was writing about geometry, so a square with sides of length stood in nicely for the term . An even better reason might have been his lack of modern algebraic notation—the equals symbol was invented by Robert Recorde a few years after Euclid, in 1557.
Here is the proof as it appears in the Elements. The length compared to simply expanding shows the power of modern notation:
What surprised me was the exact statement of Proposition II.4. Note that it starts,
If a straight line be cut at random…
In online Greek editions such as this the phrase hos etuchen meaning “at random” is set off with commas. Euclid reiterates this phrase in the first line of his proof. However, the result is actually true for any cut of the line, which is more than saying “at random.” So why does Euclid say “random”?
Euclid seems nowhere to define in any precise way what “random” means in this context. Recall that one of the great achievements of the Elements was its claim to be a precise and axiomatic approach to geometry. But using an undefined term like “random” seems to run overtly counter to that goal.
Looking for other usages doesn’t clearly let Euclid — or his main ancient editor, Theon of Alexandria — off the hook. Pope Clement I was St. Peter’s first, second, or third successor. A novelization of his acts and homilies has him using the same Greek phrase at the beginning of Book 1, chapter IV:
Our Peter has strictly and becomingly charged us concerning the establishing of the truth, that we should not communicate the books of his preachings, which have been sent to us, to any one at random, but to one who is good and religious, and who wishes to teach…
This aligns with the modern meaning: the writer was saying that most people would be unqualified to preach Peter’s sermons. It does not mean that all people would be bad or that it is unimportant who receives the books. There is support in other ancient examples for the reading, “to anyone you happen to meet,” but even then the inference stays one of “mostness” not “all.” In any event, Euclid’s proposition is correct with “all”—even if the line is “cut” at one of the endpoints.
The difference is not a quibble. It is easy to make statements in Euclidean geometry that are true for “random” but not for “all”: A random triple of points in the plane makes a triangle. A random line through a point outside a circle is not tangent to the circle.
However, there are also cases where holding for “random” is sufficient for holding for “all.” Equalities like Euclid’s have that property. So does the Schwartz-Zippel lemma: if is a polynomial and for a random over a large enough field then is the zero polynomial. In fact Euclid’s identity is a case of this—could we add Euclid as sharing credit for the lemma?
This led Ken and me to think about a problem: Can we make something out of ‘Euclidean’ randomness?
There is a third concept lurking here: generic. Generic usually implies random but is more special and does not require probability or (Lebesgue) measure. In fact it basically means “not special.” Three collinear points are special; a line tangent to a circle is special.
The exact notion of “generic” is context-dependent, but at the interface of geometry and algebra we can pin it down: a set of elementary objects (points, lines, etc.) is special if it satisfies some finite set of simple arithmetical equations and is generic otherwise. Collinear points and tangent lines are clearly special in this sense. More formally, the special sets are those closed in the Zariski topology, apart from the whole (Euclidean) space. So now we ask:
Could Euclid have been in any sense aware of the idea of genericity?
If so, then Euclid could have been led into deep waters. Consider just a line segment going from 0 to 2. The midpoint is special because it satisfies the equation . Similarly so are the points and . It quickly follows that all rational numbers are special. Now so is since it satisfies . And likewise , so all points for algebraic numbers are special too. Euclid would certainly have suspected that the cubic and higher points might not be constructible. So although he might have suspected that a “random” point was not constructible, he would have had a hard time realizing that the non-constructible points include the special subset of algebraic ones.
Of course, it took until the 1600s to articulate modern meanings of “random” and until the 1800s for Georg Cantor and the topological notions underlying genericity to arise. It still interests us what “hints” might have been perceived in the intervening centuries.
Alfred Tarski, the famous 20th-century logician, created formal axioms for geometry. His axioms modeled that part of geometry that is called “elementary.” This includes statements of plane geometry that can be stated in first-order logic and only refer to individual points and lines: arbitrary sets are not allowed. The above reference has the details of his axioms. They were built on two primitive notions:
Tarski proved that this theory is decidable. And actually it has a remarkable property: any statement in the theory is equivalent to a sentence that is in universal-existential form, a special case of prenex normal form. In this form all universal quantifiers precede any existential quantifiers:
This form is close to just having equations, so it is tantalizing to ask, given any formula , does either or hold for generic ? Or for some notion of “random” ? The basic and formulas have this property: their negations hold generically—even though they do not hold for all .
However, the following seems to be a weighty counterexample to any kind of “zero-one law” holding here: Tarski’s system can define a formula meaning that the angles at and are acute. Now fix and in the plane. Then any satisfies if and only if and . Neither the set of such nor its complement is a nullset.
This is curious because if one regards Euclid-type diagrams as finite structures like graphs, then the first-order zero-one law proved by four Soviet mathematicians and independently a little later by Ronald Fagin comes into play: as the proportion of size- structures that satisfy a given first-order sentence (pure: no parity or counting) goes either to 0 or to 1. Still, we can ask two questions:
So what we are asking is, exactly when does Euclid’s use of “random” for “arbitrary” remain correct? Which geometric statements are guaranteed either to hold or to fail for “random” arguments?
Do our questions have nice and simple answers? Are we the first to wonder how Euclid’s words fare when given a modern mathematical interpretation?
[fixed definition of ]
Wikimedia Commons source |
Loki is a Jötunn or ss in Norse mythology, who, legend has it, once made a bet with some dwarves. He bet his head, then lost the bet, and almost really lost his head—-more on that in a moment.
Today Ken and I wanted to look forward to the new year, and talk about what might happen in the future.
We have many times before discussed the future, and in particular what might happen to some of the major problems that we have. For instance:
But along the way to thinking about our open problems we looked around at other fields of science. There are plenty of hard open questions in physics, chemistry, and other areas. But the area that seems to have problems that are easy to state, but hard to resolve, is Philosophy. So we thought that instead of a predictions post it might be interesting to look at just a few of their questions. Perhaps taking a computational point of view could help them get resolved? Besides, we are terrible at predicting the future anyway.
Details on the following problems and others can be looked up at the online Stanford Encyclopedia of Philosophy (SEP), but for this intro post we will stay with the shorter descriptions in Wikipedia’s article on unsolved problems. We start with its disclaimer:
This is a list of some of the major unsolved problems in philosophy. Clearly, unsolved philosophical problems exist in the lay sense (e.g. “What is the meaning of life?”, “Where did we come from?”, “What is reality?”, etc.). However, professional philosophers generally accord serious philosophical problems specific names or questions, which indicate a particular method of attack or line of reasoning. As a result, broad and untenable topics become manageable. It would therefore be beyond the scope of this article to categorize “life” (and similar vague categories) as an unsolved philosophical problem.
So let’s take a look at a few open problem that arise in philosophy, but are not impossible. We will pick ones that seem related to our field and also are perhaps attackable by computational methods. Hence we avoid “what is reality?” and even “what is consciousness?”
The Münchhausen trilemma, also called Agrippa’s trilemma, is not a new type of “lemma.” Rather it is a claim that it is impossible to prove anything with certainty. This goes way beyond any incompleteness theorem, and applies to math as well as logic.
The argument is simple: any proof must fail because of one of the following:
This seems a pretty strong argument to me—is it certain? Of course by the argument that is impossible. So maybe the argument fails, in which case there might be statements that are indeed certain. I am confused.
Let’s return to explain Loki’s problem. Legend has it that he made a bet with dwarves, and should he lose the bet they would get his head. Sounds like a pretty scary bet—I hope it was not on the Jets to make the playoffs in the NFL. He lost the bet. And the dwarves came to collect. Loki saved himself by arguing that they could have his head, but they could not take any of his neck. The problem then became:
Where did his neck begin and his head end?
Since neither side could agree on exactly where the neck and head met exactly, Loki survived.
There are many other versions of this same issue.
Fred can never grow a beard: Fred is clean-shaven now. If a person has no beard, one more day of growth will not cause them to have a beard. Therefore Fred can never grow a beard.
Another one is:
I can lift any amount of sand: Imagine grains of sand in a bag. I can lift the bag when it contains one grain of sand. If I can lift the bag with N grains of sand then I can certainly lift it with N+1 grains of sand (for it is absurd to think I can lift N grains but adding a single grain makes it too heavy to lift). Therefore, I can lift the bag when it has any number of grains of sand, even if it has five tons of sand.
Even popular culture uses this paradox. Samuel Beckett has one character say this line in his play Endgame:
“Grain upon grain, one by one, and one day, suddenly, there’s a heap, a little heap, the impossible heap.”
This “paradox” is essentially Sperner’s lemma, which is due to Emanuel Sperner. It can be viewed as a combinatorial analog of the Brouwer fixed point theorem—one of my favorite theorems. In one dimension it says simply that if N > 1 and you paint the numbers 1 to N red and green and start with red and end with green, then there must be an i so that i is red and i+1 is green. Thus there is a definite place where the line turns from red to green. Is it a paradox that a philosophy “paradox” is a lemma in mathematics?
We can make one more observation. When it comes to writing and verifying software systems, sorites matters more than an issue of words. When should adding an object to a collection be considered to change the collection’s properties? Perhaps such depth is why sorites makes this list of “Ten Great Unsolved Problems” on the whole. Does adding one more good observation to a good blog post make it still a good blog post?
The Molyneux problem was first stated by William Molyneux to John Locke in the 17th century. Imagine a person born blind and who is able to tell a cube from a sphere. Imagine now that their sight is restored, somehow. Will they be able to tell a cube from a sphere solely by sight without touching them?
This problem was widely discussed after Locke added it to the second edition of his Essay Concerning Human Understanding. It is certainly an interesting question. Moreover, today there are cases of people who have had their sight repaired. So perhaps this question will be solved soon. In any event it would be nice to understand the brain enough to know what would happen. Will everyone be able to tell the objects apart? Or will just some? The Stanford article says much more about experiments through 2011 but without a clear resolution.
Wikipedia’s discussion limits this to statements of the form “If X then Y” where X is false in our world. In logic such statements are true, but in a way that is unsatisfying because the relationship between X and Y is unexamined. The theoretical response is to enlarge our world to a set of possible worlds which may include some worlds W in which X is true and Y is assessable. Concepts of necessity and possibility— and in modal logic—quantify the ranges of W for such statements.
This still leaves the problem of W having inferior status to our “real world.” The quantum computing progenitor David Deutsch, in his philosophical book The Fabric of Reality, suggests that maybe those W don’t have inferior status. His book is subtitled The Science of Parallel Universes—and its Implications. He tries to be quantitative with these parallel worlds in ways that go beyond the tools of modal logic. Again, it is possible that insights from computation may continue to help in judging his and the older frameworks.
Can we possibly shed computational light on any of these problems?
]]>
Is the “Forsch” awakening in complexity theory?
Composite of src1, src2, src3 |
Max von Sydow starred as the chess-playing knight in Ingmar Bergman’s iconic 1957 film The Seventh Seal. He has the first line of dialogue in this year’s Star Wars VII: The Force Awakens:
This will begin to make things right.
His character, Lor San Tekka, hands off part of a secret map before… well, that’s as much as we should say if you haven’t yet seen the movie. Highly recommended.
Today Dick and I marvel at how some of this past year’s best results hark back to the years when Star Wars first came out.
That was in 1977, followed by The Empire Strikes Back in 1980 and Return of the Jedi in 1983. Von Sydow wasn’t in any of those, nor the “prequel” films in 1999–2005. He starred as Jesus in 1965’s The Greatest Story Ever Told and as arch-villain Ernst Stavro Blofeld in the 1983 James Bond film Never Say Never Again. He earned a 2011 Oscar nomination for Best Supporting Actor but lost to the equally venerable Christopher Plummer, with whom he’d co-starred in the 2007 film Emotional Arithmetic. He is appearing in season 6 of the TV series Game of Thrones and is apparently in the next Star Wars movie too. We should all be so active so long.
Both of us remember seeing Star Wars in 1977 as if it were yesterday. That is now as long ago as 1939—the beginning of World War II—was then. If you double back 1957, you get 1899. This can make one feel old. Or it can make one feel energized by the Forsch—that is the root of ‘research’ in German. Let’s see some things that 2015 gave us and that 2016 might have in store.
1957 is when Paul Erdős first published his Discrepancy Conjecture, though he dated it “twenty-five years ago” to 1932. We have already covered Terry Tao’s stunning proof of this conjecture. The latest news on Tao’s blog from early this month covers attacks on related conjectures.
1977 marks the beginning of László Babai’s citable work on the Graph Isomorphism problem according to the bibliography of his full paper. Beginning to work through it now, we are again struck by how firmly it stands on the foundation of Eugene Luks’s 1980 algorithm for the bounded-degree case. Babai and Luks mapped the terrain further in two 1983 papers, one including William Kantor. Other major references also date to the initial Star Wars years.
One of them is a 1981 paper by Peter Cameron, who was a co-advisor when I came to Merton College, Oxford, that year, and who has recently made a flurry of posts on his own blog about travels to New Zealand and Australia. Although Peter’s paper does not reference Luks’s algorithm, its results identify the shield to extending it: a class of permutation groups with highly symmetric actions and no “nice” subgroups of small index. What Laci’s algorithm does resembles the isomorphic climactic plot elements of movies IV, VI, and VII: either there is a polylog-size local irregularity and exhaustively firing on it gets the whole thing to split (IV and VII), or the absence of one enables building a global automorphism to reach a known case which is like taking out the shield generator (VI).
We could have included “local-global” in our roundup of general themes. We should add that although Peter’s result depends on the Classification, Laci’s section 13.1 outlines a second wave of attack that avoids needing either for its analysis. A primary dependence on the classification theorem remains, but for reasons we discussed before, we feel the possibility of error from that is only a phantom menace.
1980 is when William Masek and Michael Paterson shaved a factor off the previous -time algorithms for computing the edit distance between two length- strings. The question for 35 years has been, can we save more to get time for some fixed ? There seemed to be no solid reason why not, until Arturs Backurs and Piotr Indyk earlier this year applied a connection first shown by Ryan Williams to the strong exponential time hypothesis (SETH). Namely, for edit distance implies the same for Ryan’s “Orthogonal Vectors” problem, which in turn implies an algorithm for satisfiability of -variable, -clause CNF formulas in time for some fixed , which SETH says cannot exist.
The heart is a new degree of fine-grained reduction between problems in . The point is that the problems involved are all in the bedrock of polynomial time, not flung into the space of NP-completeness. As we discussed last June, this is a puzzling kind of revenge of the SETH.
1981 is when Péter Frankl and Richard Wilson explicitly constructed uniformly succinct families of graphs of size having no cliques or independent sets of size , where
.
The lower , the harder this is to achieve. Erdős obtained in his 1947 paper but this famous first use of the probabilistic method does not give an explicit family. “Explicit” could mean definable by a formula of first-order logic, but consensus requires only that whether node has an edge to node is decided in time, which is what we mean by “uniformly succinct.”
Frankl and Wilson’s upper bound stood for over 30 years until 2012 when Boaz Barak, Anup Rao, Ronen Shaltiel, and Avi Wigderson achieved , indeed for some fixed . Now Gil Cohen has proved a major further step by constructing graphs with . This is the first explicit construction giving bounds that are quasipolynomially related to the nonconstructive bound of Erdős. Such graphs of course witness that Frank Ramsey’s famous theorem needs a higher for that value of .
1984 is when Norbert Blum proved a lower bound of 3n – 3 on the circuit complexity of an explicitly-given family of n-variable Boolean functions. This is for circuits using the full set of binary gates, not just (with free use of ) for which bounds have gone as high as 5n – o(n) before meeting resistance.
Now Magnus Find, Alexander Golovnev, Edward Hirsch, and Alexander Kulikov have beaten the “3” — by “1/86.” That is, they have constructed an explicit family of Boolean functions that require circuit size
How impressed should we be? Their paper has several new and striking ideas, including that allowing circuits to have cycles (using just and to yield linear equations) increases the toolbox of recursions for inductive proofs. However, it is still far from a healthy jump to, say, 10n, let alone a nonlinear lower bound. Our next featured result from this past year gives a bit of pause.
A few years ago we hailed a pair of independent ‘breakthroughs’ on the exponent of the time for multiplying two matrices that had been achieved by Andrew Stothers and Virginia Williams, the latter getting down under 2.372873 (correcting an earlier claim of 2.3727). This was nudged down to 2.372864 by François Le Gall last year.
A new hope for getting perceptibly toward 2, or at least achieving 2.373 – for non-microscopic values of , is however apparently quashed by a paper of Le Gall with Andris Ambainis and Yuval Filmus which came out at STOC 2015. They show that the particular technique of these papers cannot even reach down to 2.3725. More foreboding is that a wider class of methods cannot get down to 2.3078.
Getting back to the circuit lower bounds, it is thrilling to beat 3n but sobering that a paper with three new creative ideas beat the ‘3’ by only some kind of . The novelty of the methods will hopefully avert an attack of clone papers pushing it no higher than, say, 3.0125n. We can also mention a new paper by Daniel Kane and Ryan Williams on non-linear lower bounds for threshold circuits of depths 2 and 3.
We could perhaps devote a whole separate post to developments in quantum complexity. The simplest to state is a new mainstream complexity class upper bound for quantum Arthur-Merlin games with unentangled provers:
The paper by Martin Schwarz improves the previous upper bound of . (Update: Per this the result is in doubt.) We can also mention a new paper by Le Gall with Shojo Nakajima that improves ten-year-old bounds for quantum triangle-finding on sparse graphs.
The headline-getting progress, however, has been in efforts to build larger-scale quantum computers. In April, IBM announced advances in fabricating quantum circuits that detect errors.
Earlier this month, Google released a paper on experiments with their quantum computer built by D-Wave Systems claiming a speedup over classical methods. See also this interview with Scott Aaronson and a comment with remarks on the paper in Scott’s own post on it. We have been intending for a long time to cover the debate over whether speedups claimed by D-Wave are necessarily quantum. At least we can say this marks a return of the Geordie Rose and Vern Brownell-led company to the driving side of it.
We look forward to , that is next year. Perhaps it will be the year that leads to some real progress on lower bounds. Who knows?
We wish everyone a safe New Year’s Eve and all the best for 2016—may the Forsch be with you.
[Update on doubt about QMA(k) result, Robin->Richard Wilson, some word and format changes]
René Descartes and François Viète are mathematicians who have something in common, besides being French. Let’s get back to that shortly.
I am about to teach our basic discrete math class to computer science majors, and need some advice.
At Tech this class has been taught for years by others such as Dana Randall. She is a master teacher, and her rankings from the students are always nearly perfect. Alas, she is on leave this spring term, and I have been selected to replace her.
I know the material—it’s a basic course that covers the usual topics:
Summary: Fundamentals of numbers, Sets, Representation, Arithmetic operations, Sums and products, Number theory
General: Fundamentals of numbers, Sets, Representation, Arithmetic operations, Sums and Products, Number Theory;Proof Techniques: Direct Proofs, Contradiction, Reduction, Generalization, Invariances, Induction;
Algorithmic Basics: Order of Growth, Induction and Recursion;
Discrete Mathematics: Graph Theory, Counting, Probability (in relation to computability).
My dilemma is how to make teaching fun for me, make it fun for the students trying to learn this material, and do half as well as Dana. Well maybe as well.
This dilemma made me think about a new approach to teaching discrete math. I would like to try this approach on you to see if you like it. Any feedback would be most useful—especially before I launch the class this January.
First an answer to what Descartes and Viète have in common. Viète introduced at the end of 16th century the idea of representing known and unknown numbers by letters, although he was probably not the first. That is possibly Jordanus de Nemore. Descartes decades later created the convention of representing unknowns in equations by x, y, and z, and knowns by a, b, and c. This was much better than Viète’s idea of using consonants for known values and vowels for unknowns.
What does this have to do with teaching discrete math? Everything. I believe that we may confuse students by mixing two notions together. The first is that math in general, and discrete math in particular, requires students to learn a new language. The above rules for what are variables and what are constants is an example language rule that one must learn.
Let me phrase this again:
View learning discrete math as learning a new language.
In order to be successful students must learn the basics of a language that has many symbols they need to know—for example “”—and also many words and terms that have special meanings. Think about the word “odd.” In usual discourse this means:
different from what is usual or expected; strange.
But in a discrete math course, an odd natural number is one that is not divisible by . I once told a story here about what an “odd prime is”: of course, it’s just a prime that is not .
If you accept that learning the language of math is fundamental, then you should accept that we may be able to use methods from teaching real languages. So I looked into how people learn new languages—how, for example, an English speaker might learn German. One of the main principles is that there are four parts to learning any new language. We must be able to:
- Read the language;
- Write the language;
- Speak the language;
- Listen to the language.
That these all are important seems obvious, but in years of teaching discrete math I never spent any time explicitly on these skills. I never worked on speaking, for example, nor on the other skills. Yet to learn a new language one must be facile in all of the above. So this January perhaps I will do some of the following:
Another exercise might be to have students select from math statements and identity which say the same thing. Or which are ill-formed statements; or rewrite a statement without using a certain symbol or word.
For example, which of the following statements does not say the same thing as the rest:
Some comments about the math language itself. It is filled with symbols that act as shorthand for terms or words—students must learn these. The concepts are difficult for some, but this is increased by the use of some many special symbols. Also the math language uses “overloading” quite often. That is the same exact symbol may mean different things, which means that students must use the global context to figure out what the word or symbol mean. This is nothing special, since many languages do the same thing. But it does add to the difficulty in understanding the language. A simple example is “i”: is this a variable?; is it the square root of ; or is it or something else?
Insight II
A student who knows her math language is in a good position to make progress. But discrete math is more than just a new language. It includes the notion of “proof.” Notice that it seems that one can become facile in math as a language without understanding proofs. This is perhaps the most radical part of this approach to teaching discrete math. I wonder, and ask you, whether decoupling the language from the ability to understand and create proofs is a good idea. What do you think?
The point here is that we view proofs as reasoning about statements in the math language. It is a type of rhetoric. Recall that according Aristotle, rhetoric is:
The faculty of observing in any given case the available means of persuasion.
Rhetoric was part of the medieval Threefold Path whose meeting point is Truth, and which prepared students for the Fourfold Path of arithmetic, geometry, music, and astronomy. Thus rhetoric came before mathematics in the classic curriculum—recall “mathematics” meant astronomy back then—and before proofs as taught in geometry. I feel it should be so now.
Of course one of the central dogmas of mathematics is that our reasoning about statements is precise. If one proves that statement implies , then one can be sure that if is true, then so is . We expect our students to learn many types of reasoning, that is many types of proof methods. They must be both able to understand a proof we give them andto create new proofs. They must be facile with rhetorical arguments in the new math language.
Proofs will still be a central part of the course. But it will not be the only part. And it will not be entangled with the learning of the math language. I hope that this separation will make learning discrete math easier and more fun.
A colleague once had a student near the middle of the semester ask this question: what is the difference between
and
I must add that this is from a colleague who is a terrific teacher. Perhaps this shows that stressing the math language aspect is important. Ken chimes in to say that he experienced issues at almost this level teaching this fall’s graduate theory of computation course, which he treats as much like a discrete mathematics course as the syllabus allows. He emphasizes not so much “language” as “type system” (per his post last February) but completely agrees with the analogy to rhetoric.
What do you think of this approach to teaching discrete math?