Crop from Farkas Prize src |
Michel Goemans is the chair of this year’s ACM/IEEE Knuth Prize committee. He teaches at MIT and among many wonderful achievements co-won the 2000 MOS/AMS Fulkerson Prize with David Williamson for their great work on approximation for MAX CUT and MAX SAT and other optimization problems.
A few days ago he emailed me to ask if Ken and I would announce this year’s call for nominations.
Of course, as Michel wrote in his email to me, he realizes that we really do not usually make announcements. And indeed he is correct. So Ken and I are confronted with a dilemma. We want to help, but our main purpose at GLL is to present time-independent posts that balance history and technical details. Besides, if we start doing announcements like this, then year-over-year it would become like Groundhog Day. Our problem is:
How do we make the announcement and still follow our usual form?
A second potential problem is that if we were to start doing announcements like this, then year-over-year it could become like “Groundhog Day.” So we ask, “How do we make the announcement and still follow our usual form?” Besides…
One idea we had was that we would look at the history of prizes. Prizes in science and math have been around for many many years. There are two types of prizes; well there are many types, but there are two extremes. One is called ex ante prizes and the other is called ex post prizes. An ex ante prize is an attempt to use money to direct research. They are of the form:
If you can solve X, then you will win the following amount of money.
An early example was the Longitude Prize, which was based on the reality that knowledge of latitude is easy to refresh by looking at the sun and stars, but using them for longitude requires accurately knowing the local time.
In 1714, the British government offered a very large amount of money for determining a ship’s longitude. The top prize was 20 thousand British pounds—an immense amount of money at the time. John Harrison was awarded the top prize in 1773, which he solved by creating a clock—amazingly compact like a watch—that kept accurate time even on a ship at sea. It did this in calm seas, rough seas, hot temperatures, or cold temperatures. A great read about the prize, its solution, and the controversy in paying Harrison is Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time, by Dava Sobel.
A math example of an ex ante prize was the Wolfskehl Prize. Paul Wolfskehl created the prize named for him for the first to present a valid proof of Fermat’s Last Theorem. Of course, Andrew Wiles won the prize in 1997.
Some have stated that such prizes are not very useful. They often increase visibility of the problem, but attract amateurs who may or may not really understand the problem. The more recent Clay Prizes are ex ante, and it is unclear if they had any effect at all on the solution of the first of the prize problems to be solved: the Poincaré Problem. Recall the solver, Grigori Perelman, refused the prize money of $1,000,000.
Nobel Prizes are ex post prizes as are our own Turing Awards—okay they are called “awards” but that is a minor point. The Knuth Prize is definitely an ex post prize. It is given for not one achievement, but rather for a lifetime of work in the area of theory of computing. The call states:
The Prize is awarded for major research accomplishments and contributions to the foundations of computer science over an extended period of time.
I did win the prize two years ago, in 2014. I was thrilled, honored, and surprised. There are so many great theorists that I was very honored to receive it. Clearly, now is the time to try and put together a strong case for your favorite colleague. Only one can win, but you cannot win if there is not a serious case put together. So good luck to all. The committee consists of Allan Borodin, Uri Feige, Michel Goemans, Johan Håstad, Satish Rao, and Shang-Hua Teng. Here is the information on making a nomination. Nominations are due on Mar 31, 2016.
Laci Babai won it last year. Of course this was for his past work and its great innovations and deep executions, including a share of the first ever Gödel Prize. But we may have to credit the 2015 committee with prescience given Laci’s achievement with Graph Isomorphism (GI). Likewise, planning for a Schloss Dagstuhl workshop on GI began in 2014 without knowing how felicitous the December 13–18, 2015 dates would be. Perhaps there is a stimulating effect that almost amounts to a third category of prizes—as per a tongue-in-cheek time-inverting post we wrote a year ago.
Who will win this year? Of course the winner has to be nominated first—unless it’s the MVP of the NHL All-Star Game. There are many terrific candidates and I am glad not to be on the committee that has to decide. One prediction is that whoever wins will be extremely elated and honored.
[spellfix in subtitle]
Ernst Kummer was a German mathematician active in the early 1800s. He is most famous for his beautiful work on an approach designed to solve Fermat’s Last Theorem (FLT).
Today we will talk about a barrier that stopped his approach from succeeding.
I am currently teaching basic discrete mathematics at Tech. The first topic is elementary number theory, where we have covered some of the key properties of primes. Of course, we have covered the Fundamental Theorem of Arithmetic (FTA). Recall it says:
Theorem: Every natural number greater than is the product of primes, and moreover this decomposition is unique up to ordering of the primes.
This is a classic theorem, a theorem that is fundamental to almost all of number theory. It was implicit in Euclid’s famous work, and was perhaps stated and proved precisely first by Carl Gauss in his famous Disquisitiones Arithmeticae.
Alas as Kummer discovered the FTA fails for other sets of numbers. For other types of numbers half of the theorem still holds: every number that is not a generalization of can still be written as a product of “primes.” But uniqueness is no longer true.
In 1847, Gabriel Lamé claimed, in a talk to the Paris Academy, that he had solved FLT by using complex numbers—in particular numbers that were generated by the th roots of unity. Recall those are complex numbers that are solutions to the equation
Clearly is a solution always, but for primes there are other solutions: in total. Lamé’s argument was perfect, rather easy, used standard arguments, and was incorrect. The great mathematican Joseph Liouville at the talk questioned whether Lame’s assumption about unique factorization was justified; and if not, it followed that the proof was not valid. See here for a fuller discussion of this.
Liouville’s intuition was right. The FTA failed for Lamé’s numbers. Weeks after Lamé’s talk, it was discovered that Kummer had three years earlier shown that FTA indeed held for some primes but failed for . One might guess that the reason Lamé, and others, thought that factorization was unique for these numbers is that the first counterexample was for . Kummer famously showed that FTA could be replaced by a weaker and very useful statement, which allowed his methods to prove FLT in many cases. But not all.
The failure of FTA for what is now called cyclotomic integers is a well studied and important part of number theory. It is well beyond my introductory class in discrete mathematics. This failure has led to the discovery of related failures of uniqueness. One of the classic examples is the Hilbert Numbers—named after David Hilbert—of course.
Hilbert numbers are the set of natural numbers of the form . Thus they start:
Every Hilbert number greater than is the product of “Hilbert primes.” But note that a number can be prime now without being a real prime, that is a prime over all the natural numbers. Note that is a Hilbert prime: it cannot be factored as , since is not a Hilbert number.
The key observation is that some Hilbert numbers can be factored in more than one way:
Thus FTA fails for Hilbert numbers.
Another popular example of the failure of FTA uses square roots.
Note the numbers then are all those of the form:
where are integers. Many examples can be created in this way by changing to other integers. It is a major industry trying to understand which yield a set of numbers so that
satisfy the FTA.
Let us consider an extremely complex subset of the natural numbers:
Okay is just the set of all even numbers. This set is closed under addition and multiplication, and we claim that it has the following nice properties:
Let’s look at each of these in turn. A prime in is where is an odd natural number. The usual argument shows that every number in is a product of primes. For example:
Note, this process now stops, since is a prime. The last part showing that FTA fails in this setting is easy: Let be distinct odd primes. Then
Note, the factorizations are different. For example,
Does the simple example of even numbers help in understanding why FTA is special? Where was it first stated in the literature that even numbers fail to satisfy unique factorization? We cannot find it after a quick search of the web: all examples we can find are either the Hilbert numbers, some examples using square roots, or something even more complex.
[fixed some formatting issues]
Cropped from BBC feature on AI |
Marvin Minsky, sad to relate, passed away last Sunday. He was one of the great leaders of artificial intelligence (AI), and his early work helped shape the field for decades.
Today Ken and I remember him also as a theorist.
Like many early experts in computing, Minsky started out as a mathematician. He majored in math at Harvard and then earned a PhD at Princeton under Albert Tucker for a thesis titled, “Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem.” This shows his interest in AI with focus on machine learning and neural nets from the get-go. However, he also made foundational contributions to computability and complexity and one of his first students, Manuel Blum who finished in 1964, laid down his famous axioms which hold for time, space, and general complexity measures.
We will present two beautiful results of Minksy’s that prove our point. AI may have lost a founder, but complexity theory has too. His brilliance will be missed by all of computing, not just AI.
In 1967 Minsky published his book Computation: Finite and Infinite Machines. This book was really not an AI book, but rather a book on the power of various types of abstract machines.
The book includes a beautiful result which we’ll call the two-counter theorem:
Theorem 1 A two-counter machine is universal, and hence has an undecidable halting problem.
A two-counter machine has two counters—not surprising. It can test each counter to see if it is zero or not, and can add 1 to a counter, or subtract 1 from a counter. The finite control of the machine is deterministic. The import of the two-counter theorem is that only two counters are needed to build a universal computational device, provided inputs are suitably encoded. It is easy to prove that a small number of counters suffice to simulate Turing machines. But the beauty of this theorem is getting the number down to two.
To recall some formal language theory, one counter is not sufficient to build a universal machine. This follows because a counter is a restricted kind of pushdown store: it has a unary alphabet plus ability to detect when the store is empty. Single-pushdown machines have a decidable halting problem. It was long known that two pushdown stores can simulate one Turing tape and hence are universal: one holds the characters to the left of the tape head and the other holds the characters to the right. What is surprising, useful, and nontrivial is that two counters are equally powerful.
I will not go into the details of how Minsky proved his theorem, but will give a hint. The main trick is to use a clever representation of sets of numbers that can fit into a single number. Perhaps Minksy used his AI thinking here. I mention this since one of the key ideas in AI is how do we represent information. In this case he represents, say, a vector of natural numbers by
This is stored in one counter. The other is used to test whether say . Or to replace by . Making this work is a nice exercise in programming; I covered this in a post back in March 2009.
We must add personal notes here. I used this theorem in one of my early results, way back when I started doing research. Ken was also inspired by the connection from counter machines to algebraic geometry via polynomial ideal theory that was developed by Ernst Mayr and Albert Meyer at MIT. The basic idea is that if we have, say, an instruction (q,x--,y++,r) that decrements counter and increments while going from state to state , then that is like the equation represented by the polynomial . If the start and final states are and then the counter machine accepts the null input if and only if the polynomial belongs to the ideal generated by the instructions.
In 1969 Minksy and Seymour Papert published their book Perceptrons: An Introduction to Computational Geometry. It was later republished in 1987.
The book made a huge impression on me personally, but I completely missed its importance. Also the book is notorious for having created a storm of controversy among the AI community. Perhaps everyone missed what the book really was about. Let’s take a look at what it was about and why it was misunderstood.
For starters the book was unusual. While it is a math book, with definitions and theorems, it is easy to see that it looks different from any other math book. The results are clear, but they are not stated always in a precise manner. For example, Jan Mycielski’s review in the Jan. 1972 AMS Bulletin upgrades the statement of the main positive result about learning by perceptrons. Inputs are given only in the form of square arrays. The grand effect of this—combined with the book’s hand-drawn diagrams—is a bit misleading. It makes the book not seem like a math book; it makes it very readable and fun, but somehow hides the beautiful ideas that are there. At least it did that for me when I first read it.
It is interesting that even Minsky said the book was misunderstood. As quoted in a history of the controversy over it, he said:
“It would seem that Perceptrons has much the same role as [H.P. Lovecraft’s] The Necronomicon—that is, often cited but never read.”
With hindsight knowledge of how complexity theory developed after the excitement over circuit lower bounds in the 1980s gave way to barriers in the 1990s, Ken and I propose a simple explanation for why the book was misunderstood:
It gave the first strong lower bounds for a hefty class of Boolean circuits, over a decade ahead of its time.
How hefty was shown by Richard Beigel, the late Nick Reingold, and Dan Spielman (BRS) in their 1991 paper, “The Perceptron Strikes Back”:
Every family of depth- circuits can be simulated by probabilistic perceptrons of size .
That is, has quasipolynomial-size perceptrons. The key link is provided by families of low-degree polynomials of a kind that were earlier instrumental in Seinosuke Toda’s celebrated theorem putting the polynomial hierarchy inside unbounded-error probabilistic polynomial time. In consequence of the strong 1980s lower bounds on constant-depth circuits computing parity, BRS observed that subexponential-size perceptrons cannot compute parity either.
Minsky and Papert’s version of that last fact is what those Perceptrons editions symbolize on their covers. By the Jordan curve theorem, every simple closed curve divides the plane into “odd” and “even” regions. A perceptron cannot tell whether two arbitrary points are in the same region. This is so even though one need only count the parity of the number of crossings of a generic line between the points. Minsky and Papert went on to deduce that connectedness of a graphically-drawn region cannot be recognized by perceptrons either. Of course we know that connectedness is hard for parity under simple reductions and is in fact complete in deterministic logspace.
How close were Minsky and Papert to knowing they had and more besides? Papert subsequently wrote a book with Robert McNaughton, Counter-Free Automata. This 1971 book drew connections to algebraic and logical characterizations of formal languages, but did not achieve the power and clarity of descriptive complexity which flowered in the 1980s.
The other key concept they lacked was approximation. This could have been available via their polynomial analysis but would have needed the extra mathematical sophistication employed by BRS: probabilistic polynomials and approximation by polynomials. Perhaps they could have succeeded by noting connections between approximation and randomized algorithms that were employed a decade later by the likes of Andy Yao. Of course we all who worked in the 1970s had those opportunities.
The misunderstanding of their book reminds Ken of some attempts he’s seen to prove that the hierarchy of polylog circuit depth (and polynomial size) is properly contained in polynomial time. Take a function which you can prove to not be computed by your favorite family of (quasi-)polynomial size circuits of depth 1—or depth “1-plus” as perceptrons are. We may suppose always has the same length as . Now define the function on any of length to be the -fold composition —or even the vector of -folds for to Then still belongs to . You might expect to prove that computing on inputs of length requires levels of your circuits , thus placing outside and establishing .
But this conclusion about does not follow ipso facto. The inability of one layer of perceptrons to compute simple functions does not extend to multiple layers—and some length-preserving offshoots of parity make just essentially the same function, no more complex. Of course we understand this now about layers in circuit complexity. Getting snagged on this point—without sharp mathematical guidelines in the book—is what strikes us as coming out even in Wikipedia’s discussion of perceptrons:
[The lower bounds] led to the field of neural network research stagnating for many years, before it was recognized that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer. … It is often believed that [Minsky and Papert] also conjectured (incorrectly) that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function.
One can judge our point further relative to this in-depth paper on the controversy.
None of this should dim our appreciation of Minsky as a theorist. We offer a simple way to put this positively. The two-pronged frontier of circuit complexity lower bounds is represented by:
We haven’t cracked counting modulo 6 any more than counting modulo any composite in general. Nor do we have strong lower bounds against threshold circuits of those depths—while in the arithmetical case at least, high exponential bounds on depth 3 broadly suffice.
Well, a perceptron is a one-layer threshold circuit with some extra AND/OR gates, and modular counting is the first nub that Minsky and Papert identified. They knew they needed some extra layers, understood that connectedness on the whole does not “simply” reduce to parity, and probably flew over the import of having an extra factor in the modulus, but the point is that they landed right around those frontiers—in 1969. They were after broader scientific exploration than “” which hadn’t yet become a continent on the map. This makes us wonder what kind of deeper reflection than worst-case complexity might land us closer to where solutions to the lower-bound impasse lie.
Our condolences to his family and the many people who worked with him and knew him best.
[word changes toward end: “probabilistic analysis” -> “polynomial analysis”; “connectivity” –> “connectedness”]
Crop from Seneca chess quote source |
Lucius Seneca was a Roman playwright, philosopher, and statesman of the first century. He is called “the Younger” because his father Marcus Seneca was also a famous writer. His elder brother Lucius Gallio appears as a judge in the Book of Acts. Beside many quotations from his work, Seneca is famous for one he is not known to have said:
“To err is human.”
Lest we cluck at human error in pinning down ancient quotations, the source for the following updated version is also unknown—even with our legions of computers by which to track it:
“To err is human, but to really screw things up requires a computer.”
Today I report a phenomenon about human error that is magnified by today’s computers’ deeper search, and that I believe arises from their interaction with complexity properties of chess.
I have previously reported some phenomena that my student Tamal Biswas and I believe owe primarily to human psychology. This one I believe is different—but I isolated it in full only a week ago so who knows. It is that the proportion of large errors by human players in positions where computers judge them to be a tiny fraction of a pawn ahead is under half the rate in positions where the player is judged ever so slightly behind.
The full version of what Seneca actually wrote—or perhaps didn’t write—is even more interesting in the original Latin:
Errare humanum est, perseverare autem diabolicum, et tertia non datur.
This means: “To err is human; to persevere in error is of the devil; and no third possibility is granted.” The phrase tertium non datur is used for the Law of Excluded Middle in logic. In logic the law says that either Seneca wrote the line or he didn’t, with no third possibility. We will say more about this law in an upcoming post. Amid disputes about whether human behavior and its measurement follow primarily “rational” or “psychological” lines, we open a third possibility: “complexitarian.”
Here are some important things to know about chess and computer chess programs (called “engines”).
One upshot is that depth of cogitation is solidly quantifiable in the chess setting. We have previously posted about our papers giving evidence of its connection to human thinking and error. The new phenomenon leans on this connection but we will argue that it has a different explanation.
My training sets include all recorded games in the years 2010–2014 between players rated within 10 points of the same century or half-century milepost of the Elo rating system. They range all the way from Elo 1050 to Elo 2800+, that is, from beginning adult tournament-level players to the human world champion and his closest challengers. This “milepost set” has over 1.15 million positions from 18,702 games, counting occurrences of the same position in different games but skipping the first 8 turns for both players. Each position was analyzed to depth at least 19 by both the newly-released version 7 of the number-two ranked Stockfish and last month’s version 9.3 of Komodo, using a multitude of single threads (for reproducibility) on the supercomputing cluster of the University at Buffalo Center for Computational Research.
The value of a position is the same as the value of the best move(s) in the position. In multi-line mode we can get the value of the played move directly, while in single-line mode we can if needed take the value of the next position. A value of +1.50 or more is commonly labeled a “decisive advantage” by chess software (though there are many exceptions), whereas values between -0.30 and +0.30 are considered grades of equality—at least by some chess software. Accordingly let’s call any move that drops the value by 1.50 or more a blunder. We will tabulate ranges in -0.40…+0.40 where a blunder most matters. Value exactly 0.00 gets a range to itself.
The final wrinkle is that we will use the engines’ highest-depth values to group the positions, but distinguish cases where the human player’s move was regarded as a blunder at depth 5, 10, 15, and 20. The last includes some depth-19 values. Doing so distinguishes immediate blunders like hanging your queen from subtle mistakes that require a lot of depth to expose. The data divides fairly neatly into thirds, one for “amateur” players under 2000 (387,024 positions from 6,187 games), then the 2000–2250 range (394,186 from 6,380), then 2300 and above (372,587 from 6,035). The depth-5 numbers for Stockfish 7 are “off” for reasons I demonstrated in a chess forum post; responses seem to confirm a recent bug that has lessened impact for depths 9 and higher. (See Update below.) The tables are for single-line mode (the multi-line mode results are similar including the Stockfish anomaly), and positions are analyzed sans game history so that only repetitions downstream in the search are detected.
The point is not that weaker players make more large mistakes, but rather the “Red Sea” effect at 0.00. It becomes more pronounced at the greatest depths and for the strongest players. Here are the figures for players rated 2600 and above, from 90,045 positions in 1,335 games. They are tabled in smaller intervals of 0.10 within the “equality” range -0.30 to +0.30.
Although Stockfish and Komodo have differences in their evaluation scales—happily less pronounced than they were 1 and 2 years ago—they agree that the world’s elite made six times more large errors when on the lower side of equality. This begs asking not only why but whether it is a real human phenomenon.
There seems no possible rational explanation—indeed the sharp change at 0.00 counters the rational hypothesis discussed in this paper and toward the end of my 2014 TEDxBuffalo talk. But it is also hard to sustain a psychological explanation. If being slightly behind puts one off one’s game, why is the ratio accentuated for the top players? That this training set includes only games in which the players are evenly matched minimizes any variance from overconfidence or “fear factors.”
My proposed explanation leans on the above-mentioned engine behavior with repetitions and 0.00 values. Spot-checks affirm that the wide 0.00 bin includes many positions where the armies trade blows but fall into a repeating sequence from which neither can veer without cost. If the machine judges your position worth +0.01 or more then it places you discretely above this no-man’s zone. Any notionally big slip you make still has a good chance of being caught in this large attraction basin and hence being charged as only a small error by the computer. Whereas a slip from 0.00 or below has no safety net and falls right down.
This hypothesis comes with one large falsifiable prediction and perhaps a few others regarding games played by computers and computer-human teams as opposed to humans alone. My entire model uses no feature of chess apart from the move values supplied by engines at progressively increasing depths of search. It thus transfers to any alternating-decision game in which sufficiently authoritative move values can be computed. Some games like Go rule out repeating sequences, while others like Shogi allow them but rarely end in draws.
For any strategy game that disallows repetitions and/or has negligible draw frequency, there will be no similar “firewall at 0.00” effect on the rate of human mistakes.
Go and Shogi are just as hard as chess, so this hypothesis does not pertain to worst-case complexity. Rather it addresses heuristically complex “most-case” behavior. It also highlights a general risk factor when using computers as judges: notional human errors are necessarily being evaluated by machines but the machines are evidently not doing so with human-relevant accuracy.
This last issue is not theoretical or didactic for me. In applying my statistical model to allegations of cheating I need to project distributions of errors by unaided human players of all skill levels accurately. I have been aware of ramped-up error below and away from 0.00 since 2008, when I devised a pleasingly symmetric way to smooth it out: Take a differential dμ(x) that has value 1 at x = 0.00 but tapers off to 0 symmetrically. This says that the marginal value of a centipawn is the same whether you are ahead or behind the same amount but lessens when the (dis)advantage is large. When a player makes an error of raw magnitude e in a position of value v, charge not e but rather the integral of dμ(x) from x = v – e to x = v. The metric can be weighted to make 0.00 effectively as wide as neighboring intervals.
The idea is that if you blunder when, say, 0.50 ahead, the integral will go through the fattest part of the metric and so charge near face value. But when 0.50 behind it will go through the thinner region -0.50 to -0.50 – e and so record a lesser value for my analyzer. This “metric correction” balanced and sharpened my system in ways I could verify by trials such as those I described on this blog in 2011 here.
Privately I’ve indulged an analogy to how relativity “corrects” the raw figures from Newtonian physics. But with today’s chess programs cannonballing to greater depths, I can tell when carrying over my previous calibrations from the earlier Rybka and Houdini engines that this method is being strained. Having now many times the amount of data I previously took on commodity PCs has made the quantum jump at 0.00 so clear as to need special attention. Making an ad hoc change in my model’s coefficients for positions of non-positive value still feels theoretically uncomfortable, but this may be what the data is dictating.
How can you explain the phenomenon? How should regard for phenomenology—in both experimental and philosophical senses—influence the manner of handling it?
Update 2/6/16: I received a suggestion to modify one line of the Stockfish 7 code to avoid shallow-depth “move count based pruning” at nodes belonging to the current principal variation. This inserts a third conjunct && !PvNode into what is currently the second block of “step 13” in the Stockfish source file search.cpp. Re-running the data for the Elo 2600–2800 levels produced the following table:
Evidently the change greatly reduces the depth-5 “noise” but still leaves it distinguishably higher than the depth-5 figures for Komodo 9.3.
Cropped from source (Garrett Coakley) |
Euclid is, of course, the Greek mathematician, who is often referred to as the “Father of Geometry.”
Today Ken and I want to talk about an “error” that appears in the famous volumes written by Euclid a few years ago—about 2300 years ago.
The “error” is his use of the word ‘random’ when by modern standards he should be saying arbitrary. I find this surprising, since I think of random as a modern concept; and I find it also surprising, since the two notions are not in general equivalent.
It seems clear that Euclid said ‘random.’ The root-word he used, tuchaios, endures as the principal word for “random” in modern Greek and is different from words meaning “arbitrary” or “generic” or “haphazard” or even “stochastic.” The only meaning of tuchaios or Euclid’s exact phrase hos etuchen we’ve found that would make his statement remain strictly correct is, “it is unimportant which.” However, the way hos etuchen was put in the voice of Pope Clement I seems not to square with that meaning either.
I never studied the Elements, Euclid’s famous collection of thirteen books on geometry. Not that long ago, many schools used the Elements as the textbook for the introduction of mathematics. Abraham Lincoln is said to have studied the Elements until he could recite it perfectly. I never looked at any part of it until I came across Book II while doing some research. And I was quite surprised to see the use of the notion of “random” there in the text.
Book II is focused on a geometric approach to identities, which are much easier to understand as algebraic identities today. The proposition that caught my eye is Proposition II.4.
Proposition 1 If a straight line be cut at random, the square on the whole is equal to the squares on the segments and twice the rectangle contained by the segments.
Perhaps this is easy to understand as a geometric statement. Today we would write algebraically that it stands for the identity
Why did Euclid state it geometrically? Perhaps the main advantage was that it allowed him to reason directly about geometric objects. After all he was writing about geometry, so a square with sides of length stood in nicely for the term . An even better reason might have been his lack of modern algebraic notation—the equals symbol was invented by Robert Recorde a few years after Euclid, in 1557.
Here is the proof as it appears in the Elements. The length compared to simply expanding shows the power of modern notation:
What surprised me was the exact statement of Proposition II.4. Note that it starts,
If a straight line be cut at random…
In online Greek editions such as this the phrase hos etuchen meaning “at random” is set off with commas. Euclid reiterates this phrase in the first line of his proof. However, the result is actually true for any cut of the line, which is more than saying “at random.” So why does Euclid say “random”?
Euclid seems nowhere to define in any precise way what “random” means in this context. Recall that one of the great achievements of the Elements was its claim to be a precise and axiomatic approach to geometry. But using an undefined term like “random” seems to run overtly counter to that goal.
Looking for other usages doesn’t clearly let Euclid — or his main ancient editor, Theon of Alexandria — off the hook. Pope Clement I was St. Peter’s first, second, or third successor. A novelization of his acts and homilies has him using the same Greek phrase at the beginning of Book 1, chapter IV:
Our Peter has strictly and becomingly charged us concerning the establishing of the truth, that we should not communicate the books of his preachings, which have been sent to us, to any one at random, but to one who is good and religious, and who wishes to teach…
This aligns with the modern meaning: the writer was saying that most people would be unqualified to preach Peter’s sermons. It does not mean that all people would be bad or that it is unimportant who receives the books. There is support in other ancient examples for the reading, “to anyone you happen to meet,” but even then the inference stays one of “mostness” not “all.” In any event, Euclid’s proposition is correct with “all”—even if the line is “cut” at one of the endpoints.
The difference is not a quibble. It is easy to make statements in Euclidean geometry that are true for “random” but not for “all”: A random triple of points in the plane makes a triangle. A random line through a point outside a circle is not tangent to the circle.
However, there are also cases where holding for “random” is sufficient for holding for “all.” Equalities like Euclid’s have that property. So does the Schwartz-Zippel lemma: if is a polynomial and for a random over a large enough field then is the zero polynomial. In fact Euclid’s identity is a case of this—could we add Euclid as sharing credit for the lemma?
This led Ken and me to think about a problem: Can we make something out of ‘Euclidean’ randomness?
There is a third concept lurking here: generic. Generic usually implies random but is more special and does not require probability or (Lebesgue) measure. In fact it basically means “not special.” Three collinear points are special; a line tangent to a circle is special.
The exact notion of “generic” is context-dependent, but at the interface of geometry and algebra we can pin it down: a set of elementary objects (points, lines, etc.) is special if it satisfies some finite set of simple arithmetical equations and is generic otherwise. Collinear points and tangent lines are clearly special in this sense. More formally, the special sets are those closed in the Zariski topology, apart from the whole (Euclidean) space. So now we ask:
Could Euclid have been in any sense aware of the idea of genericity?
If so, then Euclid could have been led into deep waters. Consider just a line segment going from 0 to 2. The midpoint is special because it satisfies the equation . Similarly so are the points and . It quickly follows that all rational numbers are special. Now so is since it satisfies . And likewise , so all points for algebraic numbers are special too. Euclid would certainly have suspected that the cubic and higher points might not be constructible. So although he might have suspected that a “random” point was not constructible, he would have had a hard time realizing that the non-constructible points include the special subset of algebraic ones.
Of course, it took until the 1600s to articulate modern meanings of “random” and until the 1800s for Georg Cantor and the topological notions underlying genericity to arise. It still interests us what “hints” might have been perceived in the intervening centuries.
Alfred Tarski, the famous 20th-century logician, created formal axioms for geometry. His axioms modeled that part of geometry that is called “elementary.” This includes statements of plane geometry that can be stated in first-order logic and only refer to individual points and lines: arbitrary sets are not allowed. The above reference has the details of his axioms. They were built on two primitive notions:
Tarski proved that this theory is decidable. And actually it has a remarkable property: any statement in the theory is equivalent to a sentence that is in universal-existential form, a special case of prenex normal form. In this form all universal quantifiers precede any existential quantifiers:
This form is close to just having equations, so it is tantalizing to ask, given any formula , does either or hold for generic ? Or for some notion of “random” ? The basic and formulas have this property: their negations hold generically—even though they do not hold for all .
However, the following seems to be a weighty counterexample to any kind of “zero-one law” holding here: Tarski’s system can define a formula meaning that the angles at and are acute. Now fix and in the plane. Then any satisfies if and only if and . Neither the set of such nor its complement is a nullset.
This is curious because if one regards Euclid-type diagrams as finite structures like graphs, then the first-order zero-one law proved by four Soviet mathematicians and independently a little later by Ronald Fagin comes into play: as the proportion of size- structures that satisfy a given first-order sentence (pure: no parity or counting) goes either to 0 or to 1. Still, we can ask two questions:
So what we are asking is, exactly when does Euclid’s use of “random” for “arbitrary” remain correct? Which geometric statements are guaranteed either to hold or to fail for “random” arguments?
Do our questions have nice and simple answers? Are we the first to wonder how Euclid’s words fare when given a modern mathematical interpretation?
[fixed definition of ]
Wikimedia Commons source |
Loki is a Jötunn or ss in Norse mythology, who, legend has it, once made a bet with some dwarves. He bet his head, then lost the bet, and almost really lost his head—-more on that in a moment.
Today Ken and I wanted to look forward to the new year, and talk about what might happen in the future.
We have many times before discussed the future, and in particular what might happen to some of the major problems that we have. For instance:
But along the way to thinking about our open problems we looked around at other fields of science. There are plenty of hard open questions in physics, chemistry, and other areas. But the area that seems to have problems that are easy to state, but hard to resolve, is Philosophy. So we thought that instead of a predictions post it might be interesting to look at just a few of their questions. Perhaps taking a computational point of view could help them get resolved? Besides, we are terrible at predicting the future anyway.
Details on the following problems and others can be looked up at the online Stanford Encyclopedia of Philosophy (SEP), but for this intro post we will stay with the shorter descriptions in Wikipedia’s article on unsolved problems. We start with its disclaimer:
This is a list of some of the major unsolved problems in philosophy. Clearly, unsolved philosophical problems exist in the lay sense (e.g. “What is the meaning of life?”, “Where did we come from?”, “What is reality?”, etc.). However, professional philosophers generally accord serious philosophical problems specific names or questions, which indicate a particular method of attack or line of reasoning. As a result, broad and untenable topics become manageable. It would therefore be beyond the scope of this article to categorize “life” (and similar vague categories) as an unsolved philosophical problem.
So let’s take a look at a few open problem that arise in philosophy, but are not impossible. We will pick ones that seem related to our field and also are perhaps attackable by computational methods. Hence we avoid “what is reality?” and even “what is consciousness?”
The Münchhausen trilemma, also called Agrippa’s trilemma, is not a new type of “lemma.” Rather it is a claim that it is impossible to prove anything with certainty. This goes way beyond any incompleteness theorem, and applies to math as well as logic.
The argument is simple: any proof must fail because of one of the following:
This seems a pretty strong argument to me—is it certain? Of course by the argument that is impossible. So maybe the argument fails, in which case there might be statements that are indeed certain. I am confused.
Let’s return to explain Loki’s problem. Legend has it that he made a bet with dwarves, and should he lose the bet they would get his head. Sounds like a pretty scary bet—I hope it was not on the Jets to make the playoffs in the NFL. He lost the bet. And the dwarves came to collect. Loki saved himself by arguing that they could have his head, but they could not take any of his neck. The problem then became:
Where did his neck begin and his head end?
Since neither side could agree on exactly where the neck and head met exactly, Loki survived.
There are many other versions of this same issue.
Fred can never grow a beard: Fred is clean-shaven now. If a person has no beard, one more day of growth will not cause them to have a beard. Therefore Fred can never grow a beard.
Another one is:
I can lift any amount of sand: Imagine grains of sand in a bag. I can lift the bag when it contains one grain of sand. If I can lift the bag with N grains of sand then I can certainly lift it with N+1 grains of sand (for it is absurd to think I can lift N grains but adding a single grain makes it too heavy to lift). Therefore, I can lift the bag when it has any number of grains of sand, even if it has five tons of sand.
Even popular culture uses this paradox. Samuel Beckett has one character say this line in his play Endgame:
“Grain upon grain, one by one, and one day, suddenly, there’s a heap, a little heap, the impossible heap.”
This “paradox” is essentially Sperner’s lemma, which is due to Emanuel Sperner. It can be viewed as a combinatorial analog of the Brouwer fixed point theorem—one of my favorite theorems. In one dimension it says simply that if N > 1 and you paint the numbers 1 to N red and green and start with red and end with green, then there must be an i so that i is red and i+1 is green. Thus there is a definite place where the line turns from red to green. Is it a paradox that a philosophy “paradox” is a lemma in mathematics?
We can make one more observation. When it comes to writing and verifying software systems, sorites matters more than an issue of words. When should adding an object to a collection be considered to change the collection’s properties? Perhaps such depth is why sorites makes this list of “Ten Great Unsolved Problems” on the whole. Does adding one more good observation to a good blog post make it still a good blog post?
The Molyneux problem was first stated by William Molyneux to John Locke in the 17th century. Imagine a person born blind and who is able to tell a cube from a sphere. Imagine now that their sight is restored, somehow. Will they be able to tell a cube from a sphere solely by sight without touching them?
This problem was widely discussed after Locke added it to the second edition of his Essay Concerning Human Understanding. It is certainly an interesting question. Moreover, today there are cases of people who have had their sight repaired. So perhaps this question will be solved soon. In any event it would be nice to understand the brain enough to know what would happen. Will everyone be able to tell the objects apart? Or will just some? The Stanford article says much more about experiments through 2011 but without a clear resolution.
Wikipedia’s discussion limits this to statements of the form “If X then Y” where X is false in our world. In logic such statements are true, but in a way that is unsatisfying because the relationship between X and Y is unexamined. The theoretical response is to enlarge our world to a set of possible worlds which may include some worlds W in which X is true and Y is assessable. Concepts of necessity and possibility— and in modal logic—quantify the ranges of W for such statements.
This still leaves the problem of W having inferior status to our “real world.” The quantum computing progenitor David Deutsch, in his philosophical book The Fabric of Reality, suggests that maybe those W don’t have inferior status. His book is subtitled The Science of Parallel Universes—and its Implications. He tries to be quantitative with these parallel worlds in ways that go beyond the tools of modal logic. Again, it is possible that insights from computation may continue to help in judging his and the older frameworks.
Can we possibly shed computational light on any of these problems?
]]>
Is the “Forsch” awakening in complexity theory?
Composite of src1, src2, src3 |
Max von Sydow starred as the chess-playing knight in Ingmar Bergman’s iconic 1957 film The Seventh Seal. He has the first line of dialogue in this year’s Star Wars VII: The Force Awakens:
This will begin to make things right.
His character, Lor San Tekka, hands off part of a secret map before… well, that’s as much as we should say if you haven’t yet seen the movie. Highly recommended.
Today Dick and I marvel at how some of this past year’s best results hark back to the years when Star Wars first came out.
That was in 1977, followed by The Empire Strikes Back in 1980 and Return of the Jedi in 1983. Von Sydow wasn’t in any of those, nor the “prequel” films in 1999–2005. He starred as Jesus in 1965’s The Greatest Story Ever Told and as arch-villain Ernst Stavro Blofeld in the 1983 James Bond film Never Say Never Again. He earned a 2011 Oscar nomination for Best Supporting Actor but lost to the equally venerable Christopher Plummer, with whom he’d co-starred in the 2007 film Emotional Arithmetic. He is appearing in season 6 of the TV series Game of Thrones and is apparently in the next Star Wars movie too. We should all be so active so long.
Both of us remember seeing Star Wars in 1977 as if it were yesterday. That is now as long ago as 1939—the beginning of World War II—was then. If you double back 1957, you get 1899. This can make one feel old. Or it can make one feel energized by the Forsch—that is the root of ‘research’ in German. Let’s see some things that 2015 gave us and that 2016 might have in store.
1957 is when Paul Erdős first published his Discrepancy Conjecture, though he dated it “twenty-five years ago” to 1932. We have already covered Terry Tao’s stunning proof of this conjecture. The latest news on Tao’s blog from early this month covers attacks on related conjectures.
1977 marks the beginning of László Babai’s citable work on the Graph Isomorphism problem according to the bibliography of his full paper. Beginning to work through it now, we are again struck by how firmly it stands on the foundation of Eugene Luks’s 1980 algorithm for the bounded-degree case. Babai and Luks mapped the terrain further in two 1983 papers, one including William Kantor. Other major references also date to the initial Star Wars years.
One of them is a 1981 paper by Peter Cameron, who was a co-advisor when I came to Merton College, Oxford, that year, and who has recently made a flurry of posts on his own blog about travels to New Zealand and Australia. Although Peter’s paper does not reference Luks’s algorithm, its results identify the shield to extending it: a class of permutation groups with highly symmetric actions and no “nice” subgroups of small index. What Laci’s algorithm does resembles the isomorphic climactic plot elements of movies IV, VI, and VII: either there is a polylog-size local irregularity and exhaustively firing on it gets the whole thing to split (IV and VII), or the absence of one enables building a global automorphism to reach a known case which is like taking out the shield generator (VI).
We could have included “local-global” in our roundup of general themes. We should add that although Peter’s result depends on the Classification, Laci’s section 13.1 outlines a second wave of attack that avoids needing either for its analysis. A primary dependence on the classification theorem remains, but for reasons we discussed before, we feel the possibility of error from that is only a phantom menace.
1980 is when William Masek and Michael Paterson shaved a factor off the previous -time algorithms for computing the edit distance between two length- strings. The question for 35 years has been, can we save more to get time for some fixed ? There seemed to be no solid reason why not, until Arturs Backurs and Piotr Indyk earlier this year applied a connection first shown by Ryan Williams to the strong exponential time hypothesis (SETH). Namely, for edit distance implies the same for Ryan’s “Orthogonal Vectors” problem, which in turn implies an algorithm for satisfiability of -variable, -clause CNF formulas in time for some fixed , which SETH says cannot exist.
The heart is a new degree of fine-grained reduction between problems in . The point is that the problems involved are all in the bedrock of polynomial time, not flung into the space of NP-completeness. As we discussed last June, this is a puzzling kind of revenge of the SETH.
1981 is when Péter Frankl and Richard Wilson explicitly constructed uniformly succinct families of graphs of size having no cliques or independent sets of size , where
.
The lower , the harder this is to achieve. Erdős obtained in his 1947 paper but this famous first use of the probabilistic method does not give an explicit family. “Explicit” could mean definable by a formula of first-order logic, but consensus requires only that whether node has an edge to node is decided in time, which is what we mean by “uniformly succinct.”
Frankl and Wilson’s upper bound stood for over 30 years until 2012 when Boaz Barak, Anup Rao, Ronen Shaltiel, and Avi Wigderson achieved , indeed for some fixed . Now Gil Cohen has proved a major further step by constructing graphs with . This is the first explicit construction giving bounds that are quasipolynomially related to the nonconstructive bound of Erdős. Such graphs of course witness that Frank Ramsey’s famous theorem needs a higher for that value of .
1984 is when Norbert Blum proved a lower bound of 3n – 3 on the circuit complexity of an explicitly-given family of n-variable Boolean functions. This is for circuits using the full set of binary gates, not just (with free use of ) for which bounds have gone as high as 5n – o(n) before meeting resistance.
Now Magnus Find, Alexander Golovnev, Edward Hirsch, and Alexander Kulikov have beaten the “3” — by “1/86.” That is, they have constructed an explicit family of Boolean functions that require circuit size
How impressed should we be? Their paper has several new and striking ideas, including that allowing circuits to have cycles (using just and to yield linear equations) increases the toolbox of recursions for inductive proofs. However, it is still far from a healthy jump to, say, 10n, let alone a nonlinear lower bound. Our next featured result from this past year gives a bit of pause.
A few years ago we hailed a pair of independent ‘breakthroughs’ on the exponent of the time for multiplying two matrices that had been achieved by Andrew Stothers and Virginia Williams, the latter getting down under 2.372873 (correcting an earlier claim of 2.3727). This was nudged down to 2.372864 by François Le Gall last year.
A new hope for getting perceptibly toward 2, or at least achieving 2.373 – for non-microscopic values of , is however apparently quashed by a paper of Le Gall with Andris Ambainis and Yuval Filmus which came out at STOC 2015. They show that the particular technique of these papers cannot even reach down to 2.3725. More foreboding is that a wider class of methods cannot get down to 2.3078.
Getting back to the circuit lower bounds, it is thrilling to beat 3n but sobering that a paper with three new creative ideas beat the ‘3’ by only some kind of . The novelty of the methods will hopefully avert an attack of clone papers pushing it no higher than, say, 3.0125n. We can also mention a new paper by Daniel Kane and Ryan Williams on non-linear lower bounds for threshold circuits of depths 2 and 3.
We could perhaps devote a whole separate post to developments in quantum complexity. The simplest to state is a new mainstream complexity class upper bound for quantum Arthur-Merlin games with unentangled provers:
The paper by Martin Schwarz improves the previous upper bound of . (Update: Per this the result is in doubt.) We can also mention a new paper by Le Gall with Shojo Nakajima that improves ten-year-old bounds for quantum triangle-finding on sparse graphs.
The headline-getting progress, however, has been in efforts to build larger-scale quantum computers. In April, IBM announced advances in fabricating quantum circuits that detect errors.
Earlier this month, Google released a paper on experiments with their quantum computer built by D-Wave Systems claiming a speedup over classical methods. See also this interview with Scott Aaronson and a comment with remarks on the paper in Scott’s own post on it. We have been intending for a long time to cover the debate over whether speedups claimed by D-Wave are necessarily quantum. At least we can say this marks a return of the Geordie Rose and Vern Brownell-led company to the driving side of it.
We look forward to , that is next year. Perhaps it will be the year that leads to some real progress on lower bounds. Who knows?
We wish everyone a safe New Year’s Eve and all the best for 2016—may the Forsch be with you.
[Update on doubt about QMA(k) result, Robin->Richard Wilson, some word and format changes]
René Descartes and François Viète are mathematicians who have something in common, besides being French. Let’s get back to that shortly.
I am about to teach our basic discrete math class to computer science majors, and need some advice.
At Tech this class has been taught for years by others such as Dana Randall. She is a master teacher, and her rankings from the students are always nearly perfect. Alas, she is on leave this spring term, and I have been selected to replace her.
I know the material—it’s a basic course that covers the usual topics:
Summary: Fundamentals of numbers, Sets, Representation, Arithmetic operations, Sums and products, Number theory
General: Fundamentals of numbers, Sets, Representation, Arithmetic operations, Sums and Products, Number Theory;Proof Techniques: Direct Proofs, Contradiction, Reduction, Generalization, Invariances, Induction;
Algorithmic Basics: Order of Growth, Induction and Recursion;
Discrete Mathematics: Graph Theory, Counting, Probability (in relation to computability).
My dilemma is how to make teaching fun for me, make it fun for the students trying to learn this material, and do half as well as Dana. Well maybe as well.
This dilemma made me think about a new approach to teaching discrete math. I would like to try this approach on you to see if you like it. Any feedback would be most useful—especially before I launch the class this January.
First an answer to what Descartes and Viète have in common. Viète introduced at the end of 16th century the idea of representing known and unknown numbers by letters, although he was probably not the first. That is possibly Jordanus de Nemore. Descartes decades later created the convention of representing unknowns in equations by x, y, and z, and knowns by a, b, and c. This was much better than Viète’s idea of using consonants for known values and vowels for unknowns.
What does this have to do with teaching discrete math? Everything. I believe that we may confuse students by mixing two notions together. The first is that math in general, and discrete math in particular, requires students to learn a new language. The above rules for what are variables and what are constants is an example language rule that one must learn.
Let me phrase this again:
View learning discrete math as learning a new language.
In order to be successful students must learn the basics of a language that has many symbols they need to know—for example “”—and also many words and terms that have special meanings. Think about the word “odd.” In usual discourse this means:
different from what is usual or expected; strange.
But in a discrete math course, an odd natural number is one that is not divisible by . I once told a story here about what an “odd prime is”: of course, it’s just a prime that is not .
If you accept that learning the language of math is fundamental, then you should accept that we may be able to use methods from teaching real languages. So I looked into how people learn new languages—how, for example, an English speaker might learn German. One of the main principles is that there are four parts to learning any new language. We must be able to:
- Read the language;
- Write the language;
- Speak the language;
- Listen to the language.
That these all are important seems obvious, but in years of teaching discrete math I never spent any time explicitly on these skills. I never worked on speaking, for example, nor on the other skills. Yet to learn a new language one must be facile in all of the above. So this January perhaps I will do some of the following:
Another exercise might be to have students select from math statements and identity which say the same thing. Or which are ill-formed statements; or rewrite a statement without using a certain symbol or word.
For example, which of the following statements does not say the same thing as the rest:
Some comments about the math language itself. It is filled with symbols that act as shorthand for terms or words—students must learn these. The concepts are difficult for some, but this is increased by the use of some many special symbols. Also the math language uses “overloading” quite often. That is the same exact symbol may mean different things, which means that students must use the global context to figure out what the word or symbol mean. This is nothing special, since many languages do the same thing. But it does add to the difficulty in understanding the language. A simple example is “i”: is this a variable?; is it the square root of ; or is it or something else?
Insight II
A student who knows her math language is in a good position to make progress. But discrete math is more than just a new language. It includes the notion of “proof.” Notice that it seems that one can become facile in math as a language without understanding proofs. This is perhaps the most radical part of this approach to teaching discrete math. I wonder, and ask you, whether decoupling the language from the ability to understand and create proofs is a good idea. What do you think?
The point here is that we view proofs as reasoning about statements in the math language. It is a type of rhetoric. Recall that according Aristotle, rhetoric is:
The faculty of observing in any given case the available means of persuasion.
Rhetoric was part of the medieval Threefold Path whose meeting point is Truth, and which prepared students for the Fourfold Path of arithmetic, geometry, music, and astronomy. Thus rhetoric came before mathematics in the classic curriculum—recall “mathematics” meant astronomy back then—and before proofs as taught in geometry. I feel it should be so now.
Of course one of the central dogmas of mathematics is that our reasoning about statements is precise. If one proves that statement implies , then one can be sure that if is true, then so is . We expect our students to learn many types of reasoning, that is many types of proof methods. They must be both able to understand a proof we give them andto create new proofs. They must be facile with rhetorical arguments in the new math language.
Proofs will still be a central part of the course. But it will not be the only part. And it will not be entangled with the learning of the math language. I hope that this separation will make learning discrete math easier and more fun.
A colleague once had a student near the middle of the semester ask this question: what is the difference between
and
I must add that this is from a colleague who is a terrific teacher. Perhaps this shows that stressing the math language aspect is important. Ken chimes in to say that he experienced issues at almost this level teaching this fall’s graduate theory of computation course, which he treats as much like a discrete mathematics course as the syllabus allows. He emphasizes not so much “language” as “type system” (per his post last February) but completely agrees with the analogy to rhetoric.
What do you think of this approach to teaching discrete math?
Cropped from Oberwolfach source |
Ronald Jensen is a famous set theorist who was a past president of the Kurt Gödel Society. Dana Scott is the current president of this prestigious society.
Today I thought we would turn from graphs and their isomorphism problem to the study of simple trees. Well I wrote that last week, but Ken has filled out details. In the meantime we are happy to note that László Babai has released his 84-page graph-isomorphism paper, and we anticipate saying more about it in coming weeks.
OK, our trees are not so simple, since they are infinite. The isomorphism problem for finite trees has long been known to easy—actually computable in linear time. But the infinite trees that we will look at are “complete” trees in a natural sense, and so isomorphism is not an issue. But they still have interesting properties.
A key property is one first studied by Jensen in 1972, which he called the diamond property and denoted by . I like this property because it is based on a natural game on infinite trees, which I could imagine playing on finite graphs. But Ken doubts it. We both like that Jensen was once the president of the Society named after Gödel, and so we at GLL should be interested in what he did.
In computing we study various types of finite trees. A binary tree, for example, starts at a root and each node has at most two children. The longest path from the root is the depth of the tree. The tree is finite if and only if the depth is finite.
We can extend this notion by allowing the number of children and the depth to be larger. An important type of infinite tree is when we insist that all nodes have a finite number of children but allow the depth to be countable. Many important computational processes can be modeled by such trees. The complete binary tree of depth has countably many nodes and levels but uncountably many branches, indeed of them. As usual denotes the cardinality of the natural numbers but denotes their order type.
Another more powerful notion is to insist the number of children of all nodes is countable and the depth is also countable. This tree, which we call the complete tree , is not too hard to imagine. Say a node is at level if it is steps from the root. Thus this tree corresponds to a process where at each stage one can make a countable number of choices. Think of a game played on an infinite chessboard, a game that never ends. Such a game would be a complete tree.
Now the next larger cardinal after is denoted by . The famous continuum hypothesis (CH), which was shown by Kurt Gödel and Paul Cohen to be independent of the standard axioms of set theory, says that
Putting that aside for the moment we can imagine that there is a complete tree. Every node has children, and the tree has levels of order type , which stands for the first uncountable ordinal.
This tree is frankly more difficult to imagine—at least for me—but let’s try and extend our minds. This tree has a very wide branching and is very deep. This combination makes its behavior quite different from the “smaller” complete tree.
We can come halfway by imagining trees whose nodes are bounded sets of nonnegative rational numbers. Here is a descendant of if and every member of is greater than the supremum of . We can control which sets are added to by transfinite recursion and create levels—one for every countable ordinal—but with only -branching since the children of append just one rational . Stranger still, although has levels, each individual branch can only be countable since is countable. For contrast, our complete -tree does have -sized branches, but this hints at the kind of behavior we must think about.
We will now consider a simple game that plays differently on these two trees, and . Drawing on the neat book Forcing for Mathematicians by Nik Weaver, we call it the blocking game:
Given a tree, can we place one mark on some node at each level of the tree so that there is no branch that starts at the root and continues forever and avoids any marks?
The obvious answer for the complete- tree is that this should be impossible—there are just too many choices at each level so how could one mark per level destroy all paths? We can prove this in many ways. Place the markers. Then pick a random walk down the tree. What is the probability that the walk never hits any marker? My intuition says that the probability of being blocked should be zero.
Start at the root. The chance your path hits a marker is , i.e., zero. Now repeat this times and we should get that the chance of being blocked is
with multiplication a countable number of times. This is zero. But wait. We cannot just treat like a number. Or can we?
The trouble is the use of as if it is a number. Let’s avoid this by making the blocker’s strategy easier. If we can still show that the probability of being blocked is small, then there will be a path that avoids all the markers. At level let’s restrict the blocker to use only a fixed set of nodes. Now at level the chance the random path hits a marker at the next level is . So the probability that avoids all markers is
It is easy to see that if we make go to infinity fast enough, then this is a positive probability. If you need some more convincing take the logarithm of the above and note that it is approximately
And we can certainly make this sum as small as we wish.
Before we look at the complete tree, let’s consider the complete binary tree with levels having order type . Then every branch of is a subset of . Every level corresponds to some ordinal and every node at that level is a subset of —recall that can be identified with the set of ordinals below . A set that consists of one node at each level above the root is therefore the same as a function from ordinals to sets .
Now with a binary tree we can readily block all paths on the first levels. For with define and . Branches that include are blocked in the first step from the root, then branches that include but not are blocked at the next step, and so on. Finally, branches that exclude all finite ordinals are blocked at level . This gives the first hint that our intuition above about always having the freedom to make the branch avoid blockers fails when it comes to limit ordinals like .
Modified from MathOverflow item—note interesting long answers from Joel Hamkins and Emil Jeřábek. |
Now suppose we don’t allow any blockers at finite levels of the tree. The assertion that blocks all branches past that point is:
Can we disprove this? Amazingly, we cannot—not within set theory even given the axiom of choice. Moreover, this statement entails CH, since we have independent paths from the get-go and are purporting to cover them with only blockers. Jensen’s original statement of is equivalent to the existence of such an but tells us even more: we can block all branches uncountably many times in a particularly “thick” manner.
A set of countable ordinals is unbounded if for all there exists such that . It is closed if whenever is the limit of a subsequence and we have . A set that is both closed and unbounded is called a club.
A club set is loosely analogous to a set of measure one in or . The intersection of two or even countably many clubs is again a club. A set that has nonempty intersection with every club is called stationary. A stationary set is analogous to a set that has nonzero measure or is non-measurable; it has some thickness whereas a non-stationary set is called thin and is analogous to a nullset. The analogy is furthered by the fact that the intersection of a stationary set with any club remains stationary.
The original version of from section 6 of Jensen’s paper does not need to constrain for finite . His sequence notation in place of makes easier to stipulate. It is:
Another interpretation of is that as we are traveling up the tree through each level the function represents a “prediction” of exactly what and where some branch will be at time . Then the principle says that a single can predict every branch exactly correctly infinitely often—indeed uncountably often.
Now let’s turn to the the complete tree . The blocking strategy at finite levels of is blown away by the immediate -branching, though at level we still have branches. Can be blocked by one node at each level? Each node at level is now a function , not a subset of . A branch is a function .
The potential winning strategies of our blocking game on are functions such that is a node on level . This means a binary function but restricting to be defined only when . Again the sequence notation is traditional and clearer, so we use it to state the form of given in Weaver’s book:
Is harder to block than the -binary tree ? It seems yes but the answer is no: this form of is equivalent to the original. Roughly the reason is that so and have the same number of branches, and so they have the same number of nodes at each infinite level.
The diamond principle implies CH but is not equivalent to it. It holds in Gödel’s constructible universe . Weaver’s book closes with an interesting discussion of whether any of these independent principles should be accepted as “really true” the way we believe the consistency of various formal theories that we implicitly use. Along those lines, if you find these tree-blocking assertions hard to believe, it might be a reason not to regard as a ‘true’ model of set theory. We examine a couple more consequences and principles.
Suppose we have such that (2) holds. Let be the subtree of or that includes each branch up to the first point at which it is blocked. Then still has levels, but every branch of stops at a countable ordinal and so is countable. An uncountable tree in which every branch and level is countable is called an Aronszajn tree after Nachman Aronszajn who constructed one in 1934. We don’t need for this kind of tree—ZFC suffices to build our defined above: At a limit ordinal , for each existing node and rational , choose one branch of the previously existing tree that contains and has supremum , and let level consist of all the sequences obtained by unioning branches thus chosen. This use of uncountable choice enables transfinite recursion to embed the needed sequences over while keeping each level countable.
However, also implies the existence of a tree having levels in which not only is every branch countable but also every set containing no two elements on the same branch is countable. Such trees were first defined by Mikhail Yakovlevich Suslin. In a Suslin tree, so long as you don’t block any branch twice, you can throw in blockers plugging any open branches anywhere you like and never exceed a countable number. Unlike Aronszajn trees, their existence is independent of ZFC.
A stronger form called the diamond-plus principle () implies the existence of height- trees in which each level is at most countable, but that unlike still manage to have -many branches. These trees are named for Đuro Kurepa. The principle, which also holds in , says that allowing countably many blockers at each level of suffices to “club” every branch into submission. That is, there exists mapping to countable collections of subsets of such that for all branches there is a club giving:
Recall that already has branches by level . By CH this equals “only” , but is still saying that by picking off just countably many at each level, we can block every one by our analogue of a full-measure subset of . This is consistent, but do we wish to adopt it? More on this can be found in Weaver’s book and various sources including this paper and blog post by Assaf Rinot.
What intuition can we gain from blocking games on infinite trees? Is there any point in favoring set theories that stay closer to our intuitions about combinatorial structures “closer to home”?
NZ silver fern source: Robin Ducker (CC) |
Dale Parsons and Patricia Haden, of Otago Polytechnic in New Zealand, wrote a seminal paper on a genre of programming puzzles that Barbara Ericson at Georgia Tech is furthering in her doctoral work. The puzzles give lines of code in a scrambled order, sometimes with incorrect lines thrown in. The goal is to find the correct order of the correct lines.
Today Ken and I wish to discuss various string rearrangement problems in relation to the Graph Isomorphism problem.
The Parsons method was pointed out to me by Ericson, who also serves as our Director of Computing Outreach. Her thesis abstract says in part:
Parson’s programming puzzles are a family of code construction assignments where lines of code are given, and the task is to form the solution by sorting and possibly selecting the correct code lines. We introduce a novel family of Parson’s puzzles where the lines of code need to be sorted in two dimensions. The vertical dimension is used to order the lines, whereas the horizontal dimension is used to change control flow and code blocks based on indentation as in Python.
This really intrigued me and I looked into what exactly is the Parsons method. One of her major references is a paper by Petri Ihantola and Ville Karavirta of Aalto University in Finland. Here is one of their examples:
One can make a similar but harder puzzle out of Ken’s example a year ago in our memorial post for Susan Horwitz, by scrambling his six lines of code that swap two nodes in a circularly linked list. In a class teaching proofs, I wonder how good an exercise it would be to scramble lines of a formal derivation.
All this has us thinking further about problems involving rearranging strings. We can think of each given line as a character. Some Parsons puzzles have multiple occurrences of the same line which is like having the same character multiple times in a string.
Viewed this way, we can even call “Parson’s puzzle” a Parsons puzzle: the apostrophe should if anywhere be after the final ‘s’. There’s nothing wrong with “Parsons puzzle” (reading the name as an adjective not possessive), which is also a legal Parsons solution since the apostrophe “line of code” is skipped. There are isomorphic answers since we could swap either ‘s’ or ‘z’, and these swaps remain automorphisms of the final string. Now we are getting into the domain of László Babai’s new results about strings, to which we turn.
To understand Babai’s usage of “string isomorphism,” a concept introduced by Eugene Luks in 1980 under the name “color isomorphism,” let’s first consider a natural, known add-on to the graph isomorphism (GI) problem:
Given two labeled graphs on nodes plus a group presented by permutations of , is there an isomorphism from to that belongs to ?
For example, consider and the subgroup generated by and . These are both even permutations since so we immediately know is not all of the symmetric group . Now consider these two labeled graphs and :
Then and are isomorphic by the interchange , but being odd, this does not belong to . Does that mean the answer to the problem for is “no”? Well, the graph has odd automorphisms such as its mirror flip—perhaps one of them composed with belongs to ? It’s a tricky question. We could force a “no” answer by adding a “dongle” to nodes 3 and 5 in both graphs so they can only be swapped with each other and different “dongles” to other nodes that prevent any other possible isomorphism. The dongles could be a single node connected to 3 and 5 and nodes connected only to node for in each graph.
GI is then the special case where we are given generators of . Whether the general case reduces back to GI is a major open problem.
The simplest form of string isomorphism is, given strings of length , is there a permutation of that carries to ? This has a trivial answer: yes if and only if for every character in the alphabet, the numbers of occurrences of in and in are the same. This is easily in polynomial time, as are some other problems by the name “string isomorphism” on the Net. But here is the analogous subgroup problem:
Given strings of length and generators of a subgroup of , does permuting the entries of by some member of give ?
This is not trivial. It is exactly as hard as the graph subgroup version, hence hard for GI. Let us first reduce GI to it. Having enforces that only permutations derived from rearrangements of the node indices are allowed on the string indices. Specifying some way to “unroll” a graph’s adjacency matrix into a string yields generators for the right .
Since our graphs are undirected we need only unroll the upper triangle. Let’s do it by first going down the largest diagonal
then the next-largest, and so on in “bands” ending at the upper-right corner . For we get entries in the following order:
Here the six “edges” are really the slots in the string for possible edges. A pair of generators for is and . The corresponding permutations of the edge slots are and . By “design of accident” these are the same as in the example above though there is no relation—this just expedites our seeing that the subgroup generated by is not all of . The idea extends for any in canonical fashion: from the cycle and transposition generators of we get two generators for . Babai calls the group in such cases to emphasize the action being on pairs.
Now given any labeled graphs and on nodes, we can read off the corresponding strings and , and then is isomorphic to if and only if is a yes-case of the string problem. This completes the reduction from GI.
To reduce from the subgroup version of GI we map the given generators of to their images in under the above embedding. This reduction can be inverted: Given the string , define the graph to be a disjoint union of isolated pairs of nodes connected by an edge for each such that , and isolated pairs not connected by an edge for each such that . Likewise map to a graph . Finally map each generator by replacing each move in by and .
An automorphism is a rearrangement that leaves a structure the same. Such permutations form a group called the automorphism group . GI has long been known to be polynomial-time equivalent to the problem of computing a complete set of generators for when is a labeled graph. Assuming and are connected graphs (else use their complements), the forward reduction takes and notes that every generating set of contains a member that maps onto if and only if is isomorphic to .
The reverse direction is trickier and requires both decomposing the group and progressively adding “dongles” to pairs of graphs and asking whether they are isomorphic. The decision version, “does have a nontrivial automorphism?” (GA), reduces to GI but is not known to be equivalent.
This extends to an equivalence of the subgroup- version of GI and finding generators for all automorphisms in a subgroup. We almost get a reduction to the subgroup version of GA which is a decision problem: Make two copies of and convert each given generator of into a that maps to and to . Also convert the identity which yields the swap and let these generate . Then and are isomorphic by a member of if and only if has an automorphism that is an odd-length word over these generators. We still get a reduction to finding all generators of . The reverse reduction is still tricky, but no more so than the original since it was decomposing groups anyway.
For strings , we again have a situation where the basic problem is trivial: the automorphisms permute the indices for each individual character among themselves, so we need only write the usual generators for a product of symmetric groups. However, the subgroup version is again nontrivial. We state it for a binary string, but like the above it extends to any set of characters or “colors.”
Given a string and generators for a subgroup of , find generators for where permutes the indices having and permutes with .
Note how this involves finding generators for the intersection of two subgroups. By ideas like those just above this is equivalent to the subgroup version of string isomorphism, hence in turn to the subgroup version of GI. The method of the last section can also reduce the functional graph-automorphism problem directly to this string version, and conversely.
The extra generality of the string problems is that the subgroup involved need not be as in the reductions from GI, indeed need not arise from subgroups acting on graphs at all. This feature is exploited in the recursions that Babai’s proof sets up. His algorithm classifies all of these problems into quasipolynomial time. Thus, defining the problems with subgroups makes them nicer.
Although automorphisms seem simpler than isomorphisms, if our objective is to reduce the groups involved, the last reduction above involving is going in the wrong direction since is larger than . Moreover, the odd-length words that interested us formed not a subgroup but a coset of the even-length words under the swap . The following key idea of Eugene Luks’s seminal 1980-82 paper putting the bounded-degree case of GI into shows how cosets can be handled directly and without resort to the reduction to automorphisms.
Suppose in general that one has a subgroup of and coset representatives where is the identity and . For right-cosets this means
For any subset of write to mean the set of isomorphisms from to that belong to . That is, writing actions on the right, . Then just by simple set theory,
The terms for aren’t recursing on subgroups but the trick is that we can make it so by permuting by to get . Then:
This allows making progress by -fold recursion on the isomorphism problem for the subgroup . The idea applies equally well for graphs or strings or other structures.
Babai’s algorithm builds on this by creating new ways to achieve either this or some other progress, meanwhile metering progress via certain homomorphisms from into symmetric groups on smaller structures rather than by trying to divide up the domain of or . Automorphisms still frame essential concepts in the algorithm, in part because whenever is nonempty, any one member makes . We hope to survey more of this in a future post.
We hasten to add that these computational problems are not new. Their contexts in Luks’s paper are brought out by further references including a 1983 paper by Gary Miller and Babai’s 1994 Handbook of Combinatorics survey, and they are foreshadowed in the treatment of coloring-preserving auto/iso-morphisms in Babai’s own 1979 paper which introduced the term “Las Vegas algorithm.”
In highlighting them all together we’d like to find a common and consistent naming scheme. Adapting names from these papers and echoing Luks’s 1993 survey might suggest names like or or , the last distinguishing the problem of giving a full set of generators for from the decision problem of whether is nontrivial. We’ve had a suggestion to put the ‘G’ out front and the class of relational structures in parentheses, such as or , with the latter’s math-italicized indicating a particular subgroup .
For abbreviations we like appending the ‘G’ to turn GI into GIG and similarly SIG for the string version. The rub is that for the automorphism problems this could lead to names like GAG and SAG, even GAGG and SAGG for the generator-set versions, which don’t sound so nice. Writing for polynomial-time or some tighter equivalence, and using less-gaggy notation, the current state is:
As a hint that much more can be said, consider this 1998 dissertation by David Christie titled “Genome Rearrangement Problems.” Then consider that the word “genome” stops appearing after page 15 but there are 143 pages of neat material on permutations and sorting and string edit-distance type problems and cases that are -hard, or not. Evidently there are “Parsons puzzles” for genetic code. We’re not saying that Babai’s algorithm has any relevance to these—Laci is the first to disclaim any competitiveness with existing heuristic GI solvers—but something in the new structural and algorithmic ideas might click.
The most particular problem is whether GIG—that is, —reduces back to GI. If not, then does it sit properly between GI and the canonization problems? What to call these problems?
We end with a simple Parsons problem, with one line to omit:
and we await further news expectantly.
talk last Tuesday
to the University of Chicago,
We were glad to hear that Laci’s part-III
the proof is by double recursion
was not disturbed by the previous day’s hoax threat
[wording change at end of “Use” section.]