Cricketing source |
Andrew Granville is a number theorist, who has written—besides his own terrific research—some beautiful expository papers, especially on analytic number theory.
Today Ken and I wish to talk about his survey paper earlier this year on the size of gaps between consecutive primes.
The paper in question is here. It is a discussion of the brilliant breakthrough work of Yitang Zhang, which almost solves the famous twin-prime conjecture. You probably know that Zhang proved that gaps between consecutive primes are infinitely often bounded by an absolute constant. His constant initially was huge, but using his and additional insights it is now at most and plans are known that might cut it to .
As Andrew says in his paper’s introduction:
To Yitang Zhang, for showing that one can, no matter what
We believe we need the same attitude to make progress on some of our “untouchable” problems. Perhaps there is some budding complexity theorist, who is making ends meet at a Subway™ sandwich shop, and while solving packing problems between rolls in real time is finding the insights that can unlock the door to some of our open problems. Could one of these be ready to fall?
Who knows.
Throughout mathematics—especially number theory—computing the number of objects is of great importance. Sometimes we can count the exact number of objects. For example, it is long known that there are exactly labeled trees, thanks to Arthur Cayley.
The number of labeled planar graphs is another story—no exact formula is known. The current best estimate is an asymptotic one for the number of such graphs on vertices:
where and .
Thanks to a beautiful result of the late Larry Stockmeyer we know that estimating the number of objects from a large set may be hard, but is not too hard, in the sense of complexity theory. He showed that
Theorem 1 For any one can design an -time randomized algorithm that takes an NP-oracle , a predicate decidable in -time (where ) and an input and outputs a number such that with probability at least , the true number of such that holds is between and
The predicate is treated as a black box, though it needs to be evaluated in polynomial time in order for the algorithm to run in polynomial time. The algorithm has a non-random version that runs within the third level of the polynomial hierarchy.
We won’t state the following formally but Larry’s method can be extended to compute a sum
to within a multiplicative error without an NP-oracle, provided the map is computable in polynomial time, each , and the summands are not sparse and sufficiently regular. Simply think of , identify numbers with strings so that runs over , and define to hold if .
Larry’s method fails when a sum has both negative and positive terms; that is, when there is potential cancellation. Consider a sum
Even if the terms are restricted to be and , his method does not work. Rewrite the sum as
Then knowing and approximately does not yield a good approximation to the whole sum . We could have and where is a large term that cancels so that the sum is . So the cancellation could make the lower order sums and dominate.
This happens throughout mathematics, especially in number theory. It also happens in complexity theory, for example in the simulation of quantum computations. For every language in BQP there are Stockmeyer-style predicates and such that equals the number of Hadamard gates in poly-size quantum circuits recognizing whose acceptance amplitude is given by
Although the individual sums can be as high as their difference “miraculously” stays within , and the former is never less than the latter—no absolute value bars are needed on the difference. See this or the last main chapter of our quantum algorithms book for details. Larry’s algorithm can get you a approximation of both terms, but precisely because the difference stays so small, it does not help you approximate the probability. This failing is also why the algorithm doesn’t by itself place BQP within the hierarchy.
What struck Ken and me first is that the basic technique used by Yitang Zhang does not need to estimate a sum, only to prove that it is positive. This in turn is needed only to conclude that some term is positive. Thus the initial task is lazier than doing an estimate, though estimates come in later.
The neat and basic idea—by Daniel Goldston, János Pintz, and Cem Yíldírím in a 2009 paper that was surveyed in 2007—uses indicator terms of the form
where is defined to be if is prime and is otherwise. Here runs from to and . If the term is positive then at least two of the elements and must be prime, which means that the gap between them is at most the fixed value . Doing this for infinitely many yields Zhang’s conclusion that infinitely many pairs of primes have gap at most . Can it really be this simple?
The strategy is to minimize , but it needs to provide a way to get a handle on the function. This needs forming the sum
where the are freely choosable non-negative real weights. The analysis needs the to be chosen so that for every prime there exists such that does not divide any of . This defines the ordered tuple as admissible and enters the territory of the famous conjecture by Godfrey Hardy and John Littlewood that there exist infinitely many such that are all prime. This has not been proven for any admissible tuple—of course the tuple gives the Twin Prime conjecture—but thanks to Zhang and successors we know it holds for for some .
The analysis also needs to decouple from the —originally there was a dependence which only enabled Goldston, Pintz, and Yíldírím to prove cases where the gaps grow relatively slowly. Andrew’s survey goes far into details of how the sum and its use of the function must be expanded into further sums using arithmetic progressions and the Möbius function in order for techniques of analysis to attack it. The need for estimation heightens the cancellation problem, which he emphasizes throughout his article. The issue is, as he states, of central importance in number theory:
The two sums … are not easy to evaluate: The use of the Möbius function leads to many terms being positive, and many negative, so that there is a lot of cancellation. There are several techniques in analytic number theory that allow one to get accurate estimates for such sums, two more analytic and the other more combinatorial. We will discuss them all.
As usual see his lovely article for further details. Our hope is that they could one day be used to help us with our cancellation problems.
I thought that we might look at two other strategies that can be used in complexity theory, and that might have some more general applications.
Volume trick
Ben Cousins and Santosh Vempala recently wrote an intricate paper involving integrals that compute volumes. At its heart however is a simple idea. This is represent a volume in the form
where is the hard quantity that we really want to compute. The trick is to select the other ‘s in a clever way so that each ratio is not hard to approximate. Yet the cancellation yields that
If is simple, and if we have another way to estimate , then we get an approximation for the hard to compute .
Sampling trick
Let and where is a random function with expectation . Then we know that is equal to . Note that this does not need the values to be independent.
Can we use this idea to actually compute a tough sum? The sums we have in mind are inner products
where the vectors and are exponentially long but have useful properties. They might be succinct or might carry a promised condition such as we discussed in connection with an identity proved by Joseph Lagrange.
Both cases arise in problems of simulating quantum computations. Our hope is that the regularities in these cases enable one to restrict the ways in which cancellations can occur. A clever choice of dependence in the values then might interact with these restrictions. Well, at this point it is just an idea. Surveying things that boil down to this kind of idea could be another post, but for now we can invite you the readers to comment on your favorite examples.
Can we delineate methods for dealing with cancellations, also when we don’t need to estimate a sum closely but only prove it is positive?
[fixed omission of NP-oracle for Stockmeyer randomized counting]
Victor Shoup is one of the top experts in cryptography. He is well known for many things including a soon to be released book that is joint with Dan Boneh on, what else, cryptography; and the implementation of many of the basic functions of cryptography.
Today I want to talk about my recent visit to the Simons Institute in Berkeley where I heard Victor give a special lecture.
This lecture led to an interesting question:
How do we know if a crypto function is correctly implemented?
Before I discuss this let me report that this summer the institute is running an excellent program on cryptography. It was organized by Tal Rabin, Shafi Goldwasser, and Guy Rothblum. While I just visited for a short time last week, it seems to be an incredibly well run program, with much of the thanks going to Tal. Tal added social interaction as part of the program, about which more below. The institute director, Dick Karp, told me he thought these additional social activities were wonderful and they helped make the program so enjoyable and productive.
Victor gave his special talk on mostly known—to experts—results in basic cryptography. His talk was well attended, well structured, and well delivered. He made three points during the talk that were striking to me:
Kathryn Farley, a researcher in performance arts, was at Victor’s talk, since she was also in town. She could not stay for the entire 90-minute lecture—she left after about one hour for another appointment. Later that day we talked and she said she had thought about asking a question to Victor, but not being an expert in cryptography, was unsure if her question was naive or not. I asked her what it was and she replied with the question atop this post. To say it again:
How do we know if a crypto function is correctly implemented?
I told her that it was, in my opinion a terrific question, one that we often avoid. Or at best we assume it is handled by others, not by cryptographers. I immediately said that I have discussed this and related questions before here and I still am unsure what can be done to be ensure that crypto functions are correctly implemented.
Suppose that is claimed to return a random number: of course it is a pseudo-random number generator (PRNG). Lets assume that it claims to use a strong PRNG method. How can we know? We can look at the code and check that it really works as claimed, but that is messy and time consuming. Worse it defeats the whole purpose of having a library of crypto functions.
A nastier example: Suppose that uses the following trick. If the date is before January 1, 2016, it will use the correct strong generator. If the date is later, then it will use a simple PRNG that is easy to break. Further it makes this happen in a subtle manner, which is extremely hard to detect by code inspection. How would we discover this?
Friday at the institute was “The Tea Party.” At the end of the day all were invited to be outside near the institute’s building, and join in eating, drinking, and playing a variety of games.
The food was quite impressive—much better than the standard fare that we see at most university gatherings. It was also well attended, which probably is related to the quality of the food.
Kathryn and I were there. And it was a perfect time and place to ask some of the cryptographers about her question. We got several answers, most of which were not very satisfying. I will list the answers without directly quoting anyone—protecting the innocent, as they say.
Some said that the issue was not about crypto, but about software engineering. So it was someone else’s issue.
Some were more explicit and said that software engineers should use verification methods to check correctness.
Others said that the code should be checked carefully by making it open source: basically relying on the “crowd” for correctness.
None of these methods is foolproof. Which raises the question: how can we actually be sure that all is well?
I was not very happy with any of the answers that we got. I would suggest that this correctness problem is special and because it is about crypto systems it may have approaches that work that do not work for arbitrary code.
Here are some ideas that perhaps can be used: I think they are just examples of what we may be able to do, and welcome better ideas.
Property Testing:
Many crypto functions satisfy nontrivial mathematical properties, which is quite different from arbitrary code. Consider RSA. One could check that
And the same with the decode function. This doe not imply correctness, but a failure would mean the code is incorrect.
Version Testing:
If the code is critical one could in principle implement the function twice, or more times. This is best done by different vendors. Then the outputs would be checked to see if they are identical. Note since crypto functions usually operate over finite fields or rings, there is no rounding errors. Thus, the different implementations should get identical values.
Note this is a standard idea in software engineering—see this. The key assumption is that different versions should have different errors. This appears not to be the case in practice, but the method still has something behind it. Wikipedia says: -version programming has been applied to software in switching trains, performing flight control computations on modern airliners, electronic voting (the SAVE System), and the detection of zero-day exploits, among other uses.
Check No State:
A crypto function should be a function and have no state. The date “attack” I gave earlier shows that functions should be forced to not be able to create any state. I believe that it may be possible to “sandbox” the code of a alleged function so that it cannot keep any state. This obviously makes the date trick impossible.
How do we be sure that crypto functions are correctly implemented? I suggest simply that cryptographers should start to view this issue as another type of attack that can be used against their system. Perhaps viewed this way will lead to new ideas and better and more secure systems.
]]>
How important is “backward compatibility” in math and CS?
David Darling source
Henri Lebesgue came the closest of anyone we know to changing the value of a mathematical quantity. Of course he did not do this—it was not like defining π to be 3. What he did was change the accepted definition of integral so that the integral from to of the characteristic function of the rational numbers became a definite . It remains even when integrating over all of .
Today we talk about changing definitions in mathematics and computer programming and ask when it is important to give up continuity with past practice.
Continuity is an example of such continuity. Bernard Bolzano’s original definition of 1817 amounts to much the same as the standard “epsilon-delta” definition by Karl Weierstrass a half-century later: A function between metric domains and is continuous if for every point and there is a such that for all points having , it follows that . The newer definition is that is continuous with respect to topologies on and on if for every open set of , the inverse image
is an open set of .
This is backward-compatible: The metrics uniquely define topologies on and whose basic open sets have the form and similarly for . For these topologies the newer definition is equivalent to the one. Of course one virtue of the new definition is that it can be used for topologies that do not give rise to metrics.
Lebesgue’s change to the meaning of integral was not backward compatible in terms of the upper limit. It changed the status in texts that referenced Bernard Riemann’s definition which used approximations by rectangles. For the characteristic function of the rational numbers in the Riemann upper sum is whereas the Lebesgue upper sums—using open coverings that need not induce partitions of —approach zero and so meet the lower sum. It is compatible whenever a function satisfies Riemann’s definition that the upper and lower approximations by rectangles meet in the limit. One can still talk about Riemann integrability, perhaps as modified by Thomas Stieltjes, but the difference is that you must include one or both names—if you don’t then you default to Dr. Lebesgue’s definition, and then the status for functions like is different. Most to our point, the “interface” for working with the Lebesgue integral differs—especially in employing the notion of open coverings.
Compare whether an audio “recording” can still refer to a vinyl phonograph record. When I was a teenager we said “digital audio” or “digital recording” and there was much debate about whether the quality could ever equal that of a physical LP record. It didn’t take long for progress in recording density and playback to tilt the field toward those who feel CDs sound better. Now I don’t think the analog meaning is ever intended without a qualifier.
Digital opens new vistas of sound, much as Lebesgue’s definition founded a great expansion of real and complex analysis. The point we’re making is that it is not backward compatible in operation. You can play the same music as an LP—much as the Lebesgue integral gives the same value as the Riemann integral for most common functions—but you cannot slip an LP into a CD player. Whether CDs should be playable on DVDs, and DVDs on later optical drives, has been the burning issue on the design side.
Less clear is whether electric guitars are a break from acoustic guitars. The notation and interface for playing them are largely the same. It is even possible to bolt on an extension to make an acoustic guitar sound electric at the source. Thus electric guitars have not really changed the definition of “guitar.”
Also problematic is whether a change in nomenclature amounts to a change in definition. I support the position that “pi” should have been defined as 2π = 6.283185307… We’ve argued that the Indian pioneer Aryabhata had this value in mind in the 5th century CE. Doing so would change the look of equations but not the interface.
Physicists use more often than they use Max Planck’s original constant . Sometimes is called the “reduced” Planck constant or named for Paul Dirac, but even when “Planck’s constant” is used to mean in speech nothing is actually being redefined. The two notions of ‘pi’ or ‘‘ are completely convertible.
I’ve been thinking of this recently because I’ve adopted two breaks from the standard definition of codes for chess positions, which I covered earlier this year. One fixes an admitted mistake by the definition’s second creator while the other is needed to adapt to more general forms of chess. Let’s see how they capture in microcosm some larger issues in research and everyday programming.
Consider the position after the reasonable moves 1. e4 e5 2. d4 exd4 3. Bc4 Bb4+ 4. Bd2 Bc5 5. Nf3 Nf6 6. e5 Qe7 7. Bb3 d5:
The FEN code for this position—minus the last two components which do not affect the position according to the laws of chess—is:
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w KQkq d6
The first part tells where the pieces are, then ‘w’ means White is to move, and ‘KQkq’ means that both White and Black retain the right to castle both kingside and queenside. The fourth part indicates that Black’s last move 7…d5 was with a pawn that jumped over the square d6, which might enable an en-passant capture. In fact there is a White pawn on e5 in a position where it could possible make such a capture, but the move “8. exd6 e/p” is not possible because it would leave White’s king in check from Black’s queen. Note that the ‘Q’ in ‘KQkq’ does not mean White can castle queenside right now, just that it is possible in the future. Castling kingside is legal right now, and indeed 8. O-O is White’s best move. Suppose however that White plays 8. Bg5 but after Black’s 8…Bb4+ chickens out of the strong 9. c3 and meekly returns the Bishop by 9. Bd2 followed by the reply 9…Bc5. The position on the board is the same:
However, the FEN code (again minus the last two fields which do not determine the position) is now:
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w KQkq –
The en-passant marker for the square d6 has disappeared, so these are different codes.
Now the 3-fold repetition rule in chess allows the side to move to claim a draw if the intended move will cause the third (or higher) occurrence of the resulting position in the game. When an en-passant move is legal on the first occurrence of the “board position” then it is a different position, so two more occurrences of the board setup do not allow the draw claim. Here, however, if White waffled again by 10. Bg5 Bb4+ 11. Bd2, Black really would be able to claim a draw with intent to play 11…Bc5. The en-passant move is not legal, so White’s immediate options were the same, and the set of options (including castling in the future) is what defines a position.
The issue is that the FEN codes do not reflect this. Computer chess programs want to detect not only 3-fold but even 2-fold repetitions to avoid time-wasting cycles in their search. It would be most convenient to tell this from identity of the FENs, without having to build the chess position and examine pins to check for legality. Unfortunately the FEN standard mandates inserts like the ‘d6’ even when there is no nearby pawn at all.
I have therefore adopted the stricter practice of Stockfish and some other chess programs by including the target square only when an en-passant capture is actually legal. Steven Edwards—the ‘E’ in ‘FEN’—advocated the same four years ago. Happily the difference does not matter to playing or analyzing chess games—so you can say it’s “compatible” for the end-user. It is however a departure at the API level. If an end user or another program submits a non-strict FEN then your program needs to convert it internally. This is painless since you are building the position from the FEN anyway. The second deviation is more consequential for users.
A mark of Bobby Fischer’s genius apart from playing games is that while many champions have mused on tweaking the rules of chess, not just one but two of Fischer’s inventions have gained wide adoption. The greater was the Fischer clock, which metes a portion of the allotted thinking time in increments with each move besides the lump amount at the start of a game session. This lessens the acuteness of time pressure and is now standard.
The second is Fischer Random chess, which is more often now called Chess960 for the 960 different possible starting positions of pieces on the back row, from which one is randomly selected at the start of a game. Variant setup rules had been proposed by many including the onetime champion José Capablanca and challenger David Bronstein, but in my mind the stroke that makes all of these “click” is Bobby’s generalized castling rule. Adding it to Bronstein’s original idea yields this.
This allows the king and rooks to start on any squares provided the king is between the rooks. The between-ness preserves the meaning of castling ‘queenside’ and ‘kingside’ and indeed the destination positions of the king and the rook used to castle are the same as for those moves in standard chess. The other conditions are the same as in standard chess: any squares between the king and the rook must be clear, neither the king nor that rook must previously have moved, and none of the squares traversed by the king may be attacked by the enemy—though this is OK for the rook. If the Chess960 position happens to be the standard starting one then the whole game rules are completely the same—this is a “conservative extension” of chess. What’s not conserved is the standard notation for castling: O-O and O-O-O in game scores, nor e1c1 or e1g1 (e8g8 and e8c8 for Black) in internal chess program code.
The first problem emerges when you think of a Chess960 position with Black’s king starting on f8 rather than e8, say with the rooks on b8 and h8, such as this one:
The notations O-O and O-O-O are still clear, but the corresponding internal notation f8g8 is ambiguous. It could be a normal King move without castling. This is solved by changing the internal notation to f8h8 in that case, figuratively “king takes own rook,” and f8b8 as the internal code for O-O-O instead. Many chess programs accept the “king takes rook” style from other programs even in standard chess, and there is no issue for the end user.
The second problem however is with the external notation when playing Chess960. It is subtle: Suppose Black’s rook on h8 moves to h6, moves later to a6 on the other side of the board, and then has occasion to retreat to its own back row on a8. The first rook move eliminated Black’s kingside castling right, so the original ‘kq’ part of the FEN would become just ‘q’. The problem is that the FEN does not preserve the game history, and at the moment of Black’s Ra8 move, it has forgotten which rook was originally resting on the queenside. If Black subsequently moves the other rook away from b8, or moves the rook on a8 yet a fourth time, how are we to tell whether the ‘q’ castling right persists?
Including the game history in the FEN is not an option—else we could also solve the repetition-count problem whose full headaches could make another post. Various “memoryless” solutions have been proposed, from which my choice is that of Stefan Meyer-Kahlen, designer of the Shredder chess program and co-creator of the standard UCI protocol for communicating with chess programs. A “Shredder-FEN” replaces the ‘KQkq’ by the files of the rooks, again using capitals for White. This eliminates the above ambiguity, as now the ‘q’ reads ‘b’ for the rook on b8.
I’ve chosen to use Shredder-FENs not just for Chess960 but also internally for standard chess, and my code—which I will say more about in a post shortly—also accepts Shredder-FENs. Using both changes, my code stores the following as the FEN for the above diagram at White’s move 8 (including now the two last fields):
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w HAha – 0 8
The ‘HAha’ is no laughing matter—it also simplifies updates when playing a move, partly because unlike ‘KQkq’ the letters [A-Ha-h] do not occur elsewhere in the FEN. My program can export standard FENs—even for Chess960 notwithstanding the ambiguity—but its API committally changes the definition of a FEN.
What are your favorite changes to definitions in mathematics and computing theory? I have not even gone into the myriad definitions of universal hash functions. When is it important to make a clean break with a previous standard, without preserving backward compatibility?
[fixed introduction which confused the Riemann integral with its upper sum; added mention of Bronstein+Fischer variant.]
Oops—I hit publish while logged in on the joint ‘Pip’ handle; the ‘I’ in this post is only I, KWR.
“Four Weddings” is a reality based TV show that appears in America on the cable channel TLC. Yes a TV show: not a researcher, not someone who has recently solved a long-standing open problem. Just a TV show.
Today I want to discuss a curious math puzzle that underlines this show.
The show raises an interesting puzzle about voting schemes:
How can we have a fair mechanism when all the voters have a direct stake in the outcome?
So let’s take a look at the show, since I assume not all of you are familiar with it. I do admit to watching it regularly—it’s fun. Besides the American version there are many others including a Finnish version known as “Neljät Häät” (Four Weddings), a German version called “4 Hochzeiten und eine Traumreise” (4 Weddings and one dream journey), and a French version called “4 Mariages pour 1 Lune de Miel” (4 Weddings for 1 Honeymoon). The last two remind me most of the 1994 British movie “Four Weddngs and a Funeral” but there is no real connection.
There is keen interest worldwide, it seems, in weddings as they are a major life event. And of course, they are filled with lots of beautifully dressed people, lots of great displays of food and music, and lots of fun.
Like many shows, “Four Weddings” is based on a British show—do all good shows originate in the UK? Four brides, initially strangers, meet and then attend each others’ weddings. Each then scores the others weddings on various aspects: bridal gown, venue, food, and so on. Then the bride with the highest score wins a dream honeymoon. Of course there is the small unstated issue that the honeymoon, no matter how exotic, happens well after the actual wedding. Oh well.
The scoring method varies from season to season and also from country to country. But higher scores are better, and the brides get a chance on camera to explain why they scored how they did. A typical comment might be: I loved the venue, the food, but the music was terrible.
You get to see four different weddings, which is the main attraction in watching the show. Usually each wedding is a bit out there: you see weddings with unusual themes, with unusual venues, and other unusual features. If you are not ready to have an interesting wedding, to spend some extra time in making it special, then you have little chance of winning.
The puzzle to me is really simple: why would the brides rate each other fairly? They all want to win the honeymoon, the prize, so why ever give high ratings? Indeed.
There have been some discussions on the web on what makes the scoring work. Some have noticed that the most expensive weddings usually win.
Clearly the game-theoretic optimal move seems to be to give all the other brides low scores and hope the others act fairly. The trouble with this method is that you look bad—and who wants to look bad on a TV show that millions might see? Can we make a model that accounts for this? It does not have to embrace possible psychological factors at all—it just has to do well at predicting the observed ratings on the show.
I have thought about this somewhat and have a small idea. Perhaps some of you who are better at mechanism design could work out a scoring method that actually works. My idea is to penalize a score that is much lower than the others. A simple version could be something like this: Suppose the brides are Alice, Betty, Carol, and Dawn. If Alice’s wedding gets scores like this:
Betty: 7
Carol: 6
Dawn: 3,
then perhaps we deduct a point from Dawn’s total. She clearly is too low on Alice. Can we make some system like this really work?
I have been discussing with Ken and his student Tamal Biswas some further applications of their work on chess to decision making. Their latest paper opens with a discussion of “level- thinking” and the “11-20 money game” introduced in a recent paper by Ayala Arad and Ariel Rubinstein.
In the game each player independently selects a number between 11 and 20 and instantly receives . In addition, if one player chose a number exactly $1 below the other’s number, that person receives a bonus of $20 more. Thus if one player chooses the naively maximizing value $20, the other can profit by choosing $19 instead. The first player however could sniff out that strategy by secretly choosing $18 instead of $20. If the second player thinks and suspects that, the $19 can be revised down to $17. And so on in what sometimes becomes a race for the bottom, although the Nash equilibrium assigns non-zero probability only to the values $15 through $20.
In this game the level of thinking what the opponent might do is simply represented by the value . There is now a rich literature of studies of how real human players deviate from the Nash equilibrium, though they come closer to it under conditions of severe time pressure. The connection sought by Ken and Tamal relates to search depth in chess—that is, to how many moves a player looks ahead.
Ken does not know whether anyone has intensively treated the extension to players. The following seems to be the most relevant way to define this:
It would be interesting to study this with and compare the results to the observed behavior in the show “Four Weddings.” Could something like this be going on? Or are the brides simply being true to their own standards and gushing with admiration where merited? It could be interesting either way, whether they match or deviate from the projections of a simple game-theoretic model with scoring like this.
What is the right scoring method here? Is it possible to find one?
]]>
A new kind of ‘liar’ puzzle using Freestyle chess
By permission of Vierrae (Katerina Suvorova), source
Raymond Smullyan is probably the world’s greatest expert on the logic of lying and the logic of chess. He is still writing books well into his 10th decade. Last year he published a new textbook, A Beginner’s Guide to Mathematical Logic, and in 2013 a new puzzle book named for Kurt Gödel. His 1979 book The Chess Mysteries of Sherlock Holmes introduced retrograde analysis—taking back moves from positions to tell how they could possibly have arisen—to a wide public.
Today Ken and I wish to talk about whether we can ever play perfect chess—or at least better chess than any one chess program—by combining output from multiple programs that sometimes might “lie.”
We will start with Smullyan-style puzzles today, but they are prompted by an amazing and serious fact. Even though human players have been outclassed by computers for over a decade, humans judging between multiple programs have been able to beat those programs playing alone. This happens even when the human player is far from being a master—someone who would get crushed in minutes by programs available on smartphones. We want to know, how can this be?
By coincidence, yesterday’s New York Times Magazine has a feature on Terry Tao that likens discovering and proving theorems to “playing chess with the devil”—quoting Charles Fefferman:
The devil is vastly superior at chess, but […] you may take back as many moves as you like, and the devil may not. … If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but—aha!—you have your first clue.
On this blog we have previously likened perfect programs to “playing chess against God”—this was quoting Ken Thompson about endgames where perfect tables have been computed. Since the programs we consider here occasionally err—one can say lie—we will reserve the “devil” term in yesterday’s Times for them.
We, that is I and my Tech colleague Dr. Kathryn Farley, just visited Ken at his home base in Buffalo and had a great time. Of course the weather up there is near perfect this time of year and his family was wonderful to us. Plus we got to visit Wegmans—our pick for the greatest supermarket chain in the world.
One afternoon I was honored to sit in on a video conference in which Ken presented some of his research on using computer programs to evaluate the play of humans. Joining him via the Internet were two experts in so-called freestyle chess where humans are allowed access to multiple chess programs during a game. One-on-one the programs totally dominate the humans—even on laptops programs such as Stockfish and Komodo have Elo ratings well above 3100 whereas the best humans struggle to reach even 2900—but the human+computer “Centaurs” had better results than the computers alone. In the audience were representatives of defense and industrial systems that involve humans and computers.
Ken got into freestyle chess not as a player but because of his work on chess cheating—see this for example. Freestyle chess says “go ahead and cheat, and let’s see what happens…” The audience was not interested in cheating but rather in how combining humans and computers changes the game. While chess programs are extremely strong players, they may have weaknesses that humans can help avoid. Thus, the whole point of freestyle chess is:
Are humans + computers > computers alone?
That is the central question. Taken out of the chess context it becomes a vital question as computers move more and more into our jobs and our control systems. The chess context attracts interest because it involves extreme performance that can can be precisely quantified, and at least until recently, the answer has been a clear “Yes.”
At the beginning of the video conference Ken spoke about the history of computer chess, giving his usual clear and precise presentation, and then reviewed his joint paper showing the good results for human-computer teams were no fluke—they really made better moves. Ken used some of the slides from the end of his TEDxBuffalo talk. Then two Freestyle experts spoke, including a winner of three tournaments who doesn’t even have a human chess rating, and an interesting discussion followed on how they actually used the computer chess programs.
I must admit at first I was a skeptic: how could weak players, humans, help strong players, computers? Part of it was that when two or more programs disagreed on the best move the human could make the choice. This kind-of means saying the programs whose move you don’t choose are wrong.
As Ken and I mulled over the idea of freestyle chess we realized it raises some interesting puzzles. I wrote a first draft, then Ken took over adding more detail and tricks to what follows. Let’s take a look at the puzzles now.
Suppose Alice has one program that is perfect except that there is one position in which makes a error. To simplify, let’s suppose the only outcomes are win () or loss (). An error means that is a winning position for the player to move—it could be Bob not Alice—but chooses a move leading to a position that is losing for that player.
Let Alice start in a position . She wants to play perfect chess using as guide. Can she do so? The bad position might be reached in a game played from , indeed might be itself.
Alice of course cannot tell by herself whether a given position has value or , unless the position is right near the end of a game. But she has one advantage that lacks. She can play against itself from any position . If is beyond —at least if is not reachable in a correctly-played sequence of moves from —then Alice will get the correct value of .
This is like the power of a human centaur to try a program deeper on the moves it is suggesting. In symbols, Alice executes , which generates a game sequence
of positions where is checkmate for one side. The cost of this is . You might think we could let do the same thing, but is not like “Satan” in Smullyan’s story, “Satan, Cantor, and Infinity.” is not trying to out-psych Alice or correct itself; is just given as-is and by-hook-or-by-crook makes that error in some position .
So Puzzle I is:
Can the “centaur” Alice + X play perfectly, even though neither Alice nor X plays perfectly alone? And at what cost compared to X?
The answer is, yes she can. Let the legal moves in be going to positions with the move recommended by . Her algorithm exemplifies a key idea that bridges to the more interesting puzzles.
We claim this algorithm, whose running time we count as linear in , plays perfectly in any . If all values are but is wrong then some other is really a winning position. But then is wrong both at and at some position in the game from which is a contradiction. If but is wrong anyway then is wrong both at and somewhere at or beyond.
Finally if the “switch” is wrong then errs somewhere beyond and either errs beyond or erred by choosing in after all. (If is wrong and all other are losing then and so wasn’t a mistake by Alice since it didn’t matter.)
There is one loophole that needs attention. and could both be wrong because their games go to a common position in which makes an error. However, that lone error cannot simultaneously flip a true to in and a true to in , because and have the same player (Bob) to move. There is also the possibility that play from could go through (perhaps via or through some other , which we leave you to analyze. We intend to rule out the latter, and we could also rule out the former by insisting that the “game tree” from positions really be a tree not a DAG.
Boiled down, the idea is that where goes to is an “error signature” that Alice can recognize. If errs at then the signature definitely happens because is perfect at each and one of the must be winning. If the signature happens yet did not err at then must be a losing position. Hence once the signature happens then Alice can trust completely. The only way the error could possibly lie in her future is if she was losing at but lucks into a winning position—but then the error happened on Bob’s move not hers.
We claim also that all this logic is unaffected if “draw” is a possible outcome. Indeed, we could play with the old Arabian rules that giving stalemate is a win—counting it 0.8 points say—and that occupying the center with one’s king after leaving the opponent with a bare king is a win—worth say 0.7 points. It all works with “” being the true value of position (aside from a total loss) and “” being any inferior value.
Now suppose is allowed to make errors in two positions. Can Alice modify her algorithm to still play perfectly?
First suppose the errors are related. Call two positions related if can occur in a game played from where all intervening moves are not errors. Per above we will always suppose that two positions reached in one move from are unrelated (else we would say the two options are not fully distinct). Related errors are ones that occur in related positions.
If Alice knows this about , then we claim she can solve Puzzle II. She plays the game through as before, but now she looks for the error signature at all nodes in the game. If she never finds it then she plays . If she does, then she lets be the last position at which it occurs. Then she knows that either errs at or errs somewhere beyond . Either way, she can use this knowledge to whittle down the possibilities at . Or at least we think she can.
Notice, however, what has happened to Alice’s complexity. She is now running at every node in a length- game path. Her time is now quadratic in . This is still not terrible, not an exponential blowup of backtracking. But in honor of what Alberto Apostolico cared about, we should care about it here. So there is really a second puzzle: can Alice play perfectly in time ?
If the errors are unrelated then we would like Alice to carry out the same algorithm as for Puzzle I. The logic is not airtight, however, because of the case where there were unrelated errors at and . Worse, what if Alice doesn’t know whether the errors are related?
Here comes the “Freestyle” idea of using multiple programs. Let us have two programs, and . Suppose one of them can make up to errors but the other is perfect. Alice does not know which is which. Now can she play perfectly—in time?
If the errors are related then she can localize to and have the same linear time complexity as before. For simplicity let’s suppose there are just legal moves, i.e., . Here is her algorithm:
The remaining case is that one pair is and the other is . This cannot happen, because it means that one of the programs is making two unrelated errors.
This is the idea Dick originally had after the videoconference on Freestyle chess. It shows the advantage of using multiple programs to check on each other like the centaur players do. But what happens if the errors are unrelated? Call that Puzzle III.
Now let’s allow both programs and to make up to errors. Can Alice still play perfectly? We venture yes, but we hesitate to make this a formal claim because Puzzles II and III are already proving harder than expected.
How much does having a third program that is perfect help?—of course not knowing which of the three programs is perfect. If instead too can make up to errors, how much worse is that? Even if Alice can still play perfectly in polynomial time, we wonder if the exponent of will depend on . Call all of this Puzzle IV.
We can add a further wrinkle that matters even for : we can consider related errors to be just one error. This makes sense in chess terms because an error in a position that is reachable from a position can affect the search by a program in . Thus the error at knocks-on and makes the play at all nodes between and the root unreliable. Let be the set of all positions at which makes errors. Then we can define to be the minimum such that there are positions such that is contained in the union of the positions from which some is reachable. This is well-defined even when the positions form a DAG not a tree.
Thus programs could err in multiple positions but still count as having a single branch error if those positions are all on the same branch. Anything with branch errors counts as Puzzle V. This is where our error model is starting to get realistic, but as we often find in theory, there is a lot of already-challenging ground to cover before we get there. It is time to call it a day—or a post.
Our puzzles have some of Smullyan’s flavor. In a typical logic puzzle of his, Alice would be confronted by and that have different behaviors in telling the truth to arbitrary questions. The solutions in his case rely on the ability to ask questions like:
If you are a person of type … , then what would you say to the question … ?
Our situations seem different, but perhaps there are further connections between our puzzles and Smullyan’s. What do you think?
Can you solve the puzzles of kinds II or III or higher? If you or we find a clear principle behind them then this will go into a followup post.
Update (7/31/15) The artist Vierrae, Katerina Suvorova of Russia, has graciously contributed two new portraits of Smullyan in oil. I have used her new version of Smullyan in a ‘Magus’ robe at the top. Here is her portrait of him in more formal wear, as if he were a dinner guest at an Oxford High Table.
The originals in higher resolution are viewable on her DeviantArt page . Our great thanks to her.
Cropped from TCS journal source |
Alberto Apostolico was a Professor in the Georgia Tech College of Computing. He passed away on Monday after a long battle with cancer.
Today Ken and I offer our condolences to his family and friends, and our appreciation for his beautiful work.
Alberto was still active. He had a joint paper in the recent 2015 RECOMB conference, that is Research in Computational Molecular Biology. It was written with Srinivas Aluru and Sharma Thankachan and titled, “Efficient Alignment Free Sequence Comparison with Bounded Mismatches.” Srinivas is also here at Tech and wrote some words of appreciation:
[We] submitted two journal papers from this joint work this month, one on last Thursday night. … When I empathized with his situation, he would remind me that all of our lives are temporary and his situation is no different.
A full session of that conference was devoted to fighting cancer. One can only hope that some of the results of this and other theory conferences contribute to finally solving that problem.
Alberto’s work was on how much work one needs to do to identify notable properties of words. We mean very long words such as the textual representation of the human genome. Many “obvious” methods for processing strings do too much work. In a Georgia Tech feature several years ago, Alberto put it this way:
How do you compare things that are essentially too big to compare, meaning that the old ways of computing are no longer feasible, meaningful, or both? It’s one thing to compare and classify 30 proteins that are a thousand characters long; it’s another to compare a million species by their entire genomes, and then come up with a classification system for those species.
The theme of a special issue of the journal Theoretical Computer Science for Alberto’s 60th birthday in 2008 was:
Work is for people who do not know how to SAIL — String Algorithms, Information and Learning.
The foreword by Raffaele Giancarlo and Stefano Lonardi lists Alberto’s many contributions.
Giancarlo, who did a Master’s with Alberto in Salerno and then a PhD with Zvi Galil—our Dean at Georgia Tech—told some stories yesterday to a mailing list of their field. As an undergraduate feeling jitters attending a summer school taught by Apostolico on the island of Lipari off the north coast of Sicily, he was greatly heartened to see the leader arriving in his sailboat, named Obliqua (“Oblique”). A month ago Ken was in Sardinia—further north in the same waters—for a meeting of the World Chess Federation’s Anti-Cheating Committee, and offers this peaceful picture of the 41-foot yacht in which they took an excursion.
Ken and I have already talked about stringology—see here. Stringology is the study of the most basic objects in computing, linear finite sequences of letters, and is filled with deep and often surprising results. In the post we mentioned the surprise that an ordinary multitape Turing machine working in real time can print a 1 each time the first letters it has read form a palindrome. One can also trace roots to the discovery that string matching—telling whether a string occurs as a subword of and finding it if so—can be done in linear time.
Alberto and Rafaelle wrote a paper about Zvi before either wrote a paper with Zvi: “The Boyer-Moore-Galil String Searching Strategies Revisited.” This paper has a Wikipedia page under the name, “Apostolico-Giancarlo algorithm.” By employing a compact database of certain substrings of the pattern string with room to record their matches and non-matches to parts of , they showed how to reduce the overall number of character comparisons to The “” had previously been , , and in this and related measures, and matched a conjectured lower bound by Leo Guibas.
Before the paper, however, Alberto and Zvi edited and contributed to a highly influential collection of papers from a workshop in Italy sponsored by NATO’s Advanced Sciences Institute in 1984, titled Combinatorial Algorithms on Words. Many great people we know took part and wrote for the volume: Michael Rabin, Andy Yao, Andrew Odlyzko, Bob Sedgewick with Philippe Flajolet and Mireille Régnier, Andrei Broder, Joel Seiferas—and others such as Maxime Crochemore, Shimon Even, the aforementioned Guibas, Michael Main, Dominique Perrin, Wojciech Rytter, and James Storer. A contribution from Victor Miller and Mark Wegman (of universal hashing fame) titled “Variations on a Theme by Ziv and Lempel” is followed by one from the Lempel and Ziv.
Alberto and Zvi teamed on another volume in 1997, viz. the book Pattern Matching Algorithms. And yes they did co-write papers, including several on parallel algorithms for string problems—even the basic palindrome-finding problem. Their latest collaboration was a multi-author survey titled “Forty Years of Text Indexing,” which was a keynote presentation at the 2013 Combinatorial Pattern Matching symposium.
Alberto proved many surprising theorems during his long career. A recent example of his “out-of-the-box” approach is his 2008 paper with Olgert Denas titled, “Fast algorithms for computing sequence distances by exhaustive substring composition.” The abstract notes that the standard edit-distance measure
…hardly fulfills the growing needs for methods of sequence analysis and comparison on a genomic scale […] due to a mixture of epistemological and computational problems.
Well we have compressed the abstract with a little stringology ourselves. Recall our recent post on new evidence that the time to compute edit distance really is quadratic. Edit distance is so basic, it takes chutzpah to imagine that “alternative measures, based on the subword composition of sequences” could be both quicker and useful. The main theorem is a linear-time algorithm for a distance measure that seems to depend on quadratically-many pairs of substrings. This is a real feat of leveraging the ways words can combine. The paper goes on to show how this is programmed and applied—note also that the link to the paper is on the NIH website.
In the Georgia Tech feature we quoted above, Alberto said this about the genome:
It’s the closest thing we have to a message from outer space. We do not know where it comes from, understand very little of what it means, and have no clue about where it is going.
He went on to note the shift from pattern matching to pattern discovery—without saying how far he was in the vanguard on this.
Alberto captured discovery by the element of surprise, which can be quantified statistically. One of my favorite papers of his is titled, “Monotony of Surprise and Large-Scale Quest for Unusual Words,” with Mary Ellen Bock and the above-mentioned Lonardi. It is another victory in Alberto’s perpetual battle of linear over quadratic.
Here is the idea. Given a long string and substring , let be the number of occurrences of in Now suppose is in the support of a random distribution in which each character depend only on a fixed finite number of previous characters in a manner that is also independent of Let be the random variable of over drawn from this distribution. Then
is a normal statistical z-score, where stands for expectation and for standard deviation. If for some high positive threshold , then occurs unusually frequently in and so is a surprising substring. Substrings with are surprising by their lack of expected frequency in the given long text
Telling which substrings are surprising still seems to face the wall that there are quadratically many substrings overall. But now the authors work a little structural magic. Suppose we have a partition of the substrings into -many classes such that each has a unique longest and shortest member. Given , put into if and no other string in has a higher z-score, and if and no other member is more negative. They prove:
Theorem 1 Given any of length and an -partition of its substrings as above, and any fixed displacement , the sets and can be computed in time.
We have skimmed some fine print about complexity of the distribution and access to the longest and shortest strings in the , plus how they make everything work also for other score functions The way they combine the monotonicity of with regard to sub- and super-strings of , properties of convexity and concavity, numerical analysis, and graph-theoretic diagrams of the substring structures is a tour de force—the paper rewards attention to its details.
We could go on… But let’s stop now and just again repeat our condolences again to his family and friends. Georgia Tech will be putting together a memorial in his honor—perhaps you will be able to attend.
Alberto is missed already.
Cropped from source |
Joel Ouaknine is a Professor of Computer Science at Oxford University and a Fellow of St. John’s College there. He was previously a doctoral student at Oxford and made a critical contribution in 1998 of a kind I enjoyed as a student in the 1980s. This was contributing a win in the annual Oxford-Cambridge Varsity Chess Match, which in 1998 was won by Oxford, 5-3.
Today I’d like to report on some of the wonderful things that happened at a workshop on “Infinite-State Systems” hosted by Joel at the Bellairs Institute of McGill University last March 13–20 in Barbados, before we finally opened a chess set and played two games on the last evening.
The workshop was one of two happening concurrently at Bellairs, which has been my pleasure to visit twice before in 1995 and 2009. The other was on “Co-Algebras in Quantum Physics” and was co-organized by Prakash Panangaden, whom I used to know on Cornell’s faculty when I was a postdoc there. I often wished I could be in a quantum superposition between the workshops. My own talk for Joel’s workshop was on analyzing quantum circuits, and that anchored stimulating discussions I had with both workshops’ members during meals and excursions and other free time.
The other participants in ours were Dmitry Chistikov, Thomas Colcombet, Amelie Gheerbrant, Stefan Göller, Martin Grohe, Radu Iosif, Marcin Jurdzinski, Stefan Kiefer, Stephan Kreutzer, Ranko Lazic, Jerome Leroux, Richard Mayr, Peter Bro Miltersen, Nicole Schweikardt, and James Worrell. Göller and Grohe joined Joel and me for soccer in a nearby park on two lunch breaks—we didn’t just play chess.
The basic reachability problem is: Given a graph , a starting node , and a set of target nodes, is there a path from to some node in ? When is undirected the problem is solvable in logarithmic space. This is a deep result—for a long time randomized logspace was known—but a simple statement. When is directed the problem is complete for nondeterministic log space (). But is in so it’s still solvable in polynomial time.
Enough said about reachability? The meta-problem—really the problem—comes from other ways to present besides listing finitely many nodes and edges. If is presented by a circuit with inputs that recognizes its edge relation on , then reachability for jumps up to being -complete. Not all graphs have of poly size, but those that do include the transition graphs for polynomial space-bounded Turing machines with start state and accepting configurations .
As the workshop’s name implies, we can consider “machines” that give rise to infinitely many states, and then things really get interesting. We just blogged about such a machine and its halting problem. Described in another way, the “machine” is a matrix
which takes a vector
to
where
The target states in are those with last component . Are any of them reached in the sequence ? This is a deterministic and discrete and simple linear system, yet nobody knows how to decide it. Allowing more general integer matrices does not change the nature of the problem, and equivalent forms include whether the lower-left (or upper-right) entry of ever becomes zero.
So the complexity range of reachability runs from easy to (maybe-) undecidable. There are further questions one can ask after allowing to consider computations that are actually infinite: Are some states in reached infinitely often? Does the system stay always within upon entry? When there is branching one can put probabilities on the transitions and ask questions about probabilities of reaching being nonzero or or and so on. In a sense my own talk was about those questions when the transitions have amplitudes instead of probabilities. One can also associate infinitely many states to a finite graph by having one or more “budget counters” and giving each edge a label for how much it increments or debits a given counter when traversed. Finally one can partition the nodes among adversarial players who can vie to reach certain nodes and/or bankrupt some counters.
Overall the workshop impressed me with the wide range of interesting problems in this thread and the variety of their application areas. I’ll mention a few of the 16 talks now and intend more later. Here is a photo of our participants:
Nicole Schweikardt led off with the words,
“As a warmup to infinite State Systems, everything in my talk will be finite.”
She actually led off with a great introductory problem to her 2014 paper with Kreutzer: Define the -number of a graph to be the number of ways of choosing three edges that don’t have any edges between them. Equivalently, their edge-neighborhoods are disjoint; they are an independent set in the edge graph of ; the subgraph induced by their six vertices is the graph of three disjoint edges. Now for any consider:
Do they have the same -number? I felt knowledgeable about these graphs since Dick and I were interested long ago in the problem of distinguishing these graphs via constant-depth circuits with various fancy gates. So I put some confidence in my reaction, “of course not.” But the answer is yes when .
The reason “why ?” can be explained in a hand-waving manner by saying that if you number the edges in one half of and choose edges and , you can equally well choose or for a third edge in that component. One choice “attaches to” the end with , the other to , so the choices have the same degrees of freedom as in the bigger cycle . Well that is far from a proof, and what was beautiful in the rest of the talk and the paper is the connection between Hanf equivalence and logic that makes this rigorous. This leads to an open problem about extending the equivalence for order-invariant logics, which relates to what we covered two posts ago.
Richard Mayr led off the third talk by saying his contents were
“…partly known, partly nice observations, and partly half-baked.”
We raised a cheer recently for observations. Mayr began with a setup like Christos Papadimitriou’s games against Nature: a finite graph with some node choices controlled by the Player and the others random and one budget counter. His first example was the following graph:
The player begins at node with a balance of but can raise it as much as desired by choosing the edge with . To reach the goal , however, the player must eventually go to the random node . This carries some risk of the loop being taken more times than the arc was chosen at , thus bankrupting the player. But the risk can be made arbitrarily small. The key distinction is between the properties:
When the total state space is finite these conditions are equivalent, but here where the counter creates an infinite space only the second holds.
The next issue is whether players are allowed to remember the game history so as to know the values of their counters, or at least test them for zero. The latter power suffices to emulate Marvin Minsky’s multiple-counter machines so many problems on arbitrary graphs become undecidable. In restricted cases this leads to further interesting distinctions and questions. Let us add to the above graph an arrow from back to . Can the player reach infinitely often? There is no single strategy that assures this, but in case bad luck at node depleted the counter on the last go-round, can use the memory to replenish it.
Mayr then went into energy games where nature is replaced by a second player who tries to bankrupt a counter. In a recent paper he and others carved out a decidable case of the problem of who wins. This is when there is just one energy counter and the game-graph itself is induced by a one-counter automaton.
Stefan Göller talked about -dimensional vector addition systems with states (VASS). The standard definition without states is that you have a non-negative start vector and a set of vectors with positive and negative integer entries. The transitions allow adding a vector to your current vector , provided u+v is non-negative. Thus it is a solitaire version of the energy games we just mentioned with counters, where is the dimension of the vectors. In Göller’s case you also change state—and this may affect the subset of available to use at the next state.
Göller led off by quoting Dick’s lower bound for the simple case, which Dick covered near the start of this blog. In his talk he covered his recent joint paper showing that for the problem is -complete. The real surprise here is the upper bound, since only a double-exponential time upper bound was known dating from almost 30 years ago. The proof is a deep classification and analysis of three basic types of computational flow patterns. I don’t have time to reproduce the pretty pictures and the sketch of his proof from my notes, but the pictures appear in the paper.
Indeed, I have only gone through three of the first four talks on the first day—there was much great material in the rest and I will have to pick it up another time.
As our last post hinted with its discussion of possible connections between the Skolem Problem and Fermat’s Last Theorem, we suspect that number-theory issues are governing the complexity levels. Can this be brought out in a bird’s-eye view of the various reachability problems?
]]>
A small idea before the fireworks show
Thoralf Skolem was a mathematician who worked in mathematical logic, set theory, and number theory. He was the only known PhD student of Axel Thue, whose Thue systems were an early word-based model of computation. Skolem had only one PhD student, Öystein Ore, who did not work in logic or computation. Ore did, however, have many students including Grace Hopper and Marshall Hall, Jr., and Hall had many more including Don Knuth.
Today Ken and I try to stimulate progress on a special case of Skolem’s problem on linear sequences.
Although Ore worked mainly on ring theory and graph theory the seeds still collected around Skolem’s tree: Hall’s dissertation was titled “An Isomorphism Between Linear Recurring Sequences and Algebraic Rings.” Sequences defined by a finite linear operator are about the simplest computational process we can imagine:
The coefficients and initial values can be integers or relaxed to be algebraic numbers. Skolem posed the problem of deciding whether there is ever an such that .
This is a kind of halting problem. It seems like it should be simple to analyze—it is just linear algebra—but it has remained open for over 80 years. We have discussed it several times before. This 2012 survey by Joel Ouaknine and James Worrell, plus this new one, give background on this and some related problems.
Let be
where each is an algebraic integer. Our problem is:
Does there exist a natural number so that ?
This is a special case of the Skolem problem. It arises when the coefficients are the evaluations of the elementary symmetric polynomials at with alternating signs. For example, with we get
which for and gives
and so on. For we have
Then means If the are nonzero integers then for odd this is asking whether is a solution to Pierre Fermat’s equation, and we can simply answer “no.” Of course whether is a solution can be easier than asking whether the equation has a solution, but this shows our case contains some of the flavor of Fermat’s Last Theorem.
We can point up some minor progress on this problem. Our methods can handle somewhat more general cases where the sum of -th powers is multiplied by for some fixed constants and , but we will stay with the simpler case. Our larger hope is that this case embodies the core of the difficulty in Skolem’s problem, so that solving it might throw open the road to the full solution.
Let’s begin the proof for the case when is a prime . Suppose that . Recall
Clearly we can assume that . Note that this is decidable. Put . The key is to look at the quantity
where is a prime. We employ the following generalization of the binomial theorem:
where
The upshot is that all terms are divisible by a proper factor of except those from the cases , all other . Each gives a factor of and leaves the term . When is a prime this factor must include itself. Thus we get that for some of the form
where is an algebraic integer. But by the supposition this simplifies to , and so is divisible by . Thus
Since , too is divisible by . But is independent of Hence, acts as a bound on any possible prime such that . Testing the finitely many values of up to thus yields a decision procedure for this restricted case of Skolem’s problem.
Ken chimes in an observation that might be distantly related: The Vandermonde determinant
is the “smallest” alternating polynomial in variables. Together with the symmetric polynomials it generates all alternating polynomials. When the are the -th roots of unity it gives the determinant of the Fourier matrix up to sign. This determinant has absolute value
It is also the product of the lengths of the chords formed by equally-spaced points on the unit circle. The observation is that this 2-to-the-nearly-linear quantity is extraordinarily finely tuned.
To see how, let’s estimate the product of the chords in what is caricatured as the style of physicists: The length of an average chord is . So we can estimate the size of the product as
This is off by an order of magnitude in the exponent—not even close. We can be a little smarter and use the average length of a chord instead, integrating from to to get . This is still a number greater than and plugs in to yield anyway.
Such a calculation looks silly but isn’t. If we enlarge the circle by a factor of then every term in the product is multiplied by that factor and it dominates:
If we shrink the circle by the opposite happens: we divide by which crushes everything to make the analogous quantity virtually zero. Furthermore this “big crush” happens under more-plausible slight perturbations such as forbidding any of the points from occupying the arc between and radians, which prevents the equal-spacing maximization when . We covered this at length in 2011.
The underlying reality is that when you take the logarithm of the product of chords, the terms of all growth orders between and all magically cancel. There are many more chords of length than chords of length , but the latter can be unboundedly short in a way that perfectly balances the multitudes of longer chords. The actual value of seems tiny amidst these perturbative possibilities.
This gigantic cancellation reminds Dick and me of the present argument over the tiny observed magnitude of the cosmological constant . Estimation via quantum field theory prescribes a value 120 orders of magnitude higher—one that would instantly cause our universe to explode in fireworks—unless vast numbers of terms exactly cancel. Quoting Wikipedia:
This discrepancy has been called “the worst theoretical prediction in the history of physics” … the cosmological constant problem [is] the worst problem of fine-tuning in physics: there is no known natural way to derive the tiny [value] from particle physics.
“Fine-tuning” of constants without explanation is anathema to science, and many scientists have signed onto theories that there is a multiverse with 500 or more orders of magnitude of universes, enough to generate some with the tiny needed to allow life as we know it. However, any fine-tuning discovered in mathematics cannot be anathema. Perhaps the universe picks up the Fourier fine balancing act in ways we do not yet understand. More prosaically, the fine balance in quantities similar to above could be just what makes Skolem’s problem hard.
I believe that the general case of the Skolem can be handled, not just the simple case. But the problem of handling more than just primes seems hard. I believe that this method can be used to handle more cases than just primes. Ken and I are working on this. Meanwhile, we wish everyone Stateside a happy Fourth of July, whether or not that includes fireworks.
[added link to new survey in intro]
Oded Green, Marat Dukhan, and Richard Vuduc are researchers at Georgia Institute of Technology—my home institution. They recently presented a paper at the Federated Conference titled, “Branch-Avoiding Graph Algorithms.”
Today Ken and I would like to discuss their interesting paper, and connect it to quite deep work that arises in computational logic.
As a co-inventor of the Federated type meeting—see this for the story—I have curiously only gone to about half of them. One of the goals of these meetings is to get diverse researchers to talk to each other. One of the obstacles is that the language is often different. Researchers often call the same abstract concept by different names. One of my favorite examples was between “breadth-first search” and “garbage collection”—see the story here.
Another deeper reason is that they may be studying concepts that are related but not identical. We will study such an example today. It connects logicians with computer architects.
One group calls the algorithmic concept choicelessness and the other calls it reduced branching. This reminds us of the old song “Let’s Call the Whole Thing Off” by George and Ira Gershwin:
The song is most famous for its “You like to-may-toes and I like to-mah-toes” and other verses comparing their different regional dialects.
Theorists have been studying various restrictions on polynomial time algorithms for years. What is interesting—cool—is that recently some of these ideas have become important in practical algorithms. The goal is to explain the main ideas here.
A binary string is naturally an ordered structure. We could present via a number standing for and a list of such that . We could give that list in any order, but we generally can’t permute the labels —doing so would change the string. We avoid fussing with this and just give the bits of the string in the canonical order.
When the input is a graph , however, there are many possible orders in which the vertices can be labeled and the edges presented. Usually we give a number standing for and a list of edges, in some selected order. The point is that properties of the graph do not depend on the labeling. We would also like the output of algorithms not to depend on the order edges are presented. The relevant question is,
To what extent do the output and execution pattern of depend on the labeling used for the graph and the order of presenting edges?
Ideally there would be no dependence. However, consider a simple sequential breadth-first search (BFS) algorithm that maintains a list of not-yet-expanded members of the set of nodes reached from the start vertex . Whenever it must choose a next to expand. The choice is arbitrary unless determined by the labeling ( can be the least member of ) or by the presentation order of neighbors of some previous vertex that were enqueued. An ordering might be “lucky” for some graphs if it minimizes the number of times a neighbor is generated that already belongs to , and foreknowledge of and of being connected can help the algorithm know when to stop.
An example of giving different output—one that strikes us as a harder example—is the -time algorithm for finding a vertex cover of size . It cycles through sequences . For each one, it takes the edges in order, and if neither nor belongs to , adds to if the next bit of is , adding to otherwise. Not just the edge presentation but also the ordering used to distinguish from affects the final output —for reasons apart from the ordering of .
Can we reduce or eliminate the conditional branching that depends on the orderings? That is the question we see as common ground between the two concepts.
The search for ways to eliminate the dependence on the linear ordering led logicians, years ago, to the notion of PTime logics. In particular they asked if it is possible to create a model of polynomial time that does not depend on the ordering of the input. It was conjectured that such a model does not exist.
For example the work of Andreas Blass, Yuri Gurevich, and Saharon Shelah studies an algorithmic model and accompanying logic called CPT for “Choiceless Polynomial Time.” We’ll elide details of their underlying “Abstract State Machine” (ASM) model which comes from earlier papers by Gurevich, but simply note that the idea is to replace arbitrary choices that usual algorithms make with parallel execution. The key restriction is that the algorithms must observe a polynomial limit on the number of code objects that can be executing at any one time. Here is how this plays out in their illustration of BFS:
The algorithm successively generates the “levels” of nodes at distances from . In case is a neighbor of more than one , their ASM avoids having a separate branch for each edge and so avoids an exponential explosion. There are no more code objects than vertices active at any time, so the polynomial restriction is observed.
Not all polynomial-time decidable properties of graphs are so readily amenable to such algorithms, and indeed they prove that having a perfect matching is not expressible in their accompanying CPT logic even when augmented with knowledge of . At the time they pointed out that:
The resulting logic expresses all properties expressible in any other PTime logic in the literature.
A different extension by Anuj Dawar employing linear algebra is not known to be comparable with CPT according to slides from a talk by Wied Pakusa on his joint paper with Faried Abu Zaid, Erich Grädel, and Martin Grohe. That paper extends the following result by Shelah to more-general structures with two “color classes” than bipartite graphs:
Theorem 1
Bipartite matching is decidable in CPT+Counting.
There are many classes of graphs such that restricted to graphs in that class, the power of full PTime logic with order is no greater than CPT over that class. These include graphs of bounded treewidth and graphs with excluded minors, along lines of work by Grohe that we covered some time back. For other classes the power of CPT is open.
We sense a possible and surprising connection between this beautiful foundational work of Blass, Gurevich, and Shelah to an extremely important practical program.
When a modern computer computes and makes choices this causes it generally to lose some performance. The reason is that modern computers make predictions about choices: they call it branch prediction. If the computer can correctly predict the branch taken, the choice made, then the computation runs faster. If it fails to make the right prediction it runs slower.
Naturally much work has gone into how to make branch predictions correctly. In their recent paper, Green, Dukhan, and Vuduc study a radical way to improve prediction: make fewer branches. Trivially if there are fewer choices made, fewer branches to predict, they should be able to increase how often they are right. They consider two classic algorithms for computing certain graph problems: connected components and breadth-first search (BFS). Their main result is that by rewriting the algorithms to make fewer choices they can improve the performance by as much as 30%-50%. This suggests that one should seek graph algorithms and implementations that avoid as many branches as possible.
They add:
As a proof-of-concept, we devise such implementations for both the classic top-down algorithm for BFS and the Shiloach-Vishkin algorithm for connected components. We evaluate these implementations on current x86 and ARM- based processors to show the efficacy of the approach. Our results suggest how both compiler writers and architects might exploit this insight to improve graph processing systems more broadly and create better systems for such problems.
Here are the old-and-new algorithms for BFS in their paper:
Our speculative question is, can the latter algorithm be inferred from the CPT formalism of the logicians? How much “jiggery-pokery” of the previous illustration from the logicians’ paper would it take, or are we talking “applesauce”? More broadly, what more can be done to draw connections between “Theory A” and “Theory B” as discussed in comments to Moshe Vardi’s post here?
Are choices and branches really connected as we claim? Can we replace the search for algorithms that make no choices with algorithms that reduce choices, branches? In a sense can we make choices a resource like time and space. And not try to make it zero like choice free algorithms, but rather just reduce the number of choices. This may only lead to modest gains in performance, but in today’s world where speed of processors is not growing like in the past, perhaps this is a very interesting question.
This is post number 128 under our joint handle “Pip” by one WordPress count, though other counts say 127 or 129. As we wrote in that linked post: Sometimes we will use Pip to ask questions in a childlike manner, mindful that others may have reached definite answers already. We have thoroughly enjoyed the partnership and look forward to reaching the next power of 2.
]]>
Plus visiting Michael Rabin and talking about Gödel’s Theorems
Michael Ben-Or and Michael Rabin have won the 2015 Dijkstra Prize for Distributed Computing. The citation says,
In [two] seminal papers, published in close succession in 1983, Michael Ben-Or and Michael O. Rabin started the field of fault-tolerant randomized distributed algorithms.
Today Ken and I wish to congratulate both Michaels for the well deserved recognition for their brilliant work.
Ben-Or’s paper is titled, “Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols.” Rabin’s paper, “Randomized Byzantine Generals,” brought randomness to a distributed consensus problem whose catchy name had been coined in 1982. The award committee said:
Ben-Or and Rabin were the first to use randomness to solve a problem, consensus in an asynchronous distributed system subject to failures, which had provably no deterministic solution. In other words, they were addressing a computability question and not a complexity one, and the answer was far from obvious.
Their work has continued to have a huge impact on the whole area of distributed computing. The fact that a negative result, an impossibility result, could be circumvented by using randomness was quite surprising back then. It changed the face of distributed algorithms by showing that a problem with no deterministic solution might still have a solution provided randomness is allowed. Besides being a beautiful result, it has great practical importance, since randomness can be used in “real” algorithms.
We applaud the prize committee—Paul Spirakis, James Aspnes, Pierre Fraigniaud, Rachid Guerraoui, Nancy Lynch, and Yoram Moses—for making such a thoughtful choice, which shows that even “old” results can be recognized. It took years for Ben-Or and Rabin to be honored: we are glad they did not have to wait for the next power of two.
While we are thrilled to see Rabin honored for his work on distributed computing, we are even more excited to report that he is doing well. This year he had a serious health issue that required a major operation. The operation went well, but unfortunately during and after it Michael had serious complications.
I, Dick, just visited him at his home in Cambridge. I am happy to report that Michael seems to be on his way to a full recovery. He is as sharp as ever, and looks like he will be his normal self in the near future. This is great news.
We wish Michael and his wonderful wife Ruth and their daughters the best. His elder daughter, Tal Rabin, also works in cryptography; I heard her give a talk four years ago on their father-daughter research. Michael plans to be part of several upcoming events—a good sign that he is on the mend. Again our best wishes to Michael and his family.
Most of my conversation with Michael was about friends, gossip, and funny stories. As always it was fascinating to hear Michael tell stories—he is such a great story teller. As I left, after over two hours together, I talked to him about research—if only for a few minutes.
He surprised me by averring that he had been thinking anew about the famous Incompleteness Theorems of Kurt Gödel. Recall the first of these theorems implies that in any sufficiently powerful consistent theory, there exist true sentences that are unprovable, and the second says the consistency of the theory is one of these sentences. There are by now many proofs of these great results: some short and clever, some longer and more natural, some via unusual connections with other parts of mathematics.
What could be new here? Michael pointed to recent proofs he really liked. These include an AMS Notices article by Shira Kritchman and Ran Raz. The thrust is that one can view the Second Incompleteness Theorem through the lens of Kolmogorov complexity as a logical version of the famous Unexpected Examination paradox. As expressed by Timothy Chow, this paradox goes as follows:
A teacher announces in class that an examination will be held on some day during the following week, and moreover that the examination will be a surprise. The students argue that a surprise exam cannot occur. For suppose the exam were on the last day of the week. Then on the previous night, the students would be able to predict that the exam would occur on the following day, and the exam would not be a surprise. So it is impossible for a surprise exam to occur on the last day. But then a surprise exam cannot occur on the penultimate day, either, for in that case the students, knowing that the last day is an impossible day for a surprise exam, would be able to predict on the night before the exam that the exam would occur on the following day. Similarly, the students argue that a surprise exam cannot occur on any other day of the week either. Confident in this conclusion, they are of course totally surprised when the exam occurs (on Wednesday, say). The announcement is vindicated after all. Where did the students’ reasoning go wrong?
Michael said something about getting sharper bounds on the Kolmogorov complexity constant involved in their article. There wasn’t time to go into details, so we had to leave the discussion “incomplete.” So I asked Ken to try to help reconstruct what Michael was seeing and trying to do.
I, Ken, usually give only a quick taste of Gödel’s theorems in one lecture in Buffalo’s introductory graduate theory course. Let be the predicate that the Turing machine with numerical code never halts when run on empty input. Let be a strong effective formal system such as Peano arithmetic or set theory. Then I show (or give as homework) the following two observations:
Now if is not included in , there is an such that proves but is false. being false means that in the real world the machine does halt on input the empty string . Hence there is some finite number such that the decidable predicate (saying halts in steps) is true. By the strength assumption on it proves all true and false cases of , so proves , but since proves , also proves . This makes inconsistent.
Thus if is consistent, then is included in , and properly so by the c.e./not-c.e. reasoning. Taking any gives a true statement that cannot prove.
Gödel’s diagonal method shows how to construct such a , and Gödel’s definition of incompleteness also requires showing that cannot prove either. This is more subtle: proving when is true is not a violation of consistency as with proving when is false. Note that
For reasons we put in Gödel’s own voice at the end of our second interview with him, it is possible to have a model with a non-standard integer in which the statements and for all all hold. This is why Gödel originally used a stronger condition he called -consistency which rules out proving and all the statements . (As Wikipedia notes, since is a halting predicate the restricted case called -soundness is enough.) It took Barkley Rosser in 1935 to make this too work with just consistency as the assumption.
But if all we care about is having a true statement that cannot prove, the c.e./not-c.e. argument appealing to consistency is enough. Then comes the “meta-argument” that if could prove its own consistency, then because can prove the c.e./not-c.e. part of the argument, would prove . As Kritchman and Raz observe in related instances, this does not alone yield a concrete such that proves , which is the real contradiction needed to deduce the second theorem. Still, I think the above is a reasonable “hand-wave” to convey the import of the second theorem with minimal logical apparatus.
The question becomes, How concrete can we make this? Can we push the indeterminate quantity into the background and quantify ideas of logical strength and complexity in terms of a parameter that we can bound more meaningfully? Dick and I believe this objective is what attracted Rabin to the Kritchman-Raz article.
Kritchman and Raz obtain the second argument without hand-wave and with minimal “meta” by focusing on the Kolmogorov complexity of a binary string :
Here means the string length of —that is, the length of the program producing from empty tape. Now let us imagine a function —for Gregory Chaitin—that takes as parameters a description of and a number and outputs a program that does the following on empty tape:
Search through all proofs in of statements of the form ‘‘ and as soon as one is found, output .
Then , where is a constant independent of . Whenever exceeds a constant
where is the Lambert -function, we have .
Thus for there are no proofs in of the form , for any —else by running until it finds such a proof and outputs we prove and so expose the inconsistency of . Define ; then we need only find such that is true to prove the first theorem concretely. Most important, by simple counting that is provable in , such a must exist among the finite set of binary strings of length .
Kritchman and Raz conclude their argument by letting be the statement “at least strings have ” for , and (taking false). There is exactly one such that is true, but cannot even prove : By the truth of there are strings with . can guess and verify them all by running their programs with to completion. If proves then deduces that all other have , which is impossible by the choice of . Likewise, cannot prove since then it proves for every .
We start a ball rolling, however, by observing that via the counting argument, does prove . So either is false or is inconsistent. This turns around to say that if proves its own consistency, then proves that is false—which is like the “surprise exam” not being possible on the last day. But since proves , proves . Hence either is false or is inconsistent. This turns around to make deduce that its ability to prove its consistency implies the ability to prove . This rips right on through to make prove , which however it can do only if really is inconsistent. Thus cannot prove its own consistency—unless it is inconsistent—which is Gödel’s second theorem.
The article by Kritchman and Raz has the full formal proof-predicate detail. There is a “” lurking about—it’s the number of steps the programs outputting -es need to take—but the structure ensures that the eventuality of a non-standard length- computation outputting an never arises. The finiteness of and drives the improvement.
Along with Michael we wonder, what can be done further with this? Can we turn the underlying computability questions into complexity ones? The natural place to start is, how low can be? If is Peano arithmetic or set theory, can we take ? This seemed to be what Michael was saying. It depends on . And there is another thought here: We don’t need the length of to be bounded, but rather its own Kolmogorov complexity, . Can we upper-bound this—for set theory or Peano—by setting up some kind of self-referential loop?
The main open problem is how fast will Michael be back in form. We hope that the answer to this open problem is simple: very soon. Congratulations again to him and Michael Ben-Or on the prize, and Happy Father’s Day to all.
[fixed inequality after Lambert W]