“Four Weddings” is a reality based TV show that appears in America on the cable channel TLC. Yes a TV show: not a researcher, not someone who has recently solved a long-standing open problem. Just a TV show.
Today I want to discuss a curious math puzzle that underlines this show.
The show raises an interesting puzzle about voting schemes:
How can we have a fair mechanism when all the voters have a direct stake in the outcome?
So let’s take a look at the show, since I assume not all of you are familiar with it. I do admit to watching it regularly—it’s fun. Besides the American version there are many others including a Finnish version known as “Neljät Häät” (Four Weddings), a German version called “4 Hochzeiten und eine Traumreise” (4 Weddings and one dream journey), and a French version called “4 Mariages pour 1 Lune de Miel” (4 Weddings for 1 Honeymoon). The last two remind me most of the 1994 British movie “Four Weddngs and a Funeral” but there is no real connection.
There is keen interest worldwide, it seems, in weddings as they are a major life event. And of course, they are filled with lots of beautifully dressed people, lots of great displays of food and music, and lots of fun.
Like many shows, “Four Weddings” is based on a British show—do all good shows originate in the UK? Four brides, initially strangers, meet and then attend each others’ weddings. Each then scores the others weddings on various aspects: bridal gown, venue, food, and so on. Then the bride with the highest score wins a dream honeymoon. Of course there is the small unstated issue that the honeymoon, no matter how exotic, happens well after the actual wedding. Oh well.
The scoring method varies from season to season and also from country to country. But higher scores are better, and the brides get a chance on camera to explain why they scored how they did. A typical comment might be: I loved the venue, the food, but the music was terrible.
You get to see four different weddings, which is the main attraction in watching the show. Usually each wedding is a bit out there: you see weddings with unusual themes, with unusual venues, and other unusual features. If you are not ready to have an interesting wedding, to spend some extra time in making it special, then you have little chance of winning.
The puzzle to me is really simple: why would the brides rate each other fairly? They all want to win the honeymoon, the prize, so why ever give high ratings? Indeed.
There have been some discussions on the web on what makes the scoring work. Some have noticed that the most expensive weddings usually win.
Clearly the game-theoretic optimal move seems to be to give all the other brides low scores and hope the others act fairly. The trouble with this method is that you look bad—and who wants to look bad on a TV show that millions might see? Can we make a model that accounts for this? It does not have to embrace possible psychological factors at all—it just has to do well at predicting the observed ratings on the show.
I have thought about this somewhat and have a small idea. Perhaps some of you who are better at mechanism design could work out a scoring method that actually works. My idea is to penalize a score that is much lower than the others. A simple version could be something like this: Suppose the brides are Alice, Betty, Carol, and Dawn. If Alice’s wedding gets scores like this:
Betty: 7
Carol: 6
Dawn: 3,
then perhaps we deduct a point from Dawn’s total. She clearly is too low on Alice. Can we make some system like this really work?
I have been discussing with Ken and his student Tamal Biswas some further applications of their work on chess to decision making. Their latest paper opens with a discussion of “level- thinking” and the “11-20 money game” introduced in a recent paper by Ayala Arad and Ariel Rubinstein.
In the game each player independently selects a number between 11 and 20 and instantly receives . In addition, if one player chose a number exactly $1 below the other’s number, that person receives a bonus of $20 more. Thus if one player chooses the naively maximizing value $20, the other can profit by choosing $19 instead. The first player however could sniff out that strategy by secretly choosing $18 instead of $20. If the second player thinks and suspects that, the $19 can be revised down to $17. And so on in what sometimes becomes a race for the bottom, although the Nash equilibrium assigns non-zero probability only to the values $15 through $20.
In this game the level of thinking what the opponent might do is simply represented by the value . There is now a rich literature of studies of how real human players deviate from the Nash equilibrium, though they come closer to it under conditions of severe time pressure. The connection sought by Ken and Tamal relates to search depth in chess—that is, to how many moves a player looks ahead.
Ken does not know whether anyone has intensively treated the extension to players. The following seems to be the most relevant way to define this:
It would be interesting to study this with and compare the results to the observed behavior in the show “Four Weddings.” Could something like this be going on? Or are the brides simply being true to their own standards and gushing with admiration where merited? It could be interesting either way, whether they match or deviate from the projections of a simple game-theoretic model with scoring like this.
What is the right scoring method here? Is it possible to find one?
]]>
A new kind of ‘liar’ puzzle using Freestyle chess
By permission of Vierrae (Katerina Suvorova), source
Raymond Smullyan is probably the world’s greatest expert on the logic of lying and the logic of chess. He is still writing books well into his 10th decade. Last year he published a new textbook, A Beginner’s Guide to Mathematical Logic, and in 2013 a new puzzle book named for Kurt Gödel. His 1979 book The Chess Mysteries of Sherlock Holmes introduced retrograde analysis—taking back moves from positions to tell how they could possibly have arisen—to a wide public.
Today Ken and I wish to talk about whether we can ever play perfect chess—or at least better chess than any one chess program—by combining output from multiple programs that sometimes might “lie.”
We will start with Smullyan-style puzzles today, but they are prompted by an amazing and serious fact. Even though human players have been outclassed by computers for over a decade, humans judging between multiple programs have been able to beat those programs playing alone. This happens even when the human player is far from being a master—someone who would get crushed in minutes by programs available on smartphones. We want to know, how can this be?
By coincidence, yesterday’s New York Times Magazine has a feature on Terry Tao that likens discovering and proving theorems to “playing chess with the devil”—quoting Charles Fefferman:
The devil is vastly superior at chess, but […] you may take back as many moves as you like, and the devil may not. … If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but—aha!—you have your first clue.
On this blog we have previously likened perfect programs to “playing chess against God”—this was quoting Ken Thompson about endgames where perfect tables have been computed. Since the programs we consider here occasionally err—one can say lie—we will reserve the “devil” term in yesterday’s Times for them.
We, that is I and my Tech colleague Dr. Kathryn Farley, just visited Ken at his home base in Buffalo and had a great time. Of course the weather up there is near perfect this time of year and his family was wonderful to us. Plus we got to visit Wegmans—our pick for the greatest supermarket chain in the world.
One afternoon I was honored to sit in on a video conference in which Ken presented some of his research on using computer programs to evaluate the play of humans. Joining him via the Internet were two experts in so-called freestyle chess where humans are allowed access to multiple chess programs during a game. One-on-one the programs totally dominate the humans—even on laptops programs such as Stockfish and Komodo have Elo ratings well above 3100 whereas the best humans struggle to reach even 2900—but the human+computer “Centaurs” had better results than the computers alone. In the audience were representatives of defense and industrial systems that involve humans and computers.
Ken got into freestyle chess not as a player but because of his work on chess cheating—see this for example. Freestyle chess says “go ahead and cheat, and let’s see what happens…” The audience was not interested in cheating but rather in how combining humans and computers changes the game. While chess programs are extremely strong players, they may have weaknesses that humans can help avoid. Thus, the whole point of freestyle chess is:
Are humans + computers > computers alone?
That is the central question. Taken out of the chess context it becomes a vital question as computers move more and more into our jobs and our control systems. The chess context attracts interest because it involves extreme performance that can can be precisely quantified, and at least until recently, the answer has been a clear “Yes.”
At the beginning of the video conference Ken spoke about the history of computer chess, giving his usual clear and precise presentation, and then reviewed his joint paper showing the good results for human-computer teams were no fluke—they really made better moves. Ken used some of the slides from the end of his TEDxBuffalo talk. Then two Freestyle experts spoke, including a winner of three tournaments who doesn’t even have a human chess rating, and an interesting discussion followed on how they actually used the computer chess programs.
I must admit at first I was a skeptic: how could weak players, humans, help strong players, computers? Part of it was that when two or more programs disagreed on the best move the human could make the choice. This kind-of means saying the programs whose move you don’t choose are wrong.
As Ken and I mulled over the idea of freestyle chess we realized it raises some interesting puzzles. I wrote a first draft, then Ken took over adding more detail and tricks to what follows. Let’s take a look at the puzzles now.
Suppose Alice has one program that is perfect except that there is one position in which makes a error. To simplify, let’s suppose the only outcomes are win () or loss (). An error means that is a winning position for the player to move—it could be Bob not Alice—but chooses a move leading to a position that is losing for that player.
Let Alice start in a position . She wants to play perfect chess using as guide. Can she do so? The bad position might be reached in a game played from , indeed might be itself.
Alice of course cannot tell by herself whether a given position has value or , unless the position is right near the end of a game. But she has one advantage that lacks. She can play against itself from any position . If is beyond —at least if is not reachable in a correctly-played sequence of moves from —then Alice will get the correct value of .
This is like the power of a human centaur to try a program deeper on the moves it is suggesting. In symbols, Alice executes , which generates a game sequence
of positions where is checkmate for one side. The cost of this is . You might think we could let do the same thing, but is not like “Satan” in Smullyan’s story, “Satan, Cantor, and Infinity.” is not trying to out-psych Alice or correct itself; is just given as-is and by-hook-or-by-crook makes that error in some position .
So Puzzle I is:
Can the “centaur” Alice + X play perfectly, even though neither Alice nor X plays perfectly alone? And at what cost compared to X?
The answer is, yes she can. Let the legal moves in be going to positions with the move recommended by . Her algorithm exemplifies a key idea that bridges to the more interesting puzzles.
We claim this algorithm, whose running time we count as linear in , plays perfectly in any . If all values are but is wrong then some other is really a winning position. But then is wrong both at and at some position in the game from which is a contradiction. If but is wrong anyway then is wrong both at and somewhere at or beyond.
Finally if the “switch” is wrong then errs somewhere beyond and either errs beyond or erred by choosing in after all. (If is wrong and all other are losing then and so wasn’t a mistake by Alice since it didn’t matter.)
There is one loophole that needs attention. and could both be wrong because their games go to a common position in which makes an error. However, that lone error cannot simultaneously flip a true to in and a true to in , because and have the same player (Bob) to move. There is also the possibility that play from could go through (perhaps via or through some other , which we leave you to analyze. We intend to rule out the latter, and we could also rule out the former by insisting that the “game tree” from positions really be a tree not a DAG.
Boiled down, the idea is that where goes to is an “error signature” that Alice can recognize. If errs at then the signature definitely happens because is perfect at each and one of the must be winning. If the signature happens yet did not err at then must be a losing position. Hence once the signature happens then Alice can trust completely. The only way the error could possibly lie in her future is if she was losing at but lucks into a winning position—but then the error happened on Bob’s move not hers.
We claim also that all this logic is unaffected if “draw” is a possible outcome. Indeed, we could play with the old Arabian rules that giving stalemate is a win—counting it 0.8 points say—and that occupying the center with one’s king after leaving the opponent with a bare king is a win—worth say 0.7 points. It all works with “” being the true value of position (aside from a total loss) and “” being any inferior value.
Now suppose is allowed to make errors in two positions. Can Alice modify her algorithm to still play perfectly?
First suppose the errors are related. Call two positions related if can occur in a game played from where all intervening moves are not errors. Per above we will always suppose that two positions reached in one move from are unrelated (else we would say the two options are not fully distinct). Related errors are ones that occur in related positions.
If Alice knows this about , then we claim she can solve Puzzle II. She plays the game through as before, but now she looks for the error signature at all nodes in the game. If she never finds it then she plays . If she does, then she lets be the last position at which it occurs. Then she knows that either errs at or errs somewhere beyond . Either way, she can use this knowledge to whittle down the possibilities at . Or at least we think she can.
Notice, however, what has happened to Alice’s complexity. She is now running at every node in a length- game path. Her time is now quadratic in . This is still not terrible, not an exponential blowup of backtracking. But in honor of what Alberto Apostolico cared about, we should care about it here. So there is really a second puzzle: can Alice play perfectly in time ?
If the errors are unrelated then we would like Alice to carry out the same algorithm as for Puzzle I. The logic is not airtight, however, because of the case where there were unrelated errors at and . Worse, what if Alice doesn’t know whether the errors are related?
Here comes the “Freestyle” idea of using multiple programs. Let us have two programs, and . Suppose one of them can make up to errors but the other is perfect. Alice does not know which is which. Now can she play perfectly—in time?
If the errors are related then she can localize to and have the same linear time complexity as before. For simplicity let’s suppose there are just legal moves, i.e., . Here is her algorithm:
The remaining case is that one pair is and the other is . This cannot happen, because it means that one of the programs is making two unrelated errors.
This is the idea Dick originally had after the videoconference on Freestyle chess. It shows the advantage of using multiple programs to check on each other like the centaur players do. But what happens if the errors are unrelated? Call that Puzzle III.
Now let’s allow both programs and to make up to errors. Can Alice still play perfectly? We venture yes, but we hesitate to make this a formal claim because Puzzles II and III are already proving harder than expected.
How much does having a third program that is perfect help?—of course not knowing which of the three programs is perfect. If instead too can make up to errors, how much worse is that? Even if Alice can still play perfectly in polynomial time, we wonder if the exponent of will depend on . Call all of this Puzzle IV.
We can add a further wrinkle that matters even for : we can consider related errors to be just one error. This makes sense in chess terms because an error in a position that is reachable from a position can affect the search by a program in . Thus the error at knocks-on and makes the play at all nodes between and the root unreliable. Let be the set of all positions at which makes errors. Then we can define to be the minimum such that there are positions such that is contained in the union of the positions from which some is reachable. This is well-defined even when the positions form a DAG not a tree.
Thus programs could err in multiple positions but still count as having a single branch error if those positions are all on the same branch. Anything with branch errors counts as Puzzle V. This is where our error model is starting to get realistic, but as we often find in theory, there is a lot of already-challenging ground to cover before we get there. It is time to call it a day—or a post.
Our puzzles have some of Smullyan’s flavor. In a typical logic puzzle of his, Alice would be confronted by and that have different behaviors in telling the truth to arbitrary questions. The solutions in his case rely on the ability to ask questions like:
If you are a person of type … , then what would you say to the question … ?
Our situations seem different, but perhaps there are further connections between our puzzles and Smullyan’s. What do you think?
Can you solve the puzzles of kinds II or III or higher? If you or we find a clear principle behind them then this will go into a followup post.
Update (7/31/15) The artist Vierrae, Katerina Suvorova of Russia, has graciously contributed two new portraits of Smullyan in oil. I have used her new version of Smullyan in a ‘Magus’ robe at the top. Here is her portrait of him in more formal wear, as if he were a dinner guest at an Oxford High Table.
The originals in higher resolution are viewable on her DeviantArt page . Our great thanks to her.
Cropped from TCS journal source |
Alberto Apostolico was a Professor in the Georgia Tech College of Computing. He passed away on Monday after a long battle with cancer.
Today Ken and I offer our condolences to his family and friends, and our appreciation for his beautiful work.
Alberto was still active. He had a joint paper in the recent 2015 RECOMB conference, that is Research in Computational Molecular Biology. It was written with Srinivas Aluru and Sharma Thankachan and titled, “Efficient Alignment Free Sequence Comparison with Bounded Mismatches.” Srinivas is also here at Tech and wrote some words of appreciation:
[We] submitted two journal papers from this joint work this month, one on last Thursday night. … When I empathized with his situation, he would remind me that all of our lives are temporary and his situation is no different.
A full session of that conference was devoted to fighting cancer. One can only hope that some of the results of this and other theory conferences contribute to finally solving that problem.
Alberto’s work was on how much work one needs to do to identify notable properties of words. We mean very long words such as the textual representation of the human genome. Many “obvious” methods for processing strings do too much work. In a Georgia Tech feature several years ago, Alberto put it this way:
How do you compare things that are essentially too big to compare, meaning that the old ways of computing are no longer feasible, meaningful, or both? It’s one thing to compare and classify 30 proteins that are a thousand characters long; it’s another to compare a million species by their entire genomes, and then come up with a classification system for those species.
The theme of a special issue of the journal Theoretical Computer Science for Alberto’s 60th birthday in 2008 was:
Work is for people who do not know how to SAIL — String Algorithms, Information and Learning.
The foreword by Raffaele Giancarlo and Stefano Lonardi lists Alberto’s many contributions.
Giancarlo, who did a Master’s with Alberto in Salerno and then a PhD with Zvi Galil—our Dean at Georgia Tech—told some stories yesterday to a mailing list of their field. As an undergraduate feeling jitters attending a summer school taught by Apostolico on the island of Lipari off the north coast of Sicily, he was greatly heartened to see the leader arriving in his sailboat, named Obliqua (“Oblique”). A month ago Ken was in Sardinia—further north in the same waters—for a meeting of the World Chess Federation’s Anti-Cheating Committee, and offers this peaceful picture of the 41-foot yacht in which they took an excursion.
Ken and I have already talked about stringology—see here. Stringology is the study of the most basic objects in computing, linear finite sequences of letters, and is filled with deep and often surprising results. In the post we mentioned the surprise that an ordinary multitape Turing machine working in real time can print a 1 each time the first letters it has read form a palindrome. One can also trace roots to the discovery that string matching—telling whether a string occurs as a subword of and finding it if so—can be done in linear time.
Alberto and Rafaelle wrote a paper about Zvi before either wrote a paper with Zvi: “The Boyer-Moore-Galil String Searching Strategies Revisited.” This paper has a Wikipedia page under the name, “Apostolico-Giancarlo algorithm.” By employing a compact database of certain substrings of the pattern string with room to record their matches and non-matches to parts of , they showed how to reduce the overall number of character comparisons to The “” had previously been , , and in this and related measures, and matched a conjectured lower bound by Leo Guibas.
Before the paper, however, Alberto and Zvi edited and contributed to a highly influential collection of papers from a workshop in Italy sponsored by NATO’s Advanced Sciences Institute in 1984, titled Combinatorial Algorithms on Words. Many great people we know took part and wrote for the volume: Michael Rabin, Andy Yao, Andrew Odlyzko, Bob Sedgewick with Philippe Flajolet and Mireille Régnier, Andrei Broder, Joel Seiferas—and others such as Maxime Crochemore, Shimon Even, the aforementioned Guibas, Michael Main, Dominique Perrin, Wojciech Rytter, and James Storer. A contribution from Victor Miller and Mark Wegman (of universal hashing fame) titled “Variations on a Theme by Ziv and Lempel” is followed by one from the Lempel and Ziv.
Alberto and Zvi teamed on another volume in 1997, viz. the book Pattern Matching Algorithms. And yes they did co-write papers, including several on parallel algorithms for string problems—even the basic palindrome-finding problem. Their latest collaboration was a multi-author survey titled “Forty Years of Text Indexing,” which was a keynote presentation at the 2013 Combinatorial Pattern Matching symposium.
Alberto proved many surprising theorems during his long career. A recent example of his “out-of-the-box” approach is his 2008 paper with Olgert Denas titled, “Fast algorithms for computing sequence distances by exhaustive substring composition.” The abstract notes that the standard edit-distance measure
…hardly fulfills the growing needs for methods of sequence analysis and comparison on a genomic scale […] due to a mixture of epistemological and computational problems.
Well we have compressed the abstract with a little stringology ourselves. Recall our recent post on new evidence that the time to compute edit distance really is quadratic. Edit distance is so basic, it takes chutzpah to imagine that “alternative measures, based on the subword composition of sequences” could be both quicker and useful. The main theorem is a linear-time algorithm for a distance measure that seems to depend on quadratically-many pairs of substrings. This is a real feat of leveraging the ways words can combine. The paper goes on to show how this is programmed and applied—note also that the link to the paper is on the NIH website.
In the Georgia Tech feature we quoted above, Alberto said this about the genome:
It’s the closest thing we have to a message from outer space. We do not know where it comes from, understand very little of what it means, and have no clue about where it is going.
He went on to note the shift from pattern matching to pattern discovery—without saying how far he was in the vanguard on this.
Alberto captured discovery by the element of surprise, which can be quantified statistically. One of my favorite papers of his is titled, “Monotony of Surprise and Large-Scale Quest for Unusual Words,” with Mary Ellen Bock and the above-mentioned Lonardi. It is another victory in Alberto’s perpetual battle of linear over quadratic.
Here is the idea. Given a long string and substring , let be the number of occurrences of in Now suppose is in the support of a random distribution in which each character depend only on a fixed finite number of previous characters in a manner that is also independent of Let be the random variable of over drawn from this distribution. Then
is a normal statistical z-score, where stands for expectation and for standard deviation. If for some high positive threshold , then occurs unusually frequently in and so is a surprising substring. Substrings with are surprising by their lack of expected frequency in the given long text
Telling which substrings are surprising still seems to face the wall that there are quadratically many substrings overall. But now the authors work a little structural magic. Suppose we have a partition of the substrings into -many classes such that each has a unique longest and shortest member. Given , put into if and no other string in has a higher z-score, and if and no other member is more negative. They prove:
Theorem 1 Given any of length and an -partition of its substrings as above, and any fixed displacement , the sets and can be computed in time.
We have skimmed some fine print about complexity of the distribution and access to the longest and shortest strings in the , plus how they make everything work also for other score functions The way they combine the monotonicity of with regard to sub- and super-strings of , properties of convexity and concavity, numerical analysis, and graph-theoretic diagrams of the substring structures is a tour de force—the paper rewards attention to its details.
We could go on… But let’s stop now and just again repeat our condolences again to his family and friends. Georgia Tech will be putting together a memorial in his honor—perhaps you will be able to attend.
Alberto is missed already.
Cropped from source |
Joel Ouaknine is a Professor of Computer Science at Oxford University and a Fellow of St. John’s College there. He was previously a doctoral student at Oxford and made a critical contribution in 1998 of a kind I enjoyed as a student in the 1980s. This was contributing a win in the annual Oxford-Cambridge Varsity Chess Match, which in 1998 was won by Oxford, 5-3.
Today I’d like to report on some of the wonderful things that happened at a workshop on “Infinite-State Systems” hosted by Joel at the Bellairs Institute of McGill University last March 13–20 in Barbados, before we finally opened a chess set and played two games on the last evening.
The workshop was one of two happening concurrently at Bellairs, which has been my pleasure to visit twice before in 1995 and 2009. The other was on “Co-Algebras in Quantum Physics” and was co-organized by Prakash Panangaden, whom I used to know on Cornell’s faculty when I was a postdoc there. I often wished I could be in a quantum superposition between the workshops. My own talk for Joel’s workshop was on analyzing quantum circuits, and that anchored stimulating discussions I had with both workshops’ members during meals and excursions and other free time.
The other participants in ours were Dmitry Chistikov, Thomas Colcombet, Amelie Gheerbrant, Stefan Göller, Martin Grohe, Radu Iosif, Marcin Jurdzinski, Stefan Kiefer, Stephan Kreutzer, Ranko Lazic, Jerome Leroux, Richard Mayr, Peter Bro Miltersen, Nicole Schweikardt, and James Worrell. Göller and Grohe joined Joel and me for soccer in a nearby park on two lunch breaks—we didn’t just play chess.
The basic reachability problem is: Given a graph , a starting node , and a set of target nodes, is there a path from to some node in ? When is undirected the problem is solvable in logarithmic space. This is a deep result—for a long time randomized logspace was known—but a simple statement. When is directed the problem is complete for nondeterministic log space (). But is in so it’s still solvable in polynomial time.
Enough said about reachability? The meta-problem—really the problem—comes from other ways to present besides listing finitely many nodes and edges. If is presented by a circuit with inputs that recognizes its edge relation on , then reachability for jumps up to being -complete. Not all graphs have of poly size, but those that do include the transition graphs for polynomial space-bounded Turing machines with start state and accepting configurations .
As the workshop’s name implies, we can consider “machines” that give rise to infinitely many states, and then things really get interesting. We just blogged about such a machine and its halting problem. Described in another way, the “machine” is a matrix
which takes a vector
to
where
The target states in are those with last component . Are any of them reached in the sequence ? This is a deterministic and discrete and simple linear system, yet nobody knows how to decide it. Allowing more general integer matrices does not change the nature of the problem, and equivalent forms include whether the lower-left (or upper-right) entry of ever becomes zero.
So the complexity range of reachability runs from easy to (maybe-) undecidable. There are further questions one can ask after allowing to consider computations that are actually infinite: Are some states in reached infinitely often? Does the system stay always within upon entry? When there is branching one can put probabilities on the transitions and ask questions about probabilities of reaching being nonzero or or and so on. In a sense my own talk was about those questions when the transitions have amplitudes instead of probabilities. One can also associate infinitely many states to a finite graph by having one or more “budget counters” and giving each edge a label for how much it increments or debits a given counter when traversed. Finally one can partition the nodes among adversarial players who can vie to reach certain nodes and/or bankrupt some counters.
Overall the workshop impressed me with the wide range of interesting problems in this thread and the variety of their application areas. I’ll mention a few of the 16 talks now and intend more later. Here is a photo of our participants:
Nicole Schweikardt led off with the words,
“As a warmup to infinite State Systems, everything in my talk will be finite.”
She actually led off with a great introductory problem to her 2014 paper with Kreutzer: Define the -number of a graph to be the number of ways of choosing three edges that don’t have any edges between them. Equivalently, their edge-neighborhoods are disjoint; they are an independent set in the edge graph of ; the subgraph induced by their six vertices is the graph of three disjoint edges. Now for any consider:
Do they have the same -number? I felt knowledgeable about these graphs since Dick and I were interested long ago in the problem of distinguishing these graphs via constant-depth circuits with various fancy gates. So I put some confidence in my reaction, “of course not.” But the answer is yes when .
The reason “why ?” can be explained in a hand-waving manner by saying that if you number the edges in one half of and choose edges and , you can equally well choose or for a third edge in that component. One choice “attaches to” the end with , the other to , so the choices have the same degrees of freedom as in the bigger cycle . Well that is far from a proof, and what was beautiful in the rest of the talk and the paper is the connection between Hanf equivalence and logic that makes this rigorous. This leads to an open problem about extending the equivalence for order-invariant logics, which relates to what we covered two posts ago.
Richard Mayr led off the third talk by saying his contents were
“…partly known, partly nice observations, and partly half-baked.”
We raised a cheer recently for observations. Mayr began with a setup like Christos Papadimitriou’s games against Nature: a finite graph with some node choices controlled by the Player and the others random and one budget counter. His first example was the following graph:
The player begins at node with a balance of but can raise it as much as desired by choosing the edge with . To reach the goal , however, the player must eventually go to the random node . This carries some risk of the loop being taken more times than the arc was chosen at , thus bankrupting the player. But the risk can be made arbitrarily small. The key distinction is between the properties:
When the total state space is finite these conditions are equivalent, but here where the counter creates an infinite space only the second holds.
The next issue is whether players are allowed to remember the game history so as to know the values of their counters, or at least test them for zero. The latter power suffices to emulate Marvin Minsky’s multiple-counter machines so many problems on arbitrary graphs become undecidable. In restricted cases this leads to further interesting distinctions and questions. Let us add to the above graph an arrow from back to . Can the player reach infinitely often? There is no single strategy that assures this, but in case bad luck at node depleted the counter on the last go-round, can use the memory to replenish it.
Mayr then went into energy games where nature is replaced by a second player who tries to bankrupt a counter. In a recent paper he and others carved out a decidable case of the problem of who wins. This is when there is just one energy counter and the game-graph itself is induced by a one-counter automaton.
Stefan Göller talked about -dimensional vector addition systems with states (VASS). The standard definition without states is that you have a non-negative start vector and a set of vectors with positive and negative integer entries. The transitions allow adding a vector to your current vector , provided u+v is non-negative. Thus it is a solitaire version of the energy games we just mentioned with counters, where is the dimension of the vectors. In Göller’s case you also change state—and this may affect the subset of available to use at the next state.
Göller led off by quoting Dick’s lower bound for the simple case, which Dick covered near the start of this blog. In his talk he covered his recent joint paper showing that for the problem is -complete. The real surprise here is the upper bound, since only a double-exponential time upper bound was known dating from almost 30 years ago. The proof is a deep classification and analysis of three basic types of computational flow patterns. I don’t have time to reproduce the pretty pictures and the sketch of his proof from my notes, but the pictures appear in the paper.
Indeed, I have only gone through three of the first four talks on the first day—there was much great material in the rest and I will have to pick it up another time.
As our last post hinted with its discussion of possible connections between the Skolem Problem and Fermat’s Last Theorem, we suspect that number-theory issues are governing the complexity levels. Can this be brought out in a bird’s-eye view of the various reachability problems?
]]>
A small idea before the fireworks show
Thoralf Skolem was a mathematician who worked in mathematical logic, set theory, and number theory. He was the only known PhD student of Axel Thue, whose Thue systems were an early word-based model of computation. Skolem had only one PhD student, Öystein Ore, who did not work in logic or computation. Ore did, however, have many students including Grace Hopper and Marshall Hall, Jr., and Hall had many more including Don Knuth.
Today Ken and I try to stimulate progress on a special case of Skolem’s problem on linear sequences.
Although Ore worked mainly on ring theory and graph theory the seeds still collected around Skolem’s tree: Hall’s dissertation was titled “An Isomorphism Between Linear Recurring Sequences and Algebraic Rings.” Sequences defined by a finite linear operator are about the simplest computational process we can imagine:
The coefficients and initial values can be integers or relaxed to be algebraic numbers. Skolem posed the problem of deciding whether there is ever an such that .
This is a kind of halting problem. It seems like it should be simple to analyze—it is just linear algebra—but it has remained open for over 80 years. We have discussed it several times before. This 2012 survey by Joel Ouaknine and James Worrell, plus this new one, give background on this and some related problems.
Let be
where each is an algebraic integer. Our problem is:
Does there exist a natural number so that ?
This is a special case of the Skolem problem. It arises when the coefficients are the evaluations of the elementary symmetric polynomials at with alternating signs. For example, with we get
which for and gives
and so on. For we have
Then means If the are nonzero integers then for odd this is asking whether is a solution to Pierre Fermat’s equation, and we can simply answer “no.” Of course whether is a solution can be easier than asking whether the equation has a solution, but this shows our case contains some of the flavor of Fermat’s Last Theorem.
We can point up some minor progress on this problem. Our methods can handle somewhat more general cases where the sum of -th powers is multiplied by for some fixed constants and , but we will stay with the simpler case. Our larger hope is that this case embodies the core of the difficulty in Skolem’s problem, so that solving it might throw open the road to the full solution.
Let’s begin the proof for the case when is a prime . Suppose that . Recall
Clearly we can assume that . Note that this is decidable. Put . The key is to look at the quantity
where is a prime. We employ the following generalization of the binomial theorem:
where
The upshot is that all terms are divisible by a proper factor of except those from the cases , all other . Each gives a factor of and leaves the term . When is a prime this factor must include itself. Thus we get that for some of the form
where is an algebraic integer. But by the supposition this simplifies to , and so is divisible by . Thus
Since , too is divisible by . But is independent of Hence, acts as a bound on any possible prime such that . Testing the finitely many values of up to thus yields a decision procedure for this restricted case of Skolem’s problem.
Ken chimes in an observation that might be distantly related: The Vandermonde determinant
is the “smallest” alternating polynomial in variables. Together with the symmetric polynomials it generates all alternating polynomials. When the are the -th roots of unity it gives the determinant of the Fourier matrix up to sign. This determinant has absolute value
It is also the product of the lengths of the chords formed by equally-spaced points on the unit circle. The observation is that this 2-to-the-nearly-linear quantity is extraordinarily finely tuned.
To see how, let’s estimate the product of the chords in what is caricatured as the style of physicists: The length of an average chord is . So we can estimate the size of the product as
This is off by an order of magnitude in the exponent—not even close. We can be a little smarter and use the average length of a chord instead, integrating from to to get . This is still a number greater than and plugs in to yield anyway.
Such a calculation looks silly but isn’t. If we enlarge the circle by a factor of then every term in the product is multiplied by that factor and it dominates:
If we shrink the circle by the opposite happens: we divide by which crushes everything to make the analogous quantity virtually zero. Furthermore this “big crush” happens under more-plausible slight perturbations such as forbidding any of the points from occupying the arc between and radians, which prevents the equal-spacing maximization when . We covered this at length in 2011.
The underlying reality is that when you take the logarithm of the product of chords, the terms of all growth orders between and all magically cancel. There are many more chords of length than chords of length , but the latter can be unboundedly short in a way that perfectly balances the multitudes of longer chords. The actual value of seems tiny amidst these perturbative possibilities.
This gigantic cancellation reminds Dick and me of the present argument over the tiny observed magnitude of the cosmological constant . Estimation via quantum field theory prescribes a value 120 orders of magnitude higher—one that would instantly cause our universe to explode in fireworks—unless vast numbers of terms exactly cancel. Quoting Wikipedia:
This discrepancy has been called “the worst theoretical prediction in the history of physics” … the cosmological constant problem [is] the worst problem of fine-tuning in physics: there is no known natural way to derive the tiny [value] from particle physics.
“Fine-tuning” of constants without explanation is anathema to science, and many scientists have signed onto theories that there is a multiverse with 500 or more orders of magnitude of universes, enough to generate some with the tiny needed to allow life as we know it. However, any fine-tuning discovered in mathematics cannot be anathema. Perhaps the universe picks up the Fourier fine balancing act in ways we do not yet understand. More prosaically, the fine balance in quantities similar to above could be just what makes Skolem’s problem hard.
I believe that the general case of the Skolem can be handled, not just the simple case. But the problem of handling more than just primes seems hard. I believe that this method can be used to handle more cases than just primes. Ken and I are working on this. Meanwhile, we wish everyone Stateside a happy Fourth of July, whether or not that includes fireworks.
[added link to new survey in intro]
Oded Green, Marat Dukhan, and Richard Vuduc are researchers at Georgia Institute of Technology—my home institution. They recently presented a paper at the Federated Conference titled, “Branch-Avoiding Graph Algorithms.”
Today Ken and I would like to discuss their interesting paper, and connect it to quite deep work that arises in computational logic.
As a co-inventor of the Federated type meeting—see this for the story—I have curiously only gone to about half of them. One of the goals of these meetings is to get diverse researchers to talk to each other. One of the obstacles is that the language is often different. Researchers often call the same abstract concept by different names. One of my favorite examples was between “breadth-first search” and “garbage collection”—see the story here.
Another deeper reason is that they may be studying concepts that are related but not identical. We will study such an example today. It connects logicians with computer architects.
One group calls the algorithmic concept choicelessness and the other calls it reduced branching. This reminds us of the old song “Let’s Call the Whole Thing Off” by George and Ira Gershwin:
The song is most famous for its “You like to-may-toes and I like to-mah-toes” and other verses comparing their different regional dialects.
Theorists have been studying various restrictions on polynomial time algorithms for years. What is interesting—cool—is that recently some of these ideas have become important in practical algorithms. The goal is to explain the main ideas here.
A binary string is naturally an ordered structure. We could present via a number standing for and a list of such that . We could give that list in any order, but we generally can’t permute the labels —doing so would change the string. We avoid fussing with this and just give the bits of the string in the canonical order.
When the input is a graph , however, there are many possible orders in which the vertices can be labeled and the edges presented. Usually we give a number standing for and a list of edges, in some selected order. The point is that properties of the graph do not depend on the labeling. We would also like the output of algorithms not to depend on the order edges are presented. The relevant question is,
To what extent do the output and execution pattern of depend on the labeling used for the graph and the order of presenting edges?
Ideally there would be no dependence. However, consider a simple sequential breadth-first search (BFS) algorithm that maintains a list of not-yet-expanded members of the set of nodes reached from the start vertex . Whenever it must choose a next to expand. The choice is arbitrary unless determined by the labeling ( can be the least member of ) or by the presentation order of neighbors of some previous vertex that were enqueued. An ordering might be “lucky” for some graphs if it minimizes the number of times a neighbor is generated that already belongs to , and foreknowledge of and of being connected can help the algorithm know when to stop.
An example of giving different output—one that strikes us as a harder example—is the -time algorithm for finding a vertex cover of size . It cycles through sequences . For each one, it takes the edges in order, and if neither nor belongs to , adds to if the next bit of is , adding to otherwise. Not just the edge presentation but also the ordering used to distinguish from affects the final output —for reasons apart from the ordering of .
Can we reduce or eliminate the conditional branching that depends on the orderings? That is the question we see as common ground between the two concepts.
The search for ways to eliminate the dependence on the linear ordering led logicians, years ago, to the notion of PTime logics. In particular they asked if it is possible to create a model of polynomial time that does not depend on the ordering of the input. It was conjectured that such a model does not exist.
For example the work of Andreas Blass, Yuri Gurevich, and Saharon Shelah studies an algorithmic model and accompanying logic called CPT for “Choiceless Polynomial Time.” We’ll elide details of their underlying “Abstract State Machine” (ASM) model which comes from earlier papers by Gurevich, but simply note that the idea is to replace arbitrary choices that usual algorithms make with parallel execution. The key restriction is that the algorithms must observe a polynomial limit on the number of code objects that can be executing at any one time. Here is how this plays out in their illustration of BFS:
The algorithm successively generates the “levels” of nodes at distances from . In case is a neighbor of more than one , their ASM avoids having a separate branch for each edge and so avoids an exponential explosion. There are no more code objects than vertices active at any time, so the polynomial restriction is observed.
Not all polynomial-time decidable properties of graphs are so readily amenable to such algorithms, and indeed they prove that having a perfect matching is not expressible in their accompanying CPT logic even when augmented with knowledge of . At the time they pointed out that:
The resulting logic expresses all properties expressible in any other PTime logic in the literature.
A different extension by Anuj Dawar employing linear algebra is not known to be comparable with CPT according to slides from a talk by Wied Pakusa on his joint paper with Faried Abu Zaid, Erich Grädel, and Martin Grohe. That paper extends the following result by Shelah to more-general structures with two “color classes” than bipartite graphs:
Theorem 1
Bipartite matching is decidable in CPT+Counting.
There are many classes of graphs such that restricted to graphs in that class, the power of full PTime logic with order is no greater than CPT over that class. These include graphs of bounded treewidth and graphs with excluded minors, along lines of work by Grohe that we covered some time back. For other classes the power of CPT is open.
We sense a possible and surprising connection between this beautiful foundational work of Blass, Gurevich, and Shelah to an extremely important practical program.
When a modern computer computes and makes choices this causes it generally to lose some performance. The reason is that modern computers make predictions about choices: they call it branch prediction. If the computer can correctly predict the branch taken, the choice made, then the computation runs faster. If it fails to make the right prediction it runs slower.
Naturally much work has gone into how to make branch predictions correctly. In their recent paper, Green, Dukhan, and Vuduc study a radical way to improve prediction: make fewer branches. Trivially if there are fewer choices made, fewer branches to predict, they should be able to increase how often they are right. They consider two classic algorithms for computing certain graph problems: connected components and breadth-first search (BFS). Their main result is that by rewriting the algorithms to make fewer choices they can improve the performance by as much as 30%-50%. This suggests that one should seek graph algorithms and implementations that avoid as many branches as possible.
They add:
As a proof-of-concept, we devise such implementations for both the classic top-down algorithm for BFS and the Shiloach-Vishkin algorithm for connected components. We evaluate these implementations on current x86 and ARM- based processors to show the efficacy of the approach. Our results suggest how both compiler writers and architects might exploit this insight to improve graph processing systems more broadly and create better systems for such problems.
Here are the old-and-new algorithms for BFS in their paper:
Our speculative question is, can the latter algorithm be inferred from the CPT formalism of the logicians? How much “jiggery-pokery” of the previous illustration from the logicians’ paper would it take, or are we talking “applesauce”? More broadly, what more can be done to draw connections between “Theory A” and “Theory B” as discussed in comments to Moshe Vardi’s post here?
Are choices and branches really connected as we claim? Can we replace the search for algorithms that make no choices with algorithms that reduce choices, branches? In a sense can we make choices a resource like time and space. And not try to make it zero like choice free algorithms, but rather just reduce the number of choices. This may only lead to modest gains in performance, but in today’s world where speed of processors is not growing like in the past, perhaps this is a very interesting question.
This is post number 128 under our joint handle “Pip” by one WordPress count, though other counts say 127 or 129. As we wrote in that linked post: Sometimes we will use Pip to ask questions in a childlike manner, mindful that others may have reached definite answers already. We have thoroughly enjoyed the partnership and look forward to reaching the next power of 2.
]]>
Plus visiting Michael Rabin and talking about Gödel’s Theorems
Michael Ben-Or and Michael Rabin have won the 2015 Dijkstra Prize for Distributed Computing. The citation says,
In [two] seminal papers, published in close succession in 1983, Michael Ben-Or and Michael O. Rabin started the field of fault-tolerant randomized distributed algorithms.
Today Ken and I wish to congratulate both Michaels for the well deserved recognition for their brilliant work.
Ben-Or’s paper is titled, “Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols.” Rabin’s paper, “Randomized Byzantine Generals,” brought randomness to a distributed consensus problem whose catchy name had been coined in 1982. The award committee said:
Ben-Or and Rabin were the first to use randomness to solve a problem, consensus in an asynchronous distributed system subject to failures, which had provably no deterministic solution. In other words, they were addressing a computability question and not a complexity one, and the answer was far from obvious.
Their work has continued to have a huge impact on the whole area of distributed computing. The fact that a negative result, an impossibility result, could be circumvented by using randomness was quite surprising back then. It changed the face of distributed algorithms by showing that a problem with no deterministic solution might still have a solution provided randomness is allowed. Besides being a beautiful result, it has great practical importance, since randomness can be used in “real” algorithms.
We applaud the prize committee—Paul Spirakis, James Aspnes, Pierre Fraigniaud, Rachid Guerraoui, Nancy Lynch, and Yoram Moses—for making such a thoughtful choice, which shows that even “old” results can be recognized. It took years for Ben-Or and Rabin to be honored: we are glad they did not have to wait for the next power of two.
While we are thrilled to see Rabin honored for his work on distributed computing, we are even more excited to report that he is doing well. This year he had a serious health issue that required a major operation. The operation went well, but unfortunately during and after it Michael had serious complications.
I, Dick, just visited him at his home in Cambridge. I am happy to report that Michael seems to be on his way to a full recovery. He is as sharp as ever, and looks like he will be his normal self in the near future. This is great news.
We wish Michael and his wonderful wife Ruth and their daughters the best. His elder daughter, Tal Rabin, also works in cryptography; I heard her give a talk four years ago on their father-daughter research. Michael plans to be part of several upcoming events—a good sign that he is on the mend. Again our best wishes to Michael and his family.
Most of my conversation with Michael was about friends, gossip, and funny stories. As always it was fascinating to hear Michael tell stories—he is such a great story teller. As I left, after over two hours together, I talked to him about research—if only for a few minutes.
He surprised me by averring that he had been thinking anew about the famous Incompleteness Theorems of Kurt Gödel. Recall the first of these theorems implies that in any sufficiently powerful consistent theory, there exist true sentences that are unprovable, and the second says the consistency of the theory is one of these sentences. There are by now many proofs of these great results: some short and clever, some longer and more natural, some via unusual connections with other parts of mathematics.
What could be new here? Michael pointed to recent proofs he really liked. These include an AMS Notices article by Shira Kritchman and Ran Raz. The thrust is that one can view the Second Incompleteness Theorem through the lens of Kolmogorov complexity as a logical version of the famous Unexpected Examination paradox. As expressed by Timothy Chow, this paradox goes as follows:
A teacher announces in class that an examination will be held on some day during the following week, and moreover that the examination will be a surprise. The students argue that a surprise exam cannot occur. For suppose the exam were on the last day of the week. Then on the previous night, the students would be able to predict that the exam would occur on the following day, and the exam would not be a surprise. So it is impossible for a surprise exam to occur on the last day. But then a surprise exam cannot occur on the penultimate day, either, for in that case the students, knowing that the last day is an impossible day for a surprise exam, would be able to predict on the night before the exam that the exam would occur on the following day. Similarly, the students argue that a surprise exam cannot occur on any other day of the week either. Confident in this conclusion, they are of course totally surprised when the exam occurs (on Wednesday, say). The announcement is vindicated after all. Where did the students’ reasoning go wrong?
Michael said something about getting sharper bounds on the Kolmogorov complexity constant involved in their article. There wasn’t time to go into details, so we had to leave the discussion “incomplete.” So I asked Ken to try to help reconstruct what Michael was seeing and trying to do.
I, Ken, usually give only a quick taste of Gödel’s theorems in one lecture in Buffalo’s introductory graduate theory course. Let be the predicate that the Turing machine with numerical code never halts when run on empty input. Let be a strong effective formal system such as Peano arithmetic or set theory. Then I show (or give as homework) the following two observations:
Now if is not included in , there is an such that proves but is false. being false means that in the real world the machine does halt on input the empty string . Hence there is some finite number such that the decidable predicate (saying halts in steps) is true. By the strength assumption on it proves all true and false cases of , so proves , but since proves , also proves . This makes inconsistent.
Thus if is consistent, then is included in , and properly so by the c.e./not-c.e. reasoning. Taking any gives a true statement that cannot prove.
Gödel’s diagonal method shows how to construct such a , and Gödel’s definition of incompleteness also requires showing that cannot prove either. This is more subtle: proving when is true is not a violation of consistency as with proving when is false. Note that
For reasons we put in Gödel’s own voice at the end of our second interview with him, it is possible to have a model with a non-standard integer in which the statements and for all all hold. This is why Gödel originally used a stronger condition he called -consistency which rules out proving and all the statements . (As Wikipedia notes, since is a halting predicate the restricted case called -soundness is enough.) It took Barkley Rosser in 1935 to make this too work with just consistency as the assumption.
But if all we care about is having a true statement that cannot prove, the c.e./not-c.e. argument appealing to consistency is enough. Then comes the “meta-argument” that if could prove its own consistency, then because can prove the c.e./not-c.e. part of the argument, would prove . As Kritchman and Raz observe in related instances, this does not alone yield a concrete such that proves , which is the real contradiction needed to deduce the second theorem. Still, I think the above is a reasonable “hand-wave” to convey the import of the second theorem with minimal logical apparatus.
The question becomes, How concrete can we make this? Can we push the indeterminate quantity into the background and quantify ideas of logical strength and complexity in terms of a parameter that we can bound more meaningfully? Dick and I believe this objective is what attracted Rabin to the Kritchman-Raz article.
Kritchman and Raz obtain the second argument without hand-wave and with minimal “meta” by focusing on the Kolmogorov complexity of a binary string :
Here means the string length of —that is, the length of the program producing from empty tape. Now let us imagine a function —for Gregory Chaitin—that takes as parameters a description of and a number and outputs a program that does the following on empty tape:
Search through all proofs in of statements of the form ‘‘ and as soon as one is found, output .
Then , where is a constant independent of . Whenever exceeds a constant
where is the Lambert -function, we have .
Thus for there are no proofs in of the form , for any —else by running until it finds such a proof and outputs we prove and so expose the inconsistency of . Define ; then we need only find such that is true to prove the first theorem concretely. Most important, by simple counting that is provable in , such a must exist among the finite set of binary strings of length .
Kritchman and Raz conclude their argument by letting be the statement “at least strings have ” for , and (taking false). There is exactly one such that is true, but cannot even prove : By the truth of there are strings with . can guess and verify them all by running their programs with to completion. If proves then deduces that all other have , which is impossible by the choice of . Likewise, cannot prove since then it proves for every .
We start a ball rolling, however, by observing that via the counting argument, does prove . So either is false or is inconsistent. This turns around to say that if proves its own consistency, then proves that is false—which is like the “surprise exam” not being possible on the last day. But since proves , proves . Hence either is false or is inconsistent. This turns around to make deduce that its ability to prove its consistency implies the ability to prove . This rips right on through to make prove , which however it can do only if really is inconsistent. Thus cannot prove its own consistency—unless it is inconsistent—which is Gödel’s second theorem.
The article by Kritchman and Raz has the full formal proof-predicate detail. There is a “” lurking about—it’s the number of steps the programs outputting -es need to take—but the structure ensures that the eventuality of a non-standard length- computation outputting an never arises. The finiteness of and drives the improvement.
Along with Michael we wonder, what can be done further with this? Can we turn the underlying computability questions into complexity ones? The natural place to start is, how low can be? If is Peano arithmetic or set theory, can we take ? This seemed to be what Michael was saying. It depends on . And there is another thought here: We don’t need the length of to be bounded, but rather its own Kolmogorov complexity, . Can we upper-bound this—for set theory or Peano—by setting up some kind of self-referential loop?
The main open problem is how fast will Michael be back in form. We hope that the answer to this open problem is simple: very soon. Congratulations again to him and Michael Ben-Or on the prize, and Happy Father’s Day to all.
[fixed inequality after Lambert W]
Cropped from src1, src2 |
David Sanger and Julie Davis are reporters for the paper of record—the New York Times. Their recent article starts:
WASHINGTON—The Obama administration on Thursday announced what appeared to be one of the largest breaches of federal employees’ data, involving at least four million current and former government workers in an intrusion that officials said apparently originated in China.
The compromised data was held by the Office of Personnel Management, which handles government security clearances and federal employee records. The breach was first detected in April, the office said, but it appears to have begun at least late last year.
The target appeared to be Social Security numbers and other “personal identifying information,” but it was unclear whether the attack was related to commercial gain or espionage. …
Today Ken and I want to suggest a new approach to data breaches like this.
Before we explain the method let’s just say that it looks pretty grim for protecting data. If the White House cannot protect the emails of the President, and if the Office of Personnel Management cannot protect federal employees Social Security numbers, then perhaps it is time to give up.
We know huge sums are being spent on solving the security problem; many groups, centers, agencies, and researchers are working on it; countless conferences have their focus on this problem. But attacks occur, and we still lose information.
Perhaps it is a fundamental law of complex software systems that they will never be secure—that bad actors will always be able to break in and steal data. Perhaps this problem is unsolvable: not just hard, or expensive, or difficult. Perhaps it is as impossible to solve as trisecting an angle with only a ruler and a compass, or as impossible as the Halting Problem. Perhaps.
If this is the case, then we have a suggested approach to security that stops trying to solve the stealing-data problem. The approach is quite different and we will now explain it.
Jigoro Kano, the founder of the martial arts discipline Judo, once wrote:
In short, resisting a more powerful opponent will result in your defeat, whilst adjusting to and evading your opponent’s attack will cause him to lose his balance, his power will be reduced, and you will defeat him. This can apply whatever the relative values of power, thus making it possible for weaker opponents to beat significantly stronger ones.
Judo allows a “weaker” opponent to beat a “stronger” one. The idea is based on not resisting directly, but rather resisting indirectly. We believe that this principle can be used in security.
Our application of Judo is based on looking deeper at just what it means to steal some information such as your SSN. A SSN is only useful because there are transactions that are based on using it. The same goes for almost all the information that is being stolen. The information is only valuable because it can be used in some transaction that we wish to stop.
Thus the surrender idea is to assume that the data is going to be compromised. Perhaps we will make it even public? But we will start to protect the transactions in a way that does not rely on the false assumption that certain data is secret.
Our suggestions are far from new—indeed, they are being used all the time, and importantly, their safety and success have held up. Our point is to build a framework whose philosophy is not to do more than this, and to change expectations for the end-user experience. Here are some common examples:
What we sacrifice under the Judo philosophy is the “Swiss Bank” expectation that one golden key unlocks access with no questions asked.
Electronic filing of tax returns ought to be the most secure online transaction that most people partake in. Unlike electronic purchases, this happens just once a year, the partner is the U.S. Government, and the safeguards can embrace the whole of your identity with the government. Yet for each of the past few years there have been over a million cases of thieves filing false returns with stolen genuine personal information before the real person files, in order to hijack the refund.
Safeguarding details of your return is of course desirable, but it is off the point of safeguarding the refund transaction. Hence we say one shouldn’t rely on the same solution for both problems. Instead our attitude on the latter is that we should be prepared to just give up on the former—even if we have to be like Hillary Clinton or Mitt Romney.
What we believe needs to change is not what’s under the hood but rather what’s on our dashboard. We must forgo the passivity of thinking all one has to do is wait for the IRS message of deposit. There must be some validation of the destination that is interactive, such as asking a challenge question that you—the real you—provided last year.
The planted challenge question idea is an example of the static kind of knowledge-based authentication (KBA). There is also dynamic KBA, in which the questions are synthesized from information that the provider already has. These can be questions such as, what was the color of the car you bought in 2002? Both kinds of KBA are increasingly used against tax fraud.
Dynamic KBA can be used when there has been no prior interaction. There are further issues about how the provider gathers data for the questions. This Vermont government source notes issues with the use of public records. In keeping with our “surrender” motif, we don’t see how to stop this access—rather, we look to controls on how the access is used in transactions.
The Vermont source moves on to the idea of recording and analyzing patterns of keyboard use, which may be even more fraught. We wonder instead about a good way to blend KBA ideas with what we’ll call “access-based authentication” (ABA). Generalizing from the simple instance of using your e-mail to authenticate, the idea is to set up domains that only you have access to in their entirety.
To be sure, hackers might also gain access to your e-mail account used for validation, such as to roger a message about the destination of a tax refund. It won’t do for you to create a separate e-mail account used only with the IRS—rather we think such things play into hackers’ hands. Instead, your e-mail can safeguard the reality that only you use it. One idea is having a machine on which you are always logged in to your e-mail. This way any other activity shows up as supplementary.
The bad news in all of this is that assuring one’s identity is becoming a battle and there seems to be no simple way to assure victory. Our point is favor approaches that move the battle into areas an individual controls, opposed to ones controlled from outside.
Do identity protection and integrity of data use need a consistent paradigm more than new schemes?
]]>
Some examples of small insights that help
Cropped from src1, src2
Julia Chuzhoy and Chandra Chekuri are experts on approximation algorithms: both upper and lower bounds. Each is also interested in graph theory as it applies to algorithms.
Today Ken and I wish to talk about their recent papers on structural theorems for graphs.
The latest paper, by Chuzhoy, was just presented at STOC 2015 and is titled, “Excluded Grid Theorem: Improved and Simplified.” It builds on their joint papers, “Polynomial Bounds for the Grid-Minor Theorem” from STOC 2014 and “Degree-3 Treewidth Sparsifiers” from SODA 2015.
We note with amusement that the filename of the STOC 2014 paper on Chuzhoy’s website is improved-GMT.pdf while the new one is improved-improved-GMT-STOC.pdf. The versions of the GMT, for “Grid-Minor Theorem,” that they were improving had exponential bounds, notably one obtained in 1994 by Neil Robertson, Paul Seymour, and Robin Thomas. We concur with the authors that there must be an improved-improved-improved GMT out there. Perhaps we’ll see it at STOC 2016, but if the 3-2-1 progression keeps up, it will have zero authors.
Here is the main theorem that was improved between the STOC papers:
Theorem 1 There is a universal constant such that for every , every graph of treewidth has a graph grid-minor of size . Moreover the grid minor can be found by a randomized algorithm in time polynomial in the size of the graph.
This is a classic structural type graph theorem. It says that if a graph is sufficiently complex, then it must have a certain regular substructure. Here complexity is measured by the treewidth of the graph, and the substructure is that it contains a certain type of subgraph.
The in the 2014 paper’s proof is asymptotic to . The 2015 paper improves it to . There is however a ‘disimprovement': the sharper theorem has a nonconstructive element and no longer provides a polynomial-time algorithm for finding the minor with high probability. It does simplify the proof and fosters a framework for further progress.
Both treewidth and minors involve kinds of embeddings of other graphs into subgraphs or set systems based on . In the case of treewidth we have a tree and a mapping from nodes of to subsets of the vertices of . The two conditions for to be a tree decomposition of are:
The treewidth is the minimum such that has a tree decomposition by sets of size at most . This number makes trees have treewidth 1. If has a cycle , the conditions combine to force . The square grid has treewidth , while -vertex expanders have treewidth .
A graph is a minor of if there is a mapping from nodes of to disjoint subsets of such that:
Together these conditions mean one can obtain from by contracting each to a single vertex, then deleting edges until only those in remain. In Theorem 1, is a square grid. We’ve arranged the definitions to echo as much as possible. The correspondence is rough, but the tension between treewidth and grid minor-size comes out precisely in the proof.
Theorems of the above type often have quite complex proofs. As usual these proofs are broken down into pieces. The hierarchy is observation, claim, and lemma. Perhaps “claim” comes before “observation.” In any event the point is that we not only break long proofs into smaller chunks, we break down the conceptualization of their strategies. The sage Doron Zeilberger argues:
So blessed be the lemmas, for they shall inherit mathematics. Even more important than lemmas are observations, but that is another story.
I thought I would follow Doron’s suggestion and talk just about observations. They are used in their proofs and seem like they could be of independent interest.
One observation comes from the 1994 paper:
Lemma 2 There is a universal constant such that every -vertex planar graph is a minor of the square grid.
This amounts to observing that all planar networks can be compactly implemented on grids. The significance for Chuzhoy and Chekuri is that Theorem 1 extends to all planar graphs. In contraposed form:
Theorem 3 There is a universal constant such that for any planar -vertex graph , all graphs that do not have as a minor (i.e., for which is an “excluded minor”) have treewidth .
Note that the treewidth bound is independent of the size of .
Two of the important little observations from the 2014 Chekuri-Chuzhoy paper are:
Claim 4 Let be any set of non-negative integers, with and for all . Then we can efficiently compute a partition of such that and .
Claim 5 Let be a rooted tree such that for some positive integers and . Then has at least leaves or has a root-to-leaf path of length at least .
The new paper takes its jumping-off point from this little observation: Treewidth can be approximately conserved while minorizing down to a bounded-degree graph. This may sound surprising but isn’t—think of the fact that expander graphs can have degree 3. Building on advances in a paper by Ken-ichi Kawarabayashi and Yusuke Kobayashi and an independent paper by Seymour with Alexander Leaf (both of which also improved the exponent from the 1994 result), the SODA paper shows how to keep the treewidth of the degree-3 graph above . The motivation for going down all the way to degree 3 is another simple observation:
In graphs of degree 3, edge-disjoint path systems and node-disjoint path systems are the same on non-terminal nodes.
The plan and gadgets built for the proof come off well in slides for both Chekuri and Chuzhoy at a time when they had . Here is one slide that conveys the strategy; we’ve edited it to add a vertical bracket- as on the previous slide:
The disjoint path systems connect clusters of nodes that facilitate multiple switching and routing by obeying definitions like the following:
Definition 6 A node set is well-linked if for all with , , there are -many node-disjoint paths connecting and . This allows paths of length when and share vertices.
The maximum size of a well-linked set is another graph invariant. Bruce Reed proved:
Lemma 7 For graphs of treewidth , .
This concept and observation bridge between treewidth and problems of routing paths that build grid structures. Leaf and Seymour proved that having a path-of-clusters system of width enables one to find a grid minor of size on the side. To gain their tighter connections, the new papers relax the concept:
Definition 8 is -well-linked if for all with , , there is a flow from to of congestion at most .
The following result connects the bound to the flow-building task. It meets one definition of ‘observation’ insofar as it is used as the definition of “-well-linked” in some other works by the authors.
Observation 9 If is -well-linked in then for any partition of all of into sides and , the number of edges crossing the partition is at least the minimum of and .
The newest paper also uses versions where is constrained to be at most some number for which then becomes another upper bound on edges across the cut. This is the gateway to the rough-and-tumble combinatorial details, for which the latest paper and the talk slides are best to see. But we hope that sharing these observations conveys the flavor well.
What are your favorite observations?
]]>
Exponential hardness connects broadly to quadratic time
Cropped from src1, src2
Arturs Backurs and Piotr Indyk are student and advisor. The latter, Piotr, is one of the leading algorithms and complexity theorists in the world—what an honor it must be for Arturs to work with him as his advisor.
Today Ken and I want to talk about their paper on edit distance and an aspect that we find puzzling.
The paper in question is, “Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false).” It is in this coming STOC 2015.
What we find puzzling is that the beautiful connection it makes between two old problems operates between two different levels of scaling, “” and “” This messes up our intuition, at least mine.
I, Dick, have thought long and hard, over many years, about both the edit distance problem and about algorithms for satisfiability. I always felt that both algorithms should have much better than the “obvious” algorithms. However, I was much more positive about the ability for us to make a breakthrough on computing the edit distance, then to do the same for satisfiability.
The way of linking the two problems is to me quite puzzling. Quoting my favorite band, the Talking Heads:
… Now don’t you wanna get right with me?
(puzzling evidence)
I hope you get ev’rything you need
(puzzling evidence)Puzzling evidence
Puzzling evidence
Puzzling evidence …
The edit distance between two strings is defined as the minimum number of insertions, deletions, or substitutions of symbols needed to transform one string into another. Thus CAT requires three substitutions to become ATE, but it can also be done by one insertion and one deletion: pop the C to make AT and then append E to make ATE. Thus the edit distance between these two words is 2. The problem of computing the edit distance occurs in so many fields of science that it is hard to figure out who invented what first. The case of strings of length is easily seen to be computable in time quadratic, , by a dynamic programming algorithm that builds up edit distances between initial substrings.
Chak-Kuen Wong and Ashok Chandra proved this is optimal in the restricted model where one can only compare characters to each other. There are algorithms that beat quadratic by logarithmic factors—they essentially treat blocks of characters as one. But it remain open after much research whether there is an algorithm that runs in time of order , for example.
The problem is the usual question of testing Boolean clauses to see if they can all be satisfied at the same time by the same assignment to the Boolean variable. restricts to formulas in conjunctive normal form, and restricts to clauses with at most literals per clause.
Backurs and Indyk prove that if there exists such that edit distance can be decided in time , then there exists such that for formulas with variables and clauses can be solved in time . They build on a connection to SETH showed ten years ago by Ryan Williams in part of a wider-ranging paper.
The basic idea is how behaves with regard to partitions of the variables of a formula into two size- subsets, call them and . Let be the set of assignments to and to . For every assignment , let be the set of clauses it satisfies, and the remaining clauses which it does not satisfy. Similarly define and for . Then
Now let us identify with regarded as an -bit vector and similarly with , also re-labeling to be the set of -many ‘s, for ‘s. Then as Williams observed, is satisfiable if and only if we have a yes-instance of the following problem:
Orthogonal Vectors Problem (): Given two sets of -many length- binary vectors, are there and such that ?
It is obvious to solve in time by trying all pairs. The nub is what happens if we achieve anything slightly better in the exponent than quadratic, say time . Then with we get time
for , which contradicts SETH.
What’s puzzling is that the evidence against doing better than quadratic comes when is already exponential, . Moreover, the instances involved are ridiculously large, exponential sized, and we don’t even care that they have a succinct genesis in terms of . (Note that we have swapped the letters and from their paper—we find it helpful to keep “” the larger one.)
Backurs and Indyk itemize several problems to which this connection was extended since 2010, but we agree that Edit Distance () is the most striking addition to this list. Their new result is a kind of “SETH-reduction” from to Can we capture its essence without referencing each time?
The results here and before all use an unusual type of reduction. Ken and I think it would be useful to formalize this reduction, and try to understand its properties. It is not totally correct to call it simply a quasi-linear time reduction because multiple parameters are involved—we can call them and quite generally.
In the above case with and , if the clause size is fixed then we have so . It hence suffices to have a reduction from to that is computable in quasi-linear time, here meaning time . Indeed, we can allow time for any function .
When talking just about the problems and , however—without reference to — and are separate parameters with no relation specified. It suffices to say that the reduction is polynomial in and quasi-linear in . This is essentially what Backurs and Indyk do. Their “” is called ““; then they define , , and ; then they multiply , and so on. The details in their paper are considerable, involving an initial reduction from to a problem they call , and this is one reason we’d like to streamline the reduction concept.
If we assume , then “quasi-linear in and polynomial in ” is the same as “linear in and polynomial in .” Perhaps the latter phrase is the simplest and best way to define the reduction? However, we do not need to specify “polynomial in ” either. It is enough to have a suitable sub-exponential time in . For instances with we would need not , while for we would need .
Parameterized complexity defines reductions with two parameters, but the simplest ones are not exactly suitable. Finally, we wonder whether it would help to stipulate any of the specific structure that comes from including that the instances are succinct. Note that we once covered succinctness and a hypothesis roughly related to SETH (see this comment for a circuit form). This paper highlighted by Backurs and Indyk works from but says it could have used , while still not formalizing the reduction concept. Likewise their other major references; some work from and others not. The latest of them, by Karl Bringmann and Marvin Künnemann who show “SETH-hardness” for on binary strings, defines a framework for the gadgets used in all these reductions.
The remarks just above about the reduction time in “” make us recall the three most common levels of exponential hardness. The power index of a language was coined by Richard Stearns and Harry Hunt in a 1990 paper. It is the infimum of such that belongs to time . Their “Satisfiability Hypothesis” (SH) about the power index of satisfiability is much weaker than SETH, though not as weak as conjecturing merely a lower bound of .
The latter two come in slightly different versions bringing and/or a fixed in – into the picture, and of course all these versions might be false. SETH is distinguished as the closest to the upper bound. There are randomized algorithms for – that run in time for some and all , and can be replaced by in general.
Stearns and Hunt emphasized the effect of reductions on SH and the power index in general. The same can be done for ETH. But we remain equally puzzled about the issue of the size of the problems used in the reductions. We start with a SAT problem that uses bits in its description. This is viewed then eventually as an edit distance problem that uses exponential in bits. The is the edit problem is extremely large. Of course this is just fine, since we only claim to get an exponential time algorithm.
The point is that our intuition about the edit distance problem is all on problems that have modest size . I, Dick, actually had a student build a hardware machine that did modest size such problems many times faster than any computer. So all my intuition was—is—about small size edit problems. When the size of becomes astronomical my intuition may fall apart. Could this explain the reason that the result here seems puzzling?
So what is the complexity of computing the edit distance? Can we really not do better than the obvious algorithms? This seems hard, no puzzling, to us; but it may indeed be the case.
The 3SUM problem, which we recently covered, is attacked by some of these papers but has not of yet been brought within the scope of the reductions from . The decision tree upper bound has not yet yielded an algorithm that actually runs in time Yet perhaps the above kinds of reductions also generally preserve the decision-tree complexity? This would make the decision-tree result a real obstacle to showing “SETH-hardness” in that manner.
]]>