Thoralf Skolem was a mathematician who worked in mathematical logic, set theory, and number theory. He was the only known PhD student of Axel Thue, whose Thue systems were an early word-based model of computation. Skolem had only one PhD student, Öystein Ore, who did not work in logic or computation. Ore did, however, have many students including Grace Hopper and Marshall Hall, Jr., and Hall had many more including Don Knuth.
Today Ken and I try to stimulate progress on a special case of Skolem’s problem on linear sequences.
Although Ore worked mainly on ring theory and graph theory the seeds still collected around Skolem’s tree: Hall’s dissertation was titled “An Isomorphism Between Linear Recurring Sequences and Algebraic Rings.” Sequences defined by a finite linear operator are about the simplest computational process we can imagine:
The coefficients and initial values can be integers or relaxed to be algebraic numbers. Skolem posed the problem of deciding whether there is ever an such that .
This is a kind of halting problem. It seems like it should be simple to analyze—it is just linear algebra—but it has remained open for over 80 years. We have discussed it several times before. This 2012 survey by Joel Ouaknine and James Worrell, plus this new one, give background on this and some related problems.
Let be
where each is an algebraic integer. Our problem is:
Does there exist a natural number so that ?
This is a special case of the Skolem problem. It arises when the coefficients are the evaluations of the elementary symmetric polynomials at with alternating signs. For example, with we get
which for and gives
and so on. For we have
Then means If the are nonzero integers then for odd this is asking whether is a solution to Pierre Fermat’s equation, and we can simply answer “no.” Of course whether is a solution can be easier than asking whether the equation has a solution, but this shows our case contains some of the flavor of Fermat’s Last Theorem.
We can point up some minor progress on this problem. Our methods can handle somewhat more general cases where the sum of -th powers is multiplied by for some fixed constants and , but we will stay with the simpler case. Our larger hope is that this case embodies the core of the difficulty in Skolem’s problem, so that solving it might throw open the road to the full solution.
Let’s begin the proof for the case when is a prime . Suppose that . Recall
Clearly we can assume that . Note that this is decidable. Put . The key is to look at the quantity
where is a prime. We employ the following generalization of the binomial theorem:
where
The upshot is that all terms are divisible by a proper factor of except those from the cases , all other . Each gives a factor of and leaves the term . When is a prime this factor must include itself. Thus we get that for some of the form
where is an algebraic integer. But by the supposition this simplifies to , and so is divisible by . Thus
Since , too is divisible by . But is independent of Hence, acts as a bound on any possible prime such that . Testing the finitely many values of up to thus yields a decision procedure for this restricted case of Skolem’s problem.
Ken chimes in an observation that might be distantly related: The Vandermonde determinant
is the “smallest” alternating polynomial in variables. Together with the symmetric polynomials it generates all alternating polynomials. When the are the -th roots of unity it gives the determinant of the Fourier matrix up to sign. This determinant has absolute value
It is also the product of the lengths of the chords formed by equally-spaced points on the unit circle. The observation is that this 2-to-the-nearly-linear quantity is extraordinarily finely tuned.
To see how, let’s estimate the product of the chords in what is caricatured as the style of physicists: The length of an average chord is . So we can estimate the size of the product as
This is off by an order of magnitude in the exponent—not even close. We can be a little smarter and use the average length of a chord instead, integrating from to to get . This is still a number greater than and plugs in to yield anyway.
Such a calculation looks silly but isn’t. If we enlarge the circle by a factor of then every term in the product is multiplied by that factor and it dominates:
If we shrink the circle by the opposite happens: we divide by which crushes everything to make the analogous quantity virtually zero. Furthermore this “big crush” happens under more-plausible slight perturbations such as forbidding any of the points from occupying the arc between and radians, which prevents the equal-spacing maximization when . We covered this at length in 2011.
The underlying reality is that when you take the logarithm of the product of chords, the terms of all growth orders between and all magically cancel. There are many more chords of length than chords of length , but the latter can be unboundedly short in a way that perfectly balances the multitudes of longer chords. The actual value of seems tiny amidst these perturbative possibilities.
This gigantic cancellation reminds Dick and me of the present argument over the tiny observed magnitude of the cosmological constant . Estimation via quantum field theory prescribes a value 120 orders of magnitude higher—one that would instantly cause our universe to explode in fireworks—unless vast numbers of terms exactly cancel. Quoting Wikipedia:
This discrepancy has been called “the worst theoretical prediction in the history of physics” … the cosmological constant problem [is] the worst problem of fine-tuning in physics: there is no known natural way to derive the tiny [value] from particle physics.
“Fine-tuning” of constants without explanation is anathema to science, and many scientists have signed onto theories that there is a multiverse with 500 or more orders of magnitude of universes, enough to generate some with the tiny needed to allow life as we know it. However, any fine-tuning discovered in mathematics cannot be anathema. Perhaps the universe picks up the Fourier fine balancing act in ways we do not yet understand. More prosaically, the fine balance in quantities similar to above could be just what makes Skolem’s problem hard.
I believe that the general case of the Skolem can be handled, not just the simple case. But the problem of handling more than just primes seems hard. I believe that this method can be used to handle more cases than just primes. Ken and I are working on this. Meanwhile, we wish everyone Stateside a happy Fourth of July, whether or not that includes fireworks.
[added link to new survey in intro]
Oded Green, Marat Dukhan, and Richard Vuduc are researchers at Georgia Institute of Technology—my home institution. They recently presented a paper at the Federated Conference titled, “Branch-Avoiding Graph Algorithms.”
Today Ken and I would like to discuss their interesting paper, and connect it to quite deep work that arises in computational logic.
As a co-inventor of the Federated type meeting—see this for the story—I have curiously only gone to about half of them. One of the goals of these meetings is to get diverse researchers to talk to each other. One of the obstacles is that the language is often different. Researchers often call the same abstract concept by different names. One of my favorite examples was between “breadth-first search” and “garbage collection”—see the story here.
Another deeper reason is that they may be studying concepts that are related but not identical. We will study such an example today. It connects logicians with computer architects.
One group calls the algorithmic concept choicelessness and the other calls it reduced branching. This reminds us of the old song “Let’s Call the Whole Thing Off” by George and Ira Gershwin:
The song is most famous for its “You like to-may-toes and I like to-mah-toes” and other verses comparing their different regional dialects.
Theorists have been studying various restrictions on polynomial time algorithms for years. What is interesting—cool—is that recently some of these ideas have become important in practical algorithms. The goal is to explain the main ideas here.
A binary string is naturally an ordered structure. We could present via a number standing for and a list of such that . We could give that list in any order, but we generally can’t permute the labels —doing so would change the string. We avoid fussing with this and just give the bits of the string in the canonical order.
When the input is a graph , however, there are many possible orders in which the vertices can be labeled and the edges presented. Usually we give a number standing for and a list of edges, in some selected order. The point is that properties of the graph do not depend on the labeling. We would also like the output of algorithms not to depend on the order edges are presented. The relevant question is,
To what extent do the output and execution pattern of depend on the labeling used for the graph and the order of presenting edges?
Ideally there would be no dependence. However, consider a simple sequential breadth-first search (BFS) algorithm that maintains a list of not-yet-expanded members of the set of nodes reached from the start vertex . Whenever it must choose a next to expand. The choice is arbitrary unless determined by the labeling ( can be the least member of ) or by the presentation order of neighbors of some previous vertex that were enqueued. An ordering might be “lucky” for some graphs if it minimizes the number of times a neighbor is generated that already belongs to , and foreknowledge of and of being connected can help the algorithm know when to stop.
An example of giving different output—one that strikes us as a harder example—is the -time algorithm for finding a vertex cover of size . It cycles through sequences . For each one, it takes the edges in order, and if neither nor belongs to , adds to if the next bit of is , adding to otherwise. Not just the edge presentation but also the ordering used to distinguish from affects the final output —for reasons apart from the ordering of .
Can we reduce or eliminate the conditional branching that depends on the orderings? That is the question we see as common ground between the two concepts.
The search for ways to eliminate the dependence on the linear ordering led logicians, years ago, to the notion of PTime logics. In particular they asked if it is possible to create a model of polynomial time that does not depend on the ordering of the input. It was conjectured that such a model does not exist.
For example the work of Andreas Blass, Yuri Gurevich, and Saharon Shelah studies an algorithmic model and accompanying logic called CPT for “Choiceless Polynomial Time.” We’ll elide details of their underlying “Abstract State Machine” (ASM) model which comes from earlier papers by Gurevich, but simply note that the idea is to replace arbitrary choices that usual algorithms make with parallel execution. The key restriction is that the algorithms must observe a polynomial limit on the number of code objects that can be executing at any one time. Here is how this plays out in their illustration of BFS:
The algorithm successively generates the “levels” of nodes at distances from . In case is a neighbor of more than one , their ASM avoids having a separate branch for each edge and so avoids an exponential explosion. There are no more code objects than vertices active at any time, so the polynomial restriction is observed.
Not all polynomial-time decidable properties of graphs are so readily amenable to such algorithms, and indeed they prove that having a perfect matching is not expressible in their accompanying CPT logic even when augmented with knowledge of . At the time they pointed out that:
The resulting logic expresses all properties expressible in any other PTime logic in the literature.
A different extension by Anuj Dawar employing linear algebra is not known to be comparable with CPT according to slides from a talk by Wied Pakusa on his joint paper with Faried Abu Zaid, Erich Grädel, and Martin Grohe. That paper extends the following result by Shelah to more-general structures with two “color classes” than bipartite graphs:
Theorem 1
Bipartite matching is decidable in CPT+Counting.
There are many classes of graphs such that restricted to graphs in that class, the power of full PTime logic with order is no greater than CPT over that class. These include graphs of bounded treewidth and graphs with excluded minors, along lines of work by Grohe that we covered some time back. For other classes the power of CPT is open.
We sense a possible and surprising connection between this beautiful foundational work of Blass, Gurevich, and Shelah to an extremely important practical program.
When a modern computer computes and makes choices this causes it generally to lose some performance. The reason is that modern computers make predictions about choices: they call it branch prediction. If the computer can correctly predict the branch taken, the choice made, then the computation runs faster. If it fails to make the right prediction it runs slower.
Naturally much work has gone into how to make branch predictions correctly. In their recent paper, Green, Dukhan, and Vuduc study a radical way to improve prediction: make fewer branches. Trivially if there are fewer choices made, fewer branches to predict, they should be able to increase how often they are right. They consider two classic algorithms for computing certain graph problems: connected components and breadth-first search (BFS). Their main result is that by rewriting the algorithms to make fewer choices they can improve the performance by as much as 30%-50%. This suggests that one should seek graph algorithms and implementations that avoid as many branches as possible.
They add:
As a proof-of-concept, we devise such implementations for both the classic top-down algorithm for BFS and the Shiloach-Vishkin algorithm for connected components. We evaluate these implementations on current x86 and ARM- based processors to show the efficacy of the approach. Our results suggest how both compiler writers and architects might exploit this insight to improve graph processing systems more broadly and create better systems for such problems.
Here are the old-and-new algorithms for BFS in their paper:
Our speculative question is, can the latter algorithm be inferred from the CPT formalism of the logicians? How much “jiggery-pokery” of the previous illustration from the logicians’ paper would it take, or are we talking “applesauce”? More broadly, what more can be done to draw connections between “Theory A” and “Theory B” as discussed in comments to Moshe Vardi’s post here?
Are choices and branches really connected as we claim? Can we replace the search for algorithms that make no choices with algorithms that reduce choices, branches? In a sense can we make choices a resource like time and space. And not try to make it zero like choice free algorithms, but rather just reduce the number of choices. This may only lead to modest gains in performance, but in today’s world where speed of processors is not growing like in the past, perhaps this is a very interesting question.
This is post number 128 under our joint handle “Pip” by one WordPress count, though other counts say 127 or 129. As we wrote in that linked post: Sometimes we will use Pip to ask questions in a childlike manner, mindful that others may have reached definite answers already. We have thoroughly enjoyed the partnership and look forward to reaching the next power of 2.
]]>
Plus visiting Michael Rabin and talking about Gödel’s Theorems
Michael Ben-Or and Michael Rabin have won the 2015 Dijkstra Prize for Distributed Computing. The citation says,
In [two] seminal papers, published in close succession in 1983, Michael Ben-Or and Michael O. Rabin started the field of fault-tolerant randomized distributed algorithms.
Today Ken and I wish to congratulate both Michaels for the well deserved recognition for their brilliant work.
Ben-Or’s paper is titled, “Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols.” Rabin’s paper, “Randomized Byzantine Generals,” brought randomness to a distributed consensus problem whose catchy name had been coined in 1982. The award committee said:
Ben-Or and Rabin were the first to use randomness to solve a problem, consensus in an asynchronous distributed system subject to failures, which had provably no deterministic solution. In other words, they were addressing a computability question and not a complexity one, and the answer was far from obvious.
Their work has continued to have a huge impact on the whole area of distributed computing. The fact that a negative result, an impossibility result, could be circumvented by using randomness was quite surprising back then. It changed the face of distributed algorithms by showing that a problem with no deterministic solution might still have a solution provided randomness is allowed. Besides being a beautiful result, it has great practical importance, since randomness can be used in “real” algorithms.
We applaud the prize committee—Paul Spirakis, James Aspnes, Pierre Fraigniaud, Rachid Guerraoui, Nancy Lynch, and Yoram Moses—for making such a thoughtful choice, which shows that even “old” results can be recognized. It took years for Ben-Or and Rabin to be honored: we are glad they did not have to wait for the next power of two.
While we are thrilled to see Rabin honored for his work on distributed computing, we are even more excited to report that he is doing well. This year he had a serious health issue that required a major operation. The operation went well, but unfortunately during and after it Michael had serious complications.
I, Dick, just visited him at his home in Cambridge. I am happy to report that Michael seems to be on his way to a full recovery. He is as sharp as ever, and looks like he will be his normal self in the near future. This is great news.
We wish Michael and his wonderful wife Ruth and their daughters the best. His elder daughter, Tal Rabin, also works in cryptography; I heard her give a talk four years ago on their father-daughter research. Michael plans to be part of several upcoming events—a good sign that he is on the mend. Again our best wishes to Michael and his family.
Most of my conversation with Michael was about friends, gossip, and funny stories. As always it was fascinating to hear Michael tell stories—he is such a great story teller. As I left, after over two hours together, I talked to him about research—if only for a few minutes.
He surprised me by averring that he had been thinking anew about the famous Incompleteness Theorems of Kurt Gödel. Recall the first of these theorems implies that in any sufficiently powerful consistent theory, there exist true sentences that are unprovable, and the second says the consistency of the theory is one of these sentences. There are by now many proofs of these great results: some short and clever, some longer and more natural, some via unusual connections with other parts of mathematics.
What could be new here? Michael pointed to recent proofs he really liked. These include an AMS Notices article by Shira Kritchman and Ran Raz. The thrust is that one can view the Second Incompleteness Theorem through the lens of Kolmogorov complexity as a logical version of the famous Unexpected Examination paradox. As expressed by Timothy Chow, this paradox goes as follows:
A teacher announces in class that an examination will be held on some day during the following week, and moreover that the examination will be a surprise. The students argue that a surprise exam cannot occur. For suppose the exam were on the last day of the week. Then on the previous night, the students would be able to predict that the exam would occur on the following day, and the exam would not be a surprise. So it is impossible for a surprise exam to occur on the last day. But then a surprise exam cannot occur on the penultimate day, either, for in that case the students, knowing that the last day is an impossible day for a surprise exam, would be able to predict on the night before the exam that the exam would occur on the following day. Similarly, the students argue that a surprise exam cannot occur on any other day of the week either. Confident in this conclusion, they are of course totally surprised when the exam occurs (on Wednesday, say). The announcement is vindicated after all. Where did the students’ reasoning go wrong?
Michael said something about getting sharper bounds on the Kolmogorov complexity constant involved in their article. There wasn’t time to go into details, so we had to leave the discussion “incomplete.” So I asked Ken to try to help reconstruct what Michael was seeing and trying to do.
I, Ken, usually give only a quick taste of Gödel’s theorems in one lecture in Buffalo’s introductory graduate theory course. Let be the predicate that the Turing machine with numerical code never halts when run on empty input. Let be a strong effective formal system such as Peano arithmetic or set theory. Then I show (or give as homework) the following two observations:
Now if is not included in , there is an such that proves but is false. being false means that in the real world the machine does halt on input the empty string . Hence there is some finite number such that the decidable predicate (saying halts in steps) is true. By the strength assumption on it proves all true and false cases of , so proves , but since proves , also proves . This makes inconsistent.
Thus if is consistent, then is included in , and properly so by the c.e./not-c.e. reasoning. Taking any gives a true statement that cannot prove.
Gödel’s diagonal method shows how to construct such a , and Gödel’s definition of incompleteness also requires showing that cannot prove either. This is more subtle: proving when is true is not a violation of consistency as with proving when is false. Note that
For reasons we put in Gödel’s own voice at the end of our second interview with him, it is possible to have a model with a non-standard integer in which the statements and for all all hold. This is why Gödel originally used a stronger condition he called -consistency which rules out proving and all the statements . (As Wikipedia notes, since is a halting predicate the restricted case called -soundness is enough.) It took Barkley Rosser in 1935 to make this too work with just consistency as the assumption.
But if all we care about is having a true statement that cannot prove, the c.e./not-c.e. argument appealing to consistency is enough. Then comes the “meta-argument” that if could prove its own consistency, then because can prove the c.e./not-c.e. part of the argument, would prove . As Kritchman and Raz observe in related instances, this does not alone yield a concrete such that proves , which is the real contradiction needed to deduce the second theorem. Still, I think the above is a reasonable “hand-wave” to convey the import of the second theorem with minimal logical apparatus.
The question becomes, How concrete can we make this? Can we push the indeterminate quantity into the background and quantify ideas of logical strength and complexity in terms of a parameter that we can bound more meaningfully? Dick and I believe this objective is what attracted Rabin to the Kritchman-Raz article.
Kritchman and Raz obtain the second argument without hand-wave and with minimal “meta” by focusing on the Kolmogorov complexity of a binary string :
Here means the string length of —that is, the length of the program producing from empty tape. Now let us imagine a function —for Gregory Chaitin—that takes as parameters a description of and a number and outputs a program that does the following on empty tape:
Search through all proofs in of statements of the form ‘‘ and as soon as one is found, output .
Then , where is a constant independent of . Whenever exceeds a constant
where is the Lambert -function, we have .
Thus for there are no proofs in of the form , for any —else by running until it finds such a proof and outputs we prove and so expose the inconsistency of . Define ; then we need only find such that is true to prove the first theorem concretely. Most important, by simple counting that is provable in , such a must exist among the finite set of binary strings of length .
Kritchman and Raz conclude their argument by letting be the statement “at least strings have ” for , and (taking false). There is exactly one such that is true, but cannot even prove : By the truth of there are strings with . can guess and verify them all by running their programs with to completion. If proves then deduces that all other have , which is impossible by the choice of . Likewise, cannot prove since then it proves for every .
We start a ball rolling, however, by observing that via the counting argument, does prove . So either is false or is inconsistent. This turns around to say that if proves its own consistency, then proves that is false—which is like the “surprise exam” not being possible on the last day. But since proves , proves . Hence either is false or is inconsistent. This turns around to make deduce that its ability to prove its consistency implies the ability to prove . This rips right on through to make prove , which however it can do only if really is inconsistent. Thus cannot prove its own consistency—unless it is inconsistent—which is Gödel’s second theorem.
The article by Kritchman and Raz has the full formal proof-predicate detail. There is a “” lurking about—it’s the number of steps the programs outputting -es need to take—but the structure ensures that the eventuality of a non-standard length- computation outputting an never arises. The finiteness of and drives the improvement.
Along with Michael we wonder, what can be done further with this? Can we turn the underlying computability questions into complexity ones? The natural place to start is, how low can be? If is Peano arithmetic or set theory, can we take ? This seemed to be what Michael was saying. It depends on . And there is another thought here: We don’t need the length of to be bounded, but rather its own Kolmogorov complexity, . Can we upper-bound this—for set theory or Peano—by setting up some kind of self-referential loop?
The main open problem is how fast will Michael be back in form. We hope that the answer to this open problem is simple: very soon. Congratulations again to him and Michael Ben-Or on the prize, and Happy Father’s Day to all.
[fixed inequality after Lambert W]
Cropped from src1, src2 |
David Sanger and Julie Davis are reporters for the paper of record—the New York Times. Their recent article starts:
WASHINGTON—The Obama administration on Thursday announced what appeared to be one of the largest breaches of federal employees’ data, involving at least four million current and former government workers in an intrusion that officials said apparently originated in China.
The compromised data was held by the Office of Personnel Management, which handles government security clearances and federal employee records. The breach was first detected in April, the office said, but it appears to have begun at least late last year.
The target appeared to be Social Security numbers and other “personal identifying information,” but it was unclear whether the attack was related to commercial gain or espionage. …
Today Ken and I want to suggest a new approach to data breaches like this.
Before we explain the method let’s just say that it looks pretty grim for protecting data. If the White House cannot protect the emails of the President, and if the Office of Personnel Management cannot protect federal employees Social Security numbers, then perhaps it is time to give up.
We know huge sums are being spent on solving the security problem; many groups, centers, agencies, and researchers are working on it; countless conferences have their focus on this problem. But attacks occur, and we still lose information.
Perhaps it is a fundamental law of complex software systems that they will never be secure—that bad actors will always be able to break in and steal data. Perhaps this problem is unsolvable: not just hard, or expensive, or difficult. Perhaps it is as impossible to solve as trisecting an angle with only a ruler and a compass, or as impossible as the Halting Problem. Perhaps.
If this is the case, then we have a suggested approach to security that stops trying to solve the stealing-data problem. The approach is quite different and we will now explain it.
Jigoro Kano, the founder of the martial arts discipline Judo, once wrote:
In short, resisting a more powerful opponent will result in your defeat, whilst adjusting to and evading your opponent’s attack will cause him to lose his balance, his power will be reduced, and you will defeat him. This can apply whatever the relative values of power, thus making it possible for weaker opponents to beat significantly stronger ones.
Judo allows a “weaker” opponent to beat a “stronger” one. The idea is based on not resisting directly, but rather resisting indirectly. We believe that this principle can be used in security.
Our application of Judo is based on looking deeper at just what it means to steal some information such as your SSN. A SSN is only useful because there are transactions that are based on using it. The same goes for almost all the information that is being stolen. The information is only valuable because it can be used in some transaction that we wish to stop.
Thus the surrender idea is to assume that the data is going to be compromised. Perhaps we will make it even public? But we will start to protect the transactions in a way that does not rely on the false assumption that certain data is secret.
Our suggestions are far from new—indeed, they are being used all the time, and importantly, their safety and success have held up. Our point is to build a framework whose philosophy is not to do more than this, and to change expectations for the end-user experience. Here are some common examples:
What we sacrifice under the Judo philosophy is the “Swiss Bank” expectation that one golden key unlocks access with no questions asked.
Electronic filing of tax returns ought to be the most secure online transaction that most people partake in. Unlike electronic purchases, this happens just once a year, the partner is the U.S. Government, and the safeguards can embrace the whole of your identity with the government. Yet for each of the past few years there have been over a million cases of thieves filing false returns with stolen genuine personal information before the real person files, in order to hijack the refund.
Safeguarding details of your return is of course desirable, but it is off the point of safeguarding the refund transaction. Hence we say one shouldn’t rely on the same solution for both problems. Instead our attitude on the latter is that we should be prepared to just give up on the former—even if we have to be like Hillary Clinton or Mitt Romney.
What we believe needs to change is not what’s under the hood but rather what’s on our dashboard. We must forgo the passivity of thinking all one has to do is wait for the IRS message of deposit. There must be some validation of the destination that is interactive, such as asking a challenge question that you—the real you—provided last year.
The planted challenge question idea is an example of the static kind of knowledge-based authentication (KBA). There is also dynamic KBA, in which the questions are synthesized from information that the provider already has. These can be questions such as, what was the color of the car you bought in 2002? Both kinds of KBA are increasingly used against tax fraud.
Dynamic KBA can be used when there has been no prior interaction. There are further issues about how the provider gathers data for the questions. This Vermont government source notes issues with the use of public records. In keeping with our “surrender” motif, we don’t see how to stop this access—rather, we look to controls on how the access is used in transactions.
The Vermont source moves on to the idea of recording and analyzing patterns of keyboard use, which may be even more fraught. We wonder instead about a good way to blend KBA ideas with what we’ll call “access-based authentication” (ABA). Generalizing from the simple instance of using your e-mail to authenticate, the idea is to set up domains that only you have access to in their entirety.
To be sure, hackers might also gain access to your e-mail account used for validation, such as to roger a message about the destination of a tax refund. It won’t do for you to create a separate e-mail account used only with the IRS—rather we think such things play into hackers’ hands. Instead, your e-mail can safeguard the reality that only you use it. One idea is having a machine on which you are always logged in to your e-mail. This way any other activity shows up as supplementary.
The bad news in all of this is that assuring one’s identity is becoming a battle and there seems to be no simple way to assure victory. Our point is favor approaches that move the battle into areas an individual controls, opposed to ones controlled from outside.
Do identity protection and integrity of data use need a consistent paradigm more than new schemes?
]]>
Some examples of small insights that help
Cropped from src1, src2
Julia Chuzhoy and Chandra Chekuri are experts on approximation algorithms: both upper and lower bounds. Each is also interested in graph theory as it applies to algorithms.
Today Ken and I wish to talk about their recent papers on structural theorems for graphs.
The latest paper, by Chuzhoy, was just presented at STOC 2015 and is titled, “Excluded Grid Theorem: Improved and Simplified.” It builds on their joint papers, “Polynomial Bounds for the Grid-Minor Theorem” from STOC 2014 and “Degree-3 Treewidth Sparsifiers” from SODA 2015.
We note with amusement that the filename of the STOC 2014 paper on Chuzhoy’s website is improved-GMT.pdf while the new one is improved-improved-GMT-STOC.pdf. The versions of the GMT, for “Grid-Minor Theorem,” that they were improving had exponential bounds, notably one obtained in 1994 by Neil Robertson, Paul Seymour, and Robin Thomas. We concur with the authors that there must be an improved-improved-improved GMT out there. Perhaps we’ll see it at STOC 2016, but if the 3-2-1 progression keeps up, it will have zero authors.
Here is the main theorem that was improved between the STOC papers:
Theorem 1 There is a universal constant such that for every , every graph of treewidth has a graph grid-minor of size . Moreover the grid minor can be found by a randomized algorithm in time polynomial in the size of the graph.
This is a classic structural type graph theorem. It says that if a graph is sufficiently complex, then it must have a certain regular substructure. Here complexity is measured by the treewidth of the graph, and the substructure is that it contains a certain type of subgraph.
The in the 2014 paper’s proof is asymptotic to . The 2015 paper improves it to . There is however a ‘disimprovement': the sharper theorem has a nonconstructive element and no longer provides a polynomial-time algorithm for finding the minor with high probability. It does simplify the proof and fosters a framework for further progress.
Both treewidth and minors involve kinds of embeddings of other graphs into subgraphs or set systems based on . In the case of treewidth we have a tree and a mapping from nodes of to subsets of the vertices of . The two conditions for to be a tree decomposition of are:
The treewidth is the minimum such that has a tree decomposition by sets of size at most . This number makes trees have treewidth 1. If has a cycle , the conditions combine to force . The square grid has treewidth , while -vertex expanders have treewidth .
A graph is a minor of if there is a mapping from nodes of to disjoint subsets of such that:
Together these conditions mean one can obtain from by contracting each to a single vertex, then deleting edges until only those in remain. In Theorem 1, is a square grid. We’ve arranged the definitions to echo as much as possible. The correspondence is rough, but the tension between treewidth and grid minor-size comes out precisely in the proof.
Theorems of the above type often have quite complex proofs. As usual these proofs are broken down into pieces. The hierarchy is observation, claim, and lemma. Perhaps “claim” comes before “observation.” In any event the point is that we not only break long proofs into smaller chunks, we break down the conceptualization of their strategies. The sage Doron Zeilberger argues:
So blessed be the lemmas, for they shall inherit mathematics. Even more important than lemmas are observations, but that is another story.
I thought I would follow Doron’s suggestion and talk just about observations. They are used in their proofs and seem like they could be of independent interest.
One observation comes from the 1994 paper:
Lemma 2 There is a universal constant such that every -vertex planar graph is a minor of the square grid.
This amounts to observing that all planar networks can be compactly implemented on grids. The significance for Chuzhoy and Chekuri is that Theorem 1 extends to all planar graphs. In contraposed form:
Theorem 3 There is a universal constant such that for any planar -vertex graph , all graphs that do not have as a minor (i.e., for which is an “excluded minor”) have treewidth .
Note that the treewidth bound is independent of the size of .
Two of the important little observations from the 2014 Chekuri-Chuzhoy paper are:
Claim 4 Let be any set of non-negative integers, with and for all . Then we can efficiently compute a partition of such that and .
Claim 5 Let be a rooted tree such that for some positive integers and . Then has at least leaves or has a root-to-leaf path of length at least .
The new paper takes its jumping-off point from this little observation: Treewidth can be approximately conserved while minorizing down to a bounded-degree graph. This may sound surprising but isn’t—think of the fact that expander graphs can have degree 3. Building on advances in a paper by Ken-ichi Kawarabayashi and Yusuke Kobayashi and an independent paper by Seymour with Alexander Leaf (both of which also improved the exponent from the 1994 result), the SODA paper shows how to keep the treewidth of the degree-3 graph above . The motivation for going down all the way to degree 3 is another simple observation:
In graphs of degree 3, edge-disjoint path systems and node-disjoint path systems are the same on non-terminal nodes.
The plan and gadgets built for the proof come off well in slides for both Chekuri and Chuzhoy at a time when they had . Here is one slide that conveys the strategy; we’ve edited it to add a vertical bracket- as on the previous slide:
The disjoint path systems connect clusters of nodes that facilitate multiple switching and routing by obeying definitions like the following:
Definition 6 A node set is well-linked if for all with , , there are -many node-disjoint paths connecting and . This allows paths of length when and share vertices.
The maximum size of a well-linked set is another graph invariant. Bruce Reed proved:
Lemma 7 For graphs of treewidth , .
This concept and observation bridge between treewidth and problems of routing paths that build grid structures. Leaf and Seymour proved that having a path-of-clusters system of width enables one to find a grid minor of size on the side. To gain their tighter connections, the new papers relax the concept:
Definition 8 is -well-linked if for all with , , there is a flow from to of congestion at most .
The following result connects the bound to the flow-building task. It meets one definition of ‘observation’ insofar as it is used as the definition of “-well-linked” in some other works by the authors.
Observation 9 If is -well-linked in then for any partition of all of into sides and , the number of edges crossing the partition is at least the minimum of and .
The newest paper also uses versions where is constrained to be at most some number for which then becomes another upper bound on edges across the cut. This is the gateway to the rough-and-tumble combinatorial details, for which the latest paper and the talk slides are best to see. But we hope that sharing these observations conveys the flavor well.
What are your favorite observations?
]]>
Exponential hardness connects broadly to quadratic time
Cropped from src1, src2
Arturs Backurs and Piotr Indyk are student and advisor. The latter, Piotr, is one of the leading algorithms and complexity theorists in the world—what an honor it must be for Arturs to work with him as his advisor.
Today Ken and I want to talk about their paper on edit distance and an aspect that we find puzzling.
The paper in question is, “Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false).” It is in this coming STOC 2015.
What we find puzzling is that the beautiful connection it makes between two old problems operates between two different levels of scaling, “” and “” This messes up our intuition, at least mine.
I, Dick, have thought long and hard, over many years, about both the edit distance problem and about algorithms for satisfiability. I always felt that both algorithms should have much better than the “obvious” algorithms. However, I was much more positive about the ability for us to make a breakthrough on computing the edit distance, then to do the same for satisfiability.
The way of linking the two problems is to me quite puzzling. Quoting my favorite band, the Talking Heads:
… Now don’t you wanna get right with me?
(puzzling evidence)
I hope you get ev’rything you need
(puzzling evidence)Puzzling evidence
Puzzling evidence
Puzzling evidence …
The edit distance between two strings is defined as the minimum number of insertions, deletions, or substitutions of symbols needed to transform one string into another. Thus CAT requires three substitutions to become ATE, but it can also be done by one insertion and one deletion: pop the C to make AT and then append E to make ATE. Thus the edit distance between these two words is 2. The problem of computing the edit distance occurs in so many fields of science that it is hard to figure out who invented what first. The case of strings of length is easily seen to be computable in time quadratic, , by a dynamic programming algorithm that builds up edit distances between initial substrings.
Chak-Kuen Wong and Ashok Chandra proved this is optimal in the restricted model where one can only compare characters to each other. There are algorithms that beat quadratic by logarithmic factors—they essentially treat blocks of characters as one. But it remain open after much research whether there is an algorithm that runs in time of order , for example.
The problem is the usual question of testing Boolean clauses to see if they can all be satisfied at the same time by the same assignment to the Boolean variable. restricts to formulas in conjunctive normal form, and restricts to clauses with at most literals per clause.
Backurs and Indyk prove that if there exists such that edit distance can be decided in time , then there exists such that for formulas with variables and clauses can be solved in time . They build on a connection to SETH showed ten years ago by Ryan Williams in part of a wider-ranging paper.
The basic idea is how behaves with regard to partitions of the variables of a formula into two size- subsets, call them and . Let be the set of assignments to and to . For every assignment , let be the set of clauses it satisfies, and the remaining clauses which it does not satisfy. Similarly define and for . Then
Now let us identify with regarded as an -bit vector and similarly with , also re-labeling to be the set of -many ‘s, for ‘s. Then as Williams observed, is satisfiable if and only if we have a yes-instance of the following problem:
Orthogonal Vectors Problem (): Given two sets of -many length- binary vectors, are there and such that ?
It is obvious to solve in time by trying all pairs. The nub is what happens if we achieve anything slightly better in the exponent than quadratic, say time . Then with we get time
for , which contradicts SETH.
What’s puzzling is that the evidence against doing better than quadratic comes when is already exponential, . Moreover, the instances involved are ridiculously large, exponential sized, and we don’t even care that they have a succinct genesis in terms of . (Note that we have swapped the letters and from their paper—we find it helpful to keep “” the larger one.)
Backurs and Indyk itemize several problems to which this connection was extended since 2010, but we agree that Edit Distance () is the most striking addition to this list. Their new result is a kind of “SETH-reduction” from to Can we capture its essence without referencing each time?
The results here and before all use an unusual type of reduction. Ken and I think it would be useful to formalize this reduction, and try to understand its properties. It is not totally correct to call it simply a quasi-linear time reduction because multiple parameters are involved—we can call them and quite generally.
In the above case with and , if the clause size is fixed then we have so . It hence suffices to have a reduction from to that is computable in quasi-linear time, here meaning time . Indeed, we can allow time for any function .
When talking just about the problems and , however—without reference to — and are separate parameters with no relation specified. It suffices to say that the reduction is polynomial in and quasi-linear in . This is essentially what Backurs and Indyk do. Their “” is called ““; then they define , , and ; then they multiply , and so on. The details in their paper are considerable, involving an initial reduction from to a problem they call , and this is one reason we’d like to streamline the reduction concept.
If we assume , then “quasi-linear in and polynomial in ” is the same as “linear in and polynomial in .” Perhaps the latter phrase is the simplest and best way to define the reduction? However, we do not need to specify “polynomial in ” either. It is enough to have a suitable sub-exponential time in . For instances with we would need not , while for we would need .
Parameterized complexity defines reductions with two parameters, but the simplest ones are not exactly suitable. Finally, we wonder whether it would help to stipulate any of the specific structure that comes from including that the instances are succinct. Note that we once covered succinctness and a hypothesis roughly related to SETH (see this comment for a circuit form). This paper highlighted by Backurs and Indyk works from but says it could have used , while still not formalizing the reduction concept. Likewise their other major references; some work from and others not. The latest of them, by Karl Bringmann and Marvin Künnemann who show “SETH-hardness” for on binary strings, defines a framework for the gadgets used in all these reductions.
The remarks just above about the reduction time in “” make us recall the three most common levels of exponential hardness. The power index of a language was coined by Richard Stearns and Harry Hunt in a 1990 paper. It is the infimum of such that belongs to time . Their “Satisfiability Hypothesis” (SH) about the power index of satisfiability is much weaker than SETH, though not as weak as conjecturing merely a lower bound of .
The latter two come in slightly different versions bringing and/or a fixed in – into the picture, and of course all these versions might be false. SETH is distinguished as the closest to the upper bound. There are randomized algorithms for – that run in time for some and all , and can be replaced by in general.
Stearns and Hunt emphasized the effect of reductions on SH and the power index in general. The same can be done for ETH. But we remain equally puzzled about the issue of the size of the problems used in the reductions. We start with a SAT problem that uses bits in its description. This is viewed then eventually as an edit distance problem that uses exponential in bits. The is the edit problem is extremely large. Of course this is just fine, since we only claim to get an exponential time algorithm.
The point is that our intuition about the edit distance problem is all on problems that have modest size . I, Dick, actually had a student build a hardware machine that did modest size such problems many times faster than any computer. So all my intuition was—is—about small size edit problems. When the size of becomes astronomical my intuition may fall apart. Could this explain the reason that the result here seems puzzling?
So what is the complexity of computing the edit distance? Can we really not do better than the obvious algorithms? This seems hard, no puzzling, to us; but it may indeed be the case.
The 3SUM problem, which we recently covered, is attacked by some of these papers but has not of yet been brought within the scope of the reductions from . The decision tree upper bound has not yet yielded an algorithm that actually runs in time Yet perhaps the above kinds of reductions also generally preserve the decision-tree complexity? This would make the decision-tree result a real obstacle to showing “SETH-hardness” in that manner.
]]>
Our condolences
Awesome Stories source
John Nash and his wife Alicia were killed in a taxi accident on the New Jersey Turnpike Saturday afternoon. They were on their way back from Norway where he and Louis Nirenberg had just accepted the 2015 Abel Prize. Besides meeting the king of Norway, Nash had also expressed a desire to meet world chess champion Magnus Carlsen during remarks at Princeton’s celebration of his Abel Prize in March, and that was also granted this past week.
Today Dick and I join many expressing shock and offering condolences.
The time spans for the Nashes are really 1957–1963, 1970–2000, and 2001–2015. They were married in 1957, two years before the onset of John’s schizophrenia. Their divorce in 1963 came amid a nine-year series of hospitalizations that began in 1961. After his discharge in 1970, she felt he would be better off boarding in a room of the house she’d moved to in Princeton Junction than in an institution or on his own. This stable arrangement with a balance of connection and detachment is credited with helping him regain normalcy as his paranoia abated in the late 1980s. Lucidity in the 1990s cleared the way for his Nobel Prize and re-integration into Princeton’s mathematics community and with his family. He and Alicia re-married in 2001 and both contributed to a positive equilibrium.
I first saw the news early Sunday morning on BBC.com while going to check the England-New Zealand Test cricket score. It was the lead story on CNN.com for part of this afternoon. Many stories continue to emerge, among which we mention the NY Times (longer obit), Princeton, a local Princeton site, and appreciations by Peter Woit of Columbia and by Frederic Friedel of ChessBase GMBH. The last two include short personal recollections.
Abel Prize source; see also our earlier post. |
Alicia’s life was one of boldness and courage from the day she emigrated with her family from El Salvador in 1944. Her father had followed her uncle fearing backlash against their aristocratic family from a popular insurrection. Hearing about the nuclear bomb on the radio inspired her to become an atomic physicist, and she worked hard to become one of only seventeen women in MIT’s class of 1955.
Nash was then in MIT’s math department, and she met Nash on a warm day in the first lecture of his advanced calculus course. She defiantly re-opened windows Nash had closed to shut out noise and left the class with an indelible crush. They started dating after she earned her physics degree. Her attachment to him survived the literal sudden emergence of his previous female partner and their child, and the revelation to some unknown degree of previous male partners. They married in early 1957 and shortly began raising a family.
The rapid phase of Nash’s descent in early 1959 was “like a tornado.” One must marvel at how she coped in the 1960s with his condition, raising their son, and finding computer-related work near Cambridge and later Princeton. She even brought a sex-and-job discrimination lawsuit against Boeing that was settled in a helpful manner in 1973, before she found a permanent computer programming position with New Jersey Transit. She then had to deal with schizophrenia in their son John, whom I briefly got to know for several days in 1980 when came around to Princeton’s Quadrangle Club.
In recent years she advocated for support of community mental health programs that enable patients to live outside hospitals. She met with New Jersey state officials during budget negotiations in 2009. Our hearts go out most to their son.
The game of Hex is played on an rhombus of hexagons, or as Nash originally conceived it, on an checkerboard in which squares count as adjacent also on the SW-NE diagonal but not SE-NW. White and Black alternate placing stones of their color, with White trying to form a path connecting the north and south borders, Black east-west. The corner squares count as border for either player. The game cannot end in a draw, and this fact is highly non-trivial: as shown neatly in these notes it is equivalent Brouwer’s Fixed-Point Theorem, which was later instrumental to Nash’s great work on two-player non-zero-sum games.
Wolfram MathWorld source |
Hex was originally invented in 1942 by the Danish polymath Piet Hein. Nash conceived it independently from an angle that leaps out at us from this account by his fellow Princeton graduate student David Gale, as quoted in Sylvia Nasar’s biography A Beautiful Mind. Nash hailed Gale in the Princeton Graduate College quadrangle, saying:
“Gale! I have an example of a game with perfect information. There’s no luck, just pure strategy. I can prove that the first player always wins, but I have no idea what his strategy will be. If the first player loses at this game, it’s because he’s made a mistake, but nobody knows what the perfect strategy is.”
For board sizes above , Nash’s words are still true, even with the computing might and sophisticated algorithms exerted in this 2013 paper by Jakub Pawlewicz and Ryan Hayward. The nub was proved in 1976 by Shimon Even and Bob Tarjan: deciding who stands to win a given Hex position on an board is -complete.
We believe that Nash, plausibly earlier than Kurt Gödel, already concretely felt the contours of complexity theory for practical problems. Here it was blended with fascination at the power of a quick non-constructive argument to convey knowledge that is hard to obtain programmatically:
The depth of Nash’s thoughts about complexity was revealed in early 2012 when a handwritten letter (typed version) he wrote in 1955 to the National Security Agency was declassified. He proposed a cryptosystem with possible keys and continued:
Now my general conjecture is as follows: For almost all sufficiently complex types of enciphering, especially where the instructions given by different portions of the key interact complexly with each other in the determination of their ultimate effects on the enciphering, the mean key computation length increases exponentially with the length of the key, or in other words, with the information content of the key.
The significance of this general conjecture, assuming its truth, is easy to see. It means that it is quite feasible to design ciphers that are effectively unbreakable. As ciphers become more sophisticated the game of cipher breaking by skilled teams, etc., should become a thing of the past.
The nature of this conjecture is such that I cannot prove it, even for a special type of ciphers. Nor do I expect it to be proven. But this does not destroy its significance. The probability of the truth of the conjecture can be guessed at on the basis of experience with enciphering and deciphering.
If qualified opinions incline to believe in the exponential conjecture, then I think we (the U.S.) cannot afford not to make use of it.
The only words here that may not be amazingly prescient—including a rimshot on the distinction between string length and information complexity—are the idea that complexity makes concrete efforts at breaking “a thing of the past.” Even for Hex, the -completeness result might not prevent the existence of a succinctly described and efficient first-player strategy. Such a strategy might avoid the kind of positions used for the hardness proof. We are reminded in crypto by a just-released paper and website on the new Logjam attack on Diffie-Hellman key exchange. This is excellently covered by Scott Aaronson here, and has echoes of faults with RSA implementations that likewise fail to diversify the prime numbers used.
Is there a succinct description of an efficient strategy for winning at Hex?
Our condolences again to the family and friends of the Nashes.
]]>
Or rather, what can the shapes of proofs tell us about them?
April CACM source
Juris Hartmanis did much to lay the landscape of computational complexity beginning in the 1960s. His seminal paper with Richard Stearns, “On the Computational Complexity of Algorithms,” was published 50 years ago this month, as observed by Lance Fortnow in his blog with Bill Gasarch. It is a great achievement to open a new world, but all the more mysterious that after 50 years so much of its landscape remains unknown.
Today we ask what might determine the unseen topography and how much some recent large-data discoveries may help to map it.
The idea for this post arose from a possibly phantom memory that Juris (co-)wrote a short draft survey on “Shapes of Computations” sometime in 1986–1989 when I was at Cornell. I recall the specific phrase “long and skinny” to describe space-bounded computations. An space-bounded computation can explore all of a polynomial -sized undirected graph by saving just the current node and some auxiliary information by which to choose a neighbor for the next step. The trace of this computation becomes an -length sequence of -bit strings. A polynomial-space computation doing an exponential search of a game tree has the same long-and-skinny “shape” even though the scale is greater with regard to the input length . Polynomial time-bounded computations, however, can use polynomial space, whereupon they become “short and fat.” Breadth-first search of a graph is a canonical algorithm that hogs space for its relatively short duration.
Which computations fall between these “Laurel and Hardy” extremes? For SAT and the other -complete problems, this is the great question. The surest way to separate from and from would be to characterize these problems by a third distinct shape of computation. But we have not even separated from , nor logspace from , so what can we pretend to know?
My memory has probably been transferred from a column Juris wrote with his students Richard Chang, Desh Ranjan, and Pankaj Rohatgi for the May 1990 issue of the Bulletin of the EATCS. It has this nice diagram:
The analogy between computations and proofs has been instrumental since the early days of Kurt Gödel and Alonzo Church and Alan Turing. Proofs do, however, give nondeterminism “for free”; is treated the same as in their diagram, same as , while “nondeterministic polynomial space” equals . Hence I’ve regarded “proofs” as secondary to “computations” as objects for complexity. However:
The EATCS column defines a notion of width of a proof, characterizes via polynomial width proofs, and marvels at how the classic interactive protocol for retains the “skinny” shape with less length. Indeed, in cases where the verifier is able directly to check evaluations of the unique multilinear extension of the arithmetization of a quantified Boolean formula, every proof step involves just two terms for the rounds to . A related form of skinniness had been brought out by Jin-Yi Cai and Merrick Furst in their 1987 paper, “ Survives Three-Bit Bottlenecks.” The column goes on to emphasize that the form of the proof lends itself to quicker probabilistic verification. This aspect was shortly codified in the definition of probabilistically checkable proof, which lends itself most readily to characterize and .
Amid all this development on “long and skinny” proofs, what can we say about “short and fat” ones? Intuitively, such proofs have lots of cases, but that is not their full story. The network of logical dependence matters too. Hence we think there are most helpfully three kinds of proofs in regard to shape:
Direct evaluations of quantified Boolean formulas in have type 2, while the interactive proof with polynomials gives the “feel” of type 1 to both the prover and the recipient of the proof.
Chess problems prefer types 1 or a limited form of 2 for esthetics. The website ChessBase.com recently republished the longest branching-free “Mate-In-N” problem ever created, by Walther Jörgenson in 1976. It is mate-in-203 with no alternative move allowed to the winner, and virtually no case analysis of alternate defensive tries by the loser either.
However, a chess search often has type 3. Often there will be different starting sequences of moves that come to the same position . The value of that was computed the first time is stored in a hash table so that the later sequences are immediately shown that value, cutting off their need for any further work. This resembles breadth-first search insofar as marked nodes may be touched later along other paths. The dependencies of values become web-like. Injured values from hash collisions can cause huge ripple effects, as I covered in a post three years ago.
The stored-hash tactic is much the same as a lemma in a proof that is used multiple times. We suspect that last year’s 6-gigabyte computer-generated proof of discrepancy > 2 in the Erdős Discrepancy Conjecture has many such lemmas, and hence is more type 3 than 2. The provers Boris Konev and Alexei Lisitsa have an updated page with further computations. They do not link the whole impossibility proof of a length 1,161-or-more sequence of discrepancy 2, but do give some of the “DRUP” certificates of unsatisfiability. DRUP stands for reverse-unit propagation with clause deletion, and that propagation strikes us as largely composed of cases and lemmas. The subcases might be codable at high level via predicates for m < 1,161 expressing the unavailability of length- subsequences fulfilling some extra conditions , with such predicates being copiously re-used.
One finds an explosion of stored sub-cases in chess endgame tables. However, in many positions a vast majority of them are to prove wins that a chess master would see as “trivially true” in a few seconds. In other cases an alternative by the loser might simply jump to a later stage of the main trunk line, thus merely accelerating the same loss rather than posing a new case. (Similarly, an alternative by the winner might merely allow the defender to wind back to an initial stage, without much need for separate verification.) We wonder how far this pertains to the longest endgame win discovered in the new 7-piece endgame tables–mate in 545 moves. That is, how large is the part of the proof tree that needs to be specified, so that a chess program given values for positions in the tree could verify the rest via its local search?
This last eventuality prompts our new speculation: Perhaps we can rigorously develop a science of when large sections of proofs can be effectively handwaved. This would appeal to the computational style most often postulated for our human brains: shallow but broad and with great power of heuristic association, analogous to the complexity class of poly-size constant-depth threshold circuits. Even when yoked with computers the added-value of our grey matter is not to be discounted, as attested in my joint paper last summer—in section 6 showing how human-computer tandems performed better at chess in 2005–08 than computers alone.
We have recently twice covered a conjecture by Freeman Dyson that one feels should lend itself to this kind of treatment. Many other open conjectures in number theory are felt to be “probably” true, where “probably” has a technical sense that might be developed further into some kind of dependent structure: if that handwave is valid then all-the-more certainly so is this one. The idea could be helped by enumeration of exceptions that, once handled, enable the rest of the proof to be executed with a broad brush. As linked from an essay post by Scott Aaronson, Tim Gowers relates a relevant opinion by Don Zagier in a MathOverflow comment. We morph this opinion to say that mathematicians may need a “handwave heuristic” simply because many “obviously true” statements don’t connect to anything substantial that would give a reason for their truth.
This could push many proofs of type 3 toward types 2 or 1. Note that in the way interaction and randomness combine to move type 2 toward type 1, we are already agreeing to tolerate a chance of error. It is the nature of the kind of error involved in converting instances of type 3 into type 2 that needs further consideration. We wonder whether current developments such as homotopy type theory are embracing not just exact patterns but also heuristics for when a search is overwhelmingly likely to succeed—or to fail.
This still leaves our original question of shapes of computations. In the past both Dick and I have undertaken some exploration of conditions under which computations might be similarly “self-improved.” That idea will have to be for another time.
Can we assign particular “shapes” of computations canonically to specific computational problems? Can this help guide concrete attacks, or is it no more than tantamount to solving the big open questions to begin with?
Again we congratulate Juris and Richard on this 50th anniversary of their achievement. We also tip our hat to a comment by John Sidles in our “Case Against Cases” post which partly prompted this further essay. Much more can be said about “shapes of proofs”, including the “tree-like” and “DAG-like” proofs investigated by Steven Cook and Robert Reckhow in the 1970s.
[word changes to second paragraph of “short-cutting” section, ending, and caption, added note with Cook-Reckhow]
Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.
Today we congratulate them on their achievement and describe part of how their new result works.
What exactly did they prove? As some have already remarked, how one expresses this says something about communications in our field. Exactly what they proved is:
Theorem 1 For some explicit constant and all depths up to , there is an -ary monotone Boolean function that has a simple -size circuit (indeed, a formula) of depth with unbounded fan-in and gates, but such that every circuit of depth with the opposite kind of gate at its output as , or with bottom fan-in at most at the inputs, either has size above or else agrees with on at most a proportion of the inputs.
That is quite a mouthful. We can at least simplify it—as they do up front in their paper—by noting that every circuit of depth trivially obeys both of the stated restrictions on circuits of depth :
Theorem 2 Every Boolean circuit of depth and size at most gets at least a fraction of the inputs wrong with regard to computing .
Johan Håstad’s famous 1986 PhD thesis had proved a similar lower bound only in the worst case. If this still seems hard to parse, however, here is a consequence they proved. The total influence of a Boolean function is the sum from to of the proportion of for which flipping the -th bit of changes the value .
Theorem 3 For some non-constant depths and size functions greater than quasi-polynomial in , there are monotone Boolean functions whose total influence is only , but that still cannot be approximated better than a fraction by circuits of depth and size at most .
This gives a big “No” to a question in two posts by our friend Gil Kalai in 2010 on his blog and in 2012 on StackExchange. It rules out any kind of strong converse to a famous 1993 theorem of Nathan Linial, Yishay Mansour, and Noam Nisan, later improved by Ravi Boppana, showing that small constant-depth circuits compute functions of low sensitivity. This theorem has many applications, so the big bound against a converse is significant news, but perhaps the above statement still does not come trippingly off the tongue. Well, here’s something else they proved:
Theorem 4 Relative to a random oracle , the polynomial hierarchy is infinite.
Now that’s a nice short statement that can grab people—at least people like us who did complexity in the 1980s and before. We know this as an open problem going back even before Charlie Bennett and John Gill proved that for a random in 1981.
However, there is not much surprise and not much mileage in that statement. It was believed even before Ron Book observed in 1994 that its negation collapses the hierarchy without an oracle: Given a relativizable class , define to be the set of languages such that the measure of giving is zero. The measure is properly on the space of infinite 0-1 sequences but it is OK to think of the usual Lebesgue measure on where e.g.
denotes the set of primes, ignoring the clash between finite sets like and their co-finite counterparts like that map to the same dyadic rational number.
For classes enjoying certain closure properties, equals . The dot means we have -machines each of whose random branches ends with one query to the given language in , whose answer becomes the result of that branch. For instance, equals the Arthur-Merlin class . Now if for every , then by standard 0-1 laws the measure of putting is zero. Then by the fact that a countable union of sets of measure zero has measure zero, the hierarchy is infinite for a random oracle. Hence its collapse for a random oracle implies for some . This in turn would collapse the hierarchy to without oracle, much as collapses it to .
The point in is that random oracles furnish random bits for the computations. The random oracles could be richer in that exponentially many poly-length computations could use exponentially many random oracle bits in-toto. But the aforementioned closure properties together with probability amplification bridge the difference to machines using polynomially many independent bits when is a hierarchy level.
The point in the new lower bound is that the random oracle bits connect instead to distributions over the input space in an average-case argument. This connection is expressed well in their paper.
Sometimes a new idea comes from a new mathematical object, but other times it comes from a new way of applying and controlling a known object. Leslie Valiant in the late 1970s introduced the kind of projection that can do the following to each variable in a Boolean or numerical formula:
An equivalent rule to the last is that you can rename two variables to be the same variable . The substitution applies simultaneously to every occurrence of a variable. By undoing this rule one can convert every formula with occurrences of variables into a formula with different variables each occurring once. This is called a read-once formula and is unique up to permuting or renaming variables, so every is a projection of a read-once formula. The formulas targeted in their proof are already read-once, so the game becomes how to analyze the formulas that arise as projections of .
When only the first two rules are applied, is a restriction of . Håstad’s 1986 proof technique analyzed random restrictions obtained by independently leaving each variable alone with some probability and then assigning a random 0-or-1 value to each variable not left alone. Restrictions applied to a read-once formula not only preserve its structure but also preserve read-onceness, which keeps the behavior of different parts of the formula independent. This independence is sacrificed by using projections, which could “entangle” different parts of the formula and thereby worsen bias.
The proof technique both then and now works by reductio ad absurdum on the depth . is always a monotone alternating – formula of a kind introduced by Mike Sipser with at the inputs, so the output gate is if is even, if odd. It is “tuned” by giving the fan-in at leach level , , so . Håstad had shown that with the right combination of and , a random restriction makes equivalent to a formula with a large enough expected number of variables that embeds the relevant structure of . A circuit of small enough size and depth , however, with high probability gets beaten down by the restrictions into a simpler form on a pace that cannot be sustained for keeping up with .
The reductio can then continue with , until an obvious contradiction is reached on going from to with the circuit that results. One can with more care actually structure the proof as a standard induction going from lower bounds for to the desired ones for , but the mechanism is the same either way.
For an average-case argument one also needs to preserve the balance between arguments giving and those giving . Otherwise the trivial circuit that always says “no” or its partner that always says “yes” already has more advantage than the desired conclusion allows. This is where one might first think of the third projection rule as counter-productive. By identifying and , any “bias” in might be propagated to more parts of the formula. However, this also intuitively creates greater sensitivity in terms of influence as defined above. If is sensitive, then letting with bit flipped, we have that and balance each other.
Rossman, Servedio, and Tan craft the argument, using a slightly different “tuning” of from Håstad’s, so that the benefits carry the day: retains its structure and balance and hardness but the prospective smaller-depth circuit “collapses.” As they note, random projections had been used in a 2001 paper by Russell Impagliazzo and Nathan Segerlind for a proof complexity lower bound, but apparently had not been applied so bluntly to circuits.
Their projections first determine blocks of variables that get mapped to the same new variable for different . Then they can be specified by a distribution over restrictions, thus simplifying the randomness analysis. If the circuit has a gate that accesses some and some , then after the projection the gate accesses both and and hence can be collapsed to (if the gate is AND) or (if OR). Another difference from Håstad’s technique is that they need to adjust the projection probability adaptively depending on outcomes at previous depths. This is the point where we say to refer to their 59-page paper for details, but they also have excellent further explanations in sections 4 and 7.
Dick and I are excited because we have thought about inductive lower bound arguments whose base cases involve read-once formulas and whose inductive steps are projections. In a post with Dick three years ago I described one case that may retain interest even though the statement it was trying to prove turned out to be false.
My planned attack began with a “niceness” theorem for functions having a funny kind of read-once arithmetical formula that has powering as a primitive operation—for instance, counts as one not three occurrences of the variables in contrast to —but does not have an additive constant in any subterm. I proved that the first partial derivatives form a Gröbner basis under any admissible monomial ordering. If one allows additive constants, then the partials still bound such a Gröbner basis. The argument is less simple than one might expect and does not obviously extend to higher derivatives—at least not obviously to me, as I’ve sat on revisions of a paper for over a decade while trying to prove that. But the theorem is still good enough to use as a base case and ask:
How fast can various complexity measures associated to Gröbner bases grow under successive applications of (random) projections?
My computer runs back then were sobering: even after just a few projections in some targeted cases the answer was, mighty fast. And as I related in the post, the desired assertion about a certain “monomial complexity” measure was refuted. But there are other formulations and conjectures and measures to try, and this new result in the Boolean case may give encouragement to try them. There is also the difference that in my case the read-once formula was “easy” and was derived from a given function we are trying to prove “hard” by undoing projections, whereas in the new case the functions are read-once but are the ones we are showing hard for lower-depth circuits. So perhaps this all runs in the opposite direction—but still the prospect of new ways to control projections is pretty cool.
What new results may come out of this breakthrough lower bound technique?
]]>
Problems beyond brute force search
Cropped from Wikipedia source |
Hans-Joachim Bremermann was a mathematician and biophysicist. He is famous for a limit on computation, Bremermann’s limit, which is the maximum computational speed of a self-contained system in the material universe.
Today Ken and I wish to talk about the limit and why it is not a limit.
A transcomputational problem is a problem that requires processing of more than bits of information. The number comes from Earth-scale considerations, but adding less than 30 to the exponent breaks the scale of the known universe. Our friends at Wikipedia say:
This number is, according to Bremermann, the total number of bits processed by a hypothetical computer the size of the Earth within a time period equal to the estimated age of the Earth. The term transcomputational was coined by Bremermann.
What is interesting is that he thought about “transcomputational” problems in 1962. Yes almost a decade before the P=NP problem was stated. See his paper for details.
He noted back then that certain problems were beyond any reasonable brute-force search. In his own words:
The experiences of various groups who work on problem solving, theorem proving and pattern recognition all seem to point in the same direction: These problems are tough. There does not seem to be a royal road or a simple method which at one stroke will solve all our problems. My discussion of ultimate limitations on the speed and amount of data processing may be summarized like this: Problems involving vast numbers of possibilities will not be solved by sheer data processing quantity. We must look for quality, for refinements, for tricks, for every ingenuity that we can think of. Computers faster than those of today will be a great help. We will need them. However, when we are concerned with problems in principle, present day computers are about as fast as they ever will be. We may expect that the technology of data processing will proceed step by step—just as ordinary technology has done. There is an unlimited challenge for ingenuity applied to specific problems. There is also an unending need for general notions and theories to organize the myriad details.
Quite insightful for a paper that dates a decade before Cook-Karp-Levin on P=NP. It also predates the limits associated to Jacob Bekenstein’s bound on the information capacity of finite space and/or to Rolf Landauer’s principle.
One wonders what might have happened if Bremermann’s paper had been better known in our theory community. Ken notes that the Russian theory community in the 1960s highlighted the question of perebor—brute-force search. But he senses the emphasis was on problems for which it could be necessary in the abstract, rather than tied to Earth-scale considerations like Bremermann’s.
Of course there are several eventualities that were missed. One of course is quantum computation—I believe all his calculations depend on a classical view of computation. There are several other points that we can raise to attempt to beat his “limit.”
Change the algorithms: Of course his limit could be applied to computing primality, for example. The brute force method is hopeless for even modest-sized numbers. Yet we know methods that are much better than brute force and so we can easily beat his limit.
Steve Wozniak visited Buffalo yesterday as a UB Distinguished Speaker. In a small group attended by Ken, he told his standard anecdote about the first program he ever wrote. This was to solve the Knight’s Tour problem on an chessboard. He first coded a brute-force solution trying all Knight moves at each step but realized before he hit “run” that it would take about years. This awakened him, he said, to the fact that good algorithms have to go hand-in-hand with good hardware.
Change the answers: Another method is to change what we consider as an answer. Approximation algorithms of course are one important example: allow the answer to be near the optimal one. This has opened the floodgates to increase the class of problems that we can solve.
Change the problems: Another method is to change the problems that we attack. In many cases we can avoid general problems and exploit special structure of a problem. Examples that come to mind include: replace dense matrices by sparse ones; replace arbitrary graphs by planar ones or those with restricted minors; and replace data analysis of arbitrary data sets by analysis of data that is generated with specific noise, like Gaussian.
Change the world: We have posted about the idea of a world without true randomness, presenting Leonid Levin’s proof that SAT is nearly polynomially solvable in such a world. That post offered the weaker idea that every efficient generator of SAT instances might be solved by Levin’s algorithm on all but finitely many instances. The finite bound might be huge, but the fact of Levin’s algorithm would carry weight: it would solve everything else based solely on the principle that “nothing succeeds like success.” We can put this idea in concrete terms like Bremermann’s:
Could we live in a world where the act of creating an instance that requires processing more than bits of information requires processing more than bits of information?
We note that Larry Stockmeyer proved that every Boolean circuit capable of deciding a certain class of logical formulas that fit into 8 lines of 80-column ASCII text must have more gates than atoms in the observable universe. But this does not rule out a little algorithm solving every that we could generate—unless we spend the time to cycle through every legal formula that fits into 640 characters.
Are there realistic limits on computation of the type that Bremermann postulated? What are the right limits in light of today’s insights into computation?
]]>