Albert Meyer knows circuit lower bounds. He co-authored a paper with the late Larry Stockmeyer that proves that small instances of the decision problem of a certain weak second-order logical theory require Boolean circuits with more gates than there are atoms in the observable universe. The instances almost fit into two Tweets using just the Roman typewriter keys.
Today Ken and I discuss a simple but perhaps underlooked connection between P=NP and circuit lower bounds.
Albert recently co-authored, with Eric Lehman of Google and his MIT colleague Tom Leighton, the textbook Mathematics for Computer Science. It looks familiar to us because it uses the same MIT Press fonts and layout package as our quantum computing book. They say the following in their Introduction:
Simply put, a proof is a method of establishing truth. Like beauty, “truth” sometimes depends on the eye of the beholder, and it should not be surprising that what constitutes a proof differs among fields.
We would say that the game is not only about making truth evident from a proof but also from the way a theorem statement is expressed. This post uses an old result of Albert’s as an example.
Well, any criticism of Albert for how his theorem was stated is really criticism of myself, because Dick Karp and I were the first to state it in a paper. Here is exactly how we wrote it in our STOC 1980 version:
There was no paper by Albert to cite. In our 1982 final version we included a proof of his theorem:
As you can see, our proof—Albert’s proof?—used completeness for and some constructions earlier in the paper. In a preliminary section we wrote that our proofs about classes such as involved showing inclusions
“where the set of strings is complete in with respect to an appropriate reducibility.”
But in this case the proof does not need completeness for . I came up with this realization on Wednesday and Ken found essentially the same proof in these lecture notes by Kristoffer Hansen:
This proof uses not only for but also for . For the latter it suffices to have polynomial-time reduce to . This follows so long as is complete for some class to which also belongs.
Suppose runs in deterministic time . Then both and belong to , the latter because on input we can run for steps. With minimal assumptions on the function , has complete sets , and then follows from . So we can state the theorem more generally:
In fact we would get into and even smaller classes but that’s going beyond our simple point.
Our point comes through if we think of a concrete case like deterministic time , called for quasipolynomial time. So we have:
Corollary 2 If then .
Now mentally substitute for (and `‘ for `‘) in the way Karp and I summarized the final implication in our paper:
What you get after contraposing and using the hierarchy theorem for is:
Corollary 3 If then .
The point is that we can also do this for time and even smaller proper super-classes of . What follows is:
Any attempt to prove entails proving strong nonuniform circuit lower bounds on languages that are arbitrarily close to being in .
Again in the case of this implication too has been variously noted. Scott Aaronson mentions it in one sentence of his great recent 121-page survey on the versus question (p65):
“[I]f someone proved , that wouldn’t be a total disaster for lower bounds re-search: at least it would immediately imply (via ).”
Maybe I (Dick) considered this in terms of in weighing my thoughts about . But that it applies to in place of gives me pause. This greatly amplifies idle thoughts about the irony of how proving yields the same type of lower bounds against that are involved in the “Natural Proofs” barrier against proving . Ryan Williams had to combine many ideas just to separate from nonuniform —not even getting on the left nor on the right. (Incidentally, we note this nice recent MIT profile of Ryan.) So having such lower bounds for just drop from the sky upon seems jarring.
So I’m rethinking my angle on . I’ve always propounded that good lower bounds flow like ripples from new upper bounds, but the wake of seems a tsunami. We wonder if Bill Gasarch will do a 3rd edition of his famous poll about versus . Ken and I offset each other with our votes last time, but maybe not this time.
We also wonder whether Theorem 1 can be given even stronger statements in ways that are useful. In the original version of this post we overlooked a point noted first by Ryan Williams here and thought we had . To patch it, call a language in “reflective” if there is a TM running in exponential time such that and (namely, the “tableau” language defined above) polynomial-time reduces to . The complete sets mentioned above for classes within are reflective. If we let denote the subclass of reflective languages, then we can say:
Note that per Lance Fortnow’s comment here, sparse languages are candidates for being non-reflective: the tableau language which we would wish to polynomial-time Turing reduce to is generally dense.
Is this realization about and strong circuit lower bounds arbitrarily close to really new? Can our readers point us to other discussions of it?
Is the notion of “reflective” known? useful?
[fixed error in original Theorem 1 and surrounding text; added paragraph about it before “Open Problems”; moved query about “cosmological” formulas to a comment.]
Kurt Gödel is feeling bored. Not quite in our English sense of “bored”: German has a word Weltschmerz meaning “world-weariness.” In Kurt’s case it’s Überweltschmerz. We have tried for over a month to get him to do another interview like several times before, but he keeps saying there’s nothing new to talk about.
Today we want to ask all of you—or at least those of you into logic and complexity—whether we can find something to pep Kurt up. Incidentally, we never call him Kurt.
Gödel is of course famous among many things for proving that Peano Arithmetic (PA) cannot prove its own consistency—unless PA is already inconsistent. In the latter case, it would be able to prove everything; and this would imply that PA is useless.
This time we want to talk about whether PA might be proved consistent in weaker senses. The senses would escape the result of Gödel, which is usually called his Second Incompleteness Theorem. A hope is that they can be connected to open questions in complexity theory.
Recall that PA is the first order theory of arithmetic with induction. Actually all we say today could be generalized to many other theories, but to help us focus let’s discuss only PA.
The meaning of consistency for PA can be encoded as follows: Let
mean that encodes a proof in PA of the statement . Formally this means that consistency of PA is
Most mathematicians believe that PA is consistent. The best argument for consistency is probably that the axioms of PA all seem “obvious.” That is they seem to conform to our intuition about arithmetic. Even the powerful induction schema—which pumps out one axiom for each applicable formula—says something that seems clear:
Mathematical induction proves that we can climb as high as we like on a ladder, by proving that we can climb onto the bottom rung (the basis) and that from each rung we can climb up to the next one (the induction).
This is from page 3 of the book Concrete Mathematics.
Yet not everyone believes that PA is consistent, and it follows that not everyone believes that PA is robustly useful. We recently covered the late Vladimir Voevodsky’s doubts. Ed Nelson in 2015 wrote a freely-available book titled simply Elements to argue that PA is inconsistent. This work is flawed, but is interesting that a world-class mathematician is seriously interested in showing something that few working mathematicians believe. The work ends with an afterword by Sam Buss and Terry Tao, in which they say:
We of course believe that Peano arithmetic is consistent; thus we do not expect that Nelson’s project can be completed according to his plans. Nonetheless, there is much new in his papers that is of potential mathematical, philosophical and computational interest. For this reason, they are being posted to the arXiv. Two aspects of these papers seem particularly useful. The first aspect is the novel use of the “surprise examination” and Kolmogorov complexity; there is some possibility that similar techniques might lead to new separation results for fragments of arithmetic. The second aspect is Nelson’s automatic proof-checking via TeX and qea. This is highly interesting and provides a novel method of integrating human-readable proofs with computer verification of proofs.
Nelson’s criticism of PA is well summarized in this talk by Buss, while it was Tao who articulated the flaw in Nelson’s particular Kolmogorov complexity argument for inconsistency, which we also covered here.
Our idea is to examine consistency from a computer science viewpoint and use this to make a weaker notion: a notion that can be proved and avoid the Gödel limit. The idea is the following:
Can we prove that PA is consistent at least for any proof that we are likely to ever see?
We can make this precise via the following simple notion, :
Note that we mean the length of the proof in symbols, not steps; we could alternately treat as a number and state the bounded quantifier as .
Clearly for any we can determine whether or not is true or not. Of course as grows the cost of checking that all proofs of length at most in binary explodes. Yet there is no Gödel limit on this checking. We want to understand the following question:
Can we check that is true for large ?
Let’s take a look at this next.
Given a Boolean string of length we can check that it encodes a correct proof from PA in time nearly linear in . We need only check that steps are either examples of an axiom or following from the usual rules of inference of a first order logic. Suppose for the rest we assume that this checking can actually be done in linear time: we are throwing out log terms, which will just help us avoid technically more complex statements. This assumption will not change anything in an important way.
This assumption shows that can be checked in time for some constant . This means that we cannot hope to check this for even modest size ‘s. But can we check it much faster? If we could check, for example, that is true, then we would know that all proofs of at most 1,000,000 bits do not led to contradictions. This covers all proofs that most of us will ever write down or even read. Note that the following is true:
Theorem 1 PA can prove for any fixed .
Define as the length of the shortest proof in PA of . Clearly, is well defined—whether PA is consistent or not. A question is how slow can grow? Or looking at it another way, can there be short proofs that is true? We can also ask this when for some function such as .
Note that one can prove theorems like this:
Theorem 2 If then we can prove in time polynomial in .
This would allow us, at least in principle, to check huge length proofs. There are other complexity open problems that would allow us to even check much larger proof lengths.
We want, however, to be very concrete. We really want to know about more than what happens for large . Incidentally, Gödel used the same number 1,000,000 as a factor in his brief and cryptic paper (see this review) titled “On the Lengths of Proofs.” What he implicitly did there was apply the same method as in his incompleteness theorems to create formulas expressing,
There does not exist a PA proof of of length at most .
He meant length to be the number of lines, but Wikipedia’s article on it speaks of symbol length. Interpreted either way, the truth of the formulas is obvious: they don’t have proofs in PA of length . They have proofs of length by exhaustive enumeration as above. If then they have shorter proofs asymptotically—indeed this was the context of Gödel’s insight about in his famous 1956 letter to John von Neumann. But again we want to be concrete.
Note that in a “meta” sense we already proved . We know how Gödel’s construction works in general from reading only a finite piece of his proof. Then as we said its truth is obvious. We used symbols to write down but we can ignore that. So this is really a finite proof, certainly well under 1,000,000 symbols.
This is exactly what Gödel meant about the length-saving effect. The rub however is that our “meta” proof is assuming the consistency of PA—that is, the shorter proof is in the stronger theory . Gödel asserted (without proof) that the same effect can always be had by progressing to the next higher order of logic.
But going back to our sentences , clearly assuming is silly and we want to stick with first-order logic. We can consider changing the rules of PA but not that way. So our query becomes:
Is there a formal logic , that avoids the criticisms of PA referenced above and does not obviously entail the consistency of PA, such that can economically prove ?
We can alternatively talk about programs that construct proofs and shift the question to the complexity—concrete or asymptotic—of verifying that is correct. This could lead into questions about the (resource-bounded) Kolmogorov complexity of proofs. Assuming really is true, the simple enumeration argument describes a proof with Kolmogorov complexity well under 1,000,000 symbols—but more than 1,000,000 steps would be needed to expand it. The verification angle, however, may even apply in cases of “insanely long proofs.” Other formal or “fromal” approaches might be considered.
There was a flurry of work in this area for a decade-plus after Sam Buss’s innovative work in the 1980s connecting complexity questions to proofs in bounded arithmetics—that is with restrictions on the PA induction axioms or the underlying logic. We talked about his work here and here. For connections to lengths of proofs Sam himself has written two nice surveys and here is another by Pavel Pudlák. The connections extend to the Natural Proofs barrier against circuit lower bounds.
But as with many approaches to core questions in complexity, progress seems to have slowed. Perhaps it is because we haven’t been following as closely. So we are asking for news and opinions on what is important. And this is also why we are talking here about changing the questions and the rules of the game.
We have turned up some recent work on questions like ours that changes the rules. Martin Fischer wrote a 2014 paper titled “Truth and Speed-Up” and another titled “The Expressive Power of Truth.” The object of the latter is to find “natural truth theories which satisfy both the demand of semantical conservativeness and the demand of adequately extending the expressive power of our language.” The former shows that other theories including one called “,” while syntactically but not semantically conservative, give speedups for proving .
We are not sure how to assess these results. The theory stands for “Positive Truth with internal induction for total formulas” and is studied further in this 2017 paper. The emphasis seems however to be philosophical, relating to the effect of allowing truth to be a predicate. We don’t know how to connect these ideas to complexity questions but they do show scope for further action.
There are several open problems. Can we show that or similar statements are equivalent to results about how far out we can check consistency? That is, if we could check that PA has no short contradictions, does this imply anything about complexity theory?
Another class of questions is: Is it useful to know that is true? Does the fact that there is no short contradiction help make one believe in PA as a useful tool? I am not sure what to make of this. What do you think? What would Gödel think?
]]>
To give a Hilldale Lecture and learn about fairness and dichotomies
UB CSE50 anniversary source |
Jin-Yi Cai was kind enough to help get me, Dick, invited last month to give the Hilldale Lecture in the Physical Sciences for 2017-2018. These lectures are held at The University of Wisconsin-Madison and are supported by the Hilldale Foundation. The lectures started in 1973-1974, which is about the time I started at Yale University—my first faculty appointment.
Today Ken and I wish to talk about my recent visit, discuss new ideas of algorithmic fairness, and then appreciate something about Jin-Yi’s work on “dichotomies” between polynomial time and -completeness.
The Hilldale Lectures have four tracks: Arts & Humanities and Physical, Biological, and Social Sciences. Not all have a speaker each year. The Arts & Humanities speaker was Yves Citton of the University of Paris last April, and in Social Sciences, Peter Bearman of Columbia spoke on Sept. 28, three weeks before I. Last year’s speaker in the Physical Sciences track was Frank Shu of Berkeley and UCSD on the economics of climate change. Before him came my former colleagues Peter Sarnak and Bill Cook. Dick Karp was invited in 2004, and mathematicians Barry Mazur and Persi Diaconis came between him and Cook.
I am delighted and honored to be in all this company. We all may not seem to have much to do with the physical sciences but let’s see. I spoke on () a subject for another time—let this post be about my hosts.
The other highlight of my visit was meeting with the faculty of the CS department and also with the graduate students. I hope they enjoyed our discussions as much as I did.
One topic that came up multiple times is the notion of fair algorithms. This is a relatively new notion and is being studied by several researchers at Wisconsin. The area has its own blog. The authors of that blog (one I know well uses his other blog’s name as his name there—are we to become blogs?) also wrote a paper titled, “On the (Im)Possibility of Fairness,” whose abstract we quote:
What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the “observed” space) and outputs (the “decision” space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction. We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.
There is a notion of fairness in distributed algorithms but this is different. The former is about the allocation of system resources so that all tasks receive due processing attention. The latter has to do with due process in social decision making where algorithmic models have taken the lead. Titles of academic papers cited in a paper by three of the people I met in Madison and someone from Microsoft (see also their latest from OOPSLA 2017) speak why the subject has arisen:
Also among the paper’s 29 references are newspaper and magazine articles whose titles state the issues with less academic reserve: “Websites vary prices, deals based on users’ information”; “Who do you blame when an algorithm gets you fired?”; “Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks”; “The dangers of letting algorithms enforce policy.” Evan a May, 2014 statement by the Obama White House is cited.
Yet also among the references are papers familiar in theory: “Satisfiability modulo theories”; “Complexity of polytope volume computation” (by Leonid Khachiyan no less), “On the complexity of computing the volume of a polyhedron”; “Hyperproperties” (by Michael Clarkson and Fred Schneider), “On probabilistic inference by weighted model counting.” What’s going on?
What’s going on can be classed as a meta-example of the subject’s own purpose:
How does one formalize a bias-combating concept such as fairness without instilling the very kind of bias one is trying to combat?
We all can see the direction of bias in the above references. You might think that framing concepts to apply bias in the other direction might be OK but there’s a difference. Bias in a measuring apparatus is more ingrained than bias in results. What we want to do—as scientists—is to formulate criteria that are framed in terms apart from those of the applications in a simple, neutral, and natural manner. Then we hope the resulting formal definition distinguishes the outcomes we desire from those we do not and stays robust and consistent in its applications.
This is the debate—at ‘meta’ level—that Ken and I see underlying the two papers we’ve highlighted above. We blogged about Kenneth Arrow’s discovery of the impossibility of formalizing a desirable “fairness” notion for voting systems. The blog guys don’t find such a stark impossibility theorem but they say that to avoid issues with analyzing inputs and outcomes, one has to attend also to some kind of “hidden variables.” The paper by Madison people tries to ground a framework in formal methods for program verification, which is connects to probabilistic inference via polytope volume computations.
Many other ingredients from theory can be involved. The basic idea is determining sensitivity of outcomes to various facets of the inputs. The inputs are weighted for relevance to an objective. Fairness is judged according to how well sensitivity corresponds to relevance and also to how the distribution of subjects receiving favorable decisions breaks according to low-weight factors such as gender. Exceptions may be made by encoding some relations as indelible constraints—the Madison plus Microsoft paper gives as an example that a Catholic priest must be male.
Thus we see Boolean sensitivity measures, ideas of juntas and dictators, constraint satisfaction problems, optimization over polytopes—lots of things I’ve known and sometimes studied in less-particular contexts. My Madison hosts brought up how Gaussian distributions are robust for this analysis because they have several invariance properties including under rotations of rectangles. This recent Georgetown thesis mixes in even more theory ideas. The meta-question is:
Can all these formal ingredients combine to yield the desired outcomes in ways whose scientific simplicity and naturalness promote confidence in them?
Thus wading in with theory to a vast social area like this strikes us as a trial of “The Formal Method.” Well, there is the Hilldale Social Sciences track…
Did “hidden variables” bring quantum to your mind? We are going there next, with Ken writing now.
We covered Jin-Yi’s work in 2014 and 2012 and 2009. So you could say in 2017 we’re “due” but we’ll take time for more-topical remarks. All of these were on dichotomy theorems. For a wide class of counting problems in his dichotomy is that every problem in either belongs to polynomial time or is -complete. There are no in-between cases—that’s the meaning of dichotomy.
Jin-Yi’s answer to a question last month between Dick and me recently brought home to us both how wonderfully penetrating and hair-trigger this work is. Dick had added some contributions to the paper covered in the 2009 post and was included on the paper’s final 2010 workshop version. Among its results is the beautiful theorem highlighted at the end of that post:
Theorem 1 There is an algorithm that given any and formula for a quadratic polynomial over computes the exponential sum
exactly in time that is polynomial in both and . Here and means the number of arguments on which takes the value modulo .
That the time is polynomial in not is magic. We can further compute the individual weights by considering also the resulting expressions for . Together with they give us equations in the unknowns in the form of a Vandermonde system, which is always solvable. Solving that takes time polynomial in , though, and we know no faster way of computing any given .
When is fixed, however, polynomial in is all we need to say. So the upshot is that for any fixed modulus, solution counting for polynomials of degree is in . Andrzej Ehrenfeucht and Marek Karpinski proved this modulo primes and also that the solution-counting problem for degree is -complete even for . So the flip from to -complete when steps from to is an older instance of dichotomy. The newer one, however, is for polynomials of the same degree .
I wrote a post five years ago on his joint work with Amlan Chakrabarti for reducing the simulation of quantum circuits to counting solutions in . One motive is which subclasses of quantum circuits might yield tractable cases of counting. The classic—pun intended—case is the theorem that all circuits of so-called Clifford gates can be simulated in classical polynomial time (not even randomized). I observed that such circuits yield polynomials over that are sums of terms of the form
These terms are invariant on replacing by or by modulo . Hence for such there is an exactly -to- correspondence between solutions in and those in . Since counting the latter is in by Theorem 1, the theorem for Clifford gates follows.
Adding any non-Clifford gate makes a set that is universal—i.e., has the full power of . The gates I thought of at the time of my post all bumped the degree up from to or more. A related but different representation by David Bacon, Wim van Dam, and Alexander Russell gives a dichotomy of linear versus higher degree. The controlled-phase gate, however, is non-Clifford but in my scheme produces polynomials as sums of terms of the form
Those are quadratic too, so Theorem 1 counts all the solutions in polynomial time. Does this make ? The hitch is that quantum needs counting binary solutions—and having not defeats the above exact correspondence.
I thought maybe the counting problem for quadratic-and-binary could be intermediate—perhaps at the level of itself. But Jin-Yi came right back with the answer that his dichotomy cuts right there: this 2014 paper with his students Pinyan Lu and Mingji Xia has a general framework for CSPs that drops down to say the problem is -complete. A more-recent paper of his with Heng Guo and Tyson Williams lays out the connection to Clifford gates specifically, proving an equivalence to the condition called “affine” in his framework which renders counting tractable. Thus the state of play is:
Thus the difference between easy and general quantum circuits hangs on that ‘‘—a coefficient, not an exponent—as does factoring (not?) being in . Of course this doesn’t mean quantum circuits are -complete—they are generally believed not to be even -hard—but that saying has terms of the form and captures ostensibly more than .
Might there be some other easily-identified structural properties of the produced (say) by circuits of Hadamard and controlled-phase gates that make the counting problem intermediate between and , if not exactly capturing ? Well, the dichotomy grows finer and richer and stronger with each new paper by Jin-Yi and his group. This feeds in to Jin-Yi’s most subtle argument for believing as we said it here, echoing reasons expounded by Scott Aaronson and others for : if the classes were equal there would be no evident basis for our experiencing such fine and sharp distinctions.
Trying to make up for blog pause while we were busy with a certain seasonal event, we offer several open problems:
[“stabilizer theorem”->”theorem for Clifford gates”; added links to OOPSLA 2017 paper and May 2017 Cai-Guo-Williams paper]
New York Times obituary source |
Lotfi Zadeh had a long and amazing life in academics and the real world. He passed away last month, aged 96.
Today Ken and I try to convey the engineering roots of his work. Then we relate some personal stories.
Zadeh was a Fellow of the ACM, the IEEE, the AAAI, and the AAAS and a member of the NAE. But besides this alphabet soup of US-based academies, we are impressed with the one he co-founded: the Eurasian Academy. His founding partners were a historian, a neurosurgeon, a music composer, and a mathematician. They recently elected three other members: an actress-screenwriter-director, an actor-director-writer, and a physicist.
In any alphabet of his life, one letter stands out: the letter Z. The term “Fuzzy Set” has two of them. But Zadeh’s first widely noted work goes by just the bare letter.
Pierre-Simon Laplace discovered a relative of the Fourier transform that has similarly motivated applications and often better behavior. When applied to the density function of a random variable on or , it has the form
Here can be a complex number. The function is holomorphic provided we are working on . A neat trick is that we can jump from to the cumulative distribution by
Can we get such nice properties for a discrete random variable on the integers? Zadeh’s advisor at Columbia, John Ragazzini, led him in showing the power of defining
where again can be any complex number, and the domain of and the sum can be or . We note that is often defined as a function of , that a similar sign issue was discussed in reviewing the 1952 Ragazzini-Zadeh paper, and we’ve switched versus in Wikipedia’s article on to make it look more like . With a positive exponent, is the probability generating function of .
How useful is this? Much of what we can say in a short space is the same as with Fourier: If we form the convolution
then its -transform is just the product function:
Using products this way makes convolutions easier to work with. Many hard-to-handle functions become nicer under their -transforms. The Dirac delta function if and otherwise is strange at face value—though it can be understood as the random variable whose outcome is always . Under the -transform, however,
Nothing can be nicer than the constant . For explanation of where is more general than the discrete Fourier transform and relatives we defer to this beautiful page. All this grew out of ideas in the 1940s by others including Witold Hurewicz—another z—but Zadeh’s joint paper had the greatest influence in signal processing.
The art of is continuous functions forming a well-behaved nimbus around certain discrete entities. Suppose we try to do this for every discrete concept? Begin with the idea of a set , namely a subset of some universe . Instead, let us think of a fuzzy set where
Here the real number is called the grade of memebership of in . The original set is the case if and otherwise. The point is that we are now free to consider other functions that approximate and are smoother and nicer to work with. We can consider whole ensembles of such functions.
From fuzzy sets it is a short step to fuzzy logic. This has an antecedent: the infinite-valued logic of Jan Łukasiewicz and others. A statement may have a truth value between 0 and 1. A common choice is to represent the value by a logistic curve of a main parameter. Here is a somewhat distorted curve for the statement “X is wealthy” parameterized by the net worth of X:
“Simulating Complexity” blog source |
The point for us is that logistic curves are natural to work with when modeling such predicates in a larger system. Here is a pertinent recent example for image processing. Further points are that a logical 0-1 assignment to “wealthy” would have an artificially sharp distinction somewhere and that the logistic curves are more faithful to neural-net models of how we think.
Zadeh’s original 1965 paper is one of the most cited science papers of all time. It has close to citations. He confessed that:
“I knew that just by choosing the label ‘fuzzy’ I was going to find myself in the midst of a controversy… If it weren’t called fuzzy logic, there probably wouldn’t be articles on it on the front page of the New York Times. So let us say it has a certain publicity value. Of course, many people don’t like that publicity value, and when they see it in the New York Times, it doesn’t sit well with them.”
That controversy was real—see the next section. Zadeh in an acceptance speech for the 1989 Honda Foundation prize said
“The concept of a fuzzy set has had an upsetting effect on the established order.”
I (Dick) never understood why this generalization of sets created such push-back. Stuart Russell, a Berkeley professor who worked next door to Mr. Zadeh for many years, noted:
He always took criticism as a compliment. It meant that people were considering what he had to say.
The impact of his work has been recognized by a posthumous “Golden Goose” Award. The award’s name counters the stigma of the “Golden Fleece” awards given out in 1975–1988 by US Senator William Proxmire in half-jest to federally-funded research projects he deemed frivolous and wasteful. Zadeh drew attention from Proxmire as a potential “Golden Fleece” awardee. The Golden Goose citation, however, describes the “Clear Impact,” especially as seen by engineering-minded Japanese:
Part of this interest came from the fact that ‘fuzzy’ was not a pejorative term in Japanese, but instead a neutral or even positive one. Researchers there took his idea and ran, creating conferences and journals focused on making advances in fuzzy logic. To this day, the only country with more patents on fuzzy ideas and concepts than the United States is Japan.
In 1986, the first commercial application of fuzzy logic hit the shelves in Japan: a fuzzy shower head. Using fuzzy concepts of hot, cold, high pressure, low pressure, and others, the shower head could use fuzzy logic to control showers across the country. Within a few years, the market was overflowing with fuzzy consumer products. Vacuum cleaners, rice cookers, air conditioning systems, microwaves, everything was moving to fuzzy control. Even the entire subway system of Sendai in Japan was built with fuzzy logic controlling the motion of the trains.
Way back in the first month of this blog, I (Dick) quoted the following remarks by William Kahan. I was in the audience for Zadeh’s lecture too but let’s let Kahan speak:
My two favorite stories about him concern his tremendous candor. The first is about his ideas on “fuzzy sets” and the second is on “who should get tenure.” I will only tell the first one—to protect the innocent and the guilty. When I first arrived at the Computer Science Department at Berkeley, the faculty decided to have a new series of lectures that fall. The plan was to have short lectures by each faculty of the department—in this way new graduate students would learn each faculty’s research area.
One day Professor Zadeh was presenting his area of research—an area that he created called “fuzzy sets.” Fuzzy sets were then and still are today a controversial area. Some researchers do not think much of this area. However, the area is immensely popular to many others. There are countless conferences, books, and journals devoted completely to this area. Kahan was in the audience while Zadeh was speaking. Finally, at some point Kahan could take it no more. He stood up and Zadeh asked him what his question was. Kahan stated in the most eloquent manner that it might be okay to work on fuzzy sets in the privacy of your own basement (after all this was Berkeley), but there was no excuse for exposing young minds to this “stuff”—his term was stronger. We all were shocked. For a few seconds no one spoke. I wondered how in the world Zadeh could respond. Zadeh finally said, “thank you for your comments,” and went on with the rest of talk, as if nothing had happened. The next year the faculty talks were cancelled.
I met Zadeh once, when he was the featured speaker at the 6th International Conference on Computing and Information (ICCI 1994), which was held at Trent University in Peterborough, Ontario, May 26–28, 1994. Jie Wang and I drove there from STOC which was held in Montreal that year. This small conference—not to be confused with ones having similar names and acronyms—lasted just a few more years. It is hard to find any information on the 1994 meeting now—just a few paper citations—and I have found no proof on the Internet that Zadeh was there. But he was—in a non-fuzzy but decidedly freezy setting.
There was a welcoming reception in the late afternoon of the 25th. It was slated to be outside in a wooded park on the university grounds. It was late May after all. But it was cold. I’ve known cold days in May in Buffalo, but none like that—biting wind and icy sleet. Only twenty or so of the registrants braved the weather. There was fortunately a round wooden structure, covered and enclosed and large enough to shelter us, but with no central heating. Instead it had a coal heat stove. We huddled around on chairs and stools and the part of the circular wall bench near the stove. Although over two hours of nominal daylight remained, the dark clouds and scant windows made it pitch night inside. If I recall correctly, the original intent of a cookout was shelved and replaced by a bulk order of sandwiches and potato chips and other picnic fare.
Nearest the stove sat the 73-year-old Zadeh wrapped in blankets. His face glowed orange as he regaled us in good humor with stories. I don’t think I kept any record of what he said. We felt in the presence of a great man but under surreal conditions—accentuated for Jie and me by our having had a hot lunch in the downtown Montreal hotel for STOC. Somewhere I do have notes of the keynote he gave the next morning before departing—in a modern and heated university lecture room—but I have not unpacked my boxes of old notebooks since my department’s move to a new building six years ago.
His birthplace Baku has been on my mind because I’ve recently read Thomas Reiss’s biography The Orientalist of Lev Nussimbaum, who wrote under the pseudonyms Essad Bey and Kurban Said. Nussimbaum had at least a hand in the writing and production of the 1937 romance Ali and Nino, which is considered the national novel of Azerbaijan. Baku juts into the Caspian Sea and calls itself the easternmost European city as demarked by the Urals and Asia Minor extended east. I wish it had occurred to me to ask about his upbringing and the history between the wars.
We convey our profound appreciation and regrets to his family and friends.
]]>
Two more tragic losses coming before a greater tragedy
Composite of crops from src1, src2 |
Michael Cohen and Vladimir Voevodsky were in different stages of their careers. Cohen was a graduate student at MIT and was visiting the Simons Institute in Berkeley. He passed away suddenly a week ago Monday on a day he was scheduled to give a talk. Voevodsky won a Fields Medal in 2002 and was a professor at the Institute for Advanced Study in Princeton. He passed away Saturday, also unexpectedly.
Today we join those grieving both losses.
We are writing this amid the greater horror in Las Vegas. Dick and I speak our condolences and more, but the condolences that two of us can give seem to fade—they do not “scale up.” Hence we feel that the best we can do is talk about Cohen’s and Voevodsky’s roles in our scientific communities and some of what they contributed. That is a gesture of peace and serenity. It may not overcome the darkness, but something like it seems needed so that we all might do so.
Michael Cohen had already worked with a wide variety of people in over twenty joint papers. He had two all by himself: a paper at SODA 2016 titled, “Nearly Tight Oblivious Subspace Embeddings by Trace Inequalities,” and a paper at FOCS 2016 titled, “Ramanujan Graphs in Polynomial Time.”
A common theme through much of this work was wizardry with special kinds of matrices. They included Laplacian matrices in which every column sums to and only the diagonal entries can be positive. You can get one from a directed graph by negating the entries of its adjacency matrix and putting the in-degrees on the diagonal. One can further demand that the rows sum to zero, which happens for our graph if each node’s in-degree equals its out-degree. This is automatic for undirected graphs. As noted in this paper:
While these recent algorithmic approaches have been very successful at obtaining algorithms running in close to linear time for undirected graphs, the directed case has conspicuously lagged its undirected counterpart. With a small number of exceptions involving graphs with particularly nice properties and a line of research in using Laplacian system solvers inside interior point methods for linear programming […], the results in this line of research have centered almost entirely on the spectral theory of undirected graphs.
The paper, titled “Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More,” was joint with Jonathan Kelner, John Peebles, and Adrian Vladu of MIT, Aaron Sidford of Stanford, and Richard Peng of Georgia Tech, and also came out at FOCS 2016. In the case of symmetric matrices , needing only that , he was part of a bigger team including Peng and Gary Miller of CMU that found the best-known time for solving . That paper came out at STOC 2014.
Thus from early on he was working with a great many people in the community. This has been noted in tribute posts by Scott Aaronson, by Sébastien Bubeck, by Luca Trevisan, by former colleagues at Microsoft Research where Cohen spent this past summer, and by Lance Fortnow. The post by Scott includes communications from Cohen’s parents and information about memorials and donations.
We’ll talk about Cohen’s paper on Ramanujan graphs in a train of thought that will lead into aspects of Voevodsky’s work. Of course we know Srinivasa Ramanujan was a brilliant Indian mathematician who also died tragically young.
In mathematics we sometimes prove the existence of objects without knowing how to construct them. Sometimes we can prove that a random object works. This is often helpful, but one downside comes from cases where we would want different people given the same problem parameters to obtain the same object. Randomized algorithms usually do not usually have a single output that is arrived at with high probability. What we really want is an algorithm that constructs the object.
This has been the story for a long time with expander graphs. They were proved to exist long ago via the probabilistic method. The zig-zag product was a watershed in constructing some kinds of them. The goal is to get these objects constructively with the same parameters or close to them.
A Ramanujan graph is a particular kind of expander with a maximum dose of the spectral-gap condition for expansion. The adjacency matrix of a -regular graph has as its largest eigenvalue. It cannot have an eigenvalue less than , which occurs if and only if the graph is bipartite. The graph is Ramanujan if all other eigenvalues have absolute value at most . This creates a quadratic spectral gap between and the next-largest eigenvalue and this is asymptotically the largest possible.
Again, a randomly chosen -regular -node graph will be almost certainly a Ramanujan graph, for any and nontrivial . Adam Marcus, Dan Spielman, and Nikhil Srivastava (MSS) proved in 2013 that such graphs exist for all and even when required to be bipartite. But can we build one for any and ? This was not known in deterministic time until Cohen’s paper. The main advance was to use a beautiful concept of trees of degree- polynomials with interlacing roots from MSS and improve it so that the requisite trees have polynomial rather than exponential maximum branch length, which governs the time of the algorithm. The paper well rewards further reading.
What this does is put bipartite Ramanujan graphs onto the list of structures that we can apprehend and use in deterministic polynomial-time algorithms. Thus Cohen added his name to the honor roll of those constructing good expanders and making random objects concrete.
Voevodsky’s work is set against a backdrop where mathematicians do the following over and over again. They start by knowing how to build certain kinds of algebraic structures on, say, differential manifolds or curves. They then want to carry this structure over to more general settings.
Voevodsky won his Fields Medal for this kind of work. He showed how to carry over topological ideas of homotopy from differential manifolds to algebraic manifolds—that is, any manifold that is the zero set of a polynomial. We discussed homotopy and its computational relevance in our own terms here. To quote his 2002 Fields review by Christophe Soulé:
It is quite extraordinary that such a homotopy theory of algebraic manifolds exists at all. In the fifties and sixties, interesting invariants of differentiable manifolds were introduced using algebraic topology. But very few mathematicians anticipated that these “soft” methods would ever be successful for algebraic manifolds. It seems now that any notion in algebraic topology will find a partner in algebraic geometry
Voevodsky’s medal was also for his proof of a noted conjecture by John Milnor that a structure of algebraic groups he built on a field of characteristic other than 2, with the algebra taken mod 2, would be isomorphic to an étale cohomology of with coefficients mod 2. Voevodsky overcame difficulty with tools from algebraic -theory by developing and systematizing prior ideas of motivic cohomology that, as the review says, “turned out to be more computable.” He later proved the general conjecture for moduli other than 2, drawing on work by others in the meantime.
In the most ambitious cases of such “carry-overs,” however, mathematicians are able to prove that the objects needed for such structure exist but not concretely. It’s not just that the objects cannot be apprehended, but that these proofs are often not subject to being algorithmically checked.
To remedy this, Voevodsky delved deeper into constructive mathematics, which aims not to limit knowledge but rather to streamline and solidify it. He built up homotopy type theory (HoTT), which we talked about here. His ideas were programmed in the software system Coq, which grew out of Thierry Coquand’s “calculus of constructions” in partnership with Gérard Huet. Thus he was led to consider the foundations of mathematics as deeply as David Hilbert did a century ago.
The term “foundations,” which lives in the names of conferences such as FOCS and MFCS, tends to be spoken as an umbrella term for “theory.” We have argued that it ought to mean continued and concerted attention to the core problems in our field like —notwithstanding that many of them are “like” in the sense of not having budged for decades. But when Voevodsky talked about foundations, he really meant the foundations: how do we know the whole edifice we have built out of proofs—all kinds of proofs—won’t collapse?
We have blogged about Ed Nelson’s attempts to show that Peano Arithmetic is inconsistent. Voevodsky took this possibility seriously. In memorials to Voevodsky on the HoTT Google Group, André Joyal contributed the following:
My first contact with Vladimir and his ideas was at a meeting in Oberwolfach in 2011. He gave a series of talks on constructive mathematics and homotopy theory, framed as a tutorial with the proof assistant Coq. His notion of a contractible object and of an equivalence were striking. I had a hard time understanding his ideas, because they were described very formally. He apparently distrusted informal expressions of mathematical ideas. One evening, he expressed the opinion that Peano arithmetic was inconsistent! He later came to distrust the applications of his ideas to homotopy theory!
Voevodsky indeed gave a talk at IAS titled, “What If Current Foundations of Mathematics are Inconsistent?” Very controversially, it tries to turn the understanding of Kurt Gödel’s Second Incompleteness Theorem on its head as a vehicle for possibly proving the inconsistency of certain classical first-order theories. Whereas, he concluded:
In constructive type theory, even if there are inconsistencies, one can still construct reliable proofs using the following “workflow”:
- A problem is formalized.
- A solution is constructed using all kinds of abstract concepts. This is the creative part.
- An algorithm which verifies “reliability” is applied to the constructed solution (e.g., a proof). If this algorithm terminates then we know we have a good solution of the original problem. If not, then we may have to start looking for another solution.
The workflow on this effort will continue. The IAS announcement notes that a memorial workshop is being planned and more information will be available soon. Update 10/7: The IAS posted a full obituary and there are also one in today’s New York Times and one in today’s Washington Post.
Again we express our condolences to their family, loved ones, and colleagues, and the same to everyone affected by the horror in Las Vegas.
This is the 750th post on this blog. We were holding onto two other ideas for marking this milestone, while busy with papers and much else these past two weeks ourselves. Those will still come out in upcoming weeks.
[added update]
Kathryn Farley obtained her PhD from Northwestern University in performance studies in 2007. After almost a decade working in that area, she has just started a Master’s program at New York University in a related field called drama therapy (DT).
Today, I thought I would talk about the math aspects of DT.
Okay so why should I report on DT here? It seems to have nothing in common with our usual topics. But I claim that it does, and I would like to make the case that it is an example of a phenomenon that we see throughout mathematics.
So here goes. By the way—to be fair and transparent—I must say that I am biased about Dr. Farley, since she is my wonderful wife. So take all I say with some reservations.
The whole point is that understanding what DT is hard, at least for me. But when I realized that it related to math it became much clearer to me, and I hope that it may even help those in DT to see what they do in a new light. It’s the power of math applied not to physics, not to biology, not to economics, but to a social science. Perhaps I am off and it’s just another example of “when you have a hammer, the whole world looks like a nail.” Oh, well.
I asked Kathryn for a summary of DT and here it is:
Drama therapy uses methods from theatre and performance to achieve therapeutic goals. Unlike traditional “talk” therapy, this new therapeutic method involves people enacting scenes from their own lives in order to express hidden emotions, gain valuable insights, solve problems and explore healthier behaviors. There are many types of DT, but most methods rely on members of a group acting as therapeutic agents for each other. In effect, the group functions as a self-contained theatre company, playing all the roles that a performance requires—playwright, director, actors, stagehands, and audience. The therapist functions as a producer, setting up the context for each scene and soliciting feedback from the audience.
Kathryn’s summary of DT is clear and perhaps I should stop here and forget about linking it to math. But I think there is a nice connection that I would like to make.
Since Kathryn is a student again, and students are assigned readings—there is a lot of reading in DT—you may imagine that she has been sharing with me a lot of thoughts on her readings and classes. I have listened carefully to her, but honestly it was only the other day, in a cab going to Quad Cinema down on 13th St., that I had the “lightblub moment.” I suddenly understood what she is studying. Perhaps riding in a cab helps one listen: maybe that has been studied before by those in cognitive studies.
What I realized during that cab ride is that DT is an example of a generalization of another type of therapy. If the other therapy involves people–including the therapist—then DT is the generalization to . We see this all the time in math, but it really helped me to see that the core insight—in my opinion—is that DT has simply moved from to or more.
We this type of generalization all the time in math. For example, in communication complexity the basic model is two players sending each other messages. The generalization to more players creates very different behavior. Another example is the rank of a matrix. This is a well understood notion: easy to compute and well behaved. Yet simply changing from a two-dimensional matrix to a three-dimensional tensor changes everything. Now the behavior is vastly more complex and the rank function is no longer known to be easy to compute.
Here is an example of how DT could work—it is based on a case study Kathryn told me about.
Consider Bob who is seeing Alice who is Bob’s therapist. Alice is trained in some type of therapy that she uses via conversations with Bob to help him with some issue. This can be very useful if done correctly.
What DT is doing in letting be or more is a huge step. We see this happen all the time in mathematics—-finally the connection. Let’s look at Bob and Alice once more. Now Alice is talking with Bob about an issue. To be concrete let’s assume that Bob’s issue is this:
Bob has been dating two women. His dilemma is, which one should he view as a marriage prospect? He thinks both would go steady with him but they are very different in character. Sally is practical, solid, and interesting; Wanda is interesting too but a bit wild and unpredictable. Whom should he prefer?
The usual talk therapy would probably have Alice and Bob discuss the pros and cons. Hopefully Alice would ask the right question to help Bob make a good decision.
The DT approach would be quite different. Alice would have at least one other person join them to discuss Bob’s decision. This would change the mode from direct “telling” to a more indirect story-line. In that line it might emerge that Bob’s mother is a major factor in his decision—even though she passed away long ago. It might come out that his mom divorced his dad when he was young because he was too staid and level-headed. Perhaps this would make it clear to Bob that his mother was really the reason he was even considering Wanda, the wild one.
What is so interesting here is that by using more that just Bob, by setting , Alice can make the issues much more viivid for Bob.
The more I think about it, the idea of people involved is the root. Naturally anything with more than two people transits from dialogue to theater. So the aspect of `drama’ is not primordial—it is emergent. Once you say , what goes down as Drama Therapy in the textbooks flows logically and sensibly—at least it does to me now.
This is accompanied by a phase change in complexity and richness. As such it parallels ways we have talked about mathematical transitions from the case of to on the blog before. Maybe DT even implements a strategy I heard from Albert Meyer:
Prove the theorem for and then let go to infinity.
Does this connection help? Does it make any sense at all?
]]>
It was just Ken’s birthday
Kenneth Regan’s birthday was just the other day.
I believe I join all in wishing him a wonder unbirthday today.
The idea of unbirthday is due to Lewis Carroll in his Through the Looking-Glass: and is set to music in the 1951 Disney animated feature film Alice in Wonderland. Here is the song:
MARCH HARE: A very merry unbirthday to me
MAD HATTER: To who?
MARCH HARE: To me
MAD HATTER: Oh you!
MARCH HARE: A very merry unbirthday to you
MAD HATTER: Who me?
MARCH HARE: Yes, you!
MAD HATTER: Oh, me!
MARCH HARE: Let’s all congratulate us with another cup of tea A very merry unbirthday to you!
MAD HATTER: Now, statistics prove, prove that you’ve one birthday
MARCH HARE: Imagine, just one birthday every year
MAD HATTER: Ah, but there are three hundred and sixty four unbirthdays!
MARCH HARE: Precisely why we’re gathered here to cheer
BOTH: A very merry unbirthday to you, to you
ALICE: To me?
MAD HATTER: To you!
BOTH: A very merry unbirthday
ALICE: For me?
MARCH HARE: For you!
MAD HATTER: Now blow the candle out my dear And make your wish come true
BOTH: A merry merry unbirthday to you!
Ken is best known for work in theory and in particular in almost all aspects of complexity theory. But I wanted—in the spirit of an unbirthday—to point out that Ken is quite active in many other areas of computer science research. Here is one example that is joint with Tamal Biswas: Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data. We discussed it before here.
The problem is that Ken and Tamal want to be able to study levels of play in chess but are stalled currently by issues Ken raised last May in this blog. I wish them well in making strides to better understand how to model game play in chess that captures the notion of levels.
For me the following game created by Ayala Arad and Ariel Rubinstein really helps me understand the kind of thing Ken and Tamal are interested in capturing.
You and another player are playing a game in which each player requests an amount of money. The amount must be (an integer) between 11 and 20 shekels. Each player will receive the amount he requests. A player will receive an additional amount of 20 shekels if he asks for exactly one shekel less than the other player. What amount of money would you request?
The point is there are levels of thinking that a player can naturally go through. Here is a quote from their paper that should give the flavor of what is going on: The choice of 20 is a natural anchor for an iterative reasoning process. It is the instinctive choice when choosing a sum of money between 11 and 20 shekels (20 is clearly the salient number in this set and “the more money the better”). Furthermore, the choice of 20 is not entirely naive: if a player does not want to take any risk or prefers to avoid strategic thinking, he might give up the attempt to win the additional 20 shekels and may simply request the highest certain amount.
Read the paper for how Arad and Rubinstein analyze the game. The trouble is that if you take a risk and select 19 then you at least have a chance to get the bonus 20: if you reason that your opponent is playing safe that is a great play. Of course if they reason the same way, then you lose one shekel. This type of “levels” of playing are central to many games including chess.
An example is that a move may have one refutation the computer at high depth can spot but otherwise bring higher returns than playing it safe with the computer’s “best” move. How can we judge when such moves can be expected to pay off? Risky opening `novelties’ have been tried many times in chess, and in one famous game where Frank Marshall had saved up a gambit for nine years, the human player José Capablanca did find the refutation at the board.
We all wish that Ken has many more birthdays and unbirthdays. We also hope he makes progress on his open problems about depth of thinking and levels of play. What should you select in the simple coin game?
]]>
A new approximation algorithm
Composite of src1, src2, src3 |
Ola Svensson, Jakub Tarnawski, and László Végh have made a breakthrough in the area of approximation algorithms. Tarnawski is a student of Svensson at EPFL in Lausanne—they have another paper in FOCS on matchings to note—while Végh was a postdoc at Georgia Tech six years ago and is now at the London School of Economics.
Today Ken and I want to highlight their wonderful new result.
Svensson, Tarnawski, and Végh (STV) have created a constant-factor approximation algorithm for the asymmetric traveling salesman problem (ATSP). This solves a long-standing open problem and is a breakthrough of the first order.
Recall that the traveling salesman problem (TSP) is the problem of finding the cheapest tour that visits all vertices of a weighted undirected graph at least once, and the ATSP allows the graph to be directed. This difference changes the problem tremendously—it also opens up new applications. Think airline routes for dates before and after the recent eclipse: the one-way fares were not symmetric.
Below is an optimal TSP tour of 13,509 incorporated cities in the continental United States as of 1998 when it was solved. Note at bottom right that the tour includes a visit to Key West; our hearts are with all those affected by Hurricane Irma.
TSP website source |
This uses the Euclidean distance. It is common to allow any metric that satisfies the triangle inequality: for any nodes , the cost of going from to is no more than that of going from to and then from to . If we have any cost function but allow the salesman to “pass through” cities already visited, we can re-define to be the minimum cost allowing transit through one or more . Then satisfies the inequality and gives the same optimum. Conversely, if satisfies the inequality then the pass-through rule is superfluous. So allowing it is equivalent to having the triangle inequality.
Without the rule or the inequality, we could take any hard instance graph of the (directed or undirected) Hamiltonian cycle (HC) problem and add some high-cost edges to get a graph . Then approximating TSP or ATSP for is equivalent to solving HC for . Assuming the triangle inequality avoids such cases and is in force for STV.
What is so interesting about the difference between the TSP and the ATSP is that a constant approximation has long been known for the TSP. Indeed, getting a factor of is easy by finding a minimum spanning tree , considering first the tour that uses the pass-through rule to travel each edge forward-and-back, and finally improves by going directly to the next unvisited vertex in that tour rather than pass through. Getting the best-known factor is based on a simple, but very clever, algorithm by Nicos Christofides. It finds a minimum-weight perfect matching for the nodes of odd degree in using the edges they induce in , creates an Euler cycle from (traversing any edge common to and twice), and finally improves as above. There has been progress on special cases—see here, but Christofides’s algorithm has withstood attempts to improve it for over forty years. Pretty impressive.
The input for ATSP is a directed graph together with a . The graph is strongly connected, meaning that for all nodes there is some path from to . If there is no edge we could add one and define to be the minimum path cost as above, and so make into a complete directed graph. However, the STV paper does a series of reductions through problems in which the absence of edges matters.
For intuition, picture not a salesman but a big delivery truck doing its rounds on the one-way streets of Manhattan. By using more than one node at intersections we can model another feature of Manhattan, which is often not being able to make a left or even right turn. This makes Manhattan behave like a non-planar graph and turns the counting measure of blocks you must travel into a non-Euclidean distance, but still one obeying the triangle inequality.
Cropped from free Flickr source |
Now picture an army of scooters or bicycles, each taking one or a few packages—fractions of the job. They are still subject to the road rules and cost measure (not like drone delivery which is illegal in most cities). Modeling them yields the linear programming (LP) relaxation of (A)TSP studied by Michael Held and Dick Karp, which we discussed here. Its optimum is a lower bound on the optimal amount of work for the truck.
The point is that if we can find tours with cost only a constant factor higher than then we’ve automatically achieved a constant factor approximation of . The second point is that the LPs defining , while large, are nice to analyze. So their main theorem is:
Theorem 1 There is a constant and a polynomial-time algorithm that, given any , returns a tour of cost at most .
Well, the constant proved by STV is , not “galactic” but big. What is significant is that all previous proved overheads grew as or similar in the number of nodes . Once we achieve a constant factor we can think about improving it…
We quote their summary of the proof:
We now combine the techniques and algorithms of the previous sections to obtain a constant-factor approximation algorithm for ATSP. In multiple steps, we have reduced ATSP to finding tours for vertebrate pairs. Every reduction step was polynomial, and increased the approximation factor by a constant. Hence, altogether they give a constant-factor approximation algorithm for ATSP.
See the 39-page paper the meaning of “vertebrate pair” and words like “laminarly” which we didn’t know were legal in Scrabble. They are anyway far removed from the classic vocabulary “spanning tree,” “Euler tour,” “perfect matching,” and “Hamilton cycle” which sufficed for Christofides’s still-frontline algorithm.
What we note here is their proof structure using reductions to progressively-refined problems. They use previous steps to build algorithms for each problem, with the respective names:
Yes, the subscripts of the second and fourth stand for “vertebrate” and “laminar” while the third algorithm works on a reduction of the second problem to “irreducible” instances. Each step has its own approximation ratio, whose combination becomes for a customizable which they bound by .
Our point is that the proof makes a series of seemingly incremental refinements with loose ends left unvisited until we see at some step that it can “close” and finish its objective—which makes the excursion into a tour. We want to think more deeply about other potential progress of this form that may be capable of breakthroughs in our field.
They never claim that they are trying to optimize the factors in their reduction steps and the final constant . An obvious question that no doubt will be solved is to improve the constant. Is there a chance to get a small one?
]]>
A gathering this Labor Day in Rochester
Announcement source |
Joel Seiferas retired on December 31, 2016 and is now a professor emeritus in the University of Rochester Computer Science Department.
Today Ken and I wish to talk about his party happening this Labor Day—September 4th.
Joel retired on a holiday—New Year’s Eve—and is having his retirement celebration on another holiday, Labor Day. The former marks the end of each year, and the latter the cultural end of each summer. Labor Day in the US is the first Monday in September. As shown by this chart in Wikipedia’s article, there is some complexity in its otherwise periodic structure—can you name the pope responsible for it?
I’ve never liked Labor Day owing to summer ending, school starting, and another reason—a fact of calendrical life that I share with Ken. Did Jack Benny’s feelings about Valentine’s Day change after he turned 39?
Joel asked his department in lieu of a gold watch or a series of talks praising his decades of research, he wanted a series of talks that would be accessible and enjoyable to everyone. A somewhat novel idea, since many talks are not accessible to all.
I would argue that both could be achieved—Joel’s work was often technical, but could definitely be explained to a general audience. For example, at least in my opinion, his paper “Two Heads are Better than Two Tapes” could be fun to hear about. Okay it may not be as exciting as hearing about self driving cars, AI programs that can outplay humans at Go, or a proof that P=NP. But there is something—I believe—beautiful about results of Joel’s that explain the power of various basic computational devices.
But no one asked me. So Joel got his wish and is receiving four talks by leaders in our field that should be enjoyable for everyone:
They have posted a wonderful statement about Joel here. We especially enjoy how it ends:
The fact that Joel is completely ego-free and did all that work (and the single-handed development of widely used bibliographic bridging resources) purely for the advancement of the science—while also mentoring (especially in his ambitious and challenging courses) most of the theory students the department has educated, shaping the department’s faculty recruiting in theoretical computer science, serving as the department’s chair, and wholeheartedly serving the University in his many years on the Academic Honesty Board—makes him all the more of an inspiration to those who know him.
By coincidence, the great economist and public intellectual, Thomas Sowell, retired from his decades of column writing at the same time that Joel retired. At the end of a 2004 interview, Sowell was asked how he would like to be remembered, and he replied: “Oh, heavens, I’m not sure I want to be particularly remembered. I would like the ideas that I’ve put out there to be remembered.” Although our dear colleague Joel is self-effacing and modest, there is no doubt that the deep understanding of computer science that he has contributed will be remembered beyond Joel’s life and ours. We are deeply grateful to him for those ideas, and for his warm, wise friendship.
The earlier parts of the statement include some of Joel’s work and its impact. We’ll explain two of his results that are referenced.
One basic fact about general-purpose computing is the ability of one program to run any given program , even itself. The program is encoded in some fashion and may be much larger than . In practice we’re not aware of a time penalty for running this way rather than “natively” because serving programs is what a general-purpose computer does. But in the underlying model of computation there is a hit.
In some models the hit is only a constant factor depending on the size of —and importantly, not on the size of the input being run. One of the quirks of the standard multi-tape deterministic Turing machine (DTM) model is that it multiplies the constant factor by an extra overhead—indeed a factor for -step computations where we assume so all of is read. This is not just a feature of running but governs the best-known simulation of time- deterministic computations by families of Boolean circuits, which have size . It also affects how tightly we can separate time classes. By diagonalization we can prove:
Theorem 1 Let with being “time constructible” in the sense that some DTM given in unary halts in exactly steps. Then we can find a language that is accepted by a DTM in time but not accepted by any DTM in time. In symbols, , with meaning proper containment.
For separating we can improve the factor via certain “padding and translation” techniques. We can relax the first condition to read for some fixed . Can we make it go away completely? For DTMs there are specific problems that arise when we try to push the factor any further.
So what about nondeterministic Turing machines, that is, NTMs? It would seem to be harder to get a tight diagonalization to work because we cannot simply interchange ‘yes’ and ‘no’ answers to negate. However, Joel, as part of his thesis work under Albert Meyer, with Patrick Fischer making a trio, proved that one can almost eliminate the factor altogether:
Theorem 2 Let with again being the running time of some DTM. Then .
The only difference from a constant-factor overhead is having not on the left-hand side. The ‘+1’ has always struck me (Ken, writing this section). At polynomial time levels we can ignore it and even when we can drop it: . But with we have At double-exponential time levels the ‘+1’ makes an even greater relative difference. Its employment in the proof seems innocuous but remains indelible.
As the Rochester page mentions, the tightness of the theorem for was employed by Ryan Williams to prove his breakthrough lower bounds against . Our initial post on Ryan’s results brought out the connection to Joel.
This is one of Joel’s later results. It is joint with Tao Jiang and Paul Vitányi. The result is:
Theorem 3 The language of strings of the form where is a prefix of the binary string , which can be recognized in real time by a DTM with two heads on one worktape, cannot be recognized in real time by a DTM with two worktapes having one head each.
It is important to specify that the input appears on a read-only tape whose head cannot move left. The “one tape” and “two tapes” are initially blank.
The first statement is easy. As the machine reads the input it writes it down on its tape with one head and cleverly leaves the second head at the beginning. Then when the input hits the symbol the second head starts checking the remaining input against the written tape that stores . The two heads compare characters in lockstep and reach a verdict by the time the input has ended. The definition of real time often states that the input tape head advances at each step, but it can be relaxed to allow pausing it, provided there is a fixed finite number independent of the input such that always reads a fresh character within steps.
What happens when the tape heads are on separate tapes? can copy the part to one tape, rewind its head to the left edge, and do the same lockstep comparison with the input tape head reading and the second tape reading the copy of . The rub is that the pause for rewinding does not have a fixed finite bound. After copying the second head is just in the wrong position to begin comparing .
The second statement may now look obvious to you, but how about if has three tapes? Does it look equally obvious? In fact, Fischer and Meyer, working with Arnold Rosenberg in 1967, showed that a clever folding and mirroring scheme among three single-head worktapes enables to keep tabs on both ends of the part at all times while copying it. This enables to be recognized in real time. Moreover Joel, with Benton Leong, showed in 1977 that any computation with multi-head worktapes can be simulated in lockstep by enough single-head tapes—even when the `tapes’ are multi-dimensional.
That left single-head worktapes as the next case to try for recognizing . As our readers know, proving something to be impossible in computational theory is really hard. So is the proof in Joel’s joint paper, across eight pages of strategy and “crunch.” The question had after all been open for several decades. If you wonder whether other simple-looking problems are still open, see these two posts, which also reference related work by the FMR trio.
We wish Joel a great retirement and hope that the talks are as wonderful as we expect them to be. Ken is also looking forward to seeing old friends again there. Our thoughts are also with colleagues and friends in Houston and adjoining Gulf Coast areas.
]]>
A topical look at Norbert Blum’s paper and wider thoughts.
Cropped from source |
Thales of Miletus may—or may not—have accurately predicted one or more total solar eclipses in the years 585 through 581 BCE.
Today we discuss the nature of science viewed from mathematics and computing. A serious claim of by Norbert Blum has shot in front of what we were planning to say about next Monday’s total solar eclipse in the US. Update 9/2/17: Blum has retracted his claim—see update at end.
Predicting eclipses is often hailed as an awakening of scientific method, one using mathematics both to infer solar and lunar cycles and for geometrical analysis. The aspects of science that we want to talk about are not “The Scientific Method” as commonly expounded in step-by-step fashion but rather the nature of scientific knowledge and human pursuits of it. We start with an observation drawn from a recent article in the Washington Post.
Despite several thousand years of experience predicting eclipses and our possession of GPS devices able to determine locations to an accuracy of several feet, we still cannot predict the zone of totality any closer than a mile.
The reason is not any fault on Earth but with the Sun: it bellows chaotically and for all we know a swell may nip its surface yea-far above the lunar disk at any time. Keeping open even a sliver of the nuclear furnace changes the character of the experience.
The Post’s article does a public service of telling people living on the edge of the swath not to think it is a sharp OFF/ON like a Boolean circuit gate. People must not always expect total sharpness from science. Happily there is a second point: you don’t have to drive very far to get a generous dose of totality. This is simply because as you move from the edge of a circle toward the center, the left-to-right breadth of the interior grows initially very quickly. This is our metaphor for how science becomes thick and solid quickly after we transit the time of being on its edge.
Incidentally, your GLL folks will be in the state of New York next week, nowhere near the swath. Next time in Buffalo. Also incidentally, Thales is said to be the first person credited with discovering mathematical theorems, namely that a triangle made by a circle’s diameter and another point on the circle is a right triangle and that lengths of certain intersecting line segments are proportional.
The transit time is our focus on this blog: the experience of doing research amid inspirations and traps and tricks and gleams and uncertainty. Swaths of our community are experiencing another transit right now.
Norbert Blum has claimed a proof that P is not equal to NP. In his pedigree is holding the record for a concrete general Boolean circuit lower bound over the full binary basis for over 30 years—until it was recently nudged from his to His paper passes many filters of seriousness, including his saying how his proof surmounts known barriers. Ken and I want to know what we all want to know: is the proof correct?
More generally, even if the proof is flawed, does it contain new ideas that may be useful in the future? Blum’s proof claims a very strong lower bound of on the circuit complexity of whether a graph of edges has a clique of size . He gets a lower bound of for another function, where the tilde means up to factors of in the exponent. We would be excited if he had even proved that this function has a super-linear Boolean complexity.
Blum’s insight is that the approximation methods used in monotone complexity on the clique function can be generalized to non-monotone complexity. It is launched by technical improvements to these methods in a 1999 paper by Christer Berg and Staffan Ulfberg. This is the very high level of what he tries to do, and is the one thing that we wish to comment on.
Looking quickly at the 38 page argument an issue arose in our minds. We thought we would share this issue. It is not a flaw, it is an issue that we think needs to be thought about more expressly.
As we understand his proof it takes a boolean circuit for some monotone function and places it in some topological order. Let this be
So far nothing unreasonable. Note is equal to , of course. Then it seems that he uses an induction on the steps of the computation. Let be the information that he gathers from the first steps. Technically tells us something about the computation so far. The punch line is then that tells us something impossible about which is of course . Wonderful. This implies the claimed lower bound on which solves the question.
The trouble with this is the following—we studied this before and it is called the “bait and switch” problem. Let be some random function of polynomial Boolean complexity and let . Then assume that there is a polynomial size circuit for . Clearly there is one for and too. Create a circuit that mixes the computing of and in some random order. Let the last step of the circuit be take and and form , Note this computes .
The key point is this:
No step of the computation along the way has anything obvious to do with . Only at the very last step does appear.
This means intuitively to us that an inductive argument that tries to compute information gate by gate is in trouble. How can the ‘s that the proof compute have any information about during the induction? This is not a “flaw” but it does seem to be a serious issue.
If nothing else we need to understand how the information suddenly at the end unravels and reveals information about . I think this issue is troubling—at least to us. It is important to note that this trick cannot seem to be applied to purely monotone computations, since the last step must be non-monotone—it must compute the function. The old post also notes a relation between the standard circuit complexity and the monotone complexity of a related function .
While we are grappling with the paper and writing these thoughts we are following an ongoing discussion on StackExchange and in comments to a post by Luca Trevisan, a post by John Baez, and a Hacker News thread, among several other places.
The paper has a relatively short “crunch” in its sections 5 and 6, pages 25–35. These follow a long section 4 describing and honing Berg and Ulfberg’s work. What the latter did was show that a kind of circuit approximation obatined via small DNF formulas in Alexander Razborov’s famous lower–bound papers (see also these notes by Tim Gowers) can also be obtained with small CNF formulas. What strikes us is that Blum’s main theorem is literally a meta-theorem referencing this process:
Theorem 6: Let be any monotone Boolean function. Assume that there is a CNF-DNF-approximator which can be used to prove a lower bound for . Then can also be used to prove the same lower bound for .
The nub being discussed now is whether this theorem is “self-defeating” by its own generality. There may be cases of that meet the hypotheses but have polynomial . The StackExchange thread is discussing this for functions of Boolean strings denoting -node -edge graphs that give value whenever the graph is a -clique (with no other edges) and when it is a complete -partite graph. Such a function related to the theta function of László Lovász (see also “Theorem 1” in this post for context) have polynomial complexity, meet the conditions of Razborov’s method, and don’t appear to obstruct Berg and Ulfberg’s construction as used by Blum. But if they go through there, and if Blum’s further constructions using an inductively defined function would go through transparently, then there must be an error.
Update 8/22/17: This objection and the inference of error have been verified. A Wikipedia page for Éva Tardos’s function has been created. It seems to us that another form of the monotone function besides hers using integer approximations is just if the Lovász number of the complement of (presented as an edge list) is the number of 0’s in
The details of in section 5 have also been called into question. We are unsure what to say about a claim by Gustav Nordh that carrying out the inductive construction as written yields a false conclusion that the monomial is an implicant of a formula equivalent to . There are also comments about unclarity of neighboring definitions, including this from Shachar Lovett in Luca’s blog since we drafted this section.
But this leads us to a larger point. Both of us are involved right now with painstaking constructions involving quantum circuits and products of permutations that we are programming (in Python). Pages 27–28 of Blum’s paper give a construction that can be programmed. If this is done enough to crank out some examples, then we may verify that potential flaws crop up or alternatively bolster confidence in junctures of the proof so as to focus on others first. This ability is one way we are now empowered to sharpen “fuzzy edges” of our science.
Is the proof correct? Or will it fall into eclipse? We will see shortly no doubt. Comparing this table of eclipses since 2003 and Gerhard Woeginger’s page of claimed proofs over mostly the same time period, we are struck that ‘‘ and ‘‘ claims have been about twice as frequent as lunar and solar eclipses, respectively.
Update 8/18: This comment by user “vloodin”, whom we remember well from the discussion here of Vinay Deolalikar’s proof attempt seven years ago, lays out the apparent flaw in the paper in more detail.
Update 9/2: On 8/30, Blum posted to ArXiv a v2 that comprises the comment, “The proof is wrong. I shall elaborate precisely what the mistake is. For doing this, I need some time. I shall put the explanation on my homepage.” We have no further substantial information.
Update 10/18: Norbert Blum has posted a detailed but short explanation of the mistake.
[restored missing links; a few word and format changes, Uffberg->Ulfberg, updates 8/18 and 8/22]