Amazon source |
When Raymond Smullyan was born, Emanuel Lasker was still the world chess champion. Indeed, of the 16 universally recognized champions, only the first, Wilhelm Steinitz, lived outside Smullyan’s lifetime. Smullyan passed away a week ago Monday at age 97.
Today, Dick and I wish to add some thoughts to the many comments and tributes about Smullyan.
He was known for many things, but his best-known contributions were books with titles like: “What Is the Name of This Book?” Besides their obit in Sunday’s paper, the New York Times ran a sample of puzzles from these books. No doubt many enjoyed the books, and many may have been moved to study “real” logic and mathematics. His book To Mock a Mockingbird dressed up a serious introduction to combinatory logic. This logic is as powerful as normal predicate calculus but has no quantified variables. So making it readable to non-experts is a tribute to Smullyan’s ability to express deep ideas in ways that were so clear.
Smullyan earned his PhD under the guidance of Alonzo Church in 1959 at age 40. When I (Ken) was an undergrad at Princeton, I remember thinking, “gee, that’s old.” Well there’s old and there’s old… As we noted in our profile of him 19 months ago, he was still writing books at a splendid pace at age 95, including a textbook on logic.
Neither of us met him, so we never experienced his tricks and riddles firsthand, but we had impressions on the serious side. Here are some of Dick’s, first.
Smullyan and Melvin Fitting, who was one of his nine PhD students, wrote a wonderful book on set theory: Set Theory and the Continuum Problem (Oxford Logic Guides), 1996.
The blurb at Amazon says:
Set Theory and the Continuum Problem is a novel introduction to set theory … Part I introduces set theory, including basic axioms, development of the natural number system, Zorn’s Lemma and other maximal principles. Part II proves the consistency of the continuum hypothesis and the axiom of choice, with material on collapsing mappings, model-theoretic results, and constructible sets. Part III presents a version of Cohen’s proofs of the independence of the continuum hypothesis and the axiom of choice. It also presents, for the first time in a textbook, the double induction and superinduction principles, and Cowen’s theorem.
The blurb at Amazon says:
A lucid, elegant, and complete survey of set theory [in] three parts… Part One’s focus on axiomatic set theory features nine chapters that examine problems related to size comparisons between infinite sets, basics of class theory, and natural numbers. Additional topics include author Raymond Smullyan’s double induction principle, super induction, ordinal numbers, order isomorphism and transfinite recursion, and the axiom of foundation and cardinals. The six chapters of Part Two address Mostowski-Shepherdson mappings, reflection principles, constructible sets and constructibility, and the continuum hypothesis. The text concludes with a seven-chapter exploration of forcing and independence results.
Wait—are these the same book? Yes they are, and this is one way of saying the book is chock full of content while being self-contained. Neither blurb mentions the part that most grabbed me (Dick). This is their use of modal logic to explain forcing.
Modal logic has extra operators which is usually interpreted as “Necessarily ” and meaning “Possibly “; like the quantifiers and they obey the relation
Saul Kripke codified models as directed graphs whose nodes each have an interpretation of . Then holds at a node if all nodes reachable from satisfy (and hence ), while holds at if holds at some node reachable from . The nodes are “possible worlds.”
What Fitting and Smullyan do is define a translation from set theory to their modal logic such that is valid if and only if . Then the game is to build a node such that every Zermelo-Fraenkel axiom gets a but the translated continuum hypothesis fails in some world reachable from .
One reprinting of the book posed an inadvertent puzzle: many mathematical symbols were omitted. Symbols for set membership, subset, quantifiers, and so on were missing. As one online reviewer noted, “it really does make the book useless.” My copy at least was unaffected.
A week after our 7/28/15 Smullyan post mentioned above, I (Ken again) went with my family to Oregon for vacation. This included a trip to Powell’s Books in Portland, which may be the largest independent bookstore in the world. The math and science sections were larger and more eclectic than any Barnes or college bookstores I’ve seen. There were several copies of the 1961 Princeton Annals paperback edition of Smullyan’s PhD thesis, Theory of Formal Systems, on sale for $20. I felt spurred to buy one and felt it could be useful because of Smullyan’s penchant for combinatorial concreteness.
Sure enough, section B of Chapter IV formulates the rudimentary relations crisply and clearly. Here are Smullyan’s words atop page 78 on their motivation:
As remarked in the Preface, our proof follows novel lines in that all appeal to the traditional number theory devices … in the past—e.g., prime factorization, congruences and the Chinese remainder theorem—are avoided. Thus Gödel’s program of establishing incompleteness, even of first-order theories involving plus and times as their some arithmetical primitives, can, by the methods of this section, be carried out without appeal to number theory.
Simply said: Smullyan avoids all the complicated numerical machinery needed in the usual treatments and makes them—like magic—disappear. The main predicate needed by Smullyan is is a power of , provided is prime. From that he defined a predicate meaning that the binary representation of is the concatenation of those of and . The formal language is still that of logic and numbers but the operations are really manipulating strings. His predicates were able to fulfill all roles for which the class of primitive recursive relations and subclasses involving and had previously been employed.
Smullyan was writing in 1959. Turing machine complexity had not even been defined yet. It transpired later that Smullyan’s class contains nondeterministic logspace and equals the alternating linear time hierarchy. Linear time by itself is annoyingly dependent on machine details, but once you have a couple levels of quantifier alternation the class becomes very robust. Dick employed tricks with alternating linear time in some papers, and such alternation is used to amplify the time hierarchy theorem so that for multitape Turing machines leads to a contradiction higher up. It is also intriguing to see Smullyan write on page 81:
We do not know whether or not all constructive arithmetic attributes are rudimentary. Quine […] has shown that plus and times are first order definable from [concatenation] … but this leaves unanswered the question as to whether plus and times are themselves rudimentary.
The thesis has a footnote saying this had been done for plus, and times follows from remarks above, but whether the predicate is in deterministic linear time remains open. It is likely that Smullyan went through similar concrete thinking as Juris Hartmanis and Richard Stearns when they conjectured no. We wonder if anyone thought to ask Smullyan about this and wish we had.
Our condolences go to his family along with our appreciation for his writings.
Emanuel Lasker philosophized in his 1906 book Struggle about perfect strategists in any walk of life, calling them Macheeides after Greek for “battle species.” An improved edition of Google DeepMind’s AlphaGo probably joined their ranks by beating top human players 60-0 in games played via online servers, not counting one game disrupted by connection failure. The top ranked player, Ke Jie, lost 3 games and landed in the hospital, but desires a 4th try. Where will computer Macheeides strike next?
William Agnew is the chairperson of the Georgia Tech Theoretical Computer Science Club. He is, of course, an undergraduate at Tech with a multitude of interests—all related to computer science.
Today I want to report on a panel that we had the other night on the famous P vs. NP question.
The panel consisted of two local people, one semi-local person, and two remote people—the latter were virtually present thanks to Skype. The local people were Lance Fortnow and myself, and the semi-local one was Dylan McKay. He was present at the panel, and was an undergraduate a bit ago at Tech. He is now a graduate student working with Ryan Williams, who both are moving from Stanford to MIT. The last was the Scott Aaronson who is not only an expert on P vs. NP, but also all things related to quantum computation.
An excellent panel, which I was honored to be part of. We had a large audience of students, who were no doubt there because of the quality of the panelists: although—sandwiches, drinks, and brownies—may have had some effect. They listened and asked some really good questions—some of which we could even answer.
The panel, like many panels, was fun to be on; and hopefully was informative to the audience. I believe the one point that all on the panel agreed with is: we do not know very much about the nature of computation, and there remains many many interesting things to learn about algorithms. I like the way Ryan put it:
We are like cave men and women banging rocks together and trying to see what happens.
This is not an exact quote, but you get the point: we are in the dark about what computers can and cannot do.
I thought I would summarize the panel by listing just a few questions that were discussed.
Scott recently released a 121-page survey on P versus NP. He did not read all of it during the panel. In fact he did not read any of it. It is chock full of content—for instance, the story about the Traveling Salesman Problem and Extended Formulations is told in a long footnote. It was partly supported by NSSEFF, which is not a phonetic spelling of NSF but stands for the National Security Science and Engineering Faculty Fellowship, soon to be renamed for Vannevar Bush.
It takes the stand that . Over half of the non-bibliography pages are in the section 6 titled “Progress.” This is great and completely up to date—not only through Ryan’s circuit lower bounds but also last year’s rebuff to the simplest attack in Ketan Mulmuley’s Geometric Complexity Theory paradigm. It details the three major barriers—relativization, “Natural Proofs,” and “Algebrization”—right in the context of where they impeached progress.
The climax in sections 6.4 and 6.6 is what Scott calls “ironic complexity” and Mulmuley calls the “flip”: the principle that to prove a problem X is harder to compute than we know, one may need to prove that another problem Y is easier to compute than we know. This gets dicey when the problems X and Y flow together. For instance, a natural proof that the discrete logarithm is nonuniformly hard to compute makes it nonuniformly easier to compute. Hence such a proof cannot give any more than a “half-exponential” lower bound (see this for definition). Ryan’s result, which originally gave a “third-exponential” lower bound on circuits for NEXP, proves lower bounds on a exponential scaling of SAT via upper bounds on an -like version; the two are brought a little closer by the former needing only succinct instances. Scott’s survey also emphasizes the fine line between “in-P” and “NP-hard” within cases of computational problems, arguing that if P=NP then we’d have found these lines fuzzed up long ago.
For my part—Ken writing this section—I’ve experienced a phenomenon that calls to mind our old post on “self-defeating sentences.” To evade the natural-proofs barriers, I’ve tried to base hardness predicates on problems that are hard for exponential space in terms of the number of variables in . The idea is to prove that circuits computing need size where is a counting function that scales with the complexity of , in analogy to the Baur-Strassen bounds where is the “geometric degree” of a variety associated to .
The Baur-Strassen tops out at when is a polynomial of degree , and since the low-degree polynomials we care about have , this accounts for why the best known arithmetical circuit lower bounds for natural functions are only . But extending the Baur-Strassen mechanism to double-exponentially growing would yield the exponential lower bounds we strive for. Candidates with names like “arithmetical degree” and (Castelnuovo-Mumford-)“regularity” abound, giving double-exponential growth and -hardness, but the latter sows the self-defeat: The hardness means there is a reduction to from length- instances of problems but the shortness of can make fail. I’ve described a based on counting “minimal monomials” in an ideal associated to , which although not necessarily complete still met the same defeat.
So maybe the constructive fact behind a problem’s NP-completeness also embodies a mirror image of a problem in P, so that we cannot easily tell them apart. NP-complete problems may “masquerade” as being in P—since the known ones are all isomorphic, if one does they all do. This may explain the success of SAT-solvers and suspicion about P=NP being independent as voiced during the panel. It also suggests that intermediate problems may bear attacking first.
At the conclusion of the panel Agnew, who moderated it skillfully, asked the question:
If you had to state what you believe “at gunpoint” what do you believe about P vs. NP?
He was holding a Nerf gun, but we still all seemed to take the threat seriously. Not surprisingly, all but one “fool” said that they believed that P is not equal to NP. The sole fool, me, said that they felt that P=NP. I have stated this and argued it many times before: see this for more details on why.
Of course P vs. NP remains open, and again as the panel all agreed—including the fool—we need new ideas to resolve it.
[deleted sentence about “not considering P=NP”]
Cropped from source |
Shinichi Mochizuki has claimed the famous ABC conjecture since 2012. It is still unclear whether or not the claimed proof is correct. We covered it then and have mentioned it a few times since, but have not delved in to check it. Anyway its probably way above our ability to understand in some finite time.
Today I want to talk about how to check proofs like that of the ABC conjecture.
The issue is simple:
Someone writes up a paper that “proves” that X is true, where X is some hard open problem. How do we check that X is proved?
The proof in question is almost always long and complex. So the checking is not a simple matter. In some cases the proof might even use nonstandard methods and make it even harder to understand. That is exactly the case with Mochizuki’s proof—see here for some comments.
Let’s further assume that the claimed proof resolves X which is the P vs. NP problem. What should we do? There are some possible answers:
Every once in a while he would get up and join our table to gossip or kibitz Then he would add, “The bigger my proof, the smaller the hole. The longer and larger the proof, the smaller the hole.”
…[N]o one wants to be the guy that spent years working to understand a proof, only to find that it was not really a proof after all.
P Does Not Equal NP: A Proof Via Non-Linear Fourier Methods
Alice Azure with Bob Blue
Here the “with” signals that Alice is the main author and Bob was simply a helper. Recall a maxim sometimes credited to President Harry Truman: “It is amazing what you can accomplish if you do not care who gets the credit.”
What do you think about ways to check proofs? Any better ideas?
]]>
Impetus to study a new reducibility relation
See Mike’s other projects too |
Michael Wehar has just earned his PhD degree in near-record time in my department. He has posted the final version of his dissertation titled On the Complexity of Intersection Non-Emptiness Problems which he defended last month. The dissertation expands on his paper at ICALP 2014, joint paper at ICALP 2015 with Joseph Swernofsky, and joint paper at FoSSaCS 2016.
Today, Dick and I congratulate Mike on his accomplishment and wish him all the best in his upcoming plans, which center on his new job with CapSen Robotics near Pitt and CMU in Pittsburgh.
Mike’s thesis features a definition that arguably has been thought about by researchers in parameterized complexity but passed over as too particular. If I were classifying his definition like salsa or cheddar cheese,
then Mike’s definition would rate as
Too sharp, that is, for -notation, thus putting it beyond our usual complexity theory diet. But the problem it addresses has also seemed beyond the reach of complexity theory while we can’t even get a clear handle on versus :
What are the relationships between the levels of deterministic versus nondeterministic time and/or space, and versus or levels of fairly-counted deterministic and nondeterministic space?
“Fairly counted” space means that we cast away the “linear space-compression” theorem by restricting Turing machines to be binary, that is to use work alphabet . The blank character can be read but not written. This doesn’t mean swearing off -notation for space, but does mean employing it more carefully.
Questions about particular polynomial-time exponents and space-usage factors animate fine-grained complexity, which has seen a surge of interest recently. Consider this problem which is central in Mike’s thesis:
The “naive” Cartesian product algorithm works in time basically . The fine-grained question is, can one do better than naive? As we covered before, Mike’s early work showed that getting time would have the sharp consequence , which improved the conclusion in a 2003 paper by Dick with George Karakostas and Anastasios Viglas.
Mike’s thesis expands the mechanism that produces this and related results, with the aim of a two-pronged attack on complexity bounds. He has framed it in terms of parameterized complexity, which we’ll discuss before returning to the thesis.
Consider three classic -complete problems, each given an undirected graph and a number :
Parameterized complexity started with the question:
What happens when some value of the parameter is fixed?
For each problem and each , we can define the “slice” to be the language of strings—here denoting graphs —such that the answer for is “yes.” Each individual slice belongs to : just try all choices of , and when is fixed this gives polynomial time. The fine-grained question is:
Can we solve each slice faster? In particular, can we solve in time where the exponent is fixed independent of ?
The start of much combinatorial insight and fun is that the three problems appear to give three different answers:
The exact for Vertex Cover may be model-dependent; for RAMs we get and indeed time is known. But the fact of its being fixed holds in any reasonable model and classifies Vertex Cover as belonging to the class for fixed-parameter tractable. To address whether Clique and Dominating Set belong to , a novel reducibility and completeness theory was built.
To exemplify an FPT-reduction, consider the problem
When is variable this is a generic -complete problem, so of course we can reduce Clique to it, but consider the reduction done this way: Given and , make an alphabet with a character for each . Code with a table for to write down edges nondeterministically on its tape, then deterministically check whether only different nodes appear in these edges, accepting if so. The computation takes time that is and can be defined solely in terms of . The latter fact makes an FPT-eduction.
FPT-reductions work similarly to ordinary polynomial-time reductions. If our NTM problem belongs to then so does Clique. There is a trickier FPT-reduction the other way, so the problems are FPT-equivalent. Both are complete for the first level, called , of the -hierarchy. Dominating Set is hard for the second level, ; Clique FPT-reduces to it, but not vice-versa unless .
Just like all major -complete problems are related by logspace not just poly-time computable reductions, parameterized complexity has a logspace-founded version. The class consists of parameterized problems whose slices individually belong to —that is, to deterministic logspace—and and are defined similarly via the conditions and . Interestingly, the separation is known by standard diagonalization means, but is not known to be contained in for reasons similar to the vs. problem. A recent Master’s thesis by Jouke Witteveen has details on these classes.
All four of our problems belong to . Furthermore, the slices belong to “almost exactly ” space using binary TMs. The space is needed mainly to write down each candidate and can be quantified as . We could jump from this to define finer variations on and but Mike, joined by Swernofsky, chose to refine the reductions instead. Now we are primed to see how their ideas might impact separations.
Our reduction from Clique to Short NTM Acceptance bumped up to . Blowups in are important to a central concept called kernelization which deserves more space—see this survey. They can be exponential—the algorithm for Vertex Cover hints at how—or even worse. Hence people have refined parameterized reductions according to whether they blow up
But before now, we haven’t noted in this line a strong motivation to limit the blowup even further, such as to or for some fixed constant (however, see this).
Our reduction also bumped up the alphabet size. This appears necessary because for binary NTMs the problem is in : we can traverse the possible guesses made on the tape, and for each one, solve an instance of graph-reachability.
So what can we do if we insist on binary TMs? Mike and Joseph fastened on to the idea of making the in space become the parameter. We can define “Log NTM Acceptance” to be like “Short NTM Acceptance” except that the NTM is allowed space (where is the size of in states). We get a problem in that is in fact complete for under parameterized logspace reductions. Likewise “Log DTM Acceptance” is the case where is deterministic, which is complete for . Then we define “Medium DTM Acceptance” where the question asks about acceptance within time steps regardless of space. Mike’s thesis also covers “Long DTM Acceptance” where the time is .
The in allows a binary TM to write down the labels of nodes in a size- structure at a time. In that way it subsumes the use of in the above three graph problems but is more fluid—the space could be used for anything. Keying on drives the motivation for Mike’s definition of LBL (for “level-by-level”) reductions, whose uniform logspace version is as follows:
Definition 1 A parameterized problem LBL-reduces to a parameterized problem if there is a function computable in space such that for all and , , where is fixed independent of .
That is, in the general FPT-reduction form , we insist on exactly. This reduction notion turns out to be the neatest way to express how Mike’s thesis refines and extends previous work, even his own.
With our Chair, Chunming Qiao, and his UB CSE Graduate Leadership Award. |
The first reason to care about the sharper reduction is the following theorem which really summarizes known diagonalization facts. The subscript reminds us that the TMs have binary work alphabet size; it is not necessary on since it does not affect the exponent.
Theorem 2
- If a parameterized problem is LBL-hard for Log DTM Acceptance, then there is a constant and all , .
- If a parameterized problem is LBL-hard for Log NTM Acceptance, then there is a constant such that for all , .
- If is LBL-hard for Medium DTM Acceptance, then there is a constant such that for all , .
Moreover, if there is a constant such that each belongs to , , or , respectively, then these become LBL-equivalences.
The power, however, comes from populating these hardnesses and equivalences with natural problems. This is where the problem—as a parameterized family of problems asking about the intersection of the languages of -many -state DFAs—comes in. We can augment them by adding one pushdown store to just one of the DFAs, calling them where is the PDA. Then call the problem of whether
by the name . It curiously turns out not to matter whether or the are nondeterministic for the following results:
Theorem 3
- is LBL-equivalent to Log NTM Acceptance. Hence there is a such that for each , .
- is LBL-equivalent to Medium DTM Acceptance. Hence there is a such that for each , .
Between the ICALP 2015 paper and writing the thesis, Mike found that the consequence of (a) had been obtained in a 1985 paper by Takumi Kasai and Shigeki Iwata. Mike’s thesis has further results establishing a whole spectrum of variants that correspond to other major complexity classes and provide similar lower bounds on -levels within them. The question becomes how to leverage the greater level of detail to attack lower bound problems.
The way the attack on complexity questions is two-pronged is shown by considering the following pair of results.
Theorem 4 if and only if and are LBL-equivalent.
Theorem 5 If for every there is a such that , then .
It has been known going back to Steve Cook in 1971 that logspace machines with one auxiliary pushdown, whether deterministic or nondeterministic, capture exactly . Since pushdowns are restricted and concrete, this raised early hope of separating from —intuitively by “demoting” some aspect of a problem down to or which are known to be different from . There was also hope of separating and or at least finding more-critical conditions that pertain to their being equal.
What Mike’s theorem does is shift the combinatorial issue to what happens when the pushdown is added to a collection of DFAs. Unlike the case with an auxiliary pushdown to a space-bounded TM, there is a more-concrete sense in which the relative influence of the pushdown might “attenuate” as increases. Can this be leveraged for a finer analysis that unlocks some secrets of lower bounds?
At the very least, Mike’s LBL notion provides a succinct way to frame finer questions. For example, how does hang on the relation between individual levels of and levels of ? The simplest way to express this, using Mike’s notation for Medium DTM Acceptance and for the analogous parameterized problem for space-bounded acceptance, seems to be:
Theorem 6 if and only if and are LBL-equivalent.
Mike’s thesis itself is succinct, at 73 pages, yet contains a wealth of other variants for tree-shaped automata and other restricted models and connections to forms of the Exponential Time Hypothesis.
How can one appraise the benefit of expressing “polynomial” and “” complexity questions consciously in terms of individual and levels? Does the LBL reduction notion make this more attractive?
It has been a pleasure supervising Mike and seeing him blaze his way in the larger community, and Dick and I wish all the best in his upcoming endeavors.
[fixed first part of Theorem 2, added link to Cygan et al. 2011 paper, added award photo]
Cropped from src1 & src2 in gardens for karma |
Prasad Tetali and Robin Thomas are mathematicians at Georgia Tech who are organizing the Conference Celebrating the 25th Anniversary of the ACO Program. ACO stands for our multidisciplinary program in Algorithms, Combinatorics and Optimization. The conference is planned to be held starting this Monday, January 9–11, 2017.
Today I say “planned” because there is some chance that Mother Nature could mess up our plans.
Atlanta is expected to get a “major” snow storm this weekend. Tech was already closed this Friday. It could be that we will still be closed Monday. The storm is expected to drop 1-6 inches of snow and ice. That is not so much for cities like Buffalo in the north, but for us in Atlanta that is really a major issue. Ken once flew here to attend an AMS-sponsored workshop and play chess but the tournament was canceled by the snowfall described here. So we hope that the planned celebration really happens on time.
Attendance is free, so check here for how to register.
The program has a wide array of speakers. There are 25 talks in all including two by László Babai. I apologize for not listing every one. I’ve chosen to highlight the following for a variety of “random” reasons.
László Babai
Graph Isomorphism: The Emergence of the Johnson Graphs
Abstract: One of the fundamental computational problems in the complexity class NP on Karp’s 1973 list, the Graph Isomorphism problem asks to decide whether or not two given graphs are isomorphic. While program packages exist that solve this problem remarkably efficiently in practice (McKay, Piperno, and others), for complexity theorists the problem has been notorious for its unresolved asymptotic worst-case complexity.
In this talk we outline a key combinatorial ingredient of the speaker’s recent algorithm for the problem. A divide-and-conquer approach requires efficient canonical partitioning of graphs and higher-order relational structures. We shall indicate why Johnson graphs are the sole obstructions to this approach. This talk will be purely combinatorial, no familiarity with group theory will be required.
This talk is the keynote of the conference. Hopefully Babai will update us all on the state of this graph isomorphism result. We have discussed here his partial retraction. I am quite interested in seeing what he has to say about the role of Johnson graphs. These were discovered by Selmer Johnson. They are highly special: they are regular, vertex-transitive, distance-transitive, and Hamilton-connected. I find it very interesting that such special graphs seem to be the obstacle to progress on the isomorphism problem.
Petr Hliněný
A Short Proof of Euler-Poincaré Formula
Abstract: We provide a short self-contained inductive proof of famous Euler-Poincaré Formula for the numbers of faces of a convex polytope in every dimension. Our proof is elementary and it does not use shellability of polytopes.
The paper for this talk is remarkably short, only 3 pages. Of course the result has been around since the 1700s and 1800s, and David Eppstein already has a list of 20 proofs of it, so what is the point? It has to do with ways of proving things and the kind of dialogue we can have with ourselves and/or others about what is needed and what won’t work. Imre Lakatos famously codified this process, with this theorem as a running example conjuring up the so-called Lakatosian Monsters. Perhaps the talk will slay the monsters, but it will have to brave some snow and ice first.
Luke Postle
On the List Coloring Version of Reed’s Conjecture
Abstract: In 1998, Reed conjectured that chromatic number of a graph is at most halfway between its trivial lower bound, the clique number, and its trivial upper bound, the maximum degree plus one. Reed also proved that the chromatic number is at most some convex combination of the two bounds. In 2012, King and Reed gave a short proof of this fact. Last year, Bonamy, Perrett and I proved that a fraction of 1/26 away from the upper bound holds for large enough maximum degree. In this talk, we show using new techniques that the list-coloring versions of these results hold, namely that there is such a convex combination for which the statement holds for the list chromatic number. Furthermore, we show that for large enough maximum degree, a fraction of 1/13 suffices for the list chromatic number, improving also on the bound for ordinary chromatic number. This is joint work with Michelle Delcourt.
Mohit Singh
Nash Social Welfare, Permanents and Inequalities on Stable Polynomials
Abstract: Given a collection of items and agents, Nash social welfare problem aims to find a fair assignment of these items to agents. The Nash social welfare objective is to maximize the geometric mean of the valuation of the agents in the assignment. In this talk, we will give a new mathematical programming relaxation for the problem and give an approximation algorithm based on a simple randomized algorithm. To analyze the algorithm, we find new connections of the Nash social welfare problem to the problem of computation of permanent of a matrix. A crucial ingredient in this connection will be new inequalities on stable polynomials that generalize the work of Gurvits. Joint work with Nima Anari, Shayan Oveis-Gharan and Amin Saberi.
There are two. One is, will we be snowed in or snowed out this Monday? The other is, can some of the open problem raised by these talks be solved?
]]>
Even after today’s retraction of quasi-polynomial time for graph isomorphism
Cropped from source |
László Babai is famous for many things, and has made many seminal contributions to complexity theory. Last year he claimed that Graph Isomorphism (GI) is in quasi-polynomial time.
Today Laci posted a retraction of this claim, conceding that the proof has a flaw in the timing analysis, and Ken and I want to make a comment on what is up. Update 1/10: He has posted a 1/9 update reinstating the claim of quasi-polynomial time with a revised algorithm. As we’ve noted, he is currently speaking at Georgia Tech, and we hope to have more information soon.
Laci credits Harald Helfgott with finding the bug after “spending months studying the paper in full detail.” Helfgott’s effort and those by some others have also confirmed the mechanism of Laci’s algorithm and the group-theoretic analysis involved. Only the runtime analysis was wrong.
Helfgott is a number theorist whose 2003 thesis at Princeton was supervised by Henry Iwaniec with input by Peter Sarnak. Two years ago we discussed his claimed proof of the Weak Goldbach Conjecture, which is now widely accepted.
In December 2015, Laci posted to ArXiv an 89-page paper whose title claimed that GI can be solved in quasi-polynomial time. Recall that means that the algorithm runs in time for some constant . This an important time bound that is above polynomial time, but seems to be the right time bound for many problems. For example, group isomorphism has long been known to be in quasi polynomial time. But the case of graphs is much more complex, and this was reason that Babai’s claimed result was so exciting. We covered it here and here plus a followup about string isomorphism problems that were employed.
He also chose to give a series of talks on his result. Some details of the talks were reported by Jeremy Kun here.
Retracting a claim is one of the hardest things that any researcher can do. It is especially hard to say when to stop looking for a quick-fix and make an announcement. It may not help Laci feel any better, but we note that Andrew Wiles’s original proof of Fermat’s Last Theorem was also incorrect and took 15 months to fix. With help from Richard Taylor he repaired his proof and all is well. We wish Laci the same outcome—and we hope it takes less time.
In particular, his algorithm still runs faster than for any you care to name. For comparison, for more than three decades before this paper, the best worst-case time bound was essentially due to Eugene Luks in 1983. The new bound in full is
for some fixed that will emerge in the revised proof.
The important term is the . The function is exponential in . We previously encountered a recursion involving in the running time of space-conserving algorithms for undirected graph connectivity (see this paper) before Omer Reingold broke through by getting the space down to and (hence) the time down to polynomial. So there is some precedent for improving it.
As things stand, however, GI remains in the “extended neighborhood” of exponential time. Here is how to define that concept: Consider numerical functions given by formulas built using the operations and and . Assign each formula a level by the following rules:
Note that if has level then so does the power for any fixed because . The functions of level include not only all the polynomials but also all quasi-polynomial functions and ones such as , which is higher than quasi-polynomial when .
The amended bound on GI, however, belongs to level , which is what we mean by its staying in the extended neighborhood of exponential time. This is the limit on regarding the amended algorithm as “sub-exponential.”
It also makes us wonder about why it is so difficult to find natural problems with intermediate running times. We can define this notion by expanding the notion of “level” with a new rule for functions that are sufficiently well behaved:
Rule 5 subsumes rules 3 and 4 given that has level and has level . A special case is that when and has level , then has level .
We wonder when and where rule 5 might break down, but we note that careful application of rule 2 for multiplication when expanding a power makes it survive the fact that , , , and so on all have the same level. It enables defining functions of intermediate levels where .
Can the GI algorithm be improved to a level ?
We note one prominent instance of level in lower bounds: Alexander Razborov and Steven Rudich proved unconditionally in their famous “Natural Proofs” paper that no natural proof can show a level higher than for the discrete logarithm problem.
The obvious open problems are dual. Is the amended result fully correct? And can the original quasi-polynomial time be restored in the near future, or at least some intermediate level achieved? We hope so.
[fixed discussion of terms related to , added to the intro an update about the claim being reinstated]
AIP source—see also interview |
Robert Marshak was on hand for Trinity, which was the first detonation of a nuclear weapon, ever. The test occurred at 5:29 am on July 16, 1945, as part of the Manhattan Project. Marshak was the son of parents who fled pogroms in Byelorussia. Witnessing the test, hearing the destruction of Hiroshima and Nagasaki, and knowing his family history led him to become active in advancing peace. He soon co-founded and chaired the Federation of Atomic Scientists and was active in several other organizations promoting scientific co-operation as a vehicle of world peace. In 1992 he won the inaugural award of the American Association for the Advancement of Science for Science Diplomacy.
Today, the fifth day of both Chanukah and Christmas, we reflect on the gift of international scientific community.
International scientific co-operation is a theme of the movie Arrival and the story on which it is based. However, the key plot turn is a personal contact. A new example of the former is a vaccine for the Ebola virus. This item ends with words by Swati Gupta of the Merck pharmaceutical company:
“There’s been a lot of international partners that have come together in a real unprecedented effort.” The magnitude of the outbreak in West Africa, she says, made companies, governments and academic institutions push aside their own research agendas to come together and finish a vaccine.
There are countless other gifts from the former to be thankful for. We however will sing the latter, the personal side, while highlighting the role of shared experience and values in fostering research.
Marshak and his first student, George Sudarshan, worked out the “” (vector minus axial vector) structure needed to describe certain fermion interactions. Recall a fermion is a particle that obeys the Pauli exclusion principle. They published in the proceedings of a 1957 conference in Italy, whereas its second discoverers, Richard Feynman and Murray Gell-Mann, published in a major journal. This bio of Marshak speculates that uncertainty about priority warded off a Nobel Prize, but one can also point to the theory’s incompleteness in describing the weak nuclear force. First, it allowed models that conserve CP. The surprising discovery in 1964 that nature does not conserve CP won a Nobel Prize for James Cronin and Val Fitch. Second, its framework could not adapt to introduce a carrying particle for the weak force, obstructing the renormalization procedure by which predictions at high energies can be calculated. Still, the concept is a standard building block and remains consistent.
Marshak became chair of the University of Rochester physics department where he had started before the war. The same bio credits him with elevating UR to the level of other top-10 physics departments but being unable to land the same caliber of students. Hence he specially reached out to the brightest of India, Pakistan, and Japan in particular. Sudarshan hailed from Kerala, India, and verged on a Nobel later as well.
Marshak was among “approximately six” US scientists who visited the Soviet Union after the death of Josef Stalin in 1953 made contact possible, and he made several return visits in the 1950s. In the 1960s many Rochester colleagues induced him to lead the faculty senate against conservative policies of the university administration. He then became president of CCNY to stir together tuition-freedom and open-admission policies leading to explosive growth and demographic change.
His final years as a professor at Virginia Tech from 1979 were no less active: as president of the American Physical Society he channeled scientific debate about the feasibility of Ronald Reagan’s SDI, and he brokered an exchange agreement with the Chinese Academy of Sciences. He tragically passed away from a swimming accident on December 23, 1992, a day after completing the final corrections of his textbook Conceptual Foundations of Modern Particle Physics.
Sudarshan edited a posthumous book of essays in tribute to Marshak. Its title, A Gift of Prophecy, references both the many correct guesses about foundations of physics that Marshak made and the fruition of international science as he had envisioned it. Its publisher’s blurb begins:
Marshak devoted much of his life to helping other people carry out scientific research and gather to discuss their work.
From having attended many international conferences and workshops, Dick and I know that much else goes on besides “discussing our work.” We will discuss our families, our home towns, our local academic circumstances. We talk about culture and politics, usually treating culture as common and politics as comparative. We even talk about sports. There is time for discussing specific research problems, but conference excursions more often lend themselves to discussing big and general scientific questions. We have and give individual opinions, yes, but what emerges is the realization that we share a common frame of reference.
To say the shared frame is Rationality—versus whatever—would be facile. To me the frame is distinguished most by the absence of negotiations compared to some other kinds of international contacts. Negotiations at their best are non-zero-sum, but at our best the thought of zero-sum never arises. Instead we are all builders, not only of our field but of common understandings.
Within our departments there are negotiations over resources, but they are between subfields not polities, and our students are shielded from them as much as possible. For instance, nothing is vested in whether the Buffalo CSE graduate student association is led by an American or Chinese or Indian or Iranian (or more) and nobody cares because we have built shared experience and know the common work to do. The similarity of academic life in many locales helps us see humanity first and region afterward. Having advanced to the level of international contact makes us de-facto leaders in our fields, and of course we should pursue international initiatives when opportune, but we submit also this thesis:
A robust international “go-alongishness” may prove more enduring and valuable than any one initiative.
I have also been happy to interact some with students in departments abroad, most recently while teaching a short course at the University of Calcutta last August. I was struck by the similarity of the basic outlook also when speaking at a one-day workshop in Pune, India. Is our communality robust enough to stand up to changing political winds? We fervently hope this for the years ahead, as it was Marshak’s hope.
What are your views on the value of community? For one axis of value, conferences basically all no longer have face-to-face program committee meetings since using Internet and e-mail and spreadsheets is so convenient and cost-effective as to outweigh the sometimes-remarked loss of deliberation. But is there any move toward promoting remote participation in the conferences themselves, and what more would be lost in doing so?
We are thankful for our many friends in our community and wish all of you the best in the coming New Year.
[Added 1/18/17: This was my 100th or 99th singly-authored post on this blog, depending on which WordPress count one believes…]
Anatoly Karatsuba and Sergei Voronin wrote a book on Bernhard Riemann’s zeta function. The book was translated into English by Neal Koblitz in 1992. Among its special content is expanded treatment of an amazing universality theorem about the classic zeta function proved by Voronin in 1975. We covered it four years ago.
Today Ken and I take a second look and explore a possible connection to complexity theory.
Although the theorem has been known for over forty years there is more to say about it. Voronin extended it to other Dirichlet -functions in ways improved by Bhaskar Bagchi, who also proved universality for the derivative of zeta. Within the past fifteen years, Ramūnas Garunkštis wrote two papers improving the effectiveness of the bounds in the theorem, and Jörn Steuding wrote two papers bounding the density of real values such that approximates a given complex function . Both were part of a 2010 followup paper. Steuding also wrote a wonderful set of notes on the theorem. A 2011 paper by Chris King with beautiful graphic images expands a short 1994 preprint by Max See Chin Woon on how encodes universal information, an aspect highlighted in our earlier post and on Matthew Watkins’s nice page on Voronin’s theorem.
I have known about the theorem for much of this time, but just recently thought there might be some natural way to exploit it to get computational complexity information about the zeta function. Is it surprising to think of such a link? Karatsuba himself emblemizes it, because he devised the first algorithm for multiplying two -digit integers that beats the grade-school method. The study of is so rich that a wider translation of complexity aspects could have profound impact.
Recall the famous zeta function is defined for complex with real part by the summation:
And the zeta is extended to all complex numbers except at where it has a pole, via the Riemann functional equation.
The zeta function holds the secrets to the global behavior of the primes. Even the seemingly weak statement that it does not vanish on complex with is enough to prove the Prime Number Theorem. This theorem says that the number of primes less than is approximately equal to .
The critical section of the zeta function is the strip of with real part between and . The critical line is the middle part where .
The famous Riemann Hypothesis states that all complex zeroes of the zeta function lie on the critical line. This problem remains wide open, although it has been claimed many times. Even proving that no zero lies in the region with real part in seem hopeless. But we can all hope for surprises. Perhaps 2017 will be the magical year for a breakthrough, since it is prime and the only prime between 2011 and 2027.
Let’s turn to a formal statement of Voronin’s Theorem (VT):
Theorem 1 Let there exists such that
Voronin’s book with Karatsuba starts by proving this with the natural logarithm in place of , taking the branch of that is real for real . Then is allowed to have zeroes on the disc. Antanas Laurinčikas has extended universality to some other functions besides .
Really Voronin proved even more: for every :
where is Lebesgue measure, with a similar statement for .
§
If the above statements seem a bit technical let’s try a more informal notion of what VT says. Let be some infinite binary string:
We might agree to call it “universal” provided for any finite string there is a so that
Clearly such universal strings easily exist. Even stronger we could insist that there not only be one that makes the above true, but that there are an infinite number of ‘s. Even stronger we could insist that the set
has positive lower density. That is, there is some such that for all sufficiently large , the number of in the set that occur in the interval is at least .
What VT says is essentially a modification of this simple fact about strings. The infinite string is the values of the zeta function of the form where runs over real numbers. The finite string is replaced by a smooth function defined on a fixed size disc. And exact equality becomes approximation up to some .
What is exciting is not that this is possible but that it happens for a natural and very important particular function: the zeta function. Although universality was long known as an existential phenomenon of power series in analysis, Steuding’s notes remark (emphasis ours):
The Riemann zeta-function and its relatives are so far the only known explicit examples of universal objects.
One easy application of universality is that with positive density in the critical strip one can compute to within any in time . This is simply because one can take to be the constant function on the disk. Or one can take or for any fixed constant .
Well, just not : if could approximate arbitrarily closely in a disk centered on the real part being then by analyticity the Riemann Hypothesis would be false. One can, however, use to stand for when is given, or handle directly using in the strip. Perhaps what follows is better understood using after all, which is how Karatsuba and Voronin develop the proofs in their book.
For different constants we get different sets of such that approximates within . And we get different sets such that approximates within . Our first question is:
How do the sets relate to each other? Likewise the sets ?
Most in particular, can we “add” these sets? What we mean is that there should be an effective procedure such that given specifications for and for outputs a specifier for . The specifiers could just be members of the sets. Using enables us to do this for , though the following purpose can work with positive integers only.
Next, can we “multiply”? That is, can we find an effective procedure that given and outputs a specifier for ?
Now we can tell where this is going. We will build up arithmetical formulas and circuits for functions and using constants and gates. Given specifiers and , we want:
We will need to start with some basic functions beginning with the identity function . Our functions are formally defined on the complex disc but it is easy to represent functions on over this domain. The main limitation noted by Garunkštis is that his effectivizations for apply only to discs of small radius , but any radius seems good for encoding discrete functions.
We certainly expect to be available. The proofs of universality all work by representing as an infinite series. One can add the terms for and to get a series for . The multiplication case would then involve products of infinite series but perhaps they are well-behaved. The hardest challenge may be to make this work neatly when the tokens give only partial information about and . We may also wish to implement some recursion operations besides and . But once we have—if we have—a rich enough set of effective operations, we can program the whole panoply of computable functions into . Note that these representations are already known to exist for any ; we are just talking about how easy it is to program them.
The potential payoff is, perhaps questions about whether certain complex arithmetic functions have small circuits or formulas can be attacked via the resulting constraints on the behavior of . Ultimately what is going on is a kind of discrete programming using the prime numbers that go into product formulas for . For and arguments with there is the particularly nice summation formula
where ranges over all primes, which is then continued inside the strip. The point is that channels a “natural” way to do the programming. Lots of tools of analysis might be available for complexity questions framed this way.
Well, this might be a pipe dream—especially since we can’t yet even tell whether blows up in the kind of discs we are considering—but this is the week when Santa smokes his pipe preparing for a long journey and lots of children have dreams.
Does our “book of zeta” idea seem feasible to compile?
]]>
Lessons from the Park that still apply today
Iain Standen is the CEO of the Bletchley Park Trust, which is responsible for the restoration of the Park. After the war, the Park was almost completely destroyed and forgotten—partially at least for security reasons. Luckily it was just barely saved and is now a wonderful place to visit and see how such a small place helped change history.
Today I would like to report on a recent trip to Bletchley Park.
You probably know that Bletchley Park was the place where during World War II the British, with help from the Poles, were able to break the Enigma Machine. This machine, which had various models and versions, was the main one used by the Germans to encrypt and decrypt messages during the war. The ability to read their encrypted messages was invaluable to the war effort, and it is claimed that perhaps millions of lives were saved.
My wife, Kathryn Farley, along with her brother Andrew and I were in London recently. During our stay Kathryn set up a day trip to Bletchley Park, which is over an one hour car ride from where we were staying in London. This was a tremendous experience and I would definitely suggest getting to the Park if you can.
Bletchley Park consisted of the main house and a number of “huts.” The latter were primitive buildings that were needed as the number of workers grew rapidly during the war. Huts were numbered and their number became strongly associated with the work that was done there.
We will call it the Park, but the actual names used during the war include:
Some of the huts:
I have argued before here that one of the biggest breakthroughs in modern cryptography is the treatment of messages not as a series of letters, but rather treating them as a whole number. This leap from separate letters to a whole number, I believe, is instrumental in enabling researchers to create modern codes: where would we be if we still though of messages as series of letters? How could one even think of methods based on Elliptic Curves?
Jumping back over 60 years we see that the Enigma machines indeed viewed messages as a series of 26 letters. They added simple rules for punctuation: a common rule was that “See you at noon” would become “SEEXYOUXATXNOON”.
Each letter was effectively scrambled by a permutation on 26 letters. Of course the number of such permutations is which is already a huge number. Yet a code that only used one permutation to encrypt messages would easily be broken. One way to see this is to realize that cryptogram puzzles occur every day in most newspapers, and you are expected to be able to solve them in minutes.
The reason for their weakness is that messages are usually from a languages, such as German in WW II, whose tremendous redundancy makes a single substitution code easily breakable. What the Enigma did was change the permutation used from letter to letter in a wider-ranging manner than any poly-substitution cipher had done before: The first letter used some permutation, this then was changed to a new permutation for the second letter, and so on. The actual way the Enigma machines moved from one permutation to the next was based on a clever use of mechanical wheels, called rotors. How the workings of how these rotors changed the permutations is the key—bad pun—to why Enigma machines were hard to break. The Germans thought their complex motion made the machines unbreakable, but the work at Bletchley Park proved them wrong.
Here is a schematic of how current flowed through the rotors and could be changed by a key-press to turn on a light. The lighted letter was the encryption of the pressed letter. For more details see this:
The fundamental reason I think the breaking of Enigma is still interesting today, over
70 years later, is that it contains simple lessons even for modern “unbreakable” codes. Here are some of the lessons:
Key Size: The Enigma machines had a huge-size key, since the key included the choice of rotors choice, the rotors’ positions, the plugboard settings, and more. Any attack that was brute-force or even near brute force was hence doomed, and even today it would fail. But of course the key size means nothing if there is an attack that avoids trying all the keys.
Operator Error: The Enigma machines were often misused in practice. The operators often violated simple rules in a way that made the security vastly lower. Examples include re-sending messages over with the same key—called a depth—and using shortcuts in selecting settings of the rotors. For instance, often the rotor settings were only slightly changed from one day to the next. Note, one could argue that operator error today is still happening. It also includes poor implementations of the codes: there are attacks on “unbreakable” modern codes that work because of bad implementations, even with RSA. One of my favorites is the attack that is sometimes called The Bellcore Attack. This exploits errors in the execution of a code. Okay, I was involved in creating this attack along with Dan Boneh and Rich DeMillo.
Hidden Design Flaws: The Enigma machines had a fundamental design flaw. This flaw leaked a fair amount of information, that could be and was exploited in the attacks. The flaw was this property of the Enigma machines:
If the letter was encoded into , it is always the case that and are distinct letters.
Put another way: no letter ever encrypts to the itself. This was a tremendous mistake that helped break the whole complex system. One famous example was the following: It was noted that an encrypted message had a long run that had no occurrence of the letter “L.” The only way this could have happened with any reasonable probability is if the message was a series of “L”’s. This enabled a break into that key. The operator had been testing the system and just kept repeating the same key “L” over and over.
Even apart from such mistakes, the system arguably leaked bits of information per letter. To find a key needing bits to specify, as little as letters of ciphertext could suffice to determine it. Of course it still took incredible work to extract the key, but Alan Turing’s stroke of genius was how to automate the kind of “puzzle-solving” needed for such tasks.
New Technology: The Enigma machines were actually fairly secure, but the Germans did not envision that the attackers would use a machine to break their machines. These machines, called “bombes,” were critical to the success at the Park. Today the analogy might be that someone already has a working quantum computer and can break any code that depends on discrete log or on factoring. How do we know that our “unbreakable codes” have not already been cracked? Indeed.
A picture of a “bombe” at the Park. These were created to break the Enigma machines:
A curiosity: it is believed that after the war all of the bombes at the Park were destroyed. The ones on display are reproductions. Our tour guide told us that there is a folk belief that perhaps there is an original bombe hidden somewhere on the grounds of the Park. Perhaps during the on-going renovations a bombe will discovered hidden under a floor or in a wall. Who knows—the Park may have more secrets yet to be uncovered.
I have wondered if our current codes will be looked back on one day and seen to be easy to break. What do you think?
Baku Olympiad source—note similarity to this |
Magnus Carlsen last week retained his title of World Chess Champion. His match against challenger Sergey Karjakin had finished 6–6 after twelve games at “Standard” time controls, but he prevailed 3–1 in a four-game tiebreak series at “Rapid” time controls. Each game took an hour or hour-plus under a budget of 25 minutes plus 10 extra seconds for each move played.
Today we congratulate Carlsen and give the second half of our post on large data being anomalous.
According to my “Intrinsic Performance Ratings” (IPRs), Carlsen played the tiebreak games as trenchantly as he played the standard games. I measure his IPR for them at 2835, though with wider two-sigma error bars +- 250 than the 2835 +- 135 which I measured for the twelve standard games. Karjakin, however, played the rapid games at a clip of 2315 +- 340, significantly below his mark of 2890 +- 125 for the regular match. The combined mark was 2575 +- 215, against 2865 +- 90 for the match. It must be said that of course faster chess should register lower IPR values. My preliminary study of the famous Melody Amber tournaments, whose Rapid sections had closely similar time controls, finds an overall dropoff slightly over 200 Elo points. Thus the combined mark was close to the expected 2610 based on the average of Carlsen’s 2853 rating and Karjakin’s 2772. That Carlsen beat his 2650 expectation, modulo the error bars, remains the story.
Carlsen finished the last rapid game in style. See if you can find White’s winning move—which is in fact the only move that avoids losing:
The win that mattered most, though, was on Thanksgiving Day when Carlsen tied up the standard match 5–5 with a 75-move war of attrition. The ChessGames.com site has named it the “Turkey Grinder” game. On this note we resume talking about some bones to pick over “Big Data”—via large data taken using the University at Buffalo Center for Computational Research (CCR).
If you viewed the match on the official Agon match website, you saw a slider bar giving the probability for one side or the other to win. Or rather—since draws were factored in—the stands for the points expectation, which is the probability of winning plus half the probability of drawing. This is computed as a function of the value of the position from the player’s side. The beautiful fact—which we have discussed before in connection with a 2012 paper by Amir Ban—is that is an almost perfect logistic curve. Here is the plot for all available (AA) games at standard time controls in the years 2006–2015 with both players within 10 Elo points of the Elo 2000 level:
The “SF7d00” means that the chess program Stockfish 7 was run in Multi-PV mode to a variable depth between 20 and 30 ply. My scripts now balance the total number of positions searched so that endgame positions with fewer pieces are searched deeper. “LREG2” means the generalized logistic curve with two parameters. Using Wikipedia’s notation, I start with
and fix to symmetrize. Then is basically the chance of throwing away a completely winning game—and by symmetry, of winning a desperately lost game.
Chess programs—commonly called engines—output values in discrete units of called centipawns (cp). Internally they may have higher precision but their outputs under the standard UCI protocol are always whole numbers of cp which are converted to decimal for display. They often used to regard or as the value for checkmate, but has become standard. I still use as cutoffs and divide the -axis into “slots”
Positions of value beyond the cutoff belong to the end slots. Under a symmetry option, a position of value goes into both the slot for the player to move and the slot for the opponent. This is used to counteract the “drift” phenomenon discovered in this paper with my students that the player to move has a 2–3% lower expectation across all values—evidently because that player has the first opportunity to commit a game-chilling blunder.
The “b100” means that adjacent slots with fewer than 100 moves are grouped together into one “bucket” whose value is the weighted average of those slots. Larger slots are single buckets rather than divided into buckets of 100. The end slots and zero (when included) are single buckets regardless of size. Finally, the number after “sk” for “skedasticity” determines how buckets are weighted in the regression as I discuss further on.
The -value of a bucket is the sum of wins plus half of draws by the player enjoying the value (whose turn it might or might not be to move) divided by the size of the bucket. This is regressed to find most closely giving . The slope at zero is . The quantity
gives the expectation when ahead—figuratively the handicap at odds of a pawn. Note how close this is to 70% for players rated 2000.
The fit is amazingly good—even after allowing that the value, so astronomically close to , is benefiting from the correlation between positions from the same game, many having similar values. Not only does it give the logistic relationship the status of a natural law (along lines we have discussed), but also Ban argues that chess programs must conform to it in order to maximize the predictive power of the values they output, which transmutes into playing strength. The robustness of this law is shown by this figure from the above-linked paper—being rated higher or lower than one’s opponent simply shifts the curve left or right:
This is one of several reasons why my main training set controls by limiting to games between evenly-rated players. (The plots are asymmetric in the tail because they grouped buckets from up to rather than come in from both ends as the present ones do.)
Most narrowly to our goal, the value determines the scale by which increases in value translate into greater expectation, more directly than quantities like or . Put simplistically, if a program values a queen at 10 rather than 9, one might expect its “” to adjust by 9/10. Early versions of Stockfish were notorious for their inflated scale. The goal is to put all chess programs on a common scale by mapping all their values to points expectations—and Ban’s dictum says this should be possible. By putting sundry versions of Stockfish and Komodo and Houdini (which placed 2nd to Stockfish in the just-concluded ninth TCEC championship) on the same scale as my earlier base program Rybka 3, I should be able to carry over my model’s trained equations to them in a simple and direct manner. Here is the plot for Komodo 10’s evaluations of the same 100,000+ game positions:
The fit is just as fine. The values are small and equal to within so they can be dismissed. The values are for Komodo against for Stockfish, giving a ratio of about . The evaluations for 70% expectation, for Komodo and for Stockfish, have almost the same ratio to three decimal places. So we should be able to multiply Komodo’s values by 1.046 and plug them into statistical tests derived using Stockfish, right?
The error bars of on Komodo’s , which are two-sigma (a little north of “95% confidence”), give some pause because they have wiggle. This may seem small, but recall the also-great fit of the linear regression from (scaled) player error to Elo rating in the previous post. Under that correspondence, 2% error translates to 2 Elo points for every 100 below perfection—call that 3400. For Carlsen and Karjakin flanking 2800 that means only Elo but grows to for 2000-level players. Here is a footnote on how the “bootstrap” results corroborate these error bars and another data pitfall they helped avoid.
But wait a second. This error-bar caveat is treating Komodo’s as independent from Stockfish’s . Surely they are completely systematically related. Thus one should just be able to plug one into the other with the conversion factor and get the same proportions everywhere, right? The data is huge and both the logistic and ASD-to-Elo regressions this touches on have and the force of natural law. At least the “wiggle” can’t possibly be worse than these error bars say, can it?
Here are side-by-side comparison graphs with Stockfish and Komodo on the same set of positions played by players within 10 Elo points of 1750.
Now the Komodo is lower. Here is a plot of the -values for Komodo and Stockfish over all rating levels, together with the Komodo/Stockfish ratio:
The ratio waddles between 0.96 and 1.06 with a quick jag back to parity for the 2700+ elite players. Uncertainty speaks a gap of 5 Elo points for every 100 under perfection, which makes a considerable 70–point difference for Elo-2000 players.
Well, we can try clumping the data into huger piles. I threw out data below 1600 and the 2800 endpoint—which has lots of Carlsen but currently excludes Karjakin since his 2772 is below 2780. I combined blocks of four levels at 1600–1750, 1800–1950, up to 2600–2750, and quadrupled the bucket size to match. Here is the plot for 2200–2350, with a move-weighted average of 2268:
With over 500,000 data points, mirrored to over a million, can one imagine a more perfect fit to a logistic curve? With Stockfish the value even prints as unity. And yet, this is arguably the worst offender in the plot of over these six piles:
The point for 2600–2750 goes down. It is plotted at 2645 since there are far many more 2600s than 2700s players, and it must be said that the 2400–2550 pile has its center 2488 north of 2475 because 2550 included all years whereas the 2000–2500 range starts in the year 2006. But the data point for 2200–2350 is smack in the middle of this range. Why is it so askew that neither regression line comes anywhere near the error bars for the data taken with the respective engine?
Getting a fixed value for the ratio is vital to putting engines on a common scale that works for all players. The above is anything but—and I haven’t even told what happens when Rybka and Houdini enter the picture. It feels like the engines diverge not based on their evaluation scales alone but on the differences in their values for inferior moves that human players tend to make, differences that per the part-I post correspond almost perfectly to rating. Given Amir Ban’s stated imperative to conform any program’s values to a logistic scale in order to maximize its playing strength, and the incredible fit of such a scale at all individual rating levels, how can this be?
I get similar wonkiness when I try to tune the ratio internally in my model, for instance to equalize IPRs produced with Komodo and Stockfish versions to those based on Rybka 3. There is also an imperative to corroborate results obtained via one engine in my cheating tests by executing the same process with test data from a different engine. This has been analogized to the ‘A’ and ‘B’ samples in doping tests for cycling, though those are taken at the same time and processed with the same “lab engine.”
I had hoped—indeed expected—that a stable conversion factor would enable the desirable goal of using the same model equations for both tests. I’ve become convinced this year that instead it will need voluminous separate training on separate data for each engine and engine version. A hint of why comes from just looking at the last pair of Komodo and Stockfish plots. All runs skip the bucket for an exact 0.00 value, which by symmetry always maps to 0.50. Its absence leaves a gap in Komodo’s plot, meaning that Komodo’s neighboring nonzero values carry more weight of imbalance in the players’ prospects than do 0.01 or -0.02 etc. coming from Stockfish. The data has 48,693 values of 0.00 given by Komodo 10 to only 43,176 given by Stockfish 7. Whereas, Komodo has only 42,350 values in the adjacent ranges -0.10 to -0.01 and +0.01 to +0.10 (before symmetrizing) to 47,768 by Stockfish. The divergence in plot results may be amplified by the “firewall at zero” phenomenon I observed last January. The logistic curves are dandy but don’t show the cardinalities of buckets, nor other higher-moment effects.
In the meantime I’ve been using conservative ratios for the other engines relative to Rybka. For example, my IPRs computed in such manner with Komodo 10 are:
These are all 70–100 and so lower than the values I gave using Rybka. Critics of the regular match games in particular might agree more with these than my higher official numbers, but this needs to be said: When I computed the Rybka-based IPR for the aggregate of moves in all world championship matches since FIDE’s adoption of Elo ratings in 1971, and compared it with the move-weighted average of the Elo ratings of the players at the time of each match, the two figures agreed within 2 Elo points. Similarly weighting the IPRs for each match in my compendium gives almost the same accuracy.
That buttresses my particular model, but the present trouble happens before the data even gets to my model. Not even the scaling stage discussed in the last post is involved here. This throws up a raw existential question.
Much of data analytics is about “extracting the signal form the noise” when there is initially a lot of noise. Multiple layers of standard filters are applied to isolate phenomena. But here we are talking about raw data—no filters. All we have observed are the smooth linear correspondence between chess rating and average loss of position value and the even more perfect logistic relation between position value and win/draw/loss frequency. All we did was combine these two relations. The question is:
How did I manage to extract so much noise from such nearly-perfect signals?
Can you see an explanation for this wonkiness in my large data? What caveats for big-data analytics does it speak?
The chess answer is that Carlsen played 50.Qh6+!! and Karjakin instantly resigned, seeing Kxh6 51. Rh8 mate, and that after 50…gxh6 the other Rook drives home with 51. Rxf7 mate.
Update 12/11/16: Here is a note showing what happens when all drawn games are removed. The data point for 2200–2350 is even more rogue…
[fixed point placement in last figure, added “Baku Olympiad” to first caption, some word changes, added update and acknowledgment]