From Flanders Today src1 and Ryle Trust Lecture src2 |
Baron Jean Bourgain and Sir Michael Atiyah passed away within the past three weeks. They became mathematical nobility by winning the Fields Medal, Atiyah in 1966 and Bourgain in 1994. Bourgain was created Baron by King Philippe of Belgium in 2015. Atiyah’s knighthood did not confer nobility, but he held the dynastic Order of Merit, which is limited to 24 living members and has had fewer than 200 total since its inception in 1902. Atiyah had been #2 by length of tenure after Prince Philip and ahead of Prince Charles.
Today we discuss how they ennobled mathematics by their wide contributions.
Bourgain was affiliated to IAS by the IBM John von Neumann Professorship. He had been battling cancer for a long time. Here is the middle section of the coat of arms he created for his 2015 investiture:
Detail from IAS source |
The shield shows the beginning of an Apollonian circle packing, in which every radius is the reciprocal of an integer. This property continues as circles are recursively inscribed in the curvilinear regions—see this 2000 survey for a proof. To quote Bourgain’s words accompanying his design:
The theory of these [packings] is today a rich mathematical research area, at the interface of hyperbolic geometry, dynamics, and number theory.
Bourgain’s affinity to topics we hold dear in computing theory is shown by this 2009 talk titled, “The Search for Randomness.” It covers not only PRNGs and crypto but also expander graphs and succinctness in quantum computing. He has been hailed for diversity in other mathematical areas and editorships of many journals. We will talk about a problem in analysis which he helped solve not by analytical means but by connecting the problem to additive combinatorics.
Sōichi Kakeya posed the problem of the minimum size of a subset of in which a unit-length needle can be rotated through 360 degrees. Abram Besicovitch showed in 1928 that such sets can have Lebesgue measure for any . He had already shown that one can achieve measure zero with a weaker property, which he had used to show a strong failure of Fubini’s theorem for Riemann integrals:
For all there is a measure-zero subset of that contains a unit line segment in every direction.
The surprise to many of us is that such strange sets would have important further consequences in analysis. A 2008 survey in the AMS Bulletin by Izabella Łaba, titled “From Harmonic Analysis to Arithmetic Combinatorics,” brings out breakthrough contributions by Bourgain to conjectures and problems that involve further properties of these sets, which seem to retain Kakeya’s name:
Conjecture: A Kakeya set in must have Hausdorff dimension .
This and the formally weaker conjecture that the set must have Minkowski dimension are proved in but open for all . Bourgain first proved that the restriction conjecture of Elias Stein, which is about extensions of the Fourier transform from certain subspaces of functions from to to operators from to functions on , implies the Kakeya conjecture. It is likewise open for . As Łaba writes, associated estimates “with require deeper geometrical information, and this is where we find Kakeya sets lurking under the surface.”
What Bourgain showed is that the restriction estimates place constraints on sets of lower Hausdorff dimension that force them to align “tubes” along discrete directions that can be approximated via integer lattices. This led to the following “key lemma”:
Lemma 1 Consider subsets of , where and are finite subsets of , and define
For every there is such that whenever , where , we have .
To quote Łaba: “Bourgain’s approach, however, provided a way out. Effectively, it said that our hypothetical set would have structure, to the extent that many of its lines would have to be parallel instead of pointing in different directions. Not a Kakeya set, after all.” She further says:
Bourgain’s argument was, to this author’s knowledge, the first application of additive number theory to Euclidean harmonic analysis. It was significant, not only because it improved Kakeya bounds, but perhaps even more so because it introduced many harmonic analysts to additive number theory, including [Terence] Tao who contributed so much to the subject later on, and jump-started interaction and communication between the two communities. The Green-Tao theorem [on primes] and many other developments might have never happened, were it not for Bourgain’s brilliant leap of thought in 1998.
Among many sources, note this seminar sponsored by Fan Chung and links from Tao’s own memorial post.
Michael Atiyah was also much more than an analyst—indeed, he was first a topologist and algebraic geometer. He was also a theoretical physicist. Besides all these scientific hats, he engaged with society at large. After heading Britain’s Royal Society from 1990 to 1995, he became president of the Pugwash Conferences on Science and World Affairs. This organization was founded by Joseph Rotblat and Bertrand Russell in the 1950s to avert nuclear war and misuse of science, and won the 1995 Nobel Peace Prize.
The “misuse of science” aspect comes out separately in Atiyah’s 1999 article in the British Medical Journal titled, “Science for evil: the scientist’s dilemma.” It lays out a wider scope of ethical and procedural concerns than the original anti-war purpose. This is furthered in his 1999 book chapter, “The Social Responsibility of Scientists,” which laid out six points including:
As he says in its abstract:
In my own case, after many years of quiet mathematical research, working out of the limelight, a major change occurred when unexpectedly I found myself president of the Royal Society, in a very public position, and expected to act as a general spokesman for the whole of science.
Within physics and mathematics, he also ventured into a debate that comes closer to the theory-as-social-process topic we have discussed on this blog. In 1994 he led a collection of community responses to a 1993 article by Arthur Jaffe and Frank Quinn that began with the question, “Is speculative mathematics dangerous?” Atiyah replied by saying he agreed with many of their points, especially the need to distinguish between results based on rigorous proofs and heuristic arguments,
…But if mathematics is to rejuvenate itself and break exciting new ground it will have to allow for the exploration of new ideas and techniques which, in their creative phase, are likely to be as dubious as in some of the great eras of the past. …[I]n the early stages of new developments, we must be prepared to act in more buccaneering style.
Now we cannot help recalling his claim last September of heuristic arguments that will build a proof of the Riemann Hypothesis, which we covered in several posts. As we stated in our New Year’s post, nothing more of substance has come to our attention. We do not know how much more work was done on the promised longer paper. We will move toward discussing briefly how his most famous work is starting to matter in algorithms and complexity.
We will not try to go into even as much detail as we did for Kakeya sets about Atiyah’s signature contributions to topological K-theory, physical gauge theory, his celebrated index theorem with Isadore Singer, and much else. But we can evoke reasons for us to be interested in the last. We start with the simple statement from the essay by John Rognes of Oslo that accompanied the 2004 Abel Prize award to Atiyah and Singer:
Theorem 2 Let be a system of differential equations. Then
Here the analytical index equals the dimension of the kernel of minus the dimension of the co-kernel of , which (again quoting Rognes) “is equal to the number of parameters needed to describe all the solutions of the equation, minus the number of relations there are between the expressions .” The topological index has a longer laundry list of items in its definition, but the point is, those items are usually all easily calculable. It is further remarkable that in many cases we can get without knowing how to compute and individually. The New York Times obituary quotes Atiyah from 2015:
It’s a bit of black magic to figure things out about differential equations even though you can’t solve them.
One thing it helps figure out is satisfiability. Besides cases where knowing the number of solutions does help in finding them, there are many theorems that needed only information about the number and the parameterization.
We have an analogous situation in complexity theory with the lower bound theorem of Walter Baur and Volker Strassen, which we covered in this post: The number of multiplication gates needed to compute an arithmetical function is bounded below by a known constant times the log-base-2 of the maximum number of solutions to a system formed from the partial derivatives of and a certain number of linear equations, over cases where that number is finite. Furthermore, both theorems front on algebraic geometry and geometric invariant theory, whose rapid ascent in our field was witnessed by a workshop at IAS that we covered last June. That workshop mentioned not only Atiyah but also the further work in algebraic geometry by his student Frances Kirwan, who was contemporaneous with Ken while at Oxford. Thus we may see more of the kind of connections in which Atiyah delighted, as noted in current tributes and the “matchmaker” label which was promoted at last August’s ICM.
Our condolences go out to their families and colleagues.
[more tribute links]
]]>Cropped from Toronto Star source |
Isaac Asimov was a prolific writer of science fiction and nonfiction. Thirty-five years ago, on the eve of the year 1984, he noted that 35 years had passed since the publication of George Orwell’s 1984. He wrote an exclusive feature for the Toronto Star newspaper predicting what the world would be like 35 years hence, that is, in 2019.
Today we give our take on his predictions and make our own for the rest of 2019.
Asimov’s essay began by presupposing the absence of nuclear holocaust without predicting it. It then focused on two subjects: computerization and use of outer space. On the spectrum of evaluations subtended by this laudatory BBC piece and this critical column in the Toronto Star itself, we’re closer to the latter. On space he predicted we’d be mining the Moon by now; instead nothing more landed on the Moon until the Chinese Chang’e 3 mission in 2013 and Chang’e 4 happening now. His 35-year span should be lengthened to over a century.
On computerization and robotics he was mostly right except again for the timespan: he said the transition would be “about over” by 2019 whereas it may be entering its period of greatest flux only now. However, for the end of 1983 we think the “whats” of his predictions were easy. Personal computers had already been around for almost a decade. Computer systems for business were plentiful. The Internet was already a proclaimed goal and the text-based Usenet was already operating. Asimov’s essay seems to miss how the combination of these three would soon move points of control outward to end-users.
We still think what he wrote about space and robots will happen. This shows the problem of predictions is not just ‘what’ but ‘when.’ For another instance of being wrong on ‘when’ too soon, Ken told a Harvard Law graduate who visited him in Oxford in 1984 that what we now call deepfake videos were imminent. We’ll make the rest of this post more about ‘when’ than ‘what.’
Here are some predictions that we have made before. Seems we did not make any new predictions last year—oh well—but see this.
No circuit lower bound of or better will be proved for SAT. Well that’s a freebie.
A computer scientist will win a Nobel Prize. No—indeed, less close than other years.
At least five claims that and five that will be made.
A “provably” secure crypto-system will be broken. For this one we don’t have to check any claims. We just pocket the ‘yes’ answer. Really, could you ever prove the opposite? How about the attack on Diffie-Hellman in the current CACM?
An Earth-sized planet will be detected orbiting within the habitable zone of its single star. The “when” for this one came in 2017 already. We are retiring it.
A Clay problem will be solved, or at least notable progress made. Again we sense that the answer on progress is “no.” This includes saying that nothing substantial seems to have emerged from Sir Michael Atiyah’s claim of proving the Riemann Hypothesis. However, we note via Gil Kalai’s blog that a longstanding problem called the -conjecture for spheres has been solved by Karim Adiprasito.
We will add some new predictions—it seems unfair to keep repeating sure winners.
Deep learning methods will be found able to solve integer factoring. This will place current cryptography is trouble.
Deep learning methods will be found to help prove that factoring is hard.
These may not be as contradictory as they seem. There is a long-known connection between certain learning algorithms and the natural proofs of Alexander Razborov and Stephen Rudich. The hardness predicate at the core of a natural proof is a classifier to distinguish (succinct) hard Boolean functions from easy ones. There is a duality between upper and lower bounds that in particular leads to the unconditional result that the discrete log problem, which is related to factoring and equally amenable to Peter Shor’s famous polynomial-time quantum algorithm, does not have natural proofs of hardness—because their existence would make discrete log relatively easy.
Talking about quantum, we predict:
Quantum supremacy will be proved—finally. But be careful: there is a problem with this whole direction. See the next section.
An algorithm originating in a theoretical model will be enshrined in law.
There are several near-term opportunities for this. The Supreme Court yesterday agreed to hear two cases on partisan gerrymandering, at least one of which promises to codify an algorithmic criterion for excessive vote dilution. Maine adopted a automatic-runoff voting system whose dependence on computer implementation gave grounds for an unsuccessful lawsuit. Algorithmic fairness is a burgeoning area which we discussed a year-plus ago. Use of differential privacy by the U.S. Census could involve legislation. We distinguish legal provisions from the myriad problematic uses of algorithmic models in public and private policy ranging from credit evaluations to parole decisions to college admissions and much else.
The lines between heuristically solvable and really hard problems will become clearer. We have previously opined that the great success of SAT solvers in particular renders the question moot for many purposes. Well, now we say the opposite: SAT solvers will hit a wall.
Ken recently attended a workshop in central New York that aimed to bring together researchers in many fields working on quantum devices. Materials for the workshop led off with the question of building quantum computers and highlighted Gil Kalai’s skeptical position in particular. An eight–part debate between him and Aram Harrow which we hosted in 2012 involved also John Preskill and ended with a discussion of quantum supremacy, a term advanced that year by Preskill. The workshop preferred the term quantum advantage. We interpret these terms as having the following distinction:
As theoreticians we tend to think about (a) but many businesses and public-sector organizations would be ecstatic to have (b) in important applications.
A new angle on (a) was shown by the new construction by Ran Raz and Avishay Tal of an oracle such that is not in . This was hailed as the “result of the year” by Lance Fortnow (his second and our first is this progress on the Unique Games Conjecture), and Scott Aaronson furnished a great discussion of its genesis and further ramifications in complexity theory. Several popular articles tried to pump this as non-oracle evidence for (a). But there is the over-arching problem:
We know but we don’t know .
So how are we ever going to be able to prove any form of supremacy? Even if we replace ‘polynomial time’ as our definition of ‘feasible’ by something more concrete, how can we prove that successful classical heuristics do not exist? On a certain practical problem of general import, Ewin Tang, a teenager in Texas advised by Scott, designed an improved classical algorithm for low-rank matrix completion that eliminated a previous quantum exponential advantage in the time dependence on the rank parameter. It is not just a case of whether we can prove supremacy, but judging when general quantum computers will be built to realize it.
Whereas, the when involved in (b) is now. If a quantum device can do something useful now that classical methods are not delivering now, then it does not matter if the latter could be improved at greater hardware and development cost to work a year from now. This has been the gung-ho tenor of many responses to the recently-signed National Quantum Initiative Act. We do, however, still need to find and build said devices…
As for the status of (a), we don’t know any better thought for January than the Janus-like title of this paper by Igor Markov, Aneeqa Fatima, Sergei Isakov, and Sergio Boixo:
“Quantum Supremacy Is Both Closer and Farther than It Appears.”
What are your predictions for 2019? What are the most important matters we’ve left unsaid?
[added some words to end of intro]
]]>Wikimedia Commons source |
Knecht Ruprecht accompanies Santa Claus in Germany. He brings gifts to good children but lumps of coal to naughty ones. He is regarded more generally as the German counterpart to England’s Robin Goodfellow, aka. Puck. The Simpsons’ dog “Santa’s Little Helper” is named “Knecht Ruprecht” in the show’s German edition.
Today we do a nice-or-naughty riff on technological gifts suggested by yesterday’s ACM TechNews mailing.
The ACM mailings highlight the achievements of the whole field: from quantum to everything else. We thought it might be fun to be a bit puckish ourselves and deliver some “coal” to ACM. The stories can be sometimes a bit much. We hope that all involved are in good spirits and accept the “coal” as a holiday-inspired gift—with some echo of the general discussion about naughty-or-nice effects of tech advances.
Here are some that could be reported in the near future. The originals are here.
Real-Time Readouts of Thinking in Faculty.
Mighty News
December 19, 2018
Researchers from a university consortium have developed an open source system delivering fast, precise neural decoding and real-time readouts of where CS faculty think they are. The neural decoding software decrypts hippocampal spatiotemporal patterns detected from tetrode recordings without requiring spike sorting, an error-prone computational process. Implementing this software on a graphical processing unit (GPU) chip demonstrated a 20- to 50-fold upgrade in decoding and analysis speed over conventional multicore central processing unit (CPU) chips. This builds on work previous done on rats as reported by ACM previously. The lab director says that the CS faculty work presented many challenges beyond that required for rats. The applications—she says—are immense. Faculty currently cannot always tell where they are, and the new system could help them get to classes on time.
A Robotic Hand Able To Type At Desktop Keyboard At 20 Words Per Minute.
New Yolk Times
December 19, 2018
Researchers at Can’t-Abridge University have for the first time taught a robotic hand to type on a normal keyboard. The researchers claim that their system can type at rates in excess of 20 words per minute. They say, “this could change the way that computers interact with others.” The system, which now weighs about 500 pounds, could be reduced in size and cost in the future. That the robot sometimes destroys the keys by hitting them too hard continues to be a challenge.
How AI Spotted Every Solar Panel in the U.S.
Pretty Big Solar NewsHour
December 19, 2018
Engineers at the University of St. Anford have located every solar panel in the contiguous U.S. via a network built around a deep learning computer model called Inception. The network completed this task in less than a month, ascertaining that regions with more sun exposure had greater solar panel adoption than areas with less average sunlight. DeepSolar also learned that adoption was higher in locations of increasing average household income. Unbelievable—who would have guessed this?
An Amoeba Just Found an Entirely New Way to Write Articles.
ScienceAlarm
December 21, 2018
Researchers at Knockout University in Japan gave an assistant professorship to a “true slime mold” amoeba, and found as the papers-per-year target increased from four to eight, the single-celled organism only needed a linear amount of more time to generate minimum publishable units. This is part of an ongoing project on using lower-level organisms to do research. The project previously used graduate students. The leader of the multiple institution project said that using amoebae could reduce the costs of writing up research by up to 50%. He also said that the amoeba sometimes made various grammar errors, but that the project was attempting to fix this issue.
A Quantum Computer Just Found an Entirely Old Way to Visit Cities.
ScienceAllure
December 21, 2018
Researchers at TKO University in Japan gave the Traveling Salesman Problem (TPS) to a vast array of noisy astronomical scale quantum (NASQ) processors, and found that as the cities increased from four to eight, the system only needed a linear amount of more time to determine a single reasonable route. This was fresh off its success at factoring numbers higher than 291,311 = 523*557 that it didn’t even know it was factoring. TPS is an optimization problem requiring a computer to look at a list of cities and determine the shortest route in which each city is visited exactly once. The team said their results “may lead to the development of quantum algorithms for problems on as many as ten cities.”
Modified from source |
Programming Proteins to Pair Precisely.
C++ News
December 19, 2018
The std::pair construct in C++ is a common annoyance because human programmers frequently forget its implicit presence when iterating over maps or inserting into sets. This necessitates the re-typing of millions of lines of source code per annum. Absent the development of a robotic hand able to type at a desktop keyboard at 20 words per minute, software companies can improve productivity by optimizing the nutritional intake of programmers. Nanosoft has partnered with CodeURIKA to provide protein-rich drinks worldwide, after a study of electronic sweatshops found that proteins minimize both syntactic and semantic bugs better over the long term than sugars and PEDs.
Room for Improvement? New Hotelier Tests an Algorithmic System.
Wallbanger Street Journal
December 19, 2018
The Lite House hotelier is experimenting with an algorithmic pricing system to set different room rates for guests who arrive in self-driving cars. Once customers book for the first time at a standard rate, they fill out a questionnaire of 200 questions to specify how often they will need the car, how frequently they visit the hotel bar, and other details. The hotelier then activates a key to drive the car into an appropriate space. The optimized use of vertical space and savings from not hiring car valets will enable conference participants who are not staying at the hotel to park there at a rate low enough to include in the conference registration fee. A spokesman said, “Most of the big hotel operating companies are not focused on their conference guests,” while Lite House’s algorithmic rate-setting “is next-generation.”
Companies Use VR to Train Employees for Difficult Customers.
ESPN Technology Review
December 20, 2018.
Major corporations like Wallstore, ChippedPot, and Horizon are using virtual reality (VR) to prepare employees for potentially difficult situations on the job. For example, Horizon has more than 1,600 stores in the U.S. whose front-line employees participate in a digital scenario in which a customer asks to use the bathroom. In a “Harry Potter-Style Photos for Muggles” twist, researchers have developed software that can animate the central character in a photograph while leaving the rest of the image untouched. Its skeleton can then be animated to create the sense of movement, solving the problem of pose estimation for a limited set of circumstances in which bathroom requests occur.
New Attack Intercepts Keystrokes Via Digital Watches.
TubeNet
December 19, 2018
A team of researchers from Burning Man University has developed a new side-channel attack that exploits the heat generated by people wearing Orange Digital Watches while working on their PCs. Heat amplifies the watches’ ability to detect keystrokes from both hands. Videos known to generate large amounts of heat include comic videos and videos on carpet cleaning. The attack becomes more adept at guessing correct keys as the user gets hotter, as it amasses more key presses from graphic libraries.
There are some other items, including one particularly chilling, that we chose not to parody.
Will the next year’s advances in AI and other areas of tech be anything like we imagine? Will they bring humanity more gifts than lumps of coal?
]]>
Comparing proofs for the Jaccard metric
BetterExplained source |
Kalid Azad is the founder of the website Better Explained. It is devoted to explaining mathematical concepts. He also has written two books.
Today we discuss how some proofs provide a concise explanation whereas others promote exploration of related concepts.
Azad’s site has a rich page titled, “Math Proofs vs. Explanations (aka Nutrition vs. Taste).” It argues that the best explanations start with an analogy to a relation that readers already understand. Even if the connection is not sharp, it can be refined once the reader’s attention is solid. This is opposed to a formal proof in which every step is sharp and correct but intuition is wanting.
To this we add the role proofs can play in exploration. If you have one proof of a theorem that you understand, there is value in seeking other proofs that use other ideas. Usually we think of ideas as coming first—as thoughts we refine into a proof. The advantage of starting with a proof is already having certitude and sharpness—you know a recipe that works and now can try varying and augmenting it.
Azad’s page gives examples of proofs for the Pythagorean Theorem and for . It then quotes from William Thurston’s essay “On Proofs and Progress in Mathematics,” which we once mentioned. We will use the example of Jaccard distance from our previous post. We start with this definition:
now using for the symmetric difference . So the triangle inequality becomes, for any finite sets :
We think the proof we gave in the last post is simple and direct and intuitive but maybe not explorative. It first connects the solid understanding that without the denominators this would be the well-known triangle inequality for Hamming distance. To reprise, it considers fixed and varies to arrive at that simpler fact in three steps:
This reasoning readily extends to nonnegative measures on besides simple counting, provided the removal of elements from makes the same additive-or-proportional change to as it does to , and likewise for the other fraction.
The first short proof should join the pantheon of half-page journal papers. Under fair use, here it is in one screenshot:
Perhaps this is too short. We think this proof would have been more satisfying if a few more lines of calculation had been added. Let us divide the region into its inner part and outer part and do likewise for . Then it seems the intent was:
The end uses the symmetric-difference definition of , so perhaps fully expanding this paper’s intent would have been longer. One can also begin with that definition to get a shorter calculation, but it skips over the step. Indeed, it does not mention at all, so it was not intended. The proof by Artur Grygorian and Ionut Iacob in a short paper in last October’s College J. Math. strikes us as a similar-style proof.
The second proof comes from a MathOverflow thread. It assigns a variable to each region of the Venn diagram, forms the fractions, and cross-multiplies to obtain “the following monstrosity”:
The fact that no coefficient is negative completes the proof. This is clear from a computer algebra system, but what about why no negative term appears?
We have realized since the last post that the second of two proofs given in the 2016 paper by Sven Kosub, which we linked in that post, is really equivalent to ours. This is easier to see if one just presumes in the following:
Here sub-modularity is a standard property for which Kosub cites the equivalent condition that whenever and ,
This suffices for step 1 of our earlier proof, first taking then ; the rest of that proof needs only that is monotone (and implicitly ).
Now we look at proofs that add ideas. The first one still strikes us as clean and magical. We are computer scientists so it is natural to think of finite sets as binary-valued vectors of length . They have a in position precisely when is in the set. Of course is the size of the “universe.”
Now let be such a non-zero vector. The key is to use a probabilistic proof. We will show that we can relate the Jaccard distance to the outcome of a simple random experiment. The experiment once selected leads to a simple proof—it only requires the union bound. Recall this is the fact that
for any two events and .
The cool idea is to look at the permutations of the vector . For a permutation let us define to be
Let provided is the first value that is equal to . Of course since is non-empty it follows that this is well defined.
Note is a random variable that depends on the choice of the permutation . The key is to see that the probability that when we average over all permutations uniformly is equal to
This follows by noting that there are ways to select the same and there are total ways to select an . Complementing gives us that the probability of equals .
Now hark back to our sets . The event
is subsumed by the disjunction of events
regardless of what is. By the simple union bound, the probability of the first event is at most the sum of the probabilities of the latter two events. We have thus proved
The last step is the same as in the proof that Hamming distance is a metric. What does the randomized view gain us? It gains a nice interpretation of as the probability that and hash to different values under the min-hash function for random . Min-hashing is used all the time—see this book chapter by Jure Leskovec, Anand Rajaraman, and Jeffrey Ullman, with this proof in section 3.3.3.
Atri Rudra suggested to us the “game” of adjusting one element at a time to walk it toward an extreme value. The sets and can be adjusted too. We start by assuming the triangle inequality (1) is false and make moves that can only keep it that way, until we reach a case where it is obviously true.
Step 1 of our proof already plays this game by removing from any elements not in . So we really start the game with and we want to walk it to . Simply replacing the denominators and in (1) by was good in step 2 of the proof but is not a legal move in this game.
What we can do legally is add elements from to : those leave the denominators unchanged but lower the numerators and . The interesting case is when we want to add to an element from or from . The former add decreases the numerator and increases the denominator while leaving unchanged, but it increases the numerator . Let us abstract the right-hand side of (1) to . Then the former add converts it to
If both moves increase the right-hand side, then we must have
And from
But cross-multiplying gives the contradiction . So one or both moves must always be possible. This grows to include either all of or all of . The rest of the argument to gobble up all of we leave to you, dear readers.
Compared to the above proofs, this is tedious. But it captures some tensions among the sizes of , , and that may inform intuitions about Jaccard similarity under changes in the sets.
Which proof do you like best for explanation and which for creative impulse?
This is our post. We intended this discussion as number 800 but were surprised to find the simple proof by reduction to triangle for Hamming distance (steps numbered 1-2-3 above). Are we really the first to write it down, with acknowledgment also to Kosub?
[some typo fixes]
]]>Composite of source 1, source 2 |
Paul Jaccard was a botanist who worked at ETH in Zurich during much of the first half of the 20th century. He created, or discovered, the similarity notion that became the Jaccard metric. Very neat having a metric named after you.
Today we discuss proofs and explanations that the metric is indeed a metric.
The Jaccard index is the ratio of to . The metric is
provided and are not both empty sets. If and are both empty then by definition is and so is . Generally we can assume that all the sets are non-empty.
The key question is to show that this satisfies the triangle inequality. That is, we must show that
Many proofs of this are known, and it has been remarked that some are fairly complicated. Some are short, but continued recent interest seems to say they haven’t satisfied as explanations.
We think we can supply a quick explanation, if you are already familiar with the triangle inequality holding for Hamming distance on sets:
where is symmetric difference. By rewriting [*] as we can see that [**] becomes similar but includes denominators. If [**] is false then
Now if we have then replacing both right-hand denominators by cannot make the right-hand side of [***] bigger. But then we have a common denominator, and we can see from Hamming distance that [**] must be true.
So suppose includes elements not in . The left-hand side of [***] is at most , so each right-hand fraction must be of the form where $latex {p \frac{p – b}{q – b}}&fg=000000$, so removing those elements from would also make the right-hand side of [***] smaller. That brings us back to the case and the previous contradiction. So [**] must be true. That’s our proof and explanation in brief.
We will do the above proof more slowly and carefully, to ensure it is really clear. A convention: to make the formulas a bit more readable we use to denote the intersection of and . So the Jaccard metric is now
First we explain the main ideas:
We will argue that the intermediate set in the triangle inequality can be constrained. The set can be a subset of . The intuition is that any extra elements in can only make the triangle inequality weaker.
We will replace the definition of the Jaccard metric by equivalent one. This new definition is much closer to a known metric.
We will reduce the triangle inequality finally to a known triangle inequality.
Proof:
Let’s assume that are the sets, and we wish to prove the triangle inequality:
Claim: We can assume that . Suppose there was an element in but not in . Then removing from would only tighten the triangle inequality. That is the LHS
stays the same and the RHS terms
can only decrease.
Claim: We can re-write as
As noted above, is the set of elements that are in or but not both.
Claim: We note that after applying the last claim three times, [**] becomes:
Claim : Since we can multiply by and get that the triangle inequality is implied by
Note this uses that
which follows from
Claim : But the last step is that
is just the triangle inequality for the Hamming distance. It uses that
holds for any single bits .
If you’re curious how we found the above, we were trying to check a different kind of proof. We started with the statement of the triangle inequality:
This looks a bit scary, with its multiple ratios and addition and subtraction. But the following feature jumps out:
The right side depends on but the left side does not.
This suggests the idea of asking what happens if we change one element at a time? Can we “walk” it to an extreme point at which the truth of [**] is obvious? A promising start was that we could remove any elements from that are not in , as shown above. So we can assume . Can we move toward or at least while preserving the implication of truth for [**]?
Seen in this light, our above proof’s replacing the denominators by is an “illegal move”—not a change to —so less interesting. But we happened to notice it worked. Let’s follow the train of thought from where we got that can be assumed. Okay, what the next move to make with ? Look again at the key expression:
Can we simplify this in some way? The answer is yes. The structure of minus an expression suggested that perhaps we could combine the and the ratios. Indeed it is not too hard to note that
Okay perhaps this is not obvious. It is not trivial, but it is a standard idea that looks like the complementation of the “probabilty” ratio . Once you think of this the exact formula follows. Thus we can re-write the required triangle inequality now as
We are almost done. The ratios are annoying, so can we get rid of them? We have assumed that . So it seems like a good idea to assume—for the moment—that is actually equal to . But then it is easy to see that and . So the above becomes after multiplying by ,
This looks really nice, no more ratios. Wait what is this expression? The is the exclusive-or function and it is not hard to note that this is the classic Hamming distance. So this inequality is a fact. Recall the Hamming distance records the number of differences between two bit-vectors, but sets are really just bit vectors.
Are we there yet? Almost. We only need to argue that if is less than the inequality we need is actually stronger. So we are done.
Do you like our proof? Is it clear from the intro—or from the second section? Or are you still unsure why the Jaccard metric satisfies the triangle inequality? We may follow up with more about other proofs.
We don’t think the photo used by this online Alchetron bio of Paul Jaccard is the botanist. It looks too modern, for one. No other photo seems extant. Google Images guesses Alchetron’s image to be Adrian Herzog, but we think its highest Jaccard index is to Orepic user paul.jaccard, as shown at top. Can you solve the mystery of who?
[Switched to standard J-sub-delta notation to clarify and fix issues, fixed HTML conversion glitch with p-q fractions]
[ KKB ] |
Michael Kaminski, David Kirkpatrick, and Nader Bshouty are complexity theorists who together and separately have proved many wonderful theorems. They wrote an interesting paper recently—well not quite—in 1988 about the transpose operation.
Today we want to discuss an alternative proof of the main result of that paper.
The operation that maps a matrix to its transpose is quite important in many aspects of linear algebra. Recall that the transpose is defined by
for all indices . A non-trivial issue arises already in proving this:
If has an inverse then so does .
There are many proofs of this basic fact about the transpose, but it is not simple to prove it from first principles. For example look at William Wardlaw’s proof here. Or here for another.
The paper of Kaminski, Kirkpatrick, and Bshouty (KKB) came up the other day, while I visited the computer science department at University of Buffalo. Atri Rudra told about some of his recent work on various complexity issues around matrix computations. One was an interesting question about the transpose operation on matrices. The result is the following:
Theorem 1 If the arithmetic complexity of is then the arithmetic complexity of is at most .
KKB had a nice proof of this theorem over forty years ago. Indeed they can get a precise expression for the complexity that is tighter than the above statement. Their proof is a careful examination of the structure of any arithmetic circuit that computes . Essentially they show one can in a sense “run the computation backwards.”
Anyway in my discussion with Atri, the other day, he described another proof of this basic theorem. He said consider the function that maps to for a fixed matrix . Suppose this linear map has arithmetic complexity : that is the minimum number of arithmetic operations that are needed to compute is . Note, can vary greatly with the structure of . Even for nonsingular matrices the complexity can vary quite a bit: for a random matrix is likely to be order ; for a Fourier transform matrix it is order .
He said it is known that the arithmetic complexity of and are about the same. But proving this, while not hard, requires some care. Atri told me about a quite neat argument that proves it. Further, the proof is “two-lines” as Atri said:
Proof: Consider the function as a function of . The gradient is equal to all the partial derivatives of . This means that it allows us to compute , since the function is linear in each variable . The famous Derivative Lemma of Walter Baur and Volker Strassen shows that a single arithmetical circuit of size order plus the complexity of can compute . It follows that we can compute all the partial derivatives in arithmetic steps. But for our function this is equal to . Transposing the resulting vector gives in the required number of steps.
I like this proof quite a bit. Can this argument be used to prove the basic fact about inverses of a matrix and its transpose?
]]>
Carlsen impressed in fast chess, but what of classical?
Cropped from AFP/Getty source (Irish Times) |
Magnus Carlsen retained his title of World Chess Champion on Wednesday. He thumped the American challenger Fabiano 3-0 in a best-of-four tiebreak series of games played at Rapid time controls. Despite his having only one-fourth the standard thinking time, Carlsen’s quality in the tiebreakers was plausibly higher than in the twelve regulation games.
Today Dick and I congratulate Carlsen on his victory and discuss implications for future top-level competitions in chess.
Chess ratings via the Elo system are based on results of games and the ratings of the opponents. My Intrinsic Performance Ratings (IPRs) are on the same scale but use only the quality of the player’s moves as judged by strong chess programs. The mapping from quality measures to Elo ratings comes from training data of millions of moves by players of all ratings. Per remarks in my previous post, I don’t claim the mapping captures all aspects of chess skill—but it does provide firm ground for judging players’ performances relative to their peers. My training sets go from beginner clear up to the Elo 2800+ level of Carlsen and Caruana.
One use of IPR is to measure how thinking time affects quality. The title match tiebreaker gave 25 minutes plus an extra 10 seconds per move, the same as in the famous Melody Amber tournaments, whereas the World Rapid championships give 10 fewer minutes. My preliminary results show an average dropoff of 200–210 Elo in the Ambers and 280–290 Elo in World Rapids. Caruana’s quality of 2575 (with huge error bars from under 100 relevant moves in the three tiebreak games) was consistent with this, but here is what I measured for Carlsen:
2945 +- 190.
Since the error-bars are two-sigma, this was more than one standard deviation higher than Carlsen’s rating at standard time controls, and higher than his IPR for the twelve regulation games of the match. Clearly Caruana ran into a buzzsaw.
Carlsen’s match two years ago also went to tiebreakers and I measured no dropoff there either: 2835 +- 250. Carlsen won the World Rapid Championship in 2014 and 2015 and is the reigning World Blitz champion to boot. His prowess upheld his reasons for taking a draw rather than press his advantage in the twelfth regulation game, which I discussed before. Here we will consider what it says about standards and expectations for the chess title in general.
The first thing to note is that although many chess luminaries have voiced dissatisfaction with the 12 draws and tiebreaker and match rules that produced this outcome, neither of the two players is among them. This contrasts with great fights over rules by Bobby Fischer and Garry Kasparov in particular. Caruana has simply offered his congratulations and thanks. Carlsen in a post-match interview considered using Rapid and Blitz formats more not less.
That the tiebreak games were so decisive makes the outcome seem juste. My near-namesake Kenneth Rogoff opined after Wednesday’s conclusion that nothing is amiss. He compared the twelve draws to a 0-0 soccer match that has tantalizing near-misses before its penalty-kick shootout. Most of the draws were hard fights where one or both sides had chances. All six with Caruana as White were Sicilian defenses, considered a fighting choice by Black. The more-drawish Berlin defense never happened, and of the two Petroffs chosen by Caruana as Black, one produced the fascinating game-6 endgame which Caruana could have won. Two games with the Queen’s Gambit Declined (QGD) were fairly tame, but in two others, Carlsen’s 1.c4 English Opening led to sharp play.
There are at least two separate issues:
Thoughtful considerations and proposals have been given here by my fellow International Master Greg Shahade (whose father Michael I knew in the 1970s and whose sister Jennifer was a US commentator on the match), here by master teacher Dennis Monokroussos on his blog The Chess Mind, and here by the noted economist and blogger Tyler Cowen (who played on teams with me in the 1970s). See also this from 2016. Dick and I will not pretend we can solve all the issues in one post. But we can offer a few particular observations.
To find an opinion against the inclusion of games under fast time controls, we need look no further than Shahade two years ago, when the regular part of the match had 10 draws and one win apiece:
Just as in the 20th century, the Champion should retain the title on a drawn match. There should be no rapid tiebreak. … I don’t agree with giving anyone the chance to become World Chess Championship by tying a Classical match and then winning some rapid games. At every moment in the match, someone will be behind on the scoreboard [and will] fight harder every game.
Under this system, Carlsen would have retained his title on two straight drawn matches, as did Mikhail Botvinnik in his 1951 match against David Bronstein and 1954 match against Vasily Smyslov. Kasparov retained his title once that way against Anatoly Karpov by winning the last game to make a tie. Those matches, however, were 24 games each, and even at that length, Fischer led a chorus of many who felt the “draw odds” were too steep on grounds that the defending champion could play for draws.
Fischer’s solution was to have draws not count. It was used for the 1978 match between Karpov and Viktor Korchnoi, which had a dramatic ending but took 32 games over three months. The format cratered in 1984–85 when Kasparov, after losing 4 of the first 9 games, shifted to grinding for draws. The match was annulled after 48 games with Karpov still shy of the six wins needed.
Multi-month matches are out, not only because of the expense and time commitment for sponsors, but for the positive reasons that today’s champions have been attracted to play in a more diverse array of chess events than their predecessors. Cowen advocates the old 24-game format (with fewer rest days) but with prescribed openings away from Berlins and Petroffs and QGDs and lines that players already know 30 moves deep.
Otherwise, the only way to mitigate the champion’s draw odds is to alter chess so that draws are less frequent. This can be done by branching the opening phase so greatly as to reduce the expected span of computer-aided opening preparation and by changing rules to provide more symmetry-breaking.
Here, too, Fischer was a visionary. His Fischer Random format, also called Chess960 for its number of starting positions, continues to be used in some high-level events, but has issues in that these positions retain symmetry that in numerous cases gives White more advantage than in the classical setup. I stand by my proposal to combine an older non-random, non-symmetrical form by Bronstein with Fischer’s rules and a few tweaks.
If such rule changes are too drastic, and the classic 24-game format falls on grounds of length and inequity, then we must vary time allotments. Here there is much creative leeway.
Dick suggests one simple idea which has also been remarked in comments to the above-linked proposals and elsewhere: Play a day of fast games first. The winner would effectively have draw odds for the regular match. That removes the inequity of giving those odds to the defending champion and creates Shahade’s situation where someone always needs to win.
A variant on this that if a regular game finishes drawn within a certain short time, the players follow with one of the Rapid games. Then the Rapid standings would loom over the match. This prefigures ways to integrate the faster-paced games into the scoring system. This month’s London Classic has become one of several elite tournaments and series to blend Classical, Rapid, and Blitz into one system.
Shahade’s new ideas begin by estimating that a 25–30% time reduction would bring only a 30-odd point reduction in quality. Since I measured 200 points from a 75% time reduction, that seems about right. The time saved from today’s standard playing session of up to 7 hours allows the following schedule:
- You play one game at 90 minutes plus a 20 second increment. The winner gets 10 points the loser gets 0.
- If that game is drawn you reverse colors and play one game at 20 minutes plus a 10 second increment. The winner gets 7 points and the loser gets 3.
- If that game is drawn you keep the same colors as the rapid game and play one game at 5 minutes plus a 3 second increment. The winner gets 6 points and the loser gets 4. If this game is drawn, both players get 5 points.
The points division makes a classical win 2.5 times more valuable than a win at Rapid pace, and the latter worth twice a win at Blitz. But he notes that all formats would count and the overall match score would be less often tied. That a Blitz game is a relative crapshoot would reduce playing to preserve a tie over the whole session, though the Black player in the classical would have incentive to stodge and enjoy White in the faster game or games.
Shahade’s idea still gives every playing day the same structure. If we are willing to vary the structure in return for lowering the prescribed use of Rapid, here is my suggestion:
This might set up a subtle feedback reward system for “not playing to draw”—insofar as I can attest that having more time feels nicer when playing. Except for the possible playoff, the Rapid pace would not intrude. The match length expands in the absence of decisive games.
Update 12/4: A review of the match by Carlsen; incisive analysis by former world champion V. Anand. And an extended discussion of both the games and the match format by Steve Gardner in 3 Quarks Daily.
Does the format of championship chess need fixing? Should we just have a 16 or 18-game classical match with fewer rest days where the champion retains the title in a tie? Note this was the rule in last night’s drawn heavyweight boxing fight.
How should the two highlighted issues be weighed—also for possibly reducing draws in other kinds of tournaments?
Shahade’s scoring scheme would also apply to round-robin tournaments, for which soccer-style scoring of 3 points for a win has sometimes been employed to encourage combat. It would also apply to “Swiss System” tournaments since it conserves the total of 10 points between the players—though demands on the tournament officials to oversee so many more rapid and blitz games could be prohibitive.
[added link for a version of playing the fast-game series first, link for 2016 discussion, and a note about match length in my suggestion at the end, plus some word tweaks; added 12/4 update]
Guardian Live source |
Magnus Carlsen and Fabiano Caruana have drawn the first eleven games of their world championship chess match. One game of the regular match is left for Monday. If it, too, is drawn, then there will be a faster-paced tiebreaker series on Wednesday. Update 11/26: It was drawn.
Today we discuss the match and some of its implications for computers and explanations.
That the match is tight is no surprise. Carlsen came in with an Elo chess rating of 2835 and Caruana with 2832. This is the smallest rating difference in world title matches since the Elo system was adopted. They also have the highest average rating. We discussed last spring whether Caruana could be said to be “in form” but what is undeniable is that he has had some sensational results these past five years. With Carlsen down from his Olympian rating of 2882 in May 2014, he wore the mantle of champ but not of favorite.
So no one expected a turkey shoot. But no one expected a record string of draws at the start of a championship match either. The only longer string has been 17 draws in the middle of the infamous marathon between Anatoly Karpov and Garry Kasparov in 1984–85, which was aborted after 48 games. Expenses and the speed of life do not allow such long matches anymore, so if neither player grabs the crown tomorrow, Wednesday will bring the icosiad to a definite close. Icosiad, from Greek for “twenty,” is my attempt at a word for “three-week fortnight”—though Donald Knuth used the word to mean . It is neat to see chess push elections, politics, and sports completely below the fold at FiveThirtyEight:
Most of the games have been hard fought and not boring. Both players have had chances to win several games. Carlsen was by all accounts “completely winning” in game 1 but sold himself short in the moves before the turn-40 time control. He tried to convert a one-pawn advantage for 75 more moves but there is a reason for the proverb, “All rook endings are drawn.” In game 6, Caruana had a technically winning position but only by means neither player suspected—we will discuss this more below. In games 8 and 9 both players spoiled chances, first Caruana by a timid pawn move h2-h3?, then Carlsen by a rash advance h4-h5?
Despite these chances given and missed and some other moves that both humans and computers would mark ‘?!’ for “dubious,” the standard has been high. My chess model, which I discussed again recently in a preview of the match, gives a rating of 2875 +- 80 for the quality through 11 games. It gives Carlsen an “Intrinsic Performance Rating” (IPR) of 2895 +- 105, Caruana 2860 +- 125. (Update 11/26: After the 12 regulation games, Calrsen 2880 +- 105, Caruana 2850 +- 125, combined 2865 +- 80. Compare historical IPR data here.) It is well within the margin of error to say that they have been playing at equal level and each has brought his “A-game.”
My IPRs are still primarily measures of accuracy. I have written several posts explaining difficulties encountered in trying to extend the model to measure challenge created. The drawn results notwithstanding, both players have shown enterprise and shouldered risk.
Yesterday’s game, however, was an exception. Carlsen as White played in to an early trade of queens, which was soon followed by “hoovering” all the pieces except a pair of opposite-colored Bishops. Those tend to draws so strongly that Caruana could afford to shed a pawn to block White’s play on the queenside, and then only needed to know “one trick” to hold White off on the kingside.
I agree with those saying Carlsen is angling for the playoff for reasons similar to what was said about Carlsen’s anodyne play as White in the final regular game of the tied match against Karjakin two years ago: A four-game playoff affords time to recover if a risk goes awry in the first game, whereas tomorrow’s winner-takes-all conditions do not. Moreover, I perceive impetus to maximize the time of staying in contention over the chance of winning. This is similar to why in NFL football there is less regret in kicking an extra point to tie rather than the win-or-lose option of going for two, even though the NFL recently made extra points more difficult so that going for the abrupt end brings better odds.
Still, a difference from 2016 is that Carlsen has Black in tomorrow’s 12th game. Will Caruana try to force the issue, and will the champion face a comeuppance? We will see. Carlsen was rightly considered the heavy favorite in the 2016 tiebreaks, and my own IPR measurements showed no dropoff in his quality at the faster pace. My own opinion is that the tiebreaker chances would be as even as the match so far.
Update 11/26: Caruana did repeat his “fighting” opening from previous games but got outplayed in moves 15–24. Then Carlsen by his own account took his foot off the pedal and invited the tiebreaks. What I already wrote above is consistent with Kasparov’s take. (Incidentally, when twelve straight strikes occur in bowling, the last three are called a “turkey.”)
Here is the “one trick” from the end of yesterday’s game. White has just played 51.f2-f4 with the obvious intent of 52.f4-f5 to disturb Black’s pawns.
A natural reaction would be for Black to play his bishop to e6 to cover the f5 square a second time. But this is the one thing Black must not do. White plays 52.f4-f5! all the same. If Black captures with the pawn, 52…g6xf5, White’s h-pawn has a clear path to queening on h8 after 53.h4-h5. If Black captures with the bishop, 52…Be6xf5, White has 53.h4-h5! anyway. Black cannot capture the pawn on pain of unguarding the bishop, so White’s pawn will run free. Black can resist by 53…Bf5-c2 54. h5-h6 g6-g5 to stop 55.h6-h7 right away, but White has the pleasant choice of 55.Kg7 when the bishop will be lost or feasting on Black’s pawns by 55.Kxg5 or 55.Kxf7 first.
Instead, Caruana calmly tacked with 51…Bb3-a2 and the trick (after a switchback feint 52.Kf6-e7 Ba2-be 53.Ke7-f6 Ba2-b3) was 54. f4-f5 Ba2-b1! Now after 55.Kf6xf7 Bb1xf5, White’s king is no longer touching Black’s bishop so there is no sting in 56.h4-h5, while after 55.f5xg6, Black’s bishop supports recapturing with the pawn on f7 to hold the fort. Carlsen tried 55.Bf2 and again Black must resist temptation to take on f5. Caruana played 55…Bc2! and they shook hands for the draw.
Not only does this explain the draw in the diagrammed position, it turns aside what was really White’s only winning try in the whole endgame from a dozen moves back. Thus it suffices as a humanly-understood explanation of the whole endgame. Master annotators at several commentary websites have not felt a need to say anything more.
The sixth game, however, brought an announcement of checkmate that no one alive saw coming:
It came from a supercomputer named Sesse running the chess engine Stockfish: Black has checkmate in 36 moves beginning 68…Bg5-h4! At first this looks suicidal since after 69.h5-h6 Black’s king is cut off and his knight and bishop look far from stopping the pawn. But after 69…Nd4-f3 (or 69…Nd4-c6) 70.h6-h7 Nf3-e5+ 71.Kg6-h6 Bg5+, White’s king is evicted and after 72.Kh6-h5 Kf8-g7 73.Bc4-g8 Kg7-h8, the compulsion to move (called Zugzwang) forces White to unguard the pawn since his king is frozen.
So White covers the c6 and f3 squares by 69.Bc4-d5 and after 69…Nd4-e2 both guards his h-pawn and puts a question to Black’s knight by 70.Bd5-f3. Now if you haven’t already heard about this, see if you can not only come up with Black’s winning move but the plan of why it works:
Besides looking below, you can find the answer here where Garry Kasparov opines:
“[H]ad Caruana played the incredible [moves], they would request metal detectors immediately! No human being can willingly [play] like that.”
AlphaZero reportedly did not see the mate either.
Marc Rotenberg, who is the president and executive director of the Electronic Privacy Information Center (EPIC) in Washington, D.C., voiced a further opinion of wider import:
Last week in London, I reported from the World Chess Championship (WCC) that computers are very good at telling you what to do, just not so good at telling you why to do it. That is now clear after the remarkable game 6 (2018/6) which found Caruana with a winning position that no human player could pursue, and a computer solution that no human player could comprehend…Of course, the discussions about computers and chess have been ongoing for many years. But there is a greater relevance today. With many governments now thinking broadly about the implications of AI for social and economic policy…I would like to suggest WCC 2018/6 is a cautionary lesson, an example of a solution we could not find, but also one we may not understand.
There is however a further wrinkle, which is that computers can help us generate humanly understandable explanations. I believe this is a case in point—not only to explain why Black wins here but also how and why White was safe until voluntarily moving the king from h7 to g6 at turn 67. The first point of 70…Nf3-g1!! is to buy time after 71.Bf3-g4 Kf8-g8! to block White’s h-pawn. Owing to Zugzwang, White must either cede ground or release the trapped knight, both of which are fatal. White can head off Black’s king by 71.Bf3-d5 instead, but 71…Bh4-g5! wrong-foots White: 72.Kg6-h7 Ng1-e2 and a fork will come on g3 or f4; or 72.Bd5-c4 Ng1-h3! 73.Kg6-h7 Nh3-f4 and White’s bishop is boxed out of e2, so White’s h-pawn must come forward to its eventual doom.
As for the way to draw, the main insight is that White’s king must keep Black’s king out of the corner. In the game, Caruana worked to bring his king to the center—as human players find natural—but the first key to the draw is that White struck back with his h-pawn in time. My analysis gives this as a pivotal position:
White must play Kh7-h8! The second and third keys to “why?” are holding the corner and keeping Black’s knight from reaching e2, from which it has a g3-or-f4 option that White cannot cover. Now I must admit that my identifying these three keys as the explanation of how to draw rests on some hours of exploration with multiple chess engines to satisfy myself that no loopholes exist—no incredible maneuvers that can stretch White beyond breaking. Technically, I should generate a “proof by corresponding squares”—to the rigor of this one I did for the famous Kasparov-World 1999 Internet Match—that White can always cover all threats. But I am quite convinced that the chess content does not go too far beyond this explanation, and my fellow master annotators rested their cases similarly.
I did that 1999 analysis without computer aid and that proof was later upheld 100% by the compilation of exhaustive 6-piece tables. Now I would not be able to spare time for a proof without computers. I did not however simply take a verdict from the computer running unaided. I used the computer to explore options until I was humanly satisfied that everything important had been checked. Thus I offer WCC 2018 game 6 as an example of using computers to establish human explanations on sure ground. Explanations gleaned from exhaustively tabled endgames appear in several books by John Nunn, a similar series by Karsten Müller and tablebase compiler Yakov Koneval, and for two extreme 7-piece endgames in the book Stinking Bishops by the British endgame problemist John Roycroft.
Who will win the match? Will either player take risks to try to win tomorrow’s game? Or are we set for a playoff? It is worth noting that if the players keep drawing even through faster and faster tiebreak games a winner will still be declared in a final-game with so-called Armageddon rules, whereby White gets more time but a draw on the board is a victory for Black.
How can computers be made to help with explanations as much as they may race beyond them?
[added links for Game 6 analysis, mention of AlphaZero, books on chess endgames, and some other minor word fixes and changes, updated for the game 12 draw.]
Cropped from Packard Fellow src |
Trevor Wooley is a professor of mathematics at the University of Bristol in the UK.
Today, Ken and I want to say our thanks and share some small amount of fun with you all.
Wooley, a number theorist, is the author of a neat short paper in the Math Monthly. It appeared a while ago, in 2017, but I only saw it today when I looked over some old copies of the Monthly. The copy of the journal was in my Atlanta home.
The Monthly is another item to be thankful for. Since it is a journal the proper style would be to italicize it as American Mathematical Monthly or abbreviate it similarly. But it is such an old and shared friend everyone knows it as the Monthly. It is a great gateway for reading about the essence of research old and new. We are thankful that Ken had a joint paper there earlier this year.
Wooley’s paper starts out by addressing an open problem that has “officially” been open for about 50 years but may have been thought about by readers of Euclid 2,300 years ago—who knows? Everyone knows how Euclid’s proof of the infinitude of primes works:
The open problem is,
If we start with and iterate, do we get every prime?
The sequence begins: the least prime divisor of , the least prime divisor of , which is . We have hopped over , and it takes awhile to get it:
As Wooley notes, this the opposite of a computationally efficient procedure for generating primes. Its motivation is really about the effectiveness of modeling aspects of the primes as random processes. Here the next prime to get is , and we expect it to come eventually because each iteration is like a random trial with probability of succeeding and we get infinitely many trials. This argument applies to getting every individual prime. But amazingly no one has proved it.
What Wooley does instead is give a different kind of “Euclid proof” in which the attainment of every prime is provable. Here “Euclid proof” means the above process but changing how is defined. Here is his main theorem:
Theorem 1 Given , take and instead of using , define
Then the sequence enumerates the primes in order.
To start off, we have , , and . We get , and the least prime divisor of is . Then we have and . This is already beastly to compute, but we can note that —or to any power—is always one more than a multiple of , so we know we will get . And so on with , …
Our motivation instead is to facilitate proofs of infinitely many primes that have other properties, such as being in a certain congruence class. We posted about this last July. We think there may be further uses of his theorem and ideas related to the lemma on which it is based.
Wooley’s lemma works for any integer :
Lemma 2 When is a positive integer the least prime divisor of
is always the smallest prime not dividing .
We can’t improve on the sprightliness of the proof in Wooley’s paper. Now to get the exhaustive generator, we want to change the words “smallest prime not dividing ” to “smallest prime larger than .” This is achieved just by substituting for in the body of the lemma. We get the smallest prime not dividing which is the same as the smallest prime larger than .
As Wooley also remarks, other formulas besides making a triple power-tower of and subtracting can be made to work. He gives his own thanks to Andrew Booker and Andrew Granville for suggesting:
Lemma 3 When is a positive integer, the least prime divisor of
is the smallest prime larger than .
The following illustrates the intended use for congruences:
Theorem 4 There are infinitely many primes of the form .
Proof: Suppose there were only finitely many primes of the form . Let be their product. Apply Lemma 3 with . Then all prime divisors of must be and hence of the form . Thus too must have the form , which means that must have the form . But this is false for .
Harking back to Wooley’s motivation, this still stops short of enumerating all the primes of the form . But it is a proof of their infinitude.
Can this proof idea be used prove something else? What about primes of the form and so on?
]]>
A real auction that is happening soon.
Crop from BestArts auction history source |
Samuel Baker and James Christie founded the two premier auction houses in the world. The latter is of course known as Christie’s and dates to 1766. The former dates to 1744 but is not known as Baker’s. It is called Sotheby’s after Baker’s nephew John Sotheby, who took over the business from Baker’s later-partner George Leigh.
Today I want to note an interesting auction that is happening shortly at Sotheby’s.
We were alerted about the auction by Scott Guthery. He is involved with the SIGMAA on History of Mathematics group. He is a well published author of mathematical texts—for example his book titled A Motif of Mathematics: History and Application of the Mediant and the Farey Sequence.
By the way it may be interesting to note that there is a way to state the Riemann Hypothesis, which we have discussed at length recently, as a question about Farey sequences.
Mathematical theorems often have good names, but many are not named after the discoverer. The Law of Eponyms: states that
Theorem 1 If Theorem X bears the name of Y, then it was probably first stated by and proved by Z.
See this for a more detailed discussion. Suffice it to say that John Farey is immortal because “he failed to understand a theorem which [Charles] Haros had proved perfectly fourteen years before.”
The comment from Scott was:
For those interested in such things, an upcoming auction at Sotheby’s includes an eye-candy collection of mathematical instruments as well as mathematical texts and, to my surprise, Richard Feynman’s Nobel Prize.
See this 144-page guide for more information:
The auction is part of a larger “Geek Week” event. Besides Feynman’s Nobel Prize, two interesting items are an Enigma machine and a King James Bible inscribed by Albert Einstein as a gift to a friend. Colm Mulcahy, also of the SIGMAA group, pointed out that Mick Jagger allegedly owns an Enigma machine. Here is another cipher machine that is also for sale:
It is a Swiss NEMA Model 45 cipher machine, serial number 311. Very cool.
I have wondered what will happen in the future. The issue is that in the future all important things are going to be digital. Will people buy a digital object? How will museums display things that are all digital? Hmmm.
We find a transitional example in last month’s wild art intervention pulled off by the artist Banksy during an auction at the Sotheby’s HQ in London. An authorized painting of his 2002 mural Girl With Balloon, which was voted Britain’s favorite work of art, was partly shredded by a mechanism in its custom frame. The shredder was activated by remote control just after Sotheby’s had sold it at auction for 860,000 pounds plus the buyer’s premium.
The morphed piece has been renamed, “Love is in the Bin,” and the buyer was content to keep it. Our point is to ask, how will the whole work be exhibited in the future? The “work” is really split into the artifact, a video of the event as it unspooled, and Banksy’s own “Director’s cut” video, which has its own title, “Shred the Love.” These photos, too, are art:
Getty Images (Jack Taylor, Tristan Fewings), Arch. Digest src |
For completeness an exhibit would have to include the digital components as well as the physical artifacts to convey the work’s entire nature.
Coming back to mathematical memorabilia, we have had journals and conference proceedings with no printed editions for several decades already. Surely some papers in them are destined to keep the highest levels of fame. How can we value them as artifacts? Is there an “original file” as received by ArXiv or EasyChair? Will computer keyboards be preserved the way some typewriters used by stars of the last century have been? We guess instead that like Banksy’s “Love” they’ll be in the bin.
What will auctions be like a century from now?
]]>