Neil L. has graced these pages many times before. Every eve of St. Patrick’s Day he has visited Dick. Sometimes Dick has been hard to find, but Neil has always managed.
Today we relate some medical news while wishing a Happy St. Patrick’s Day 2018.
Neil knew to go to the apartment shared with Kathryn which is adjacent to MoMA in Manhattan. He appeared and took in the grand view of St. Patrick’s Cathedral through their east-facing window. But he did not find either of them there.
What Neil found was a sheet of paper on a table near the window. “A message for me?”, he breathed. He read in big letters across the top “IDEA FOR GOLD-” and that was enough. He snatched the sheet and vanished. But he had to go somewhere. So he came to me.
I had seen Neil for the first time only a year ago. Since I knew what Neil didn’t, I was expecting him. I had gone down to my basement ostensibly to watch “March Madness” while on the exercise machine. I was riveted by bottom-seed UMBC sinking three straight 3-point shots to hold #1 Virginia to a 21-21 tie at halftime, but at the first sign of green smoke I switched off the TV and pulled up two chairs and a small table with ashtray.
Neil intoned as he lit his pipe,
“A blessed eve to ye.”
I replied, “Same to you—I guess it has already turned St. Patrick’s Day in your home isle.” Neil nodded and as he was about to speak I interjected, “You did not find Dick—”
“Aye—aught I saw of him.”
I had permission to inform Neil of why:”He is in intensive care after heart surgery. Kathryn is with him.”
Neil doffed his green hat with a long and serious “Ahhhh…” Then he took a long drag on his pipe. “To be mortal…”, he whispered. But forcing his lip corners brightly up, he said,
“Yet ideas are immortal—that is why I come. Every year, at this time. I have a message from Dick to show ye.”
He pulled out the sheet. I read the entire top line: Idea for Goldbach. “He means the Goldbach Conjecture,” I informed Neil. I expected Neil to recognize the conjecture—if leprechauns can be nerds, he is one. Duly Neil intoned:
“Every x > 2 that is divisible by 2 is the sum of two—“
“Primes.” I completed his sentence—the pause struck me as strange—and I went on: “With Fermat’s Last Theorem having been solved, Goldbach is now the easiest unsolved problem in mathematics to state. The fact that Pierre Fermat got 357 years of credit for a ‘Theorem’ just because he left a marginal note saying he’d proved it emboldened Godfrey Hardy…” On my new I-Pad I called up the story:
Hardy was known for his eccentricities. … He always played an amusing game of trying to fool God (which is also rather strange since he claimed all his life not to believe in God). For example, during a sea trip to Denmark he sent back a postcard claiming that he had proved the Riemann hypothesis. He reasoned that God would not allow the boat to sink on the return journey and give him the same fame that Fermat had achieved with his “last theorem.”
A long green puff accompanied the reply,
“Would Dick do that?”
I reflexively replied “naw” with my mouth but the part of me that really wanted to see the idea had control of my hands. I reached for the sheet and there was a flash of white light.
The sheet no longer had Dick’s handwriting. I expected Neil to chide my haste and withdraw it, but instead he laid it flat and folded his arms:
src |
I goggled a bit, but I knew what I was looking at without googling. “That’s not leprechaun writing. That’s Nigerian script as used in the movie Black Panther.”
“Aye.”
“You can’t do that. That’s cultural appropriation.”
Neil emptied his pipe, straightened his gold buckle, tipped his hat again, folded his arms the other way, and gave me a long stare.
“Oh who would ye tattle me to, the First Minister o’ the Irish?”
Neil meant Leo Varadkar, but why “first” minister? This was the second time he had avoided saying “prime.” Nor did he use the proper title Taoiseach or say “Ireland.” Clearly he was trying to convey something beyond a lesson of intercultural embrace. Neil piped up again:
“It matters aught what the characters are, but which ye tell away from each other.”
Neil was right—the information content resides wholly in the ability to distinguish pairs of symbols. Impatiently I asked, “So what is the information? Can you read it?”
“It is coded with the starkest cosmic scrambler—a black hole—so that I may tell little from it.”
My thoughts sprang ahead owing to this week’s passing of Stephen Hawking. In a children’s book written with his daughter Lucy Hawking and Christophe Galfard, the children’s astronaut father falls into a black hole but is resurrected through the Hawking radiation, though it takes a galactic computer “a long time” to reconstruct him. On the serious side, Dick and I have wanted to blog about the proposed solution by Patrick Hayden and Daniel Harlow to the black hole firewall paradox, whereby the one-wayness of Hawking radiation staves off the reckoning of the paradox. I’ve also wondered whether every computational process that can occur in nature must occur in proximity to any black hole, combined with every computation we can program ipso facto being one that occurs in nature. I then sprang back, however, to what Neil had said about immortality, so I asked:
“Neil, from your immortal perspective, would you say that Hawking of all humans came the closest to states that you recognize as pertaining to immortal beings?”
Neil unexpectedly flinched from my question.
“Steve—“
“Stephen,” I corrected. “It makes a difference.”
“—he had little to share with us folk. Try Google. You will see.”
Indeed, there was basically no webpage connecting “Hawking” to “leprechauns” at all. Plenty for trolls and a few for elves, plus “fairy” as an adjective, but none for leprechauns. Nor gnomes. Noting what Hawking said about the brain possibly continuing in articles like this and this, and that he also had his DNA sequence shot into space on the Immortality Drive, I pressed Neil on my question. After long pause he replied softly:
“What do ye most celebrate about him this week, after all? If ye look at the Web everywhere it seems…”
Indeed I have been struck how so many of the memorial appreciations of his life were playing up his human qualities. The stories… Well, Hawking was human after all.
“Aught may ye have both ways, me lad.”
Whatever Neil meant by “aught,” there was the sheet of Dick’s ideas to decode. Neil had said he might tell a little from it. Jumping from Theory of Everything to the technical pivot of the Imitation Game movie, I realized we did have some of the plaintext: ‘idea’ and ‘Goldbach’ and associated words. Neil understood and gave it a go.
After much play of green light over the sheet, Neil sighed and announced:
“I could decode just this early theorem.”
Neil copied it out in his hand and I read:
Theorem 1 Suppose that almost all even numbers are sums of two distinct primes. Then almost every prime is the middle of an arithmetic progression of length three.
Proof: Let be given. We can clearly assume that all even numbers larger than are the sums of two different primes. Let be a prime and let where and both and are primes. Take
Now , , and form a progression of length three. But each is a prime:
and
So this leans on the slight strengthening of Goldbach where writing as is disallowed. After every even number tested has been written as the sum of two different primes. An equivalent form is whether every number is the average of two distinct primes. Dick’s conclusion is just the restriction where itself is prime.
Is this open? It still is. Neil and I scoured the writing but we could glean nothing more. Even to prove the existence of infinitely many length-3 progressions of primes had been difficult in the 1930s. What else was there about Goldbach, or did Dick’s sheet move on? I am looking forward to asking Dick next week when he will be up to having visitors.
Neil looked at his watch and gave a start.
“Begorrah—I must be off. Yea though the Irish laddies missed the basketball this year, still I must take care of malarkey me fellows might wreak… To boot, the ladies start tomorrow.”
I had time just to ask one more question: “Neil, if you were in this position and wanted to make sure your ideas for a big theorem were put down—even if not sure which side is true—which one would you choose?” Neil replied:
“With the totality of my uttered words here I have told ye. They have held throughout two attributes each of which literally makes it follow.”
And with a green flash he was gone. I turned on the TV and saw instantly that he was too late: “UMBC 74, Virginia 54” flashed on the screen.
Can you process Neil’s speech to find his answer?
What chance might there be of proving that every prime is an average of two other primes, short of proving the full Goldbach Conjecture?
I am sure all our readers will wish Dick a safe and speedy recovery. We were working on this on Thursday before his operation, including the math.
Update (3/18): Dick continues mending in the ICU. As for Neil L., he either failed to contain leprechaun excesses or joined them. This is what a graph of leprechaun involvement looks like.
Cropped from “Knuth at Brown” video source |
Donald Knuth’s 80th=0x50th birthday was on January 10. In the array of his birthdays, numbering from zero so that stands for his birth day in 1938, that was indeed . However, as the 81st entry in the array it might have to be called his 81st birthday. Oh well.
Today we salute his 80th year—wait, it really is his 81st year—and wish him many more.
Our little riff on “off-by-one” issues is not an idle matter. Don’s epochal multi-volume monograph The Art of Computer Programming (TAOCP) set standards for presenting as well as designing algorithms and programs. He nodded to community agreement on the benefit of “numbering from zero” but began his chapter on lists and arrays by numbering from 1 before using either convention. The xkcd cartoon “163: Donald Knuth” projects onto him the opinion,
“Different tasks call for different conventions.”
In his textbook Concrete Mathematics with Ron Graham and Oren Patashnik, in a passage indexed as “Zero not considered harmful,” they say:
People are often tempted to write instead of because the terms for , , and in this sum are zero. … But such temptations should be resisted; efficiency of computation is not the same as efficiency of understanding! …[S]ums can be manipulated much more easily when the bounds are simple. … Zero-valued terms cause no harm, and they often save a lot of trouble.
Emphasizing zero values rather than the index , they continued with what they termed “a radical departure from tradition”:
Kenneth Iverson introduced a wonderful idea in his programming language APL … to enclose a true-or-false statement in brackets, and to say that the result is if the statement is true, if the statement is false … This makes it easy to manipulate the index of summation, because we don’t have to fuss with boundary conditions.”
This was almost 25 years ago but Don’s advice is still a step ahead today.
I (Ken) used to think that only chess held marquee events up near the Arctic Circle: the 1972 match between Bobby Fischer and Boris Spassy in Reykjavík, Iceland; the 2014 Chess Olympiad in Tromsø, Norway. The January 8–10 workshop and celebration for Don’s 80th birthday was organized in Piteå, Sweden, which is just north of 65° latitude.
“Organized” is the operative word. As Don says in the opening seconds of a video with the science editor of Sweden’s premier newspaper Dagens Nyheter:
It took such a perfect match. I don’t believe that any other … anywhere else in the world would come anywhere near being right…
We have elided one of Don’s words, and we’ll keep you in suspense about it, but for a hint it’s the kind of suspension. Which is one of the more difficult things I’ve ever had to do in plain TeX, because WordPress does not recognize fancy add-ons to LaTeX (nor even the \TeX or \LaTeX macros).
TeX, of course, was Don’s free gift to the world. It sprang not only from his desire to make mathematical typesetting freely available—and to demonstrate how to code large-scale useful software—but also from the value of pliable visual beauty. It was duly featured at the workshop in a talk by by Yannis Haralambous titled “TeX as a Path.” Here are the other talks in the order they were given—most have slides on the talks page:
Sunset was observed during the lunch break—this was the Arctic Circle in early January, after all.
There followed the talk on TeX by Haralambous, a tribute by Don’s son John Knuth, and a brief introduction by Jan Overduin to the signature event of the day, which took place after the cutting and serving of the birthday cake.
The word apocalypse comes from Greek apo- “away” and kalupsis “covering”—that is, a revelation. As the original Greek name for the Book of Revelation it acquired its connotations of catastrophe and final destruction. What we now consider to be apocalyptic writing goes back at least to the fall and Babylonian exile of Israel and Judah. But the primary element of revealing hope sets Revelation apart—except that its germ is in the last chapter of the book of Daniel.
As Björner highlighted in his talk, in a January 1981 interview that was published a year later in the The Two-Year College Mathematics Journal, a caption opened by noting him as “an accomplished organist and composer” and went on to quote him:
“I want to write some music for organ with computer help. If I live long enough, I would like to write a rather long work that would be based on the book of Revelation. The musical themes would correspond to the symbolism in the book of Revelation.”
He got it going in early 2011. So January 10 saw the world premiere of his Fantasia Apocalyptica on the Orgel Acusticum, whose official page begins by saying, “This is an instrument for the 21st century.” It was built in 2012 by Gerald Woehl. A year ago, Don visited it in Piteå and wrote an incredibly detailed exegesis, including the organ’s software features and controls.
You can hear parts of it in an introductory video by the Canadian organist Jan Overduin, who performed it in Piteå and will do so again on November 4 at his home First United Church in Waterloo. This video is atop Don’s own page which includes his full score in manuscript and typeset forms.
Both Overduin and Don describe how the music closely follows both the message and the numerical contents of the Book of Revelation. Don described the process as “Constraint-based Composition” in a lecture of that title in May 2015 at Stanford. He draws analogy to constrained writing as practiced by the Oulipo group of mainly-French writers. A very loose example is how every post on this blog is constrained to follow some “GLL invariants.”
However, what I (Ken) think of as the highest example of constrained writing is translation. Insofar as they give scope for the translator’s own creativity, they are constrained by faithfulness to the source text. Creative choices come because meaning and imagery and emotion require different mixings in different languages and media. The Fantasia Apocalyptica is organized into one movement for each chapter of Revelation, and each movement follows the verses as Overduin expounds. It is thus a musical translation at perhaps a finer grain than many tone poems that have been based on literary works.
With “fine-grained” as well as “constraints” we have circled around to computer-science concepts again. If there is one over-arching point we see Don conveying, it is that such integration of informatics with arts and language and real-life appreciation should be natural.
We wish Don many more birthdays to come, no matter how they are numbered.
St. Andrews history source |
William Burnside was a well-known researcher into the early theory of finite groups.
Today Ken and I thought we would talk about one of his most-cited results—a result that is really due to others.
This happens all the time in mathematics. In this case Burnside wrote an important book on finite group theory and included a lemma that is called various things. It is sometimes named for Augustin-Louis Cauchy or Ferdinand Georg Frobenius, or called, “the Lemma that is not Burnside’s.” The lemma was incorrectly attributed to Burnside because he proved it in his 1897 book Theory of Groups of Finite Order. His title for the last two sections of his eighth chapter was the statement of the lemma:
Number of symbols left unchanged by all the substitutions of a group is the product of the order of the group and the number of the sets in which the symbols are interchanged transitively.
After his proof in section 118, he began section 119 by writing:
119. The formula just obtained is the first of a series of similar formulae, due to Herr Frobenius,* which are capable of many useful applications.
The * citation was to a one-page paper by Frobenius in Crelle’s Journal in 1887. The formula itself appeared in an 1845 paper by Cauchy. In his 1911 second edition, Burnside stated it in section 145 (page 191) as “Theorem VII” where he adjoined a twin statement about summing the squares of the numbers of fixed elements.
The book has no instance of the word “lemma,” which the first edition used only in section 77. There is no nearby mention of Cauchy or Frobenius and neither the 1887 nor the 1845 paper is cited anywhere. Instead, Burnside’s next mention of Frobenius comes on page 269 astride five new chapters on representation theory that were his great thrust in the expanded edition. He used a big * footnote stretching across two pages to cite multiple works by Frobenius en banc:
* The theory of the representation of a group of finite order as a group of linear substitutions was largely, and the allied theory of group-characteristics was entirely, originated by Prof. Frobenius. …
That Burnside gave a tandem statement supports the position that he felt it was all well-known research, but perhaps what the online Encyclopedia of Mathematics calls the “mysterious dropping” of Frobenius owed as much to his sweep toward this grand encomium.
Burnside is more properly known for actual beautiful results such as the important theorem, which shows that every finite group whose order has only two prime divisors is solvable. The original proof relied heavily on representation theory, and it took many years to get a proof that avoided needing this machinery—see this for the theorem and a short historical note.
Then there is Burnside’s conjecture, which lasted for six-plus decades before being refuted in 1964, but which still has viable forms and impacts as we covered before.
But what of the Lemma? The problem was less Burnside’s lack of citation and more the habit of others citing it from Burnside. So we propose calling it the Lemma Cited From Burnside (LCFB). The initials can also stand for “Lemma of Cauchy-Frobenius per Burnside.”
An action of a group on a set is a mapping from to permutations of that multiplicative: . That is, the action is a subgroup of the symmetric group of that is a homomorphic image of , but the action retains information about which homomorphism was used.
An action induces—and is equivalently definable as—a function
with the property that for all and , . The orbit of under the action is the set of values over all .
This is all made more visual if we write for . Especially when the action is faithful, meaning is 1-to-1, we can picture the elements of as mapping -es directly. Then the orbit is . Some questions to ask are:
The latter two questions may seem unrelated. But the LCFB says that their answers are the same:
Lemma 1 Let act on the set . Then
The key idea of the proof is that orbits are balanced. For any , define its stabilizer by and note that this forms a subgroup of . The orbit-stabilizer theorem states:
Theorem 2 Let act on the set and let . Then
It follows right away that for all in the orbit , ; call this . The proof of Lemma 1 comes quickly too:
which is what we needed.
We have used a “Theorem” to prove a “Lemma.” Is the “Theorem” obvious? We note substantial discussion about that in a StackExchange post and links from it to a lengthy post by Tim Gowers. We will try our own explanation because it speaks right to aspects of the cycle structure of permutations that we are trying to quantify in new ways.
List out the orbit of as . For each , pick an element of such that . And list out as . We claim that
If they are equal as elements of then , but by the choice of and this forces . And . Thus the elements are all different.
Now consider any . There is a such that . Now put . Then
Therefore belongs to ,so it equals for some . Thus , so we have written . Thus the elements run through exactly once. It follows further that for each there are exactly elements such that , namely those for different . It follows that , which proves the theorem.
Note that we seem to have avoided appealing to the concepts of order (of an element or group) or quotient that go into statements of Lagrange’s theorem. Of course they are present, but we used as a screen to hide them. We didn’t even have to exhibit a 1-1 correspondence between and . It all simply flows from the axiom of inverses and multiplicativeness of the action which in particular gave us . There is no intermediate notion of how a group can act on a set—it must have perfect balance in orbits.
Here is an example that also shows the difference between the physical and algebraic nature of an action. Consider regular -gons whose edges are colored one of colors. Equivalently, we can consider them to have triangular facets colored the same front and back. The physical actions are rotating the polygon right by degrees and flipping it over. These generate the dihedral group . The number of orbits of the action of on the set of tile colorings tells us how many tile types there are.
The LCFB tells us to count the colorings fixed by each group element. Consider and , that is, black and red colorings of a square. This gives permutations and colorings. We count as follows:
We get fixed items, hence the LCFB gives orbits.
Two different representations of a flip element in . |
The bottom part of our figure shows how we could have oriented the square like a diamond before doing the mirror-image flip. Now, however, the coloring shown is not fixed, and only 4 not 8 colorings are preserved. Is this an inconsistency? Although the flip is physically the same, it is algebraically different. Represented as a permutation in cycle form, the original flip was , whereas the latter flip is . The latter flip is equal to following the former flip with the rotation, which indeed fixed 4 colorings under the former representation of the action of .
This suggests that the cycle structure of the permutations used in the action is what matters. George Pólya observed this to give a generating-function form, which Nicolaas de Bruijn refined and extended, and which allows for weighting the elements. In our example, for any , it gives a polynomial that can be used to compute the number of -gon tile types for any number of colors, without repeating the inspection of fixed elements. The polynomial involves variables raised to the power of the number of cycles of length in a given permutation. Each permutation gives a monomial and those are summed up and divided by the size of the group. In the case of the polynomial is
If we simply substitute each by then we get the number of orbits; for instance, square tiles with 3 colors give 21 orbits. But with colors we can substitute by and then the resulting gives more information. Assigning other weights besides to achieves other counting tasks. This article by Nick Baxter serves to introduce a detailed survey by Mollee Huisinga titled “Pólya’s Counting Theory,” which is great for further applications including classifying molecular structures.
We are interested in cases involving succinctly-described permutations of exponential-sized sets . That the general formula for involves the Euler totient function not only of but also divisors of hints at one source of such cases. In our setting, whether fixes an element is decidable in polynomial time, but whether and belong to different orbits—and other predicates about orbits—may not be. Thus the LCFB is instrumental to getting the orbit-counting problem and other functions into .
What applications of the Lemma Cited From Burnside have been noted in complexity theory? We can ask this more generally about applications of the combinatorial double-counting trick, where one way of doing the counting looks hopeless but the other way at least gives a function or difference of functions.
Toutfait.com source |
Marcel Duchamp was a leading French chess player whose career was sandwiched between two forays into the art world. He played for the French national team in five chess Olympiads from 1924 to 1933. He finished tied for fourth place out of fourteen players in the 1932 French championship.
Today we look afresh at some of his coups in art and chess and find some unexpected depth.
We say “unexpected” because Duchamp was famous for art that consisted of common objects tweaked or thrown together. He called them readymades in English while writing in French. An example is his 1917 Fountain, which Dick and Kathryn saw in Philadelphia last weekend:
SFMOMA replica, Wikimedia Commons source |
Duchamp first submitted this anonymously to a New York exhibition he was helping to organize. When it was refused, he resigned in protest. He then ascribed it to a person named Richard Mutt. Richard means “rich person” in French while “R. Mutt” suggests Armut which is German for “poverty.” A magazine defended the work by saying that whether Mr. Mutt made the fountain—which came from the J.L. Mott Iron Works—has no importance:
He CHOSE it… [and thus] created a new thought for that object.
The new thought led in 2004 to a poll of 500 art professionals voting Fountain “the most influential artwork of the 20th century.” This was ahead of Guernica, Les Demoiselles d’Avignon, The Persistence of Memory, The Dance, Spiral Jetty, just to name a few works of greater creation effort. It was ahead of everything by Alexander Calder or Andy Warhol or math favorite Maurits Escher for that matter. For Duchamp that was quite a coup—which is also French for a move at chess. Fountain is also a kind of coupe—French for “cup” and also snippet or section.
Michelangelo Buonarroti famously declared that “every block of stone has a figure inside it and the sculptor’s task is to discover it.” Dick and I feel most of our peers would disagree with this about sculpture but agree with the same remark applied to mathematics. As Platonists we believe our theorems had proofs in “the book” from the start and that our chipping away at problems is what discovers them.
The paradox is that we nevertheless experience theoretical research as being as creative as Michelangelo’s artwork or Duchamp’s original painting, Nude Descending a Staircase. What accounts for this? We have previously alluded to “builders” versus “solvers” and the imperative of creating good definitions. Builders still need to sense where and when proofs are likely to be available.
Finding a new proof idea is reckoned as the height of creativity despite the idea’s prior existence. This is however rare. Most of us do not invent or re-invent wheels but rather ride wheels we’ve mastered. They may be wheels we learned in school, as pre-fab as Duchamp’s 1913 Bicycle Wheel. The creativity may come from learning to deploy the wheels in a new context:
David Gómez (c) “Duchamp a Hamburger Bahnhof” source, license (photo unchanged) |
Neither of the works pictured above is the original version. The originals were lost, as were second versions made by Duchamp. Duchamp’s third Bicycle Wheel belongs to MoMA, which is adjacent to Dick and Kathryn’s apartment in Manhattan. The Philadelphia Art Museum has the earliest surviving replica of Fountain, certified by Duchamp as dating to 1950. Duchamp blessed fourteen other replicas in the 1960s.
For contrast, Duchamp spent nine years making the original artwork shown behind the board in this crop of an iconic photo:
Cropped from Vanity Fair source |
The nine-foot high construction between glass panes lives in Philadelphia under its English name The Bride Stripped Bare By Her Bachelors, Even. The “even” translates the French même, which is different from mème meaning “meme.” Richard Dawkins coined the latter term in his 1976 book the Selfish Gene. Modern Internet “memes” diverge from Dawkins’s meaning but amplify his book’s emphasis on replication. Both the isolation of concepts and the replication were anticipated by Duchamp.
A wonderful Matrix Barcelona story has the uncropped photo of Duchamp playing the bared Eve Babitz. It also has a film segment with Duchamp and Man Ray which shows how they viewed the world. Duchamp could paint “retinally” as at left below, but this page explains how his vision of the scene shifted a year later. Then his poster for the 1925 French championship abstracts chess itself:
Composite of sources collected here |
Duchamp’s high-level chess activity stopped before the Second World War broke, but he kept up his interest during it. In 1944-45 he helped organize an exhibition The Visual Imagery of Chess at the Julien Levy gallery in midtown Manhattan. One evening featured a blindfold simultaneous exhibition by the Belgian-American master George Koltanowski:
Composite of src1, src2 |
This composite photo with leafy allusions shows left-to-right Levy (standing), artist Frederick Kiesler, Duchamp executing a move called out by Koltanowski (facing away), art historian Alfred Barr, Bauhaus artist Xanti Schawinsky, composer Vittorio Rieti, and the married artists Dorothea Tanning and Max Ernst. Plus someone evidently looking for a new game. My “assisted readymade” skirts the edge of fair use (non-commercial) with modification. The works I combined each have higher creation cost than the objects Duchamp used. Yet there’s no restraint on combining other people’s theorems—of whatever creation cost—with attribution.
Koltanowski kept the seven games entirely in his head, winning six and drawing one, and performed similar feats well into his eighties. Yet Duchamp once beat him in a major tournament—in only 15 moves—when Koltanowski was in his prime and looking at the board.
So how strong was Duchamp? It is hard to tell because Arpad Elo did not create the Elo rating system until the 1950s and because the records of many of Duchamp’s games are lost. Twenty of Duchamp’s tournaments are listed in the omnibus Chessbase compilation but only four have all his games, only four more include as many as five games, and most including the 1923 Belgian Cup lack even the results of his other games. For these eight events, my chess model assesses Duchamp’s “Intrinsic Performance Ratings” (IPRs) as follows:
The error bars are big but the readings are consistent enough to conclude that Duchamp reached the 2000–2100 range but fell short of today’s 2200 Master rank.
My IPRs for historical players have been criticized as too low because today’s players benefit from greater knowledge of the opening, middlegame dynamics, and endgames. My model does not compensate for this—it credits moves that go from brain to board not caring whether preparing with computers at home put them in the brain. However, a comparison with Koltanowski is particularly apt because Elo himself estimated Koltanowski at 2450 based on his play in 1932–1937. My IPR from every available Koltanowski game in those years is 2485 +- 95. When limited to the major—and complete—tournaments that would have most informed Elo’s estimate, it is 2520 +- 100. The latter does much to suggest that Koltanowski might have merited back then the grandmaster title, which he was awarded honoris causa in 1988. Koltanowski had 2380 +- 165 in the year 1929, including his loss to Duchamp.
Still, Duchamp’s multiple IPR readings over 2000 earn him the rank of expert, which few attain. Duchamp gave himself a different title in 1952:
I am still a victim of chess. It has all the beauty of art—and much more.
Duchamp loved the endgame but is only known to have composed one problem. Fitting for Valentine’s Day, he embellished it with a hand-drawn cupid:
Composite of diagrams from Arena and Toutfait.com source |
Yet unrequited love may be the theme, for there is no solution. Analysis by human masters has long determined the game to be drawn with the confidence of a human mathematical proof. All the critical action can be conveyed in one sequence of moves: 1. Rg7+ Kf2 2. Ke4 h4 3. Kd5 h3 4. Kc6 h2 5. Rh7 Kg2 6. Kc7 Rg8 (or 6…Rf8 or …Re8 or even …Rxb7+ if followed by 7. Kxb7 f5!) 7. b8Q Rxb8 8. Kxb8 h1Q (or 8…f5 first) 9. Rxh1 Kxh1 10. Kc7 f5 11. b6 f4 12. b7 f3 13. b8Q f2 14. Qb1+ Kg2 15. Qe4+ Kg1 16. Qg4+ Kh2 17. Qf3 Kg1 18. Qg3+ Kh1! when 19. Qxf2 is stalemate and no more progress can be made.
The cupid and signature were on one side of the program sheet for an art exhibition titled “Through the Big End of the Opera Glass.” The other side had the board diagram, caption, and mirror-image words saying, “Look through from other side against light.” The upshot arrow was meant as a hint to shoot White’s pawns forward. But by making a mirror image of the position instead, I have found the second of two surprising effects.
The first surprise is that when it comes to verifying the draw with today’s chess programs—which are far stronger than any human player—Duchamp’s position splits them wildly. This is without equipping the programs with endgame tables—just their basic search algorithms.
The Houdini 6 program, which has just begun defending its TCEC championship against a field led by past champions Komodo and Stockfish, takes only 10 seconds on my office machine to reach a “drawn” verdict that it never revises. Here is a coupe of its analysis approaching depth 40 ply, meaning a nominal basic search 20 moves ahead. That’s enough to see the final stalemate, so Houdini instead tries to box Black in, but by move 4 we can already see Black’s king squirting out. Its latent threat to White’s pawns knocks White’s advantage down to 0.27 of a pawn, which is almost nada:
Komodo stays with the critical line but churns up hours of thinking time while keeping an over-optimistic value for it:
The just-released version 9 of Stockfish, however, gyrates past depth 30, seemingly settles down like Houdini, but then suddenly goes—and stays—bananas:
When Komodo is given only 32MB hash, it gyrates even more wildly until seeming to settle on a +3.25 or +3.26 value at depths 31–34. Then at depth 35 it balloons up to +6.99 and swoons. After 24 hours it is still on depth 35 and has emitted only two checked-down values of +6.12 and +4.00 at about the 8 and 16 hour points.
Now for the second surprise. The mirror-image position at right above changes absolutely none of the chess logic. But when we input it to Stockfish 9, with 32MB hash, it gives a serene computation:
What’s going on? The upshot is that the mirrored position’s different squares use a different set of keys in a tabulation hashing scheme. They give a different pattern of hash collisions and hence a different computation.
There are two more mirror positions with Black and White interchanged. One is serene (from depth 20 on) but the other blows up at depths 36–40. This is for Stockfish 9 with 32MB hash, “contempt” set to 0, and default settings otherwise. With 512MB hash, both blow up. Since both Stockfish 9 and the Arena chess GUI used to take the data are freely downloadable, anyone can reproduce the above and do more experiments.
There is potential high importance because the large-scale behavior of the hash collisions and search may be sensitive to whether the nearly-50,000 bits making up the hash keys are truly random or pseudorandom. I detailed this and a reproducible “digital butterfly effect” in a post some years ago.
Thus unexpected things happen to computers at high depth in Duchamp’s position. It is not in the Chessbase database, but he may have gotten it “readymade” from playing a game or analyzing one. In all cases we can credit the astuteness of his choosing it.
What will be Duchamp’s legacy in the 21st Century? Chess players will keep it growing. Buenos Aires (where he traveled to study chess in 1919), Rio de Janeiro, and Montevideo have organized tournaments in his honor. It was my pleasure to monitor the 2018 Copa Marcel Duchamp which finished last week in Montevideo. This involved getting files of the games from arbiter Sabrina de San Vicente, analyzing them using spare capacity on UB’s Center for Computational Research, and generating ready-made statistical reports for her and the tournament staff to view.
[a few slight fixes and tweaks; added note about contempt=0]
Adam Engst is the publisher of the site TidBITS. This is a site dedicated to technical insights about all aspects of Apple machines: from nano to servers.
Today I want to talk about mathematical tidbits not Apple devices, but look at the above site for information about Apple stuff of all kinds.
By the way:
“tasty tidbits” is a small and particularly interesting item of gossip or information. Origin mid 17th century (as tyd bit, tid-bit ): from dialect tid «tender» (of unknown origin).
TidBITS has been running since 1990, when Apple made only a certain kind of bulky device that went by the name “personal computer.” The Internet did not exist yet, so the site was distributed via a platform called HyperCard, which combined publishing, presentation, and database functions.
A Math Result: An Interesting Pattern
Another Math Result: A Bad Joke
If you write out to two decimal places, backwards it spells “pie”:
The New York Times: Downgrades The Size of The Human Genome
In a recent article in the Times it was reported that a certain sequence effort found that a bacteria had a large genome. It was actually so large that it was a thousand times larger than our human genome. The article then added that the human genome is about 3.5 million bases. Wrong. It is actually about 3.5 billion bases. I reported this typo to the Times by the way.
Senator Jack Reed: On CNN Today
I happen to tune in CNN’s broadcast of the Senate hearing on security today. There Reed asked security experts whether or not the US has a coherent plan for quantum computing. They basically said that they could not say much in an open hearing. I do find it remarkable that quantum computing is mentioned in a Senate hearing.
Gifted: The Movie
This is a movie about a young child, Mary, who is a math savant, and whether or not she should be placed in a special school. I recently watched the movie with my wife Kathryn Farley. We enjoyed the movie although it had a pretty simple plot. Indeed see this:
Colin Covert of the Star Tribune gave the film 3/4 stars, saying, “Sure, it’s a simple, straightforward film, but sometimes that’s all you need as long as its heart is true.” Richard Roeper gave the film 4 out of 4 stars and said, “Gifted isn’t the best or most sophisticated or most original film of the year so far—but it just might be my favorite.”
Mary’s mother had been a promising mathematician, who worked on the famous Navier-Stokes problem, before taking her own life when Mary was six months old. Indeed the plot of the movie hinges on the mother actually solving the problem. Of course, in the movie we get no hint how she may have done this. One of the key parts of the movie concerns when Mary is being examined by a math professor—this is to determine if she really is gifted. Mary is asked to evaluate the definite integral that is written on a chalk board in a lecture hall. The integral is:
She cannot figure it out and leaves the exam without answering the question. But finally she explains to her grandmother that it is a trick question. The above integral of course is equal to infinity. She comes back to the lecture hall and explains that she was taught to never correct adults. Now she corrects the problem with a minus sign and solves it:
I must point out that when the problem came up I said to my wife that the first integral was incorrect. I thought perhaps the movie had an error in it. But no it was a trick question—very neat. No doubt this was based on advice from the math advisors—who include Terence Tao.
People have said that a mathematician is someone who thinks that
is obvious. Well, if you multiply two copies of the equation, one with “” in place of “,” you get an integral with in the exponent on the left-hand side. Using converts it to polar coordinates, and then you can intuit why what you get equals . I wonder if there is a similar test for complexity theorists? Any suggestions?
Peter Montgomery is a cryptographer at Microsoft. Just recently, Joppe Bos and Arjen Lenstra have edited a book titled Topics in Computational Number Theory Inspired by Peter L. Montgomery. The chapters range to Montgomery’s work on elliptic curves, factoring, evaluating polynomials, computing null-spaces of matrices over finite fields, and FFT extensions for algebraic groups. Bos and Lenstra say in their intro that most of this was “inspired by integer factorization, Peter’s main research interest since high school.” Factoring, always factoring…
Today we discuss one of Montgomery’s simpler but most influential ideas going back to a 1985 paper: how to compute mod when you can only do operations mod .
We’ve previously thought in terms of the “make” and “break” sides of cryptography. Montgomery has certainly done a lot of work on the latter. He has even said to have demonstrated in the 1980s that the Data Encryption Standard as adopted in 1977 was vulnerable to attacks on even on IBM PCs.
However, we realize that the “make” side has three parts, two of which the “breakers” often help directly. The first part is the design of new cryptosystems. The only help from breakers is knowing what not to design. The second part however is their implementation. Any mathematical ideas used to make attacks faster might also help speed methods that are safer but were previously clunky. The third part is defense, specifically against timing attacks. Speedier execution is often smoother execution and hence less vulnerable to inference by timing.
The simple function conceived by Montgomery solves a problem whose most common case is as follows:
Compute modulo an odd number using only addition, multiplication, and operations modulo a power of , that is, modulo . Pre-computed quantities such as are also allowed.
Of course this is used for implementations of the RSA public-key cryptosystem. With binary notation, mod means ignoring all but the bottom bits and division by is just a right shift by bits. Montgomery multiplication has also been used to implement Peter Shor’s famous quantum factoring algorithm here and here. I and my student Chaowen Guan, who contributed some of the text and equations below, are using it to emulate Shor’s algorithm in a model where we translate quantum circuits into Boolean formulas in order to apply heuristic solvers. We have a paper with Amlan Chakrabarti in the Springer LNCS journal Transactions on Computational Systems, which became the destination for this refinement of my old “cookbook” draft with Amlan which I described some years ago.
The basic trick is to compute the following function. Given and relatively prime, we can find integers such that
Then define the Montgomery reduction of any natural number by
We have omitted the last line of the routine which Montgomery called “REDC” in his paper. This last line takes the output of as we have defined it and tests ; upon failure it outputs rather than . We, however, want to avoid even this dependence on . From several papers on avoiding this test, all surveyed in the new book's chapter on “Montgomery Arithmetic From a Software Perspective” by Bos and Montgomery himself, we follow this 2009 paper by Qiong Pu and Xiuying Zhao and section 2.2 of this earlier paper by Colin Walter and Susan Thompson (note the latter’s attention to timing attacks).
First we note that the value is always a natural number. Let where means integer division as in Python. Then . We get:
So is an integer, and by (1) it is positive except that . Hence we can freely compose and iterate . What kind of function is it?
By our last calculation, embodies integer division by both in the implicit use of and the explicit product with . It does not, however, compute . Its actions on numbers decomposed via and via go as follows:
Thus increases by one with only when subtracting from crosses a multiple of . In that sense imitates division by . Moreover, when is relatively prime to then so is because , so if shared a proper divisor with then so would . Thus for such , is relatively prime to and in particular cannot be a multiple of .
It follows with that and . Now presume and let . Then
Thus with , has the same congruence modulo as , while with , .
Most in particular, for any , if we define and , then
Thus defining
gives a commutative operator that when restricted to acts like multiplication. Moreover, if we have and then with and we have:
Thus the multiplicative behavior of carries through mod on multiplying by modulo . The operator is Montgomery multiplication proper. The idea is that one can do whole iterated products with one division by and one reduction mod per and have only the single reduction mod at the end.
But again we want to avoid even that one use. But there are cases where fails. The largest ratio for and with is given for and by . In search of a general result, we are helped by the following inequalities:
The last line follows because prevents . Finally note that if then
To use this iteratively, one wants the output to be and if so return instead. The comparison with is simple to do.
Suppose we want to multiply modulo . We may suppose that each already satisfies . The problem with transforming each to , besides the extra multiplication for each , is going to magnitudes as high as . Transforming to would defeat our purpose because while we can efficiently pre-compute giving , we cannot suppose the reductions mod for the to be pre-computed.
So we plunge ahead and do:
First, how much can this grow? Put . Since can be greater than , the precondition for is not met. But then , so that we preserve the bound .
Note that when is a power of 2 it may need to approach , but that does not affect the upper bound. The issue now is that includes extra factors of , that is, . Hence . If we can compute a number such that and , then a final will give the final targeted value such that .
Thus, modulo , we have reduced iterated multiplication to exponentiation. The extra multiplications—or rather, more applications of —beats computing or for each .
When it comes to implementing fast modular exponentiation, we need to square a running product in the sense of computing or something related. To maintain a constant bound given , we need and so that:
so
Presupposing , we can substitute to require
Ignoring lower-order terms gives us , so . The discriminant is so we need , which yields to realize the same upper bound. Since is a power of 2, may need to be chosen nearly to use this guarantee. In hardware contexts this is annoying since may thus need one more machine word than (or else must be restricted to the possible size), but when working with individual bits—or qubits—this extra overhead from squaring is no big deal.
It remains to show how to imitate standard algorithms for fast exponentiation. Here is one, using (*) to mean . Recall that and can be pre-computed.
fastexp(a,x): montyfastexp(a,x): # needs R > 4N A = a A = a (*) r_2 # cong. to ar X = x X = x Y = 1 Y = r while X > 0: while X > 0: if X is even: if X is even: A = (A * A) A = A (*) A X = X // 2 X = X // 2 else: else: Y = (A * Y) Y = A (*) Y X = X - 1 X = X - 1 return Y return Y (*) 1
By induction, A and Y always have an “extra ” until the return strips it away. They always have magnitude at most , so that the final returns a value . This value equals .
An example of why requiring only can fail is , , , : but the right-hand code outputs , which has the right congruence but is not ; there is none with as in the aforementioned papers, or returning as the last line of our code, make it work with smaller .
Our little “scholium” has turned up some interesting congruence properties. Does the above suggest any further ideas and savings?
[removed erroneous “Lemma 1”]
Source from previous paper |
Cody Murray is a PhD student of Ryan Williams at MIT. He and Ryan have a new paper that greatly improves Ryan’s separation of nonuniform circuits from uniform nondeterministic time classes. The previous best separation was from , that is, nondeterministic time . The new one is from , which is nondeterministic time . The ‘Q’ here means “quasi-polynomial,” not “quantum.”
Today we discuss the new ideas that led to this breakthrough on a previous breakthrough.
Since I live in Buffalo and my hometown Buffalo Bills are named for the frontiersman and showman William Cody, I can’t help making Wild West associations. The complexity frontier has now existed for over 60 years, a fourth of United States history, longer than the time from the Civil War to Arizona becoming the last contiguous state in 1912. So much is still unknown about it that any new avenue of progress commands attention.
The main idea calls to mind the classic movie The Searchers. In the movie the two principals are searching for an abducted small child. Here we are searching for a witness string to an -type predicate that is small in the following sense: it is the truth table of a small circuit . In the first applications using , the length of was exponential in the length of the input and the size of , while polynomial in , was (poly-)logarithmic in . In the new paper, all of the auxiliary size functions are related by operations under which polynomials are closed.
The small nature of the witness depends on strong hypotheses that the proof is ultimately trying to contradict. Ryan’s previous results used the powerful hypothesis . The present result starts by supposing . What we wish to emphasize first are the clever alterations to previous techniques that enable sweeping over the vast expanses of strings in a new way to reveal a new unconditional lower bound.
Let be a complexity function. We will think of as polynomial or quasi-polynomial but the reasoning works clear up to . With in mind, say that a language is “easy at length '' if there exists an -input circuit of size such that for all , . Say is “hard at '' otherwise. Consider the following conditions:
When encodes a natural problem like , we might expect its hardness not to fluctuate with , so that these conditions have equal force. When (like ) is downward self-reducible, easiness at lengths just below implies easiness at . The easiness does slip to in place of , where is the number of queries in the self-reduction, but this still supports the intuition that easiness and hardness should be evenly spread.
To get a language that meet the third criterion, we can loop over Boolean functions with inputs until we find that has no circuit of size . Then define:
Almost-everywhere hardness may fail technically if, say, the encoding of as uses no odd-length strings, so that is trivially easy at odd . We could fill in the gap by using as the encoding, but this is ad hoc and might not work for other kinds of gaps. We could instead craft notions of being hard on a polynomially dense set , strengthening condition 2 by making easy to decide and denser. Against this backdrop the new “a.a.e.” condition used in the paper is short and neat:
Definition 1 is almost-almost-everywhere hard if there is a polynomial such that for all , either is hard at or is hard at .
We may relax to be a more general function, such as a polynomial composed with other bounds. We assume all bounds in question are increasing functions that are time-constructible or space-constructible as required.
Besides the diagonal language , the paper uses a special -complete set credited to Rahul Santhanam drawing on work by Luca Trevisan and Salil Vadhan. is downward self-reducible, paddable in that for all , and autoreducible in the following sense: there is a probabilistic polynomial-time oracle TM that on inputs makes only queries of length , always accepts when and is the oracle, and when rejects with probability at least 2/3 given any oracle. (The last clause prevents simply having query .)
The new lower bound really comes from a new kind of upper bound using “Merlin-Arthur with advice.” A predicate related to a language has the Merlin-Arthur property if (informally speaking):
One reason this double-barreled quantification enters the scene is that we still don’t know how to separate from , but we do know that the exponential-time Merlin-Arthur class is not in . The new tweak involves adding a quantifier but will allow dropping the time. It comes from exempting a small piece of from the second (“soundness”) condition, where depends only on the length of :
Definition 2 belongs to if there is a predicate decidable in time such that for all there is a string of length such that for all :
The point is that the condition for is allowed to fail for other strings . The string will in fact either be from the a.a.e. definition or will give the maximum length at which the language mentioned above has circuits of some size depending only on (so exists). Now we can state the lower bound, importing what we’ve said about and previously and just now:
Theorem 3 There are constants such that for all as above, and any auxiliary functions such that , , and , we can construct a language that is a.a.e.-hard.
The proof blends the constructions of and mentioned in the previous section. In particular, it uses the manner in which reduces to under appropriate padding. The reduction maps any string of length to a padded string of length in time. Note that the oracle TM that comes with obeys a kind of condition. We have all the pieces we need to carry out the diagonalization needed for a.a.e.-hardness while obtaining a Merlin-Arthur protocol with advice that works as follows on any input , :
First note that we saved up to time by not computing . The latter case uses the magnitude not the length of in the padding. The proof analyzes the cases.
In the former case, because of how is defined with padding and , there is a circuit of size that decides at length , so Merlin can guess it. In the latter case, Merlin guesses for the length- slice of directly. The property of ensures that for all and the appropriate , either there is a leading Arthur to accept always or all make Arthur reject with probability at least 2/3, so the set of giving the former case defines a language in with the stated time and advice bounds.
The a.a.e. hardness follows from how the protocol either yields the length- slice of or implicitly maps the length- slice of it under the reduction to . In fact, the argument is more delicate and starts by negating both sides of the “a.a.e.” hardness definition for sake of contradiction. For that we refer to the paper.
Let denote the “easy-witness” class of languages such that for all witness predicates for that are decidable in time , and all , there is a circuit of size whose graph is a string such that holds. It doesn’t much matter whether we restrict witnesses to have length a power of 2 or let the graph be for some . Let denote the class of languages with (nonuniform) circuits of size .
Theorem 4 There are universal constants and such that for every and time function , where :
The two triple compositions of (which is called in the paper) foreshadow the proof being a three-day ride. The proof again works by contradiction, and it helps that the negation of the “easy-witness” condition is concrete and helpful: it gives a verifier and an such that there are of length at most giving but none with small circuit complexity. In fact, we get this for infinitely many . The proof finally applies a theorem of Chris Umans that constructs a polynomial-time pseudorandom generator such that whenever has circuit complexity not less than and , all circuits of size give:
where is the output length of in terms of and the length of . This is where the constant comes in, while can be taken as divided by the exponent in the running time of . The generator is used to de-randomize the protocol. This yields a nondeterministic TM whose running time violates an application of the nondeterministic time hierarchy theorem, producing the desired contradiction.
The horses driving the final results come from various families of circuits, which we may suppose obey some simple closure properties. The nub is the speed with which a nondeterministic TMs can distinguish -input circuits in the following sense:
Note that distinguishing the all-zero and quarter-dense cases is easy to do randomly in basically time, which converts to deterministic time under certain de-randomizing hypotheses. We only need to achieve this nondeterministically with a little advantage over the brute-force time (which cycles through assignments). The main theorem works for any such that :
Theorem 5
- If -input -circuits of size can be nondeterministically distinguished in time, then there is a such that for all , does not have size- circuits in .
- If -input -circuits of size can be nondeterministically distinguished in time, then for all there is a such that does not have size- circuits in .
The proof applies the easy-witness theorem to a particular verifier constructed by Eli Ben-Sasson and Emanuele Viola, and its easy witnesses lead the charge. By distinguishing the adversaries’ horse colors they lift their cover of darkness and drive them to a final contradiction shootout in the gulch of the nondeterministic time hierarchy theorem. In terms of the first statement’s target time happens to be , which is polynomial in , while the second statement’s time in terms of is , which is quasi-polynomial in .
Note that the second statement pulls a fast one: the order of quantifying and is switched. The gun it draws, however, was already in plain sight from earlier work by Ryan. Let denote circuits plus one layer of threshold gates at the inputs:
Theorem 6 For all there is an such that given circuits of modulus , depth , and size , whether they compute the all-zero function can be decided deterministically in time .
The sheriff holding this gun rides all the circuits out of the highlands of . And if a gunslinger NTM can be found to enforce the first clause in Theorem 5, a trail may open up for showing .
What other consequences follow from this march into new lower-bound territory? Already these movies are serials.
[fixed a subscript, m-vs-n in condition 2, and last sentence before “Open Problems”; fixed LaTeX in algorithm case; fixed Theorem 6.]
As moderator of RSA 2016 panel |
Paul Kocher is the lead author on the second of two papers detailing a longstanding class of security vulnerability that was recognized only recently. He is an author on the first paper. Both papers credit his CRYPTO 1996 paper as originating the broad kind of attack that exploits the vulnerability. That paper was titled, “Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems.” Last week, the world learned that timing attacks can jeopardize entire computing systems, smartphones, the Cloud, everything.
Today we give a high-level view of the Meltdown and Spectre security flaws described in these papers.
Both flaws are at processor level. They are ingrained in the way modern computers operate. They are not the kind of software vulnerabilities that we have discussed several times before. Both allow attackers to read any memory location that can be mapped to the kernel—which on most computers allows targeting any desired memory contents. Meltdown can be prevented by software patches—at least as we know it—but apparently no magic bullet can take out Spectre.
Kocher was mentioned toward the end of a 2009 post by Dick titled, “Adaptive Sampling and Timed Adversaries.” This post covered Dick’s 1993 paper with Jeff Naughton on using timing to uncover hash functions. Trolling for hash collisions and measuring the slight delays needed to resolve them required randomized techniques and statistical analysis to extract the information. No such subtlety is needed for Meltdown and Spectre—only the ability to measure time in bulky units.
The attacks work because modern processors figuratively allow cars—or trolleys—to run red lights. An unauthorized memory access will raise an exception but subsequent commands will already have been executed “on spec.” Or if all the avenue lights are green but the car needs to turn at some point, they will still zoom it ahead at full speed—and weigh the saving of not pausing to check each cross-street versus the minimal backtrack needed to find a destination that is usually somewhere downtown.
Such speculative execution leverages extra processing capacity relative to other components to boost overall system speed. The gain in time from having jumped ahead outweighs the loss from computing discarded continuations. The idea of “spec” is easiest to picture when the code has an if-else branch. The two branches usually have unequal expected frequencies: the lesser one may close a long-running loop that the other continues, or may represent failure of an array-bounds test that generally succeeds. So the processor applies the scientific principle of induction to jump always onto the fatter branch, backtracking when (rarely) needed.
Meltdown applies to the red-light situation, Spectre to branches. Incidentally, this is why the ghost in the logo for Spectre is holding a branch:
The logos were designed by Natascha Eibl of Graz, Austria, whose artistic website is here. Four authors of both papers are on the faculty of the Graz University of Technology, which hosts the website for the attacks. The Graz team are mostly responsible for the fix to Meltdown called KPTI for “kernel page-table isolation,” but the Spectre attack is different in ways that make it inapplicable.
There have been articles like this decrying the spectre of a meltdown of the whole chip industry. We’ll hold off on speculating about impending executions and stay with describing how the attacks work.
The Meltdown paper gives details properly in machine code, but we always try to be different, so we’ve expressed its main example in higher-level C code to convey how an attacker can really pull this off.
To retrieve the byte value at a targeted location in the kernel’s virtual memory map , the attacker can create a fresh array of objects whose width is known to be the page size of the cache. The contents of don’t matter, only that initially no member has been read from, hence not cached. The attacker then submits the following code using a process fork or try-catch block:
object Q; //loaded into chip memory byte b = 0; while (b == 0) { b = K[x]; //violates privilege---so raises an exception } Q = A[b]; //should not be executed but usually is //continue process after subprocess dies or exception is caught: int T[256]; for (int i = 0; i < 256; i++) { T[i] = the time to find A[i]; } if T has a clear minimum T[i] output i, else output 0.
What happens? Let’s first suppose . By “spec” the while-loop exits and the read from happens before the exception kills the child. This read generally causes the contents of to be read into the cache and causes the system to note this fact about the index . This note survives when the second part of the code is (legally) executed and causes the measured time to be markedly low only when , because only that page is cached. Thus the value of is manifested in the attacker’s code, which can merrily continue executing to get more values.
The reason for the special handling of is that a “race condition” exists whose outcome can zero out or leave its initial value untouched. The while-loop keeps trying until the race is won. If the secret value really is zero then the loop will either raise the exception or iterate until a segmentation fault occurs. The latter causes Q = A[0] not to be executed, but then the initial condition that no page of is cached still holds, so no time is markedly lower, so is returned after all.
A second key point is that Intel processors allow the speculatively executed code the privilege needed to read . Again, the Meltdown paper has all the details expressed properly, including how cache memory is manipulated and how to measure each without dislodging from its cache. The objects need not be as big as the page size —they only need to be spaced apart in . This and some other tunings enabled the Meltdown paper’s experiment to read over 500,000 supposedly-protected bytes per second.
The Spectre attack combines “spec” with the old bugaboo of buffer overflows. Enforcing array bounds is not only for program correctness but also for securing boundaries between processes. The attacker uses an array with size bound and an auxiliary array of size . The attacker needs to discover some facts about the victim’s code and arrange that addresses for overflowing will map into the victim’s targeted memory. The first idea is to induce enough accesses with valid to train the branch predictor to presume compliance, so that it will execute in advance the body of the critical line of code:
if (x < s) { y = A[256*B[x]]; }
To create the delay during which the body will execute speculatively, the attacker next thrashes the cache so that will likely be evicted from it. Finally, the attacker chooses some and re-enters the critical line. The bounds check causes a cache miss so that the "spec" runs. Not only will it deliver the targeted byte , it will cache it at the spaced-out location (the page size is not involved). Then the value of is recovered much as with Meltdown.
Spectre is more difficult to exploit but what makes it scarier is that now the out-of-bounds access is not treated as guarded by a higher privilege level: it involves the attacker’s own array . Even if the attacker has limited access to , there is a way: Call the critical line with random valid a few hundred times. There is a good chance that a valid value that equals will be found. Since is in the cache, the line y = A[256*B[x']] will execute faster than the other cases. Detecting the faster access again leaks .
The paper concludes with variants that allow similar timing attacks to operate under even weaker conditions. One does not even need to manipulate the cache beforehand. The last one is titled:
Leveraging arbitrary observable effects.
That is, let stand for “something using that is observable when speculatively executed.” Then the following lines of code are all it takes to compromise the victim’s data:
if (x < s) { y = A[256*B[x]]; O(y); }
We’ve talked about timing attacks, but can we possibly devise a concept of timing defenses? On first reflections the answer is no. Ideas of scrambling data in memory space improve security in many cases, but scrambling computations in time seems self-defeating. Changing reported timings randomly and slightly in the manner of differential privacy is useless because the timing difference of the cached item is huge. Computers need to assure fine-grained truth in timing anyway.
Besides timing, there are physical effects of power draw and patterns of electric charge and even acoustics in chips that have been exploited. Is there any way the defenders can keep ahead of the attackers? Can the issues only be fixed by a whole new computing model?
[some word and format changes, qualified remark on software nature in intro and linked more related posts]
Muhammad Afzal Upal is Chair of the Computing and Information Science Department at Mercyhurst University. He works in machine learning and cognitive science, most specifically making inferences from textual data. In joint papers he has refined a quantitative approach to the idea of postdiction originally suggested by Walter Kintsch in the 1980s.
Today we review some postdictions from 2017 and wish everyone a Happy New Year 2018.
In a 2007 paper, the postdictability of a concept is defined as “the ease with which a concept’s inclusion in the text can be justified after the textual unit containing that concept has been read.” This contrasts with “the ease with which the occurrence of the concept can be predicted prior to the concept having been read.” The main equation defines the extent to which the concept—or event, I may add—is memorable by
where is the prior likelihood of emerging and is a constant. It says that the concept is most memorable if you couldn’t have predicted it but after you see it you slap your forehead and say, “Ah, of course!” It relates to what makes ideas “stick.”
Mercyhurst is in Erie, Pennsylvania. Erie had lots of snow this past week. Record–breaking snow. More than Buffalo usually gets. We had several relatives and friends who had to drive through it on their way to Michigan and Pittsburgh and points further south. Was that karma? coincidence with this post? memorable in a way that fits the framework?
And how about the Buffalo Bills making the playoffs after a miracle touchdown by the Cincinnati Bengals on fourth-and-12 from midfield in the final minute knocked out the Baltimore Ravens? In designing a two-week unit on data science for my department’s new team-taught Freshman Seminar on “The Internet,” I had the foresight to use years-since-a-team’s-last-playoff-win (not last playoff appearance) as the definition of “playoff drought” in activity examples. Hence—unless the Bills upset the Jacksonville Jaguars next Sunday—the local “nudge” of my materials will work equally well for next fall’s offering. Can one quantify my prescience as prediction? postdiction? Let’s consider some more-germane examples.
Last January we did not do a predictions or year-in-review post as we had done in all seven previous years. We were caught up in questions over László Babai’s graph isomorphism theorem and other matters. Several predictions were recurring, so let’s suppose we made them also for 2017:
Since some of our perennial questions have entered a steady state, it is time to find new categories. A week-plus ago we noticed that Altmetric publishes a top-100 list based on their “Altmetric Attention Score” every November 15. So it is natural to suppose we postulated:
The answer with regard to the 2017 list is “yes” but the reason is unfortunate—it is Norbert Blum’s incorrect paper coming in at #38. Blum gave a formal retraction and subsequent explanation, which we added as updates to our own item on the claim. The only (other) paper labeled “Information and Computer Sciences” is the AlphaGo Zero paper at #74. Actually, Blum’s paper was tagged “Research and Reproducibility.”
AlphaGo Zero and most recently AlphaZero spring to mind. With much swallowing of pride from my having started out as a chess-player in the early 1970s when computers were minimal, I’m not sure that games of perfect information should ultimately be regarded as “human-centric.” Based on my current understanding of the AlphaZero paper and comments by Gwern Branwen in our previous post, what strikes me as the most stunning fact is the following:
Chess can be encoded into a ‘kernel’ of well under 1GB such that + small search comprehensively outperforms an almost 1,000x larger search.
More on the human-centric side, however—and allowing supervision—the most surprise and attention seems to have gone to the 2017 Stanford study adapting off-the-shelf facial-analysis software to distinguish sexual orientation from photographs with accuracy upwards of 90%, compared to human subjects at 52% from a balanced sample, which is barely better than random. For utility we would nominate LipNet, which achieves over 95% accuracy in lip-reading from video data, but the paper dates to December 2016.
The lip-reading success may be the more predictable. The extent to which it and the “gaydar”application are postdictable appears to be the same as our reaching a community understanding of what deep neural nets are capable of—which does not require being able to explain how they work. Setting up grounds beforehand for the justification by which Upal and his co-authors define postdiction might be a fair way of “giving credit for a postdiction.”
Per Lance Fortnow in his own 2017 review, the complexity result of the year is split between two papers claiming to prove full dichotomy for nonuniform CSPs—where dichotomy means that they are either in P or NP-complete. Meanwhile we have devoted numerous posts to Jin-Yi Cai’s work on dichotomy between P and #P-completeness, including recently. So can we get some credit for prediction? or for postdiction? Anyway, we make it a prime prediction for 2018 that there will be notable further progress in this line.
We specify quantum supremacy to mean mean building a physical device that achieves a useful algorithmic task that cannot be performed in equal time by classical devices using commensurate resources. The words “useful” and “commensurate” are subjective but the former rules out stating the task as “simulating natural quantum systems” and furthers John Preskill’s emphasis in his original 2012 supremacy paper that the quantum device must be controllable. The latter rules out using whole server farms to match what a refrigerator-sized quantum device can do. The notion involves concrete rather than asymptotic complexity, so we are not positing anything about the hardness of factoring, and intensional tasks like Simon’s Problem don’t count—not to mention our doubts on the fairness of its classical-quantum comparison. Aram Harrow and Ashley Montanaro said more about supremacy requirements in this paper.
Our “postdiction” gets a yes for 2017 from the claims in this Google-led paper that 50-qubit devices would suffice to achieve supremacy and are nearly at hand, versus this IBM-led rebuttal showing that classical computers can emulate a representative set of 50-qubit computations. The notion of emulation allows polling for state information of the quantum circuit computation being emulated, so this is not even confronting the question of solving the task by other means—or proving that classical resources of some concrete size cannot solve all length- cases of the task at all. Recent views on the controversy are expressed in this November 14 column in Nature, which links this October 22 post by Scott Aaronson (see also his paper with Lijie Chen, “Complexity-Theoretic Foundations of Quantum Supremacy Experiments”) and this December 4 paper by Cristian and Elena Calude which evokes the Google-IBM case.
That is, the notions of algorithm and protocol will fuse into greater structures with multiple objectives besides solving a task. In 2016 we noted Noam Nisan’s 2012 Gödel Prize-winning paper with Amir Ronen titled “Algorithm Mechanism Design.” Noam’s 2016 Knuth Prize citation stated, “A mechanism is an algorithm or protocol that is explicitly designed so that rational participants, motivated purely by their self-interest, will achieve the designer’s goals.” In November we covered mechanisms for algorithmic fairness. There is a nicely accessible survey titled “Algorithms versus mechanisms: how to cope with strategic input?” by Rad Niazadeh who works in this area. It alloys techniques from many veins of theory and has a practical gold mine of applications. What we are watching for is the emergence of single powerful new ideas from this pursuit.
We see some sign of this in the dichotomy-for-CSPs result, but we have thoughts that we will talk more about in a later post.
What concepts do you think will have the highest in 2018?
[some word and spacing changes]
YouTube 2015 lecture source |
David Silver is the lead author on the paper, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” which was posted twelve days go on the ArXiv. It announces an algorithm called AlphaZero that, given the rules of any two-player game of strategy and copious hardware, trains a deep neural network to play the game at skill levels approaching perfection.
Today I review what is known about AlphaZero and discuss how to compare it with known instances of perfect play.
AlphaZero is a generalization of AlphaGo Zero, which was announced last October 18 on the Google DeepMind website under the heading “Learning From Scratch.” A paper in Nature, with Silver as lead author, followed the next day. Unlike the original AlphaGo, whose victory over the human champion Lee Sedol we covered, AlphaGo Zero had no input other than the rules of Go and some symmetry properties of the board. From round-the-clock self-play it soon acquired as tutor the world’s best player—itself.
The achievements in Go and Shogi—the Japanese game whose higher depth in relation to Western chess we discussed three years ago—strike us as greater than AlphaZero’s score of 28 wins, 72 draws, and no losses against the champion Stockfish chess program. One measure of greatness comes from the difference in Elo ratings between the machine and the best human players. AlphaGo Zero’s measured rating of 5185 is over 1,500 points higher than the best human players on the scale used in Go. In Shogi, the paper shows AlphaZero zooming toward 4500 whereas the top human rating shown here as of 11/26/17 is 2703, again a difference over 1,500. In chess, however, as shown in the graphs atop page 4 of the paper, AlphaZero stays under 3500, which is less than 700 ahead of human players.
Although AlphaZero’s 64-36 margin over Stockfish looks like a shellacking, it amounts to only 100 points difference on the Elo scale. The scale was built around the idea that a 200-point difference corresponds to about 75% expectation for the stronger player—and this applies to all games. Higher gains become multiplicatively harder to achieve and maintain. This makes the huge margins in Go and Shogi all the more remarkable.
There has been widespread criticism of the way Stockfish was configured for the match. Stockfish was given 1 minute per move regardless of whether it was an obvious recapture or a critical moment. It played without its customary opening book or endgame tables of perfect play with 6 or fewer pieces. The 64 core threads it was given were ample hardware but they communicated position evaluations via a hash table of only one gigabyte, a lack said to harm the accuracy of deeper searches. However hobbled, what stands out is that Stockfish still drew almost three-fourths of the games, including exactly half the games it played Black.
I have fielded numerous queries these past two weeks about how this affects my estimate that perfect play in chess is rated no higher than 3500 or 3600, which many others consider low. Although the “rating of God” moniker is played up for popular attention, it really is a vital component in my model: it is the -intercept of regressions of player skill versus model parameters and inputs. I’ve justified it intuitively by postulating that slightly randomized versions of today’s champion programs could score at least 10–15% against any strategy. I regard the ratings used for the TCEC championships as truer to the human scale than the CCRL ratings. TCEC currently rates the latest Stockfish version at 3226, then 3224 for Komodo and 3192 for the Houdini version that won the just-completed 10th TCEC championship. CCRL shows all of Houdini, Komodo, and an assembly-coded version of Stockfish above 3400. Using the TCEC ratings and the official Elo “p-table” implies that drawing 2 or 3 of every 20 games holds the stronger player to the 3500–3600 range.
Of course, the difference from Go or Shogi owes to the prevalence of draws in chess. One ramification of a post I made a year ago is that the difference is not merely felt at the high end of skill. The linear regressions of Elo versus player error shown there are so sharp () that the -intercept is already well determined by the games of weaker players alone.
Overall, I don’t know how the AlphaZero paper affects my estimates. The Dec. 5 paper is sketchy and only 10 of the 100 games against Stockfish have been released, all hand-picked wins. I share some general scientific caveats voiced by AI researcher and chess master Jose Camacho-Collados. I agree that two moves by AlphaZero (21. Bg5!! and 30.Bxg6!! followed by 32.f5!! as featured here) were ethereal. There are, however, several other possible ways to tell how close AlphaZero comes to perfection.
One experiment is simply to give AlphaZero an old-fashioned examination on test positions for which the perfect answers are known. These could even be generated in a controlled fashion from chess endgames with 7 or fewer pieces on the board, for which perfect play was tabulated by Victor Zakharov and Vladimir Makhnichev using the Lomonosov supercomputer of Moscow State University. Truth in those tables is often incredibly deep—in some positions the win takes over 500 moves, many of which no current chess program (not equipped with the tables) let alone human player would find. Or one can set checkmate-in- problems that have stumped programs to varying degrees. The question is:
With what frequency can the trained neural network plus Monte Carlo Tree Search (MCTS) from the given position find the full truth in the position?
The trained neural network supplies original probabilities for each move in any given position . AlphaZero plays games against itself using those probabilities and samples the results. It then adjusts parameters to enhance the probabilities of moves having the highest expectation in the sample, in a continuous and recursive manner for positions encountered in the search from . The guiding principle can be simply stated as:
“Nothing succeeds like success.”
We must pause to reflect on how clarifying it is that this single heuristic suffices to master complex games—games that also represent a concrete face of asymptotic complexity insofar as their size n-by-n generalizations are polynomial-space hard. A famous couplet by Piet Hein goes:
Problems worthy of attack
prove their worth by hitting back.
It may be that we can heuristically solve some NP-type problems better by infusing an adversary—to make a PSPACE-type problem that hits back—and running AlphaZero.
As seekers of truth, however, we want to know how AlphaZero will serve as a guide to perfection. We can regard a statement of the form, “White can win in 15 moves” (that is, 29 moves counting both players) as a theorem for which we seek a proof. We can regard the standard alpha-beta search backbone as one proof principle and MCTS as another. Which ‘logic’ is more powerful and reliable in practice?
A second way to test perfection is to take strategy games that are just small enough to solve entirely, yet large enough that stand-alone programs cannot play perfectly on-the-fly. One candidate I offer is a game playable with chess pawns or checkers on a board with 5 rows and columns, where perhaps can be set to achieve the small-enough/large-enough balance. I conceived this 35 years ago at Oxford when seemed right for computers of the day. The starting position is:
Each player’s pawns move one square forward or may “hop” over an opposing piece straight forward or diagonally forward. If some hop move is legal then the player must make a hop move. The hopped-over piece remains on the board. If a pawn reaches the last row it becomes a king and thereupon moves or hops backwards. No piece is ever captured.
The goal is to make your opponent run out of legal moves. If a king reaches the player’s first row it can no longer move. This implies a fixed horizon on a player’s count of moves. The trickiest rules involve double-hops: If a single hop and double hop are available then the double hop is not mandatory, but if a pawn on the first row begins a double hop it must complete it. Upon becoming a king after a hop, however, making a subsequent return hop is optional, except that a king that makes the first leg of a returning double hop must make the second leg. A final optional rule is to allow a king to move one cell back diagonally as well as straight.
From the starting position, White can force Black to hop four times in four moves by moving to a3, a4, d3, and d4. Then White still has the initiative and can either make a king or force another hop; the latter, however, forces White to hop diagonally in return. This seems like a powerful beginning but the subsequent Black initiative also looked strong. My playing partners at Oxford and I found that positional considerations—making the opponent’s pieces block themselves—mattered as much as the move-count allowance. This made it challenging and fun enough for casual human play, but we knew that computers should make quick work of it.
The point of using this or some other solved game would be to compare the strategy produced by the AlphaZero procedure against perfection—and against programs that use traditional search methods.
What do you think are the significances of the AlphaZero breakthrough?