Cropped from source |
Thales of Miletus may—or may not—have accurately predicted one or more total solar eclipses in the years 585 through 581 BCE.
Today we discuss the nature of science viewed from mathematics and computing. A serious claim of by Norbert Blum has shot in front of what we were planning to say about next Monday’s total solar eclipse in the US.
Predicting eclipses is often hailed as an awakening of scientific method, one using mathematics both to infer solar and lunar cycles and for geometrical analysis. The aspects of science that we want to talk about are not “The Scientific Method” as commonly expounded in step-by-step fashion but rather the nature of scientific knowledge and human pursuits of it. We start with an observation drawn from a recent article in the Washington Post.
Despite several thousand years of experience predicting eclipses and our possession of GPS devices able to determine locations to an accuracy of several feet, we still cannot predict the zone of totality any closer than a mile.
The reason is not any fault on Earth but with the Sun: it bellows chaotically and for all we know a swell may nip its surface yea-far above the lunar disk at any time. Keeping open even a sliver of the nuclear furnace changes the character of the experience.
The Post’s article does a public service of telling people living on the edge of the swath not to think it is a sharp OFF/ON like a Boolean circuit gate. People must not always expect total sharpness from science. Happily there is a second point: you don’t have to drive very far to get a generous dose of totality. This is simply because as you move from the edge of a circle toward the center, the left-to-right breadth of the interior grows initially very quickly. This is our metaphor for how science becomes thick and solid quickly after we transit the time of being on its edge.
Incidentally, your GLL folks will be in the state of New York next week, nowhere near the swath. Next time in Buffalo. Also incidentally, Thales is said to be the first person credited with discovering mathematical theorems, namely that a triangle made by a circle’s diameter and another point on the circle is a right triangle and that lengths of certain intersecting line segments are proportional.
The transit time is our focus on this blog: the experience of doing research amid inspirations and traps and tricks and gleams and uncertainty. Swaths of our community are experiencing another transit right now.
Norbert Blum has claimed a proof that P is not equal to NP. In his pedigree is holding the record for a concrete general Boolean circuit lower bound over the full binary basis for over 30 years—until it was recently nudged from his to His paper passes many filters of seriousness, including his saying how his proof surmounts known barriers. Ken and I want to know what we all want to know: is the proof correct?
More generally, even if the proof is flawed, does it contain new ideas that may be useful in the future? Blum’s proof claims a very strong lower bound of on the circuit complexity of whether a graph of edges has a clique of size . He gets a lower bound of for another function, where the tilde means up to factors of in the exponent. We would be excited if he had even proved that this function has a super-linear Boolean complexity.
Blum’s insight is that the approximation methods used in monotone complexity on the clique function can be generalized to non-monotone complexity. It is launched by technical improvements to these methods in a 1999 paper by Christer Berg and Staffan Ulfberg. This is the very high level of what he tries to do, and is the one thing that we wish to comment on.
Looking quickly at the 38 page argument an issue arose in our minds. We thought we would share this issue. It is not a flaw, it is an issue that we think needs to be thought about more expressly.
As we understand his proof it takes a boolean circuit for some monotone function and places it in some topological order. Let this be
So far nothing unreasonable. Note is equal to , of course. Then it seems that he uses an induction on the steps of the computation. Let be the information that he gathers from the first steps. Technically tells us something about the computation so far. The punch line is then that tells us something impossible about which is of course . Wonderful. This implies the claimed lower bound on which solves the question.
The trouble with this is the following—we studied this before and it is called the “bait and switch” problem. Let be some random function of polynomial Boolean complexity and let . Then assume that there is a polynomial size circuit for . Clearly there is one for and too. Create a circuit that mixes the computing of and in some random order. Let the last step of the circuit be take and and form , Note this computes .
The key point is this:
No step of the computation along the way has anything obvious to do with . Only at the very last step does appear.
This means intuitively to us that an inductive argument that tries to compute information gate by gate is in trouble. How can the ‘s that the proof compute have any information about during the induction? This is not a “flaw” but it does seem to be a serious issue.
If nothing else we need to understand how the information suddenly at the end unravels and reveals information about . I think this issue is troubling—at least to us. It is important to note that this trick cannot seem to be applied to purely monotone computations, since the last step must be non-monotone—it must compute the function. The old post also notes a relation between the standard circuit complexity and the monotone complexity of a related function .
While we are grappling with the paper and writing these thoughts we are following an ongoing discussion on StackExchange and in comments to a post by Luca Trevisan, a post by John Baez, and a Hacker News thread, among several other places.
The paper has a relatively short “crunch” in its sections 5 and 6, pages 25–35. These follow a long section 4 describing and honing Berg and Uffberg’s work. What the latter did was show that a kind of circuit approximation obatined via small DNF formulas in Alexander Razborov’s famous lower–bound papers (see also these notes by Tim Gowers) can also be obtained with small CNF formulas. What strikes us is that Blum’s main theorem is literally a meta-theorem referencing this process:
Theorem 6: Let be any monotone Boolean function. Assume that there is a CNF-DNF-approximator which can be used to prove a lower bound for . Then can also be used to prove the same lower bound for .
The nub being discussed now is whether this theorem is “self-defeating” by its own generality. There may be cases of that meet the hypotheses but have polynomial . The StackExchange thread is discussing this for functions of Boolean strings denoting -node -edge graphs that give value whenever the graph is a -clique (with no other edges) and when it is a complete -partite graph. Some such functions related to the theta function of László Lovász (see also “Theorem 1” in this post for context) have polynomial complexity, meet the conditions of Razborov’s method, and don’t appear to obstruct Berg and Uffberg’s construction as used by Blum. But if they go through there, and if Blum’s further constructions using an inductively defined function would go through transparently, then there must be an error.
The details of in section 5 have also been called into question. We are unsure what to say about a claim by Gustav Nordh that carrying out the inductive construction as written yields a false conclusion that the monomial is an implicant of a formula equivalent to . There are also comments about unclarity of neighboring definitions, including this from Shachar Lovett in Luca’s blog since we drafted this section.
But this leads us to a larger point. Both of us are involved right now with painstaking constructions involving quantum circuits and products of permutations that we are programming (in Python). Pages 27–28 of Blum’s paper give a construction that can be programmed. If this is done enough to crank out some examples, then we may verify that potential flaws crop up or alternatively bolster confidence in junctures of the proof so as to focus on others first. This ability is one way we are now empowered to sharpen “fuzzy edges” of our science.
Is the proof correct? Or will it fall into eclipse? We will see shortly no doubt. Comparing this table of eclipses since 2003 and Gerhard Woeginger’s page of claimed proofs over mostly the same time period, we are struck that ‘‘ and ‘‘ claims have been about twice as frequent as lunar and solar eclipses, respectively.
[restored missing links; a few word and format changes]
Composite of src1, src2. |
Olivier Bournez and Amaury Pouly have proved an interesting theorem about modeling physical systems. They presented their paper at ICALP 2017 last month in Warsaw.
Today Ken and I wish to explain their theorem and its possible connections to complexity theory.
Of course as theorists we are most interested in discrete systems and rarely if ever mess with differential equations. I do recall, with some awe, that when I started my career at Yale the numerical analysts were experts at ODEs: that’s ordinary differential equations for the rest of us. An ODE is an equation that involves functions of one independent variable and its derivatives. A famous one is Isaac Newton’s second law of motion:
They used their ability to guess solutions to discrete recurrence systems. I do not believe there is an exact meta-theorem connecting discrete recurrences with ODEs but at the heuristic level they were able to just look at a recurrence and say I believe the solution is And they were usually right.
Let’s get back to what Bournez and Pouly (BP) have proved.
Let’s first state what BP proved and then discuss their result. Their result is an extension of a 1981 theorem of Lee Rubel, which Bournez and Pouly call “an astonishing fact.” Rubel proved there is a differential equation that is “universal” in the sense expressed by his theorem statement:
Theorem 1 There exists a fourth order differential algebraic equation (DAE)
where is a polynomial in four variables and integer coefficients such that for any continuous function and any there is a smooth solution to (*) such that for all real ,
Rubel actually proved more: the theorem allows to be any continuous positive function: so the error between the solution and can decay off as tends to plus or minus infinity. He also exhibited specific polynomials . If we rename the differentials to variables , respectively, then Rubel’s simplest one is:
Boulez and Pouly note that simpler ones were found by others, including two whole families by Richard Duffin and one in a neat paper by Keith Briggs where one can take any :
Note that all three have the same “differential monomials” and approach each other as one chooses higher, as is most obvious on multiplying by .
The theorem may be astonishing but according to BP is also a bit disappointing. They say:
As we said, Rubel’s proof can be seen as an indication that (fourth-order) polynomial implicit DAE is too loose a model compared to classical ODEs, allowing in particular to glue solutions together to get new solutions. As observed in many articles citing Rubel’s paper, this class appears so general that from an experimental point of view, it makes little sense to try to fit a differential model because a single equation can model everything with arbitrary precision.
They cite two more deficiencies of all these results:
First, … the proofs heavily rely on the fact that constructed DAE does not have unique solutions for a given initial data. … Rubel’s DAE never has a unique solution, even with a countable number of [initial] conditions of the form . Second, the proofs usually rely on solutions that are piecewise defined. Hence they cannot be analytic, while analycity is often a key expected property in experimental sciences.
This leads into what BP proved (their emphasis):
Theorem 2 There exists a fixed polynomial in variables, for some , so that for any continuous and any there exist so that there is a unique analytic function so that
- The function is a solution in the sense that
- For all real ,
.
As with Rubel’s result they actually prove the stronger form where can be any continuous positive function so that the error between the solution and can decay off just like before.
BP’s theorem differs from Rubel’s in that their solution is unique and analytic. This has several ramifications. For one it makes the proof harder, since gluing functions together as Rubel did fails for analytic function. The proof uses some programming tricks instead. One ingredient is that the function
can be modulated for any irrational to bring both and arbitrarily close to multiples of without the denominator vanishing. That is, they can make the fraction grow fast while retaining analyticity. Émile Borel once conjectured that solutions to -variable ODEs that are defined on all of must have growth bounded by a stack of exponentials. This conjecture was refuted even for , and BP were able to apply the refutation.
Second, their theorem cuts down the extreme multiplicity of solutions all the way to one. In real life we expect that the equation given the initial conditions uniquely forces the total behavior of the solution. As they quote, Rubel’s paper had even given as open “whether we can require in our theorem that the solution that approximates to be the unique solution for its initial data.”
Their result can be viewed both as a positive result and a negative one. On the one hand they show that ODEs are very powerful and can model almost any reasonable computation. You can think of this as showing the power of analog computation. On the other hand, they show that models based on ODEs may likewise be too powerful. They too may be so powerful that they are not really useful.
One weakness is that their polynomial is not so simple. They outline how their construction can be effectivized to give somewhere north of 300. Possibly it can be tightened much further. They point out an analogy with the kind of universality shown for the Riemann zeta function by Sergei Voronin, citing one of our two posts on it for reference. The zeta function is analytic but cannot be a solution to any polynomial ODE (nor DAE). We can try to motivate the task of improving and by noting how a minimum universal ODE is a natural and fixed version of a minimum universal program.
What are some further implications of the new result for modeling nature? Is there some way we can stratify differentially simulatable systems according to some (concrete) measure of their computational complexity?
]]>
Including debt to Marina Ratner, 1938-2017
By joint permission of Assad Binakhahi, artist, and Radio Farda (source) |
Maryam Mirzakhani won the Fields Medal in 2014. We and the whole community are grieving after losing her to breast cancer two weeks ago. She made several breakthroughs in the geometric understanding of dynamical systems. Who knows what other great results she would have found if she had lived: we will never know. Besides her research she also was the first woman and the first Iranian to win the Fields Medal.
Today we wish to express both our great sorrow and our appreciation of her work.
An article in 2014 by Jordan Ellenberg called her win a “cultural change in mathematics” not for her gender or nationality but for her field of dynamics. He called it “an infant compared to the other major branches of math.” Now dynamics has been studied since long before Isaac Newton, and we’ve covered the three-body problem among other topics. What he means is that abstraction away from physics was needed to boost mathematical tools of analysis and that this gained thrust only in the second half of the 1900s.
We can put it this way: Dynamics has always been a moving target. The work that Mirzakhani furthered gives it a fixed frame. Whole ensembles of possible motions can be represented by parameters to form a space—one like a manifold but with a quotient structure. This space becomes a single geometric object by which to analyze the dynamics. We can give a facile analogy to how Boolean circuits are often considered easier to analyze than Turing machines because they are fixed whereas Turing machines move. But there is a greater potential conduit to problems in complexity theory: both her work and the attack on P vs. NP by Ketan Mulmuley and co-workers involve orbits and their closures.
Perhaps the best example of a dynamical system to play with is the familiar executive toy of metal balls on strings. Usually there are five identical balls as at left below, but let’s say a junior executive might start with just two as shown in the middle.
Composite of various sources plus extra drawing. |
Now let’s transport the company to Edward Abbott’s Flatland. Junior executives there have two balls that go back and forth along a line inside a confined area. We don’t know how gravity would work in Flatland—at least not classical gravity—but the edges of the line segment would propel a ball colliding with them back toward the center. Of course we assume all collisions are perfectly elastic, meaning in particular that they conserve momentum. Admittedly contrary to the illustrations, we also assume the “balls” are really point particles of vanishing radius.
We can now trade a ball for a dimension. We can represent configurations of the balls by points where is the displacement of the left ball from the left end and is the distance between the balls. These points form a triangle as shown, with left-right remaining the directions of the first ball and up-down corresponding to left-right for the second ball. The combined directions and velocities of the two balls become one direction and velocity of the blue ball shown in the triangle. The two balls collide—remember we made their radii infinitesimal—when the blue ball is on the hypotenuse.
The neat fact is that the dynamics of the two balls in 1D are faithfully represented by the Newtonian behavior of the one ball in the triangle. Collisions with the sides or with each other, at any velocities, become angle-preserving collisions with the sides. A proof may be found here (first pages) along with a representation of three constrained particles on a circle. The only thing we need to avoid is if the two balls hit the left and right sides simultaneously or hit each other against a side. That corresponds to the blue ball hitting a corner, a singular event we are entitled to ignore. Abracadabra, our executive is now gaming at billiards on a triangular table.
The last trick is the niftiest and works with any triangle—and more generally with polygons. We can reflect the triangle along one of its edges as diagrammed in a survey by Mirzakhani’s Stanford colleague Alex Wright which is a major source for our post:
The billiard trajectory becomes a straight line into the reflected copy. Obviously it would be nicer if we could analyze straight lines—that is, geodesics—in a larger space. When and how we can make the space may recall the tiling problems of our previous post but the rules are different. We need not tile in the plane but can use surfaces of arbitrary genus and metrics that allow angles greater than around a conical point. This is where the special mathematical framework and tools for the work by Mirzakhani we are discussing enter in.
We’ve exemplified that billiards can represent some other kinds of dynamical systems. Of course, billiards—even with just one ball—is interesting in itself. We can play it on tables shaped like other polygons besides triangles, or not polygons at all. Here are some questions we would like to answer:
Some of these questions are challenging even for triangles. Every acute triangle has a closed loop that visits the three bases of the three altitudes, but it is not known whether every obtuse triangle has a closed loop at all. On a convex billiard table the answer to question 3 is immediately yes, but what about non-convex tables? If the edges are mirrors and is a candle, we are asking whether is illuminated—and how much if any of the surface remains in shadow. Although Wikipedia traces the question only to Ernst Straus in the early 1950s, I wonder if Newton thought of it during his work on multiple-prism arrays in his great treatise Opticks. This book by Serge Tabachnikov has more.
The questions become more attackable if we assume that every interior angle of the polygon is a rational multiple . Then is called a rational polygon. There are only finitely many ways that can be iteratively reflected around one of its edges and the changes in orientation form a finite group that is dihedral. This is easy to visualize if the copies of tile the plane in the sense of the last post. Group theory and topology and abstract spaces extend our horizon because they can be used on polygons that don’t simply tile and allow us to apply the straight-line reflection trick.
A “clump” of non-overlapping polygons in the plane generates a translation surface if:
For example, we can take a square and pair the opposite edges. Identifying them creates a torus. Besides the familiar 3D donut shape of a torus, we can picture it in 2D via how squares tile the plane. If we take a single octagon and identify the four pairs of opposite sides then all eight vertices become identified as shown below at left. We get a translation surface with angles summing to at one vertex. This time octagons cannot tile the plane but we can still picture the space with algebraic help.
Two clumps generate the same space if one can be converted to the other by the operations of translating a polygon, bisecting a polygon along a diagonal, or doing the inverse of the latter to legally glue two polygons together. This equivalence relation is said to be difficult here but is evidently polynomial-time decidable.
We may also ignore interior edges; thus the reflections of the right triangle having smallest angle —shown at right in our figure—are considered to yield the octagonal translation surface. Indeed, every translation surface can be presented by a single polygon (see section 12 of this) but not necessarily one that is convex.
Rotations and deformations of the polygons, however—shown in the middle of the figure—yield different spaces. We can describe those and other processes by groups acting on their coordinates. In the real plane there are two coordinates so we are talking about the general linear group of matrices with real entries and its subgroups.
The reflections of a rational polygonal billiard table yield a translation surface, but not every translation surface arises that way. What do we gain by the extra generality? What we gain are the algebraic tools and one more trick:
Instead of looking at different starting points for the billiard ball and rotating the direction in which it starts moving, we can look at rotations and linear stretchings of the translation surfaces. That is, instead of the orbit of the ball, we can study the algebraic orbit of the space under or some of its subgroups.
The orbits have their own spatial structure. This is one of the great features of representation theory conceived by Sophus Lie: groups of matrices acting on spaces form topological spaces in their own right. Subgroups can be defined by parameters that act as coordinates for . So what happens when is a translation surface?
A simple answer was hoped for but experience with fractal behavior and chaos in related matters had restrained hopes of proving one. The answer by Mirzakhani in collaboration with Alex Eskin and joined by Amir Mohammadi was dubbed the “Magic Wand Theorem” in this survey by Anton Zorich:
Theorem 1 The closure of the orbit of a translation space is always a Riemannian manifold, moreover one definable by linear equations in periodic coordinates with zero constant term.
Despite the statement being simple and short the proof is anything but: almost half of the first paper’s 204 pages are devoted to approximation techniques employing random walks amid conditions of low entropy meaning low rate of divergence or “unpredictability.” Zorich says more about the wide panoply of techniques the proof brings together. Thus the ultimate dynamics were brainpower, knowledge, interaction, focus while assembling all the moving parts, and sheer hard work.
What does the “Magic Wand Theorem” do? To quote the title of a paper by Samuel Lelièvre, Thierry Monteil, and Barak Weiss, “Everything is Illuminated.” They solved question 3 above for rational polygons by showing that at most finitely many points remain in shadow—and illumination comes arbitrarily close to those points. It is just amazing that a simple question that Newton would have instantly understood needed such heft to answer. As they say in their abstract:
Our results crucially rely on the recent breakthrough results of Eskin-Mirzakhani and Eskin-Mirzakhani-Mohammadi, and on related results of [Alex] Wright.
Wright’s survey also notes that Theorem 1 converts many results of the form ‘X happens in almost all cases (but we don’t know specifically which)’ into ‘X happens in all cases.’
The theorem also makes previous upper and lower bounds for certain counting problems coincide. Incidentally, one of the major results in Mirzakhani’s PhD thesis, cited in the article accompanying her Fields Medal, showed how to count simple closed geodesics in Riemannian manifolds as a function of their length . The count can jump—e.g. when passes the length of a loop around a torus—but behaves nicely asymptotically.
The amplification of previous knowledge also shows in the relation of Theorem 1 to a theorem by Marina Ratner that inspired it:
Theorem 2 Let be a Lie group and a finitely periodic structure within —that is, a lattice. Let be a subgroup of definable by real matrices such that some power of is zero and the entries of are functions of one real parameter . Then for every point in , the closure of its orbit under is a manifold defined by homogeneous equations.
The Fields citation article calls Theorem 1 “a version of Ratner’s theorem for moduli spaces,” noting that the latter are “totally inhomogeneous.” It says Mirzakhani was thus able to transfer questions about dynamics on inhomogeneous spaces into nicer homogeneous cases. Other theorems by Ratner form a nexus that is all reflected in Mirzakhani’s work with Eskin and Mohammadi.
By sad coincidence, Marina Ratner also passed away earlier this month. Yesterday’s New York Times gave her a long obituary as well, noting how she did some of her best work after age 50 and that it was a basis for work by others including Mirzakhani. Jointly they provided much to inspire. Here are Mirzakhani in a still from a Harvard Math lecture video at the point where she introduced billiards and the illumination problem, and Ratner receiving an honorary doctorate from The Hebrew University in Jerusalem:
Our most ambitious question is whether Mirzakhani’s work can be made to have a magic effect on orbit closure problems that some are trying to use to illuminate complexity theory.
Again we express our condolences to her family and colleagues.
Update 7/30: We received permission to use Assad Binakhahi’s beautiful memorial drawing titled “Unfinished Equation” from both the artist and Radio Farda, whose torch logo appears at its upper left. To them many thanks. The previous picture of Mirzakhahi has been moved alongside the one of Marina Ratner, with an added sentence above them and some other word changes at top and here.
Cropped and combined from src1, src2. |
Michaël Rao and Marjorie Rice are linked in this month’s news. Rao has just released a paper (see also slides and code here) completing the catalog of convex polygons that tile the plane. Rice, who passed away on July 2 (obit), had expanded the pentagon catalog from 9 to 13 while working in her kitchen in 1975. Rolf Stein found a fourteenth in 1985 and Casey Mann led a team of programmers to find a fifteenth in 2015. Rao has closed the book at 15.
Today Dick and I hail their accomplishments, which we noted from two articles by Natalie Wolchover in Quanta this past Tuesday. We also emphasize some related problems.
We especially are impressed by Rice, who was a true amateur. She had no advanced training in mathematics of any kind. After reading a 1975 Scientific American article on tessellations she started her search for new types. She succeeded and found ones that had been missed by everyone—that includes Johannes Kepler who worked on tessellations in 1691. She maintained a website named, “Intriguing Tessellations.”
In recent decades, computers have become an essential tool. This raises the possibility of a new kind of amateur: one who can code. Computing power is more accessible than ever before. The fact that having advanced degrees doesn’t make your code run faster levels the playing field. As it happens, Mann led a team that included a student and Rao wrote his own code.
If you draw any triangle in the plane, then you can place a 180 rotated copy against it on an edge to make a parallelogram. That can be replicated to make an infinite strip, and those strips complete a tiling of the plane. That tiling is periodic with only two different orientations of the triangle.
A little more thought will tell you that any quadrilateral—not just a parallelogram—can be made to tile the plane. The reason is that the four interior angles add up to 360—even if the quadrilateral is not convex. Make three copies and orient them so that the four different angles come together at a corner of the original and its two adjoining edges are shared. Then the same orientations work at the opposite corner and this suffices to see that the clump of four tiles the whole plane, indeed two of them do.
Combined from Math and the Art of M. C. Escher wiki source. |
Convex polygons with 7 or more sides cannot tile the plane—whatever their shape—because their interior angles sum so high that the average number of polygons meeting at a vertex would fall below 3. Regular hexagons can tile, of course. Karl Reinhardt in 1918 showed that convex hexagons can tile in three ways that use 2, 3, or 4 different orientations (the first not being a special case of the third).
That left the case of pentagons. Of course a regular pentagon cannot tile the plane, but ones shaped like a baseball home plate can mesh in a sawtooth pattern using two orientations. You can get this by cutting each strip of a regular hexagonal tiling in half:
Mathematical Tourist source. |
The idea of cutting tiles to make new ones animates the mathematics. The following figure taken from a 2015 story in Britain’s Guardian newspaper shows the now-complete list of distinct pentagonal tilings.
Note that the version of this diagram used in a February 2013 post on Rice had only the 14 tilings known then.
All known tilings by single connected pieces are periodic like wallpaper. There is an algebraic theory of wallpaper symmetries and corresponding groups. Note that pentagonal symmetry is excluded. The periodic clumps in tilings by pentagons, if they have any rotational symmetry other than the full circle, must have one of order 2, 3, 4, or 6.
Flooring, however, can choose to have a radial pattern. Here are two tilings found by Sir Roger Penrose of Oxford with five-fold symmetry that can be extended infinitely far:
Modified from Martin Gardner article source. |
Both use the same two quadrilateral tiles and a special restriction: the two centrally symmetric vertices of one cannot touch a centrally symmetric vertex of the other. Cutting each tile into two triangles facilitates defining a self-recursion that proves how the pattern can be extended infinitely. Our diagram also shows a mutual recursion between the “sun” and “star” patterns.
The limit ratio of the convex “kite” tiles to the concave “dart” tiles in the recursions is the golden ratio, … To see why, note that each larger orange-bordered kite at right is made of two kites plus two halves of a dart, while the larger darts have just one kite and the halves of a dart. The recursion thus involves powers of the matrix , whose entries yield consecutive Fibonacci numbers, whose ratio approaches . Because is irrational, the tilings are not periodic.
Penrose found a related tiling using two convex shapes—a thin lozenge and a fatter one—with a similar restriction that enforces aperiodicity. John Conway suggested enforcing these restrictions by matching up colored lines, as illustrated by the entrance to Oxford’s new Mathematical Institute building. This is my own photo from two years ago:
The restriction can be enforced without markings by notching the sides allowed to match up in the manner of jigsaw puzzle pieces, but this creates non-convex polygons. Robert Ammann found a way to cut Penrose’s lozenges and assemble them into three convex polygons that can only tile aperiodically. This figure from a paper last year by Teruhisa Sugimoto shows how:
Here is a color version of Ammann’s tiling posted by John Lindner. It too could be an attractive floor, but how about as a kitchen counter?
Two of Ammann’s tiles are pentagonal. Can a single pentagon carry out an aperiodic tiling? That question may have boarded the train of Rice’s thinking as she worked at her kitchen counter. It took until Sugimoto’s paper to prove this impossible when the pentagons must share entire edges. Rao’s results completes a definite no answer.
Understanding tilings of the plane is an intriguing mathematical problem. Finding new ones, as Rice did, requires cleverness and insight. Showing that certain types of tilings are impossible, as Rao did, requires another type of cleverness: the ability to prove that something cannot exist. This is interesting because it reminds us of lower bound problems that are well documented to be difficult.
Tilings are special compared to other classification problems—that is problems that show that the following list consist of all ways to create some mathematical object. They are different because tilings can be used to build real objects. One measure of this is that there are a number of patents for various types of tilings. Penrose thought to patent his tiles before publicizing them. We quote the introduction to his US patent 4133152 titled, “Set of tiles for covering a surface”:
[The field of this invention] has found practical application not only to the design of paving and wall-coverings but also in the production of toys and games. In both instances, not only is the purely geometric aspect of complete covering of the surface of importance, but the esthetic appeal of the completed tessellation has equal significance in the eye of the beholder. … [T]he pattern which they form is necessarily non-repetitive, giving a considerable esthetic appeal to the eye.
We especially like his equal regard for esthetics. What emerged in greater force than even he may have imagined—while expressly thinking of crystals—is how strongly Nature shares this regard. Dan Shechtman received the 2011 Nobel Prize in Chemistry for discovering quasicrystals. We covered some of this history going back to Hao Wang’s first proof of the existence of finite sets of non-convex tiles that can only tessellate aperiodically in a post four years ago. It is also neat that this was initially a consequence of Wang’s proof that whether a given can tessellate is undecidable, because by compactness, if the only tessellations are periodic then this fact is detectable in finite time.
If you don’t insist on convex tiles for your kitchen counter, then the question of aperiodic tilings by one piece remains open. This is called the Einsteinproblem. The name is not for Albert Einstein but derives from ein Stein being German for one stone. German uses “Stein” in many game contexts (besides Go) where we in English say “piece.”
Joan Taylor of Tasmania, another amateur mathematician, discovered in early 2010 that a single hexagonal tile could be forced to tile hierarchically—and only aperiodically—if a more complicated set of marking rules is stipulated. These rules cannot be enforced by jigsaw notching. However, followup work with Joshua Socolar discovered how to realize the hierarchical scheme by a single non-connected tile:
The three lines are just for show; the shape alone enforces the structure which the lines make clear. This is not quite an einstein—not one stone—but an allusion to Albert is warranted by the combination of cleverness, esthetics, and amazement that this fact brings. Taylor maintains a website with other striking designs.
There is also the problem of proving the widely-voiced belief that for square tiles with notches, the set of six discovered by Raphael Robinson has the minimum size to force aperiodicity.
If you prefer to stay with convex tilings, what emerges from Sugimoto’s paper vis-à-vis Ammann’s tiles discussed above is that the following question remains open:
Are there two convex tiles that tessellate but only aperiodically?
We have not even taken time to consider tilings in three (or higher) dimensions, in which a single bi-prism is known to give only tight packings of space that are aperiodic in one of their three dimensions. Whether a single 3D tile can squeeze periodicity out of all three dimensions seems to be open. We have also glommed over whether to allow tiles to be reflected or flipped over as well as rotated, and whether the number of different rotation angles in an aperiodic tiling is infinite, as happens for the irrational twist angles (in degree units) for the bi-prism packings. We invite you, our readers, to contribute your own favorite open tiling problems.
How would you “bet” on the open tiling problems? How would you have bet before 2010? We’ve discussed estimates of how people would bet on open problems in complexity but we have no idea here.
[sourced first two diagrams in section 2]
Combined from source |
Eric Allender and Michael Saks have been leading lights in computing theory for four decades. They have both turned 60 this year. I greatly enjoyed the commemorative workshop held in their honor last January 28–29 at DIMACS on the Rutgers campus.
Today Dick and I salute Eric and Mike on this occasion.
Eric and Mike have been together on the Rutgers faculty since the middle 1980s. I have known Eric since we were both graduate students. We both had papers at the first Structure in Complexity Theory conference in 1986—when it was co-located with STOC at Berkeley—and again at the 1986 ICALP in Rennes, France. I don’t know if I first met Mike at the Berkeley conferences or a couple years later at FOCS 1988 in White Plains, New York. The “Structures” conference was renamed CCC for “Computational Complexity Conference” in 1996. The 2017 conference starts tomorrow in Riga, Latvia.
Mike also has recently been named an ACM Fellow, joining Eric on that illustrious roster. Mike’s primary appointment is in Mathematics and he is currently serving as Chair. Eric was Chair of Computer Science at Rutgers not long ago. I have somehow managed not to do a paper with Eric—nor has Lance Fortnow, as Lance noted during his talk—but we collaborated with Michael Loui on three book chapters covering complexity theory in the CRC Algorithms and Computation Theory Handbook. I did write a paper with Eric’s student Martin Strauss, who along with Michal Koucký—another of Eric’s students—organized the workshop.
One hears all the time about movements and generations in art and music but they happen also in science. We’ve discussed the “AI Winter” among other things. Complexity theory is no exception.
Eric and I were among the avant-garde of “Structural Complexity.” The idea was to understand common features of complexity classes and reductions between problems, apart from specific features of the problems in isolation. In part this was a reaction to how direct analysis of problems had not only failed to resolve versus in the 1970s, but had met barriers in the form of oracle results that applied in similar ways across many levels of classes.
Eric’s paper at STOC 1987, titled “Some Consequences of the Existence of Pseudorandom Generators,” kicked off his study and use of multiple forms of Kolmogorov Complexity (KC). We have covered this in Eric’s research before. The “structural” flavor of KC comes from how it avoids referencing specific combinatorial structures like graphs or formulas or set systems and how it can be applied at many levels of complexity.
It seems fair to say the structural approach clarified and systematized many questions of complexity theory but did not resolve many on its own. But it was great for fashioning molds into which combinatorial arguments can be injected. Eric’s signature lower bound with his student Vivek Gore, separating the classes uniform- and , employs these ‘structural’ ingredients:
A similar list can be made for Ryan Williams’s separation of from nonuniform : oracles again employed constructively; succinctness; quasi-linear time complexity as a stepping-stone; and probabilistic simulation of AND/OR using modular counting—for which Allender-Gore is cited. The use of symmetric functions and gates in both papers might be deemed more “combinatorial” but this still goes toward my point here.
That said, Mike has always represented more the “combinatorial” side—indeed, his first three journal papers after joining Rutgers appeared in the journal Combinatorica. Further, the workshop emblem adds up to “Eric + Mike = Complexity and Combinatorics.” The two flavors were evident in the series of talks chosen to honor each and both.
All but one talk has video online. DIMACS Director Rebecca Wright gave a welcoming introduction.
Avi Wigderson spoke on “Branscamp-Lieb Inequalities, Operator Scaling, and Optimization.” After relating why his and Mike’s families are close, he set the tone by telling what led Mike into combinatorics as a graduate student in mathematics:
“When he was near the algebraists, the typical conversation he would hear is, ‘Remember this really extremely general result I proved last year? Well, guess what—I can now generalize it even more and I can prove an even more general one.’ [But] then when he hung around with the combinatorialists, he would hear, ‘You remember this extremely trivial problem that I could not solve last year? Well, I found a special case that I can still not solve.’ So he decided, ‘that’s for me.’
Well … Mike still has some affinity to the algebraic side, … and has this tendency whenever he is facing a problem the first thing to do is to generalize it just below the point where it becomes false, and then scale it a bit.”
Avi then introduced the technical part by saying that he was led into it by the Polynomial Identity Testing (PIT) problem, “a problem that Eric cares about, Mike cares about, lots of people care about”:
“I just want to mention that Mike and I spent five years on [PIT], meeting every week in a café for the day. We had lots of great ideas that ended up with nothing. I think that’s the story with a lot of other people.”
Avi could have appended to that last sentence, “…and in complexity theory in general.” Thus C & C are married to a hard bed. The main body of his talk was about testing inequalities, where things can be done.
Harry Buhrman spoke on “Computing With Nearly Full Space.” We covered this work with a different slant here. Harry’s first six minutes featured many stories and photos of conferences and meetings with Eric and Mike.
Meena Mahajan spoke on “Enumerator Polynomials: Completeness and Intermediate Complexity.” Although she began by saying she mainly knew Eric, having invited him to India and vice-versa, her talk was highly combinatorial involving polynomial enumerators for cliques, Hamiltonian circuits, and much else including projections in real space.
Clifford Smyth spoke on “Restricted Stirling and Lah numbers and their inverses.” This involved the problem of computing (the signs of) entries of certain inverse matrices without having to do the whole inversion.
Yi Li spoke on “Estimating the Schatten Norm of Matrices in Streaming Models.” He started with a problem about -dimensional real vectors : Starting with , you get sequential updates += to the components of . You want to maintain estimates of a function to within without using space —ideally, using space polylog in . He then took this to the case of matrices and described solvable and hard cases.
Mary Wootters—who along with Yi Li did her PhD under Martin—closed the first day by speaking on “Repairing Reed-Solomon Codes.” This, from a joint paper with Venkat Guruswamy, was my favorite talk. The basic problem is deliciously simple: Given an unknown polynomial of degree over where , and an argument , we want to compute . A random set of values suffices to compute any by interpolation. Having only values is never enough. Each value has bits. Do we really need all bits of the values? Mary gave cases where, amazingly, getting samples of only total bits from the values is enough. The bits sent by each node may be computed locally from but not with communication from any other node.
The conference dinner had several speeches and toasts and a joint birthday cake.
Neil Immerman spoke on “Algebra, Logic and Complexity.” He began by noting that he met Eric at the same joint STOC and Structures meeting in 1986 which I mentioned above. He started with how the descriptive complexity program refined notions of reductions to make them very sharp, culminating with uniform- reductions formalized via first-order logic. This covered his 2009 paper with Eric and three others, showing that under standard complexity assumptions there are exactly six equivalence classes of Boolean-based constraint satisfaction problems under isomorphisms.
Nati Linial spoke on “Hypertrees.” He has been Mike’s most frequent collaboration partner—21 joint papers and counting—and vice-versa. He related how they have visited each other often since they were post-docs together at UCLA. The talk involved large matrices whose nonzero entries are or .
Toniann Pitassi was supposed to speak on “Strongly Exponential Lower Bounds for Monotone Computation” but she was unable to travel at the last minute.
Ramamohan Paturi spoke about “Satisfiability Algorithms and Circuit Lower Bounds.” This covered Mike’s algorithmic ideas in the famous “PPSZ” paper, “An Improved Exponential-Time Algorithm for -SAT.”
Lance Fortnow closed the technical part of the meeting by talking about “Connecting Randomness and Complexity.” He started by noting that he and Eric have gone a combined 61-for-62 in attending the Structures/Complexity conferences, Lance having missed only 2012 in Porto, and told more personal stories. His talk covered Eric’s work involving degrees of Kolmogorov complexity-based randomness, which I’ve noted above.
I had to drive back to Buffalo early the next day, so I missed the festivities on the second evening and a third-day brunch at Eric’s house. Overall it was a really nice and convivial time. It was great seeing friends again, and one conversation in particular has proved valuable to me since then: I heard Eric and Harry and Michal and Jack Lutz and perhaps Mario Szegedy or Mohan Paturi talking about how Kolmogorov complexity is “not so concrete.” Without giving the actual details I outlined a practical case where one would want a definite, concrete measure. I thank DIMACS for sponsorship and the organizers for putting together a great event.
I’ve just now returned from Poland, whose classic toast “Sto lat!” means, “May you live one hundred years.” Accordingly we wish that may come to denote their ages in Roman numerals.
[fixed two names and added note about Li being Martin’s student too]
Oded Goldreich is one of the top researchers in cryptography, randomness, and complexity theory.
Today Ken and I wish to thank the Knuth Prize Committee for selecting Oded as the winner of the 2017 Knuth Prize.
It is no doubt a wonderful choice, a choice that rewards many great results, and a choice that is terrific. Congrads to Oded. This year the choice was only announced to the general public at the last minute. Ken and I at GLL got an encrypted message that allowed us to figure it out ahead of time. The message was: YXWX APRN LKW CRTLK DHPFW The encryption method is based a code with over
keys, and so was almost unbreakable. But we did it.
Oded gave his talk last night to a filled ballroom: one of the perks of winning the Knuth Prize. I had sent him congrats as soon as I heard he had won and added I looked forward to his talk. He answered essentially “thanks for increasing the pressure on me.” I know he was kidding since he always gives great talks.
I just heard the talk; and he delivered, with his usual mixture of fun and seriousness. The talk had two parts. The first started with some apologizes.
He added some wonderful comments like: “I had some jokes but I forgot them.” This brought a huge laugh down—we theory people just love diagonal arguments.
This part continued with some interesting comments on the nature of Theory. Some of it was advice to junior members and some advice to senior members. My favorite were:
I like these suggestions very much. I have more than once been on the receiving end of “but it is so simple.” I would like to think that I rarely have said that to someone else.
Oded then moved on to the technical part of his talk. I personally liked the first part very much and would have loved to hear more of his comments of this nature.
But Oded wanted to use this talk to also highlight some very interesting new results on proof systems. Here he spoke about On Doubly-Efficient Interactive Proof Systems. He introduced the idea by using the movie When Night is Falling. It is a Canadian Film from 1995 involving Petra and Camille. My wife, Kathryn Farley, who was sitting next to during the talk, immediately whispered to me: “what a wonderful movie” as soon as Oded put a picture of Petra and Camille on the screen. We all have our own expertise.
A proof system is called doubly-efficient if the prescribed prover strategy can be implemented in polynomial-time and the verifier’s strategy can be implemented in almost-linear-time. See here for a paper on the subject joint with Guy Rothblum. I think we will report on this material in more detail in the future, but here is part of their abstract:
A proof system is called doubly-efficient if the prescribed prover strategy can be implemented in polynomial-time and the verifier’s strategy can be implemented in almost-linear-time. We present direct constructions of doubly-efficient interactive proof systems for problems in P that are believed to have relatively high complexity. Specifically, such constructions are presented for t-CLIQUE and t-SUM. In addition, we present a generic construction of such proof systems for a natural class that contains both problems and is in NC (and also in SC).
Again congrats to Oded. Any thoughts of how the message to Ken and me was encoded?
[some word fixes]
Géraud Sénizergues proved in 1997 that equivalence of deterministic pushdown automata (DPDAs) is decidable. Solving this decades-open problem won him the 2002 Gödel Prize.
Today Ken and I want to ponder how theory of computing (TOC) has changed over the years and where it is headed.
Of course we have some idea of how it has changed over the years, since we both have worked in TOC for decades, but the future is a bit more difficult to tell. Actually the future is also safer: people may feel left out and disagree about the past, but the future is yet to happen so who could be left out?
For example, we might represent the past by the following table of basic decision problems involving automata such as one might teach in an intro theory course. The result by Sénizergues filled in what had been the last unknown box:
Problem/machine | DFA | NFA | DPDA | NPDA | DLBA | DTM | |||||||
Does accept ? | In P | In P | In P | In P | PSPC | Undec. | |||||||
Is ? | In P | In P | In P | In P | Undec. | Undec. | |||||||
Is ? | In P | In P | Undec. | Undec. | Undec. | Undec. | |||||||
Is ? | In P | PSPC | In P | Undec. | Undec. | Undec. | |||||||
Is ? | In P | PSPC | Decidable | Undec. | Undec. | Undec. | |||||||
Here ‘PSPC’ means -complete. This table is central but leaves out whole fields of important theory.
At the Theory Fest this June—which we mentioned here—there will be a panel on the future of TOC. We will try to guess what they will say.
Of course we don’t know what the panel will say. They don’t necessarily give statements ahead of time like in some Senate hearings. But we can get a hint from the subjects and titles of some of the invited plenary talks, which are the last afternoon session each day:
We salute Ken’s colleague Atri among the speakers. There is also a keynote by Orna Kupferman titled, “Examining classical graph-theory problems from the viewpoint of formal-verification methods.” And there is one by Avi Widerson titled “On the Nature and Future of ToC”—which is the subject of this post.
We can get a fix on the present by looking at the regular papers in the conference program. But like Avi we want to try to gauge the future. One clear message of the above range of talks is that it will be diverse. But to say more about how theory is changing we take another look at the past.
We can divide the changes in TOC into two parts. One is
and the other is
Years ago, most of the questions we considered were basic questions about strings and other fundamental objects of computing. A classic example was one of Zeke Zalcstein’s, my mentor’s, favorite problems: the star height problem. You probably do not know this—I knew it once and still had to look it up. Here is a definition:
Lawrence Eggan seems to have been the first to raise the following questions formally, in 1963:
Regarding the first question, at first it wasn’t even known whether needed to be greater than . There are contexts in which one level of nesting suffices, most notably the theorem that one while-loop suffices for any program. Eggan proved however that is unbounded, and in 1966, Fran\c{c}ois Dejean and Marcel Schützenberger showed this for languages over a binary alphabet.
The second question became a noted open problem until Kosiburo Hashiguchi proved it decidable in 1988. His algorithm was not even elementary—that is, its time was not bounded by any fixed stack of exponentials in —but Daniel Kirsten in 2005 improved it to double exponential space, hence at worst triple exponential time. It is known to be -hard, so we might hope only faintly for a runnable algorithm, but special cases (including ones involving groups that interested Zeke) may be tractable. Narrowing the gap is open and interesting but likely to be difficult.
Do you wish you could travel back to the early 1960s to work on the original problems? Well, basically you can: Just add a complementation operator and define it to leave star-height unchanged. Then the resulting generalized star-height problem is wide open, even regarding whether suffices. To see why it is trickier, note that over the alphabet ,
so those languages have generalized star-height . Whereas, does not—it needs the one star. See this 1992 paper and these recent slides for more.
Diversifying areas are certainly giving us new domains of questions to attack. Often the new problem is an old problem with an new application. For instance, Google’s PageRank algorithm derives from the theory of random walks on graphs, as we noted here.
The novelty we find it most fruitful to realize, however, comes from changes in what we regard as solutions—the second point at the head of the last section. We used to demand exact solutions and measure worst-case complexity. Now we allow various grades of approximation. Answers may be contingent on conjectures. For example, edit distance requires quadratic time unless the Strong Exponential Time Hypothesis is false—but some approximations to it run in nearly linear time. We have talked at length about such contingencies in crypto.
A nice survey in Nature by Ashley Montanaro shows the progression within the limited field of quantum algorithms. In the classical worst-case sense, it is said that there aren’t many quantum algorithms. For a long time the “big three” were the algorithms by Peter Shor and Lov Grover and the ability of quantum computers to simulate quantum -body problems and quantum physics in general. Quantum walks became a fourth and linear algebra a fifth, but as Montanaro notes, the latter needs changing what we consider a solution to a linear system where is . You don’t get , rather you get a quantum state that approximates over a space of qubits. The approximation is good enough to answer some predicates with high probability, such as whether the same solves another system . You lose exactness but what you gain is running time that is polynomial in rather than in . A big gain is that is now allowed to be huge.
The survey goes on to problems with restricted numbers of true qubits, even zero. These problems seem important today because it has been so hard to build real quantum computers with more than a handful of qubits. Beyond the survey there are quantum versions of online algorithms and approximations of those.
If we are willing to change what we consider to be an answer, it follows that we are primed to handle fuzzier questions and goals. Online auctions are a major recent topic, and we have talked about them a little. There are many design goals: fairness, truthfulness, minimizing regret, profitability for one side or the other. Again we note that old classic problems are often best adaptable to the new contexts, such as stable-marriage graph problems with various new types of constraints.
The old classic problems never go away. What may determine how much they are worked on, however, is how well we can modify what counts as a solution or at least some progress. It seems hard to imagine partial or approximate answers to questions such as, “is logspace equal to polynomial time?”
The problem we began with about equivalence of DPDAs may be a good test case. Sénizergues gave a simple yes-answer to a definite question, but as with star-height, his algorithm is completely hopeless. Now (D)PDAs and grammars have become integral to compression schemes and their analysis—see this or this, for instance. Will that lead to important new cases and forms of the classic problems we started with? See also this 2014 paper for PDA problem refinements and algorithms.
What are your senses of the future of ToC?
]]>
The problem of mining text for implications
2016 RSA Conference bio, speech |
Michael Rogers, the head of the National Security Agency, testified before the Senate Intelligence Committee the other day about President Donald Trump. He was jointed by other heads of other intelligence agencies who also testified. Their comments were, as one would expect, widely reported.
In real time, I heard Admiral Rogers’s comments. Then I heard and read the reports about them. I am at best puzzled about what happened.
The various reports all were similar to this:
Adm. Michael S. Rogers, the head of the National Security Agency, also declined to comment on earlier reports that Mr. Trump had asked him to deny possible evidence of collusion between Mr. Trump’s associates and Russian officials. He said he would not discuss specific interactions with the president.
The above quote is accurate—Adm. Rogers did not discuss specific interactions with the president. But I have trouble with this statement. The problem I have is this:
Are statements made in a Senate hearing subject to the basic rules of logic?
For example, if a person says and later says , can we conclude that he or she has effectively said ?
Let’s look at the testimony of Adm Rogers. He insisted that he could not recall being pressured to act inappropriately in his almost three years in the post. “I have never been directed to do anything I believe to be illegal, immoral, unethical or inappropriate,” he said.
During his three years as head of the NSA he worked under President Obama and now President Trump. So I see the following logical argument. Since he has never been asked to do anything wrong during that period, then it follows that Trump never asked him to do anything wrong.
This follows from the rule called universal specification or universal elimination. If is true, then for any in the set it must follow that is true.
What is going on here? The reports that he refused to answer ‘is true?’ are correct. But he said a stronger statement in my mind that is true. Is it misleading reporting? Or do the rules of logic not apply to testimony before Senate committees? Which is a stronger statement:
or
where is an element of ?
In mathematics the latter statement is stronger, but it appears not to be so in the real world. The statement is more direct. What does this say about logic and its role in human discourse?
Ken recalls a course he took in 1979 from the late Manfred Halpern, a professor of politics at Princeton. Titled “Personal and Political Transformation,” the course used a set of notes that became Halpern’s posthumous magnum opus.
The notes asserted that components of human relationships can be classed into eight basic modalities, the first three being paradigms for life: emanation, incoherence, transformation, isolation, subjection, direct bargaining, boundary management, and buffering. The first three form a progression exemplified by Dorothy and the wizard vis-à-vis Glinda and the ruby slippers in The Wizard of Oz; later he added deformation as a ninth mode and fourth paradigm and second progression endpoint. It particularly struck Ken that presenting mathematical proofs is classed as a form of subjection: You can’t argue or bargain with a proof or counterexample.
Buffering made and remained in his list. He showed how each member is archetypal in human history and depth psychology. So Ken’s answer is that the one-step-remove of saying “” rather than “” is a deeply rooted difference. It makes wiggle-room that a jury of peers might credit in a pinch.
Psychology aside, the mining of logical inferences is a major application area. Sometimes the inference is outside the text being analyzed, such as when “chatter” is evaluated to tell how far it may imply terrorist threats. We are interested in cases where the deduction is more inside. For instance, consider this example in a 2016 article on the work of Douglas Lenat:
A bat has wings. Because it has wings, a bat can fly. Because a bat can fly, it can travel from place to place.
One might say that underlying this is the logical rule
One of the problems, however, is that even if we limit the set to animals, the rule is false—there are many flightless birds. This leads into the whole area of non-monotonic logic which is a topic for another day—but good to bear in mind when revelations from hearings revise previously-held beliefs.
Ken has been dealing this week with an example at the juncture of the logic of time and human language. He had to evaluate twenty pages of testimony about a recurring behavior . In one place it states that occurred at time and occurred “once again” at time . The question is whether one can infer and apply the rule
This was complicated by the document having been translated from a foreign language. Whether time was the next occurrence of after time makes a difference to results Ken might give. Of course this may be clarified in a further round of testimony—but we could say the same about Admiral Rogers, and he has left the stand.
How soon will we have apps that can take statements of the form and deduce for a particular that we want to know about? Will inferences from “material implication” be considered material in testimony?
Update, 11:15pm 6/8/17: CNN has just told of a woman they interviewed about the contradictions between Donald Trump and James Comey. Asked if she believes Comey lied, she replied, “No.” Asked if she believes Trump lied, she replied, “No.” Asked how that could be, she said: “The media has distorted it.” Thus the logical law of excluded middle is replaced by a “law of occluded media,” which blocks constructive inference…
Wikimedia Commons source |
Robert Southey was the Poet Laureate of Britain from 1813 until his death in 1843. He published, anonymously, “The Story of the Three Bears” in 1837.
Today Ken and I want to talk about the state of versus and the relationship to this story.
The story, as I’m sure you know, is about Goldilocks. She has—no surprise—curly blond hair. She enters the home of three bears while they are away. She tries their chairs, eats some of their porridge, and falls asleep on one of their beds. When the bears return she runs away.
What you may not know is that Southey’s original story had not a young girl but an old woman. She is not innocent but furtive, self-serving, and meddlesome: she breaks the little bear’s chair and eats his breakfast. An 1849 retelling by Joseph Cundall changed her into a girl named “Silver-hair” and changed her motives to restlessness and curiosity. Her hair changed to gold around 1868 but she did not acquire the name “Goldilocks” until 1904.
Of course there is no change to our classic problem: Claims continue that there are proofs that , claims continue that there are proofs that , and claims continue that —but without offering any proof. What connection can there be to the Goldilocks story? It is in the telling—the literary rule of three augmented with a total order.
The Goldilocks tale is really one of . It has her try at each stage: chairs, food, and beds. At each stage of the story, one item is too big-or-hot-or-hard, one is too small-or-cold-or-soft, and one is just right. Then the bears follow the same sequence in discovering her traipsing.
The “just right” aspect has been named the Goldilocks Principle. Christopher Booker’s oft–quoted description of the “dialectical three” goes as follows:
“[T]he first is wrong in one way, the second in another or opposite way, and only the third, in the middle, is just right. … This idea that the way forward lies in finding an exact middle path between opposites is of extraordinary importance in storytelling.”
The Goldilocks Principle however leads, according to this neat history on the LetterPile website, to what it calls the “Goldilocks Syndrome”:
“We are living in consumerism, where big companies non-stop create billions of realities, where everybody … can feel ‘just right.’ … The problem starts when we can’t stop looking for perfect solutions in [this] pretty imperfect world.”
It is not clear whether they have a solution, but they go on to describe and recommend the following “Goldilocks Rule”:
“Balance between known and unknown, risky and risk-free, predictable and unpredictable.”
Our take on all this is: are we trying to be “too perfect” in our approach to versus ? Can we profitably strike a new balance?
I have argued recently and before that is possible, but with an algorithm for say that is galactic—see here for our introduction of this term—meaning an algorithm that is completely useless. Here are three perspectives on the power of the two classes.
Lemma 1 There is a constant so that if is in , then has a Boolean circuit of size a most .
Note that the consequence is easy to show: Assume and that there is such a constant . Then this contradicts Ravi Kannan’s famous theorem that the polynomial hierarchy has sets that require boolean circuits for any . (See this 2009 paper for more.) In terms of our theme: is too weak to be equal to —the bowl is too small.
We can start by regarding the “three barriers”—relativization, natural proofs, and algebrization—as effects of such masquerading. Then one can focus further on the extent to which -objects can be approximated by polynomial-time ones. Many -complete problems are easy in average case under certain natural distributions. We wonder whether the theory can be structured to say that logspace objects, or ones from (not to mention ) cannot approximate so well. An example of a technical issue to overcome is that languages like are -hard but approximated ultra-simply by the language of all strings.
Of course, it would be a huge breakthrough already to separate from uniform , let alone from logspace or . We’re suggesting instead to think along these lines:
So, which bowl? Which bed to lie in? Most seem to believe the second is how to prove but who knows. At least we’re trying not to run away.
The famous front-and-back cover art of the venerable 1979 textbook by John Hopcroft and Jeffrey Ullman is said to depict Cinderella:
We believe Goldilocks fits better: curly hair, wearing boots not slippers, and breaking things. Both of us recall general optimism about solving versus at the time the text was published. Now the artwork seems prophetic on what happens when we tug at the question. Can we get a “middle-way” approach to it up and functioning?
While on the subject of textbooks, we are happy to note that our textbook Quantum Algorithms Via Linear Algebra received a second printing from MIT Press, in which all of our previous errata have been corrected.
UK Independent source—and “a gentle irony” |
Roger Bannister is a British neurologist. He received the first Lifetime Achievement Award from the American Academy for Neurology in 2005. Besides his extensive research and many papers in neurology, his 25 years of revising and expanding the bellwether text Clinical Neurology culminated in being added as co-author. Oh by the way, he is that Bannister who was the first person timed under 4:00 in a mile race.
Today I cover another case of “Big Data Blues” that has surfaced in my chess work, using a race-timing analogy to make it general.
Sir Roger also served as Head of Pembroke College, one of the constituents of Oxford University. He was one of three august Rogers with whom I interacted about IBM PC-XT computers when the machines were installed at Oxford in 1984–1985. Sir Roger Penrose was among trustees of the Mathematical Institute’s Prizes Fund who granted support for my installation of an XT-based mathematical typesetting system there, a story I’ve told here. Roger Highfield and his secretary used an XT in my college’s office, and I was frequently called in to troubleshoot. While drafting this post last month, I received a mailing from Sir Martin Taylor that Dr. Highfield had just passed away—from his obit one can see that he, too, received admission to a royal order.
Dr. Bannister was interested in purchasing several XTs for scientific as well as general purposes at Pembroke. At the time, numerical performance required purchasing a co-processor chip, adding almost $1,000 to what was already a large outlay per machine by today’s standards. I wish I’d thought to say in a quick deadpan voice, “let it run four minutes and it will give you a mile of data.” (Instead, I think the 1954 race never came up in our conversation.) Today, however, data outruns us all. How to keep control of the pace is our topic.
Roger Bannister 50-year commemorative coin. Royal Mint source. |
As shown above in the commemorative coin’s design, the historic 3:59.4 time was recorded on stopwatches. We’ll stay with this older timing technology for our example.
Suppose you have a field of 200 milers. Suppose you also have a box of 50 stopwatches. For each runner you pick a stopwatch at random and measure his/her time. You get results that closely match the histogram of times that were recorded for the same runners in trials the previous day.
How good is this? You can be satisfied that the box of watches does not have a systematic tendency to be slow or to be fast for runners at that mix of levels. Projections based on such fields are valid.
The rub, however, is that you could have gotten your nice fit even if each individual watch is broken and always returns the same time. Suppose your field included Bannister, John Landy, Jim Ryun, and Sebastian Coe, with each in his prime. They would probably average close to 3:55. Hence if one of the 50 watches is stuck on 3:55, it will fit them well. It doesn’t matter if you actually draw the watch when measuring the last-place finisher. The point is that you expect to draw the watch 4 times overall and are fitting an aggregate.
Indeed, you only need the distribution of the (stopped) watches to match the distribution of the runners under random draws. You may measure a close fit not only in the quantiles but also the higher moments, which is as good as it gets. Your model may still work fine on tomorrow’s batch of runners. But at the non-aggregate level, what it did in projecting an individual runner was vapid.
Here is a hypothetical example in the predictive analytic domain of my chess model. Consider a model used by a home insurance company to judge the probabilities of damage by earth movement, wind, fire, or flood and price policies accordingly. I’ve seen policies with grainy risk-scale levels that apply to several hundred homes in a given area at one time. The company only needs good performance on such aggregates to earn its profit.
But suppose the model were fine-grained enough to project probabilities on individual homes. And suppose it did the following:
This is weird but might not be bad. If the risks average out over several hundred homes, a model like this might perform well—despite the consternation homeowners would feel if they ever saw such individual projections.
Of course, “real” models don’t do this—or do they? The expansion of my chess model which I described last Election Day has started doing this. It fixates on some moves but gives near-zero probability to others—even ones that were played—while giving fits 5–50x sharper than before. If you’ve already had experience with behavior like the above, please feel welcome to jump to the end and let us know in comments. But to see what lessons to learn from how this happens in my new model, here are details…
My chess model assigns a probability to every possible move at every game turn , based only on the values given to those moves by strong computer chess programs and parameters denoting the skill profile of a formal player . The programs list move options in order of value for the player to move, so that is the raw inferiority of in chess-specific centipawn units.
The model asserts that the parameters can be used to compute dimensionless inferiority values , from which projected probabilities are obtained without further reference to either parameters or data. The old model starts with a function that scales down the raw difference according to the overall position value . Then it defines
Lower and higher both decrease the probability of playing a sub-optimal move by dint of driving higher. The effect of is greatest when is low, so is interpreted as the player’s “sensitivity” to small differences in value, whereas governs the frequency of large mistakes and hence is called “consistency.” My conversion represents each as a power of the best-move probability , namely solving the equations
where is the number of legal moves in the position. The double exponential looks surprising but can be broken down by regarding as a “utility share” expressed in proportion to the best move’s utility , then . Alternate formulations can define directly from and the parameters, e.g. by , and/or simply normalize the shares by rather than use powers, but they seem not to work as well.
This “inner loop” defines as a probability ensemble given any point in the parameter space. The “outer loop” of regression needs to determine the that best conforms to the given data sample. The determine projections for the frequency of “matching” the computer’s first move and the “average scaled difference” of the played moves by:
The regression makes these into unbiased estimators by matching them to the actual values and in the sample. We can view this as minimizing the least-squares “fitness function”
where the weights on the individual tests are fixed ad-lib. In fact, my old model virtually always gets , thus solving two equations in two unknowns. Myriad alternative fitness functions using other statistical tests and weights help to judge the larger quality of the fit and cross-validate the results.
In my original model, all is good. My training sets for a wide spectrum of Elo ratings yield best-fit values that not only give a fine linear fit with residuals small across the spectrum, but the individual sequences and also give good linear fits to . Moreover, for all and positions the projected probabilities derived from have magnitudes that spread out over the reasonable moves .
My old model is however completely monotone in this sense: The best move(s) always have the highest , regardless of . Moreover, an uptick in the value of any move increases for every . This runs counter to the natural idea that weaker players prefer weaker moves.
The new model postulates a mechanism by which weaker moves may be preferred by dint of looking better at earlier stages of the search. A new measure called “swing” is positive for moves whose high worth emerges only late in the search, and negative for moves that look attractive early on but end with subpar values. The latter moves might be “traps” set by a canny opponent, such as the pivotal example from the 2008 world championship match discussed here.
A player’s susceptibility to “swing” is modeled by a new parameter called for “heave” as I described last November. The basic idea is that represents the “subjective value” of the move , so that represents the subjective difference in value. The idea I actually use applies swing to adjust the inferiority measure:
where is a fourth parameter and for negative is defined to be . Dropping from the second term and raising it to just the not power would be mathematically equivalent, but coupling the parameters makes it easier to try constraining and/or . (In fact, I’ve tried various other combinations and tweaks to the formulas for and , plus four other parameters kept frozen to default values in examples here. None so far has changed the picture described here.)
Note that the formulas for preserve the property for the first-listed move . When has equal-optimal value, that is , cannot be negative and is usually positive. That makes and hence reduces the share compared to . The first big win for the new model is that it naturally handles a puzzling phenomenon I identified years ago, for which my old model makes an ad-hoc adjustment.
The second big win is that can be negative even when —the swing term overpowers the other. This means the model projects the inferior move as more likely than the engine’s optimal move. This is nervy but in many cases my model correctly “foresees” the player taking the bait.
The third big win—but tantalizing—is that the extended model not only allows solving 2 more equations but often makes other fitness tests align like magic. The first of the following choices of extra equations makes an unbiased estimator for the frequency of playing a move of equal value to , which became my third cheating test after its advocacy in this paper (see also reply in this):
A typical fit that looks great by all these measures is here. It has 26,450 positions from all 497 games at standard time controls with both players rated between Elo 2040 and Elo 2060 since 2014 that are collected in the encyclopedic ChessBase Big 2017 data disc. It shows for to , then and tests related to it, then is repeated between and , and finally come four cases of for 0.01–0.10, 0.11–0.30, 0.31–0.70, and 0.71–1.50, plus four with .
Only , , , and were fitted on purpose. All the other tests follow closely like baby ducks in a row, except for some like captures and advancing versus retreating moves where human peculiarities may be identified. The value of is 5–10x as sharp as what my old model typically achieves. The new model seems to be confirming itself across the board and fulfilling the goal of giving accurate projected probabilities for all moves, not just the best move(s). What could possibly be amiss?
The first hint of trouble comes from the fitted value of being . In my old model, players rated 2050 give between and , while even the best players give . Players rated 2050 are in amateur ranks and leaves no headroom for masters and grandmasters. The value of compounds the sharpness; together with , a slight value difference (say) gets ballooned up to , giving and , which shrinks near 1-in-5,000 when and below 1-in-650,000 when . This is weirdly small—and we have not even yet involved the effects of the swing term with .
Those effects show up immediately in the file. I skip turns 1–8, so White’s 9th move is the first item. In the following position at left, Black has just captured a pawn and White has three ways to re-take, all of them reasonable moves according to the Stockfish 7 program.
Positions in game Franke-Doennebrink, 1974 at White’s 9th move (left) and Black’s 11th (right). |
Here is how my new model projects them:
NRW Class1 1314;Germany;2014.02.02;6.4;Franke, Thomas;Doennebrink, Elmar;1-0 r1b1k2r/pp2bppp/2n1pn2/3q4/3p4/2PBBN2/PP3PPP/RN1Q1RK1 w kq - 0 9; c3xd4, engine c3xd4 Eval 0.24 at depth 21; swap index 1 and spec AA2050SF7w4sw10-19: (InvExp:1), Unit weights with s = 0.0083, c = 0.3846, d = 12.5000, v = 0.0500, a = 0.9863, hm = 1.8024, hp = 1.0000, b = 1.0000: M# Rk Move RwDelta ScDelta Swing SwDDep SwRel Util.Share ProjProb'y 1 1 c3xd4: 0.00 0.000 0.000 0.000 0.000 1 0.79527569 2 2 Nf3xd4: 0.42 0.321 -0.035 -0.034 -0.034 0.144422 0.20472428 3 3 Be3xd4: 0.55 0.395 0.008 0.005 0.005 0.00445313 0.00000001
That’s right—it gives zero chance of a 2050-player taking with the Bishop, even though Stockfish rates that only a little worse than taking with the Knight. True, human players would say 9.Bxd4 is a stupid move because it lets Black gain the “Bishop pair” by exchanging his Knight for that Bishop. Of 155 games that ChessBase records as reaching this position, 151 saw White recapture by 9.cxd4, 4 by 9.Nxd4, and none by 9.Bxd4. So maybe the extremely low projection—for 9.Bxd4 and all other moves—has a point. But to give zero? The is the utility share, so the is actually about ; the is an imposed minimum. My original model—setting and fitting only and —spreads out the probability nicely, maybe even too much here:
M# Rk Move RwDelta ScDelta Swing SwDDep SwRel Util.Share ProjProb'y 1 1 c3xd4: 0.00 0.000 0.000 0.000 0.000 1 0.57620032 2 2 Nf3xd4: 0.42 0.321 -0.035 -0.034 -0.034 0.280586 0.14018157 3 3 Be3xd4: 0.55 0.395 0.008 0.005 0.005 0.241178 0.10168579
At Black’s 11th turn, however, the new model gives three clearly wrong “zero” projections:
NRW Class1 1314;Germany;2014.02.02;6.4;Franke, Thomas;Doennebrink, Elmar;1-0 r1b2rk1/pp2bppp/2nqpn2/8/3P4/P1NBBN2/1P3PPP/R2Q1RK1 b - - 0 11; Rf8-d8, engine b7-b6 Eval 0.11 at depth 20; swap index 1 and spec AA2050SF7w4sw10-19: (InvExp:1), Unit weights with s = 0.0083, c = 0.3846, d = 12.5000, v = 0.0500, a = 0.9863, hm = 1.8024, hp = 1.0000, b = 1.0000: M# Rk Move RwDelta ScDelta Swing SwDDep SwRel Util.Share ProjProb'y 1 1 b7-b6: -0.00 -0.000 0.000 0.000 0.000 1 0.56792559 2 2 Nf6-g4: 0.18 0.163 -0.001 -0.002 -0.002 0.0907468 0.00196053 3 3 Rf8-d8: 0.18 0.163 0.042 0.046 0.046 0.00391154 0.00000001 4 4 Bc8-d7: 0.21 0.187 -0.029 -0.030 -0.030 0.278053 0.13071447 5 5 Nf6-d5: 0.28 0.241 0.047 0.050 0.050 0.00218845 0.00000001 6 6 a7-a6: 0.30 0.256 -0.049 -0.051 -0.051 0.28777 0.14001152 7 7 Qd6-c7: 0.31 0.264 -0.012 -0.012 -0.012 0.097661 0.00304836 8 8 g7-g6: 0.37 0.306 0.015 0.017 0.017 0.00355675 0.00000001 9 9 Qd6-d8: 0.39 0.320 -0.054 -0.051 -0.051 0.206264 0.06438231 10 10 Qd6-b8: 0.39 0.320 -0.037 -0.038 -0.038 0.158031 0.02787298
Owing to many other games having “transposed” here by a different initial sequence of moves, Big 2017 shows 911 games reaching this point. In 683 of them, Black played the computer’s recommended 11…b6. None played the second-listed move 11…Ng4, which reflects well on the model’s giving it a tiny . But the third-listed move 11…Rd8 gets a zero despite having been chosen by 94 players. Then 91 played the sixth-listed 11…a6, which actually gets the second-highest nod from the model, and 22 played 11…Bd7, which the new model considers third most likely. But 12 players chose 11…Nd5, four of them rated over 2300 including the former world championship candidate Alexey Dreev in a game he won at the 2009 Aeroflot Open. My old model’s fit of the same data gives 34.8% to 11…b6, 10.4% to 11…Ng4 and 7.5% to 11…Rd8 with the ad-hoc change for tied moves (would be 8.7% to both without it), and 5.1% to 11…Nd5, with eighteen moves getting at least 1%.
To be sure, this is a well-known “book” position. The 75% preference for 11…b6 doubtless reflects players’ knowledge of past games and even the fact that Stockfish and other programs consider it best. It is hard to do a true distributional benchmark of my model in selected positions because the ones with enough games are exactly the ones in “book.” Studies of common endgame positions have been tried then and now, but with the issue that the programs’ immediate complete resolution of these endgames seems to wash out much of the progression in thinking and differentiation of player skill that one would like to capture. (My cheating tests exclude all “book-by-2300+” positions and all with one side ahead more than 3.00.) Most to the point, the fitting done by my model on training data is supposed to be already the distributional test of how players of that rating class have played over many thousands of instances.
The following position is far from book and typifies the most egregious kind of mis-projection:
SVK-chT1E 1314;Slovakia;2014.03.23;11.6;Debnar, Jan;Milcova, Zuzana;1-0 2r4k/pp5p/2n5/2P1p2q/2R1Qp1r/P2P1P2/1P3KP1/4RB2 b - - 1 32; Qh5-g5, engine Qh5-g5 Eval 0.01 at depth 21; swap index 2 and spec AA2050SF7w4sw10-19: (InvExp:1), Unit weights with s = 0.0083, c = 0.3846, d = 12.5000, v = 0.0500, a = 0.9863, hm = 1.8024, hp = 1.0000, b = 1.0000: M# Rk Move RwDelta ScDelta Swing SwDDep SwRel Util.Share ProjProb'y 1 1 Qh5-g5: 0.00 0.000 0.129 0.137 0.000 0.0347142 0.00018527 2 2 Rc8-d8: 0.00 0.000 0.026 0.025 -0.112 1 0.74206054 3 3 a7-a6: 0.09 0.085 -0.008 -0.005 -0.142 0.117659 0.07922164 4 4 Rc8-g8: 0.21 0.187 -0.092 -0.087 -0.225 0.0996097 0.05003989 5 5 Rh4-h1: 0.25 0.219 -0.166 -0.165 -0.302 0.136659 0.11270482
This has two tied-optimal moves for Black in a position judged +0.01 to White, not a flat 0.00 draw value, yet the one that was played gets under a 1-in-5,000 projection. Here are the by-depth values that produced the high positive value:
-------------------------------------------------------------------------------------------- 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -------------------------------------------------------------------------------------------- Qg5 -117 -002 -008 +000 -015 +008 +017 +032 +011 +007 +000 +004 +001 +000 +000 +006 +001 Rd8 -089 -058 -036 -032 -013 -025 +006 +000 -010 +000 -012 -013 +001 +014 +001 +001 +001
The numbers are from White’s view, so what happened is that 32…Rd8 looked like giving Black the advantage at depths 10, 13, and 15–16, whereas 32…Qg5 looked significantly inferior (to Stockfish 7) at depth 12 and nosed in front only at depth 20 just before falling into the tie. The swing computation begins at depth 10 to evade the Stockfish-specific strangeness I noted here last year, so in particular the “rogue” values at depth 5 (and below) are immaterial. The values and differences from depth 10 onward are all relatively gentle. Hence their amounting to a tiny versus and microscopic is a sudden whiplash.
What I believe is happening to the fit is hinted by this last example giving the highest probability to the 2nd-listed move. Our first game above has two positions where the 9th-listed move gets the love. (The second, shown in full here, is notable in that the second-best move gets a zero though it is inferior by only 0.03 and was played by all three 2200+ players in the book.) This conforms to the goal of projecting when weaker players will prefer weaker moves.
This table shows that the new model quite often prefers moves other than , compared to how often they are played:
To be sure, the model is not putting 100% probability on these preferred moves, but when preferred they get a lot more probability than under my old model, which never prefers a move other than . Recall however that my old model’s fit was not too far off on these indices—and both models are fitted to give the same total probability to over all positions . Hence the probability on inferior moves is conserved but more concentrated.
Yes, greater concentration was the goal—so as to distinguish the most plausible inferior moves. But the above examples show a runaway process. The new model seems to be seizing onto properties of the distribution alone. For each we can define to be the move with the most negative value of . The also form a histogram over . The fitting process can grab it by putting all weight on plus at most a few other moves at each turn .
These few moves are the “stopped-watch reading” in my analogy. The moves given zero are the readings that cannot happen for a given runner/position. The fitting doesn’t care whether moves getting zero were played, so long as other turns fill in the histogram. If a high for —as with 32…Rd8 above—fills a gap, the fit will gravitate toward values of and that beat down all the moves with at such turns . In trials on other data, I’ve seen crash under while zooms aloft in a crazy race.
What can fix this? The maximum likelihood estimator (MLE) in this case involves minimizing the log-sum of the projected probabilities of the moves played at turn . Adding it as a weighted component of helps a little by inflating the probability of the moves that were played, but so far not a lot. Even more on-point may be maximum entropy (ME) estimation, which in this case means minimizing
There are various other ways to fit the model, including a quantiling idea I devised in my AAAI 2011 paper with Guy Haworth. In principle, and because the training data is copious, it is good to have these ways agree more than they do at present. Absent a lightning bolt that fuses them, I am finding myself locally tweaking the model in directions that optimize some “meta-fitness” function composed from all these tests.
Is this a known issue? Does it have a name? Is there a standard recipe for fixing it?
Do any deployed models have similar tendencies that aren’t noticed because there isn’t the facility for probing deeper into the grain that my chess model enjoys?
[added “at standard time controls”, a few other word changes, added game diagrams]