[ Montreal ] |
Andrew Granville writes brilliant papers that explain hard results in number theory. He also proves hard results in number theory.
Today, Ken and I use the famous Goldbach conjecture to discuss a third rail: how to identify which results “should be” true even though they have been too hard to prove.
Granville has just recently published a graphic novel, Prime Suspects: The Anatomy of Integers and Permutations, with his sister Jennifer Granville and illustrator Robert Lewis. It features constables named Green and Tao, a detective Jack von Neumann, and students named Emmy Germain and Sergei Langer among other allusions to real (and complex) mathematicians. It grew out of a 2009 musical play that premiered at IAS.
The driver of the plot is a deep connection between the frequency of primes below a number and that of permutations in that are “prime” in the sense of having only one cycle. Substituting for and vice-versa tends to create correspondences of known results in number theory vis-à-vis permutation group theory. See this MAA review. Going beyond the known theorems described in the novel, we wonder how far such heuristic sleuthing methods can go on long-unsolved cases.
The statement we know as the (Strong) Goldbach Conjecture is that every even number can be written as the sum of two prime numbers. It was made in 1742 by Christian Goldbach. He wrote to Leonhard Euler:
“Every integer that can be written as the sum of two primes can also be written as the sum of as many primes as desired, until all terms are .”
Well, that is not what we call Goldbach’s conjecture. He and many others at the time considered to be a prime number. What he’s getting at can be seen from this snippet of his letter:
Above his sums, Goldbach put an asterisk * to the note in the margin, in which he asserts his conjecture:
“Every integer greater than 2 can be written as the sum of three primes.”
Wait—that is not the Goldbach conjecture either. It is the “Weak” one and was apparently proved in 2013 by Harald Helfgott. We discussed this in a 2014 post whose larger theme we are continuing here. It also proves the first conjecture, but not the strong conjecture.
But what Goldbach seems to be driving at with his drawing of sums is having one of the “primes” be . Then the strong conjecture is needed. Euler pointed this out in his reply to Goldbach’s letter. But Euler, who was a saint in many ways, charitably reminded Goldbach of a communication earlier that year when Goldbach had observed that his first conjecture followed from the strong statement. Euler went on to say:
“That every even number should be a sum of two primes I hold to be a completely certain theorem, irrespective of my not being able to prove it.”
Ken has translated this a little differently from Wikipedia’s article and its source, reading into Euler’s words the stance of truth shining apart from proof. How one can justify this stance is what we want to discuss.
The conjecture is curious on several fronts. For one it is usually said to be “obviously correct.” It has been checked to about or so by computation. There are many open conjectures in number theory that are likely to be true. But few are claimed to be “true” with such a strong bias. Many other conjectures are likely to be true, but none as likely as the Goldbach.
In 1975, Hugh Montgomery and Robert Vaughan proved that the Goldbach is true for most even numbers. That is that the number of possible even numbers less than some are not sums of two primes grows like . Thus if one picks a random even number it is likely to be the sum of two primes. Here the “likely” is a mathematical certainty.
How do we “know” that it is likely to be true? One source is the method of prime models. Primes are quite mysterious and hard to understand. So there are heuristic models that suggest we think of the primes as “random”. Of course this is silly, since the primes are a deterministic fixed sequence of numbers. But the hope is that the following is true.
If is a statement about the primes that is with high probability in the random model, then it is true.
Of course this is nonsense.
But it is interesting nonsense. Harald Cramér has a model that is simple. Granville add some refinements to this model here and here. More recently William Banks and Kevin Ford and Terence Tao have a new model for the primes here.
These models are useful for making and thinking about number theory conjectures. Perhaps one day they will be able to really be used to determine truth. They are certainly good heuristics to have when studying the prime numbers. We are jealous. In complexity theory it would be wonderful to have anything like these models. Perhaps
Cramér’s model is simple to state. Imagine that the primes are replaced by a random set by placing in with probability . And we make these choices independently. The Fermat numbers are those
The first of these
are prime. Fermat thought this continued but it is not true. Euler showed that the next is not a prime
An interesting problem is are there any more prime Fermat numbers? Many believe that there are no more, or art most there are a finite number in total. Let’s look at using the model to understand the Fermat numbers:
.
Therefore, the total expected number of Fermat primes is at most finite. Of course this is assuming the model is predictive.
Our friends at Wikipedia say:
As with many famous conjectures in mathematics, there are a number of purported proofs of the Goldbach conjecture, none of which are accepted by the mathematical community.
Try a Google search yourself for “Goldbach conjecture proved”. The top hits include several “proofs” that the conjecture is true. The proofs are all short and simple. All are believed to be wrong. I find it interesting that the proofs, in many cases, use a random like argument in there “proofs”. The trouble is that the above models are only heuristics. So the proofs seem to be incomplete.
Can we imagine getting heuristic models for complexity theory? For quantum algorithms perhaps. What would such heuristic models even look like? We wonder.
]]>Britannica source |
Paul Painlevé was a French mathematician who specialized in classical mechanics. He is known for a conjecture in 1895 about anomalies in Isaac Newton’s equations that was proved in 1988.
Tonight, Ken and I celebrate Halloween with some mathematical horrors that may have real physical consequences.
Painlevé is also known for having been Prime Minister of France for two short stints in 1917 and 1925. He was also Minister of War under President Raymond Poincaré, a cousin of Henri Poincaré. He was the first Frenchman in powered flight—alongside Wilbur Wright in 1908—and at the end of his life headed the French Air Force.
A basic fact about Newtonian gravity is that the gravitational potential energy of two point masses and is proportional to
where is the distance between them. We usually think of how gravity becomes weaker as grows but when is tiny it becomes quite strong. Since the potential is negative, it is possible for an individual particle in a finite -body system to accelerate to arbitrary speed without violating the conservation of energy. But can it happen in a finite amount of time—and without actually becoming zero in a collision?
Painlevé proved a ‘no’ answer for but suspected ‘yes’ for . Zhihong Xia proved in 1988 it can happen for , extending earlier advances by his thesis advisor, Donald Saari. The case is still open; in all cases, the initial conditions for it to happen form a measure-zero subset of the configuration space.
The difference from is shown by this diagram from their 1995 AMS Notices paper, which has a greatly understandable telling of the whole story. We, however, envision a fantasy story about what could have happened much earlier.
We want to imagine that Xia’s result was proved not near the end of the 20th century but near the start—in particular, before Albert Einstein’s creation of General Relativity, but after Special Relativity. Say the result was obtained in 1908 by Painlevé after his flight with Wright. That same year, Hugo von Zeipold proved a startling consequence of Painlevé’s conjecture, which we now know to be a theorem:
Theorem 1 Under Newtonian gravity, a -body set of point masses can eject a particle to infinity in finite time.
That is, without collisions, a Newtonian -body problem of point masses can create separations between particles that grow to infinity within a finite time. Saari and Xia say that the effect can be partially seen if you do the following:
Place a tennis ball on top of a basketball and drop the pair from chest height. The tennis ball rebound is pretty dramatic—enough so that it should be done outside.
Ken tried it and it works. It does not work so well if the tennis ball is replaced by a piece of Halloween candy. If the basketball is replaced by a pumpkin, it definitely will not work.
We know already, however, from Einstein’s theory of special relativity that it cannot work. We would have an instant before the singular time at which the point mass has just passed the speed of light. At that time it has acquired infinite energy according to special relativity, but cannot have withdrawn it from the potential by then.
In 1908 this would have been an internal indication that Newton’s theory of gravity must break down. There were of course external indicators, such as the theory’s incorrect prediction of the orbit of Mercury. Maybe internal ones were known, but this seems most glaring in retrospect. Are we right in this interpretation? The result by Xia is shocking enough as it stands, and makes us wonder what other surprises lurk in equations.
Other possibilities of singularities happening within finite time have not been ruled out by any physical theories. The New York Times Magazine in July 2015 ran a profile titled “The Singular Mind of Terry Tao” with a great opening sentence:
This April, as undergraduates strolled along the street outside his modest office on the campus of the University of California, Los Angeles, the mathematician Terence Tao mused about the possibility that water could spontaneously explode.
Indeed, Tao proved it can happen under a plausible modification of the Navier-Stokes equations of fluid dynamics. He writes there:
Intriguingly, the method of proof in fact hints at a possible route to establishing blowup for the true Navier-Stokes equations, which I am now increasingly inclined to believe is the case (albeit for a very small set of initial data).
As with Painlevé’s conjecture—Xia’s theorem—the point would be not that the initial conditions could happen with any perceptible probability, but that our world is capable of their happening at all. At least, that is, with the equations by which we describe our world. Our own mention at the start of 2015 alluded to the possibility of Tao’s blowup applying to fluid-like fields in cosmological theories.
Just last week, a column on the Starts With a Bang blog alerted us to an issue with equations that could be skewing cosmological theories today. The blog is written by Ethan Siegel for Forbes and its items are linked regularly on RealClear Science. Siegel draws an analogy to a phenomenon with Fourier series that feeds into things Dick and I (Ken writing this part) have already been thinking about. Things used all the time in theory…
The Fourier phenomenon is named for the American physicist Josiah Gibbs, but Gibbs was not the original discoverer. We could add this to our old post on a law named for Stephen Stigler, who did not discover it, that no scientific law is named for its original discoverer. The first discoverer of the Gibbs Phenomenon was—evidently—Henry Wilbraham in 1848. Through the magic of the Internet we can convey it by lifting Wilbraham’s original figure straight from his paper:
What this shows is the convergence of a sum of sine waves to a square wave. The convergence is pointwise except at the jump discontinuities , but it is not uniform. The -values do not converge on any interval crossing but instead rise about above the square wave function value in the vicinity of , no matter how many terms are summed in the approximations . Wilbraham’s middle drawing depicts this in finer detail than any other rendering I have found. The persistent overshoot is physically real—it is the cause of ringing artifacts in signal processing.
Now the convergence does satisfy a criterion of -approximation of a function that we use all the time in theory: for any and large enough , except for an fraction of in the domain. This kind of convergence is used internally in the proofs of quantum algorithms for linear algebra which we recently discussed. If the value is explicitly what you’re after, this is fine. But if you use the value only implicitly while composing the approximation with some other function , you must beware that the compositions are not thrown off in a constant way by the overshoots.
Siegel draws attention to what is alleged as something similar actually happening to current physical theories that use a well-known class of algorithmic simulations. The details are in a new paper by Anton Baushev and Sergey Pilipenko, titled “The central cusps in dark matter halos: fact or fiction?” To show the relation between this and the opening example in this post, we need only quote the paper (page 2)—
However, the present state-of-art of -body simulation tests … can hardly be named adequate. The commonly-used criterion of the convergence of -body simulations in the halo center is solely the density profile stability [per which] the central cusp (close to ) is formed quite rapidly ().
—and then quote Siegel’s own description of the core-cusp problem and the allegation:
In theory, matter should fall into a gravitationally bound structure and undergo what’s known as violent relaxation, where a large number of interactions cause the heaviest-mass objects to fall towards the center (becoming more tightly bound) while the lower-mass ones get exiled to the outskirts (becoming more loosely bound) and can even get ejected entirely.
Since similar phenomena to the expectations of violent relaxation were seen in the simulations, and all the different simulations had these features, we assumed that they were representative of real physics. However, it’s also possible that they don’t represent real physics, but rather represent a numerical artifact inherent to the simulation itself.
Is all this real physics? Or is it artifacts showing that current theories and/or algorithms are flawed? Whichever is the truth, our equations have tricks that may not lead to treats.
For a relevant postscript, Painlevé did not abandon physics when he rose in politics. In 1921, he effectively removed an apparent singularity at the event horizons of black holes in general relativity. He and colleagues discussed this with Einstein the following year, but apparently not the -body conjecture. How far are we right in our interpretation that the (proved) conjecture plus special relativity suffices to disprove Newtonian gravity by itself?
]]>
What it takes to understand and verify the claim
Cropped from 2014 Wired source |
John Martinis of U.C. Santa Barbara and Google is the last author of a paper published Wednesday in Nature that claims to have demonstrated a task executed with minimum effort by a quantum computer that no classical computer can emulate without expending Herculean—or Sisyphean—effort.
Today we present a lay understanding of the claim and discuss degrees of establishing it.
There are 76 other authors of the paper. The first 75 are alphabetical, then comes Hartmut Neven before Martinis. Usually pride of place goes to the first author, but that depends on size. Martinis is also the corresponding author. The cox in a rowing race rides at the rear. We have discussed aspects of papers with a huge number of authors here.
Three planks of a quantum supremacy claim are:
Scott Aaronson not only has made two great posts on these and many other aspects of the claim, he independently proposed in 2015 the sampling task that was programmed, and he analyzed it in a foundational paper with Lijie Chen of MIT. Researchers at Google had already been thinking along those lines, and they anchored the team composed from numerous other institutions as well. As if on cue—just a couple days before Wednesday’s announcement—a group from IBM put out a post and paper taking issue with the argument for the third plank.
We’ll start with the task and go in order 1-3-2.
Any -qubit quantum circuit and input to induces a probability distribution on . Because it will not matter if we prepend up to NOT gates to , we may suppose . Then is a unit complex vector of length with entries corresponding to possible outputs . Then the probability of getting by a final measurement of all qubits is
Next we consider probability distributions that are generated uniformly at random by the following process, for some and taking :
for to :
choose a uniformly at random;
increment its probability by .
Here we intend to be the number of binary nondeterministic gates in the circuit. In place of Hadamard gates the experimental circuits get their nondeterminism from these three single-qubit gates (ignoring global phase for in particular):
Here where and is another name for NOT. The difference from using Hadamard gates matters to technical analysis of the distributions but the interplay between quantum nondeterministic gates and classical random coins remains in force.
The choice of , , or is itself uniformly random at each point where a single-qubit gate is used, except for not repeating the same gate on the same qubit, and those choices determine . Now we can give an initial statement of the task tailored to what the paper achieves:
Given randomly-generated quantum circuits as inputs, distinguish with high probability from any .
In more detail, the object is to take a number and moderately large integer , both dictated by practical elements of the experiment, and fulfill this task statement:
Given randomly-generated , generate samples such that .
It’s important to note that there are two stages of randomness: one over which chooses , and then the stage of measuring after (perhaps imperfectly) executing . The latter can be repeated to get a large sample of strings for a given . The nature of the former stage matters most to justifying how to interpret tests of the samples and to closing loopholes. Our does not signify having uniform distribution in the latter sampling, but rather covers classical alternatives in the former stage that (with overwhelming probability) belong to a class we call . The for random will (again w.o.p.) belong to a class which we explain next.
In honor of the baseball World Series, we offer a baseball analogy. To make differences sharper to see, we take , so . This is not what the experiment does: their biggest instance has 20 layers totaling nondeterministic single-qubit gates (plus two-qubit gates) on the qubits. But let us continue.
We are distributing units of probability among “batters” . A batter who gets two units hits a double, three units makes a triple, and so on. The key distinction is between the familiar batting average and the slugging average, which averages all the bases scored with hits:
Thus with respect to a random , and without any knowledge of , a chosen team of hitters cannot expect to have a joint slugging average higher than . Moreover, for any fixed , the chance of getting a slugging average higher than tails away exponentially in (provided also grows).
With respect to , however, a quantum device can do better. Google’s device programs itself given as the blueprint. So it just executes and measures all qubits to sample the output. Finding its own heavy hitters is what a quantum circuit is good at. The probability of getting a hitter who hits a triple is magnified by compared to a uniform choice. Moreover, will never output a string with zero hits—a “can’t miss” property denied to a classical reader of . For large the probability distribution approaches and the slugging expectation is approximately
That is, a team drafted by sampling from random quantum circuits expects to have a slugging average near . This defines the class . If works perfectly, the average will surpass whenever with near certainty as grows.
Google’s circuits have up to , so . Then the “can’t miss” aspect of the quantum advantage is less sharp but the approximation is closer and the idea of is the same. The nature of can actually be seen from point intensities in speckle patterns of laser light:
The practical challenge is that the implementation of is not perfect. The consequence of an error in the final output is severe. The heavy-hitter outputs of a random are generally not bit-wise similar, so sampling their neighbors is like sampling uniform distribution. As the paper says, “A single bit or phase flip over the course of the algorithm will completely shuffle the speckle pattern and result in close to zero fidelity.”
Their circuits are sufficiently random that effects of sporadic errors over millions of samples can be modeled by a simple equation using quantum mixed states. We shortcut the paper’s physical analysis by drawing on John Preskill’s illustration of a de-polarizing channel in chapter 3 of his wonderful online notes on quantum computation to reach the same equation (1). The modeling has informative symmetry when the errors of a bit flip, phase flip, or both are considered equally likely with probability . The action on the entangled pair in the Bell basis is given by the density matrix evolution where
where and is the density matrix of the completely mixed two-qubit state which is just a classical distribution. This presumes ; note that completely mixes the Bell basis already. The fidelity of to the original state is then given by
This modeling already indicates that with serial opportunities for error the fidelity will decay as . The Google team found low ‘crosstalk’ between qubits and they used exactly this expression in the form , evidently with being the native gate error rate they call and where having for single-qubit gates supplies the factor .
The error for the two-qubit gates is similarly represented. (The full modeling in the supplement, section V, is more refined.)
By observing their benchmarks (discussed below) for varying small they could calculate the decay concretely and hence estimate values of for the vast majority of runs with larger . The random nature of the circuits evidently makes covariance of errors that could systematically upset this modeling negligible. Thus they can conclude that their device effectively samples from the distribution
Such distributions can be said to belong to the class . The paper reports that their is driven below but stays above in trials. This bounds the range of the they can separate by. That is separated from zero achieves the first plank and starts on the second. The third needs attention first, however.
Both concrete and asymptotic complexity evidence matter for the third plank, the former for now and the latter for how and everything else may scale up in the future. In asymptotic complexity, we still don’t know that and , which sandwich the quantum feasible class , are different. Thus asymptotic evidence about polynomial bounds must be conditional. Asymptotic evidence about linear time bounds can be sharper but then tends to be conditioned on forms of SETH in ways we still find puzzling.
Lower bounds in concrete complexity are less known and have a self-defeating aspect: We are trying to say that any program run for less than an infeasible time must fail. But we can’t run for time to show that it fails because time is just as infeasible as time . The best we can do is run for a feasible , either (i) on a smaller task size, or (ii) on the original task but argue it doesn’t show progress. Neither is the same; we made some attempts on (ii).
What the paper does instead is argue that a particular classical approach (also from the Aaronson-Chen paper) would take 10,000 years on today’s hardware. This reminds us of a famous 1977 “Mathematical Games” column by Martin Gardner, which quotes an estimate by Ron Rivest that for factoring a 126-digit number on then-current hardware, “the running time required would be about 40 quadrillion years!” It took only until 1994 for this to be broken. Sure enough, IBM calculated that a more-clever implementation of on the Summit supercomputer would take under 3 days. The point is not so much that the Summit hardware is comparable as that estimates based on what are currently thought to be the best possible (classical) methods need asterisks.
On the asymptotic side, the last section (XI) of the paper’s 66-page supplement proves a theorem toward showing that a classical simulation from that scales polynomially with would collapse to , and similarly for sub-exponential running times. It does not get all the way there, however: improvements would need to be made in upper bounds for approximation and for worst-case to average-case equivalence. [Added 10/31/19: see this new paper by Scott A. and Sam Gunn.] Moreover, there is a difference from what their statistical testing achieves that we try to explain next.
We can cast the second plank in the general context of predictive modeling. Consider a forecaster who places estimates on the true probabilities of various events. Here we need to compute the probabilities of output strings observed from the physical device, using the given circuit and the estimate of . This must be done classically, and incurs the “-versus-” issue discussed above.
But before we get to that issue, let’s say more from the viewpoint of predictive modeling. We measure how well the forecasts conform to the true by applying a prediction scoring rule. If outcome happens, then the log-likelihood rule assesses a penalty of
This is zero if the outcome was predicted with certainty but goes to infinity if the individual is very low—which is an issue in the quantum case. The expected score based on the true probabilities is
The log-likelihood rule is strictly proper insofar as the unique way to minimize is to set for each . In human contexts this means the model has incentive to be as accurate as possible.
The formula (2) is the cross-entropy from the distribution to the distribution. Before we can use it, we need to ask a question:
What is forecasting what? Is the device the imperfect model and do the “true ” come from the analysis of giving ?
This is how it appeared to us and seems from other writing, but we can argue the opposite from first principles: The physical device is the “ground truth” however it works. The assertion that it is executing a blueprint with some estimated loss in fidelity is really the model. Then it follows that is analogous to not , and we can call it . Since we can compute it, we can calculate in (2).
Saying this leaves “” in (2) as denoting the device’s true probabilities of giving the output strings . These are not directly observable: it is infeasible to sample the device often enough to expect a sufficiently large number of repeated occurrences based on the “birthday” threshold. Thus there appears to be no way to estimate individual values in (2), but this doesn’t matter: the very act of sampling the device carries out the “” part of (2). Summing for the that occur over a large but feasible number of trials gives estimates of (2) that are close enough to make the needed distinctions with high confidence. We can then match the estimate of against the theoretical estimate, which we may call , assuming accurate knowledge of . By the scoring function being strictly proper, this match entails achieving with sufficient approximation. This property of the goal mitigates some of the modeling issue.
This issue was clarified by reading Ryan O’Donnell’s 2018 notes on quantum supremacy, which preview this same experiment. The above view on which is “forecaster”/”forecastee” might defend the team against his opinion that it “kind of gets its stats backwards”—but the inability to compute cross-entropy from the blueprints’ distribution to that of the device remains an issue. What the team did instead, however, is shift to something simpler they call “linear cross-entropy.” They simply show that the from their samples collectively beat the “” that applies to —more simply put, that when summed over -many trials ,
This just boils down to giving a z-score based on the modeling for . It is analogous to how I (Ken writing this) test for cheating at chess. We are blowing a whistle to say the physical device is getting surreptitious input from quantum mechanics to achieve a strength of compared to a “classical player” who is “rated” as having strength .
The difference from showing that the device’s score from (2) is within a hair of is that this is based on . To be sure, the paper shows that their -scores conform to those one would expect an “-rated” device to achieve. But this is still not the same as (2). Whether it is tantamount for enough purposes—including the theorem about —is where we’re most unsure, and we note distinctions between fully (classically) sampling and “spoofing” the statistical tests(s) raised by Scott (including directly in reply to me here) and others. The authors say that using “linear cross-entropy” gave sharper results and that they tried other (unspecified) measures. We wonder how much of the space of scoring rules familiar in predictive modeling has been tried, and whether rules having more gentle tail behavior for tiny than might do better.
Finally, there is the issue that the team were able to verify exactly only for circuits up to qubits and/or with levels, not with levels. This creates a dilemma in that IBM’s paper may push them toward or , but that increases the gap from instance sizes they can verify. This also pushes away from the possibly of observing the nature of more directly by finding repeated strings in the second-stage sampling of a fixed . The “birthday paradox” threshold for repeats is roughly samples, which might be feasible for around (given the classical work needed for each , which IBM’s cleverness might speed) but not above . The distinguishing power of repeats drops further with . We intend to say more about these last few points, and we are sure there are many chapters still to write about supremacy experiments.
Is the evidence so far convincing to you? Is enough being done on the third plank to exclude possible clever classical use of the fact that the circuits are given as “white boxes”? Are there possible loopholes?
We would also be grateful to know where we may have oversimplified our characterization of the task and our analysis of the issues.
[Added more error-modeling details to the real-world section; some minor word changes; clarified how X,Y,W are chosen; addendum to clarify modeling issues; 10/31/19: removed addendum after blending it into a revision of the last main section—original version preserved here—and linked new Aaronson-Gunn paper.]
[ Harvard ] |
Harry Lewis is known for his research in mathematical logic, and for his wonderful contributions to teaching. He had two students that you may have heard of before: a Bill Gates and a Mark Zuckerberg.
Today I wish to talk about a recent request from Harry about a book that he is editing.
The book is the “Classic Papers of CS” based on a course that he has been teaching for years. It will contain 46 papers with short introductions by Harry. My paper from 1977 with Alan Perlis and Rich DeMillo will be included. The paper is “Social Processes and Proofs of Theorems and Programs”.
Harry says that “A valued colleague believes this paper displays such polemical overreach that it should not appear in this collection”. I hope that it does still appear anyway. Harry goes on to say
And though verification techniques are widely used today for hardware designs, formal verification of large software systems is still a rarity.
Indeed.
I have mixed feeling about our paper, which is now getting close to fifty years old. I believe we had some good points to make then. And that these are still relevant today. Our paper starts with:
Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think.
Our point was just this: Proofs in mathematics and not just formal arguments that show that a theorem is correct. They are much more. They must show why and how something is true. They must explain and extend our understanding of why something is true. They must do more than just demonstrate that something is correct.
They must also make it clear what they claim to prove. A difficulty we felt, then, was that care must be given to what one is claiming to prove. In mathematics often what is being proved is simple to state. In practice that is less clear. A long complex statement may not correctly capture what one is trying to prove.
Who proves that the specification is correct?
I have often wondered why some do not see this point. That proofs are more than “correctness checks”. I thought I would list some “proofs” of this point.
The great Carl Gauss gave the first proof of the law of quadratic reciprocity. He later published six more proofs, and two more were found in his posthumous papers. There are now over two hundred published proofs.
So much to say that a proof is just a check.
Thomas Hales solved the Kepler conjecture on sphere packing in three-dimensional Euclidean space. He faced some comments that his proof might not be certain—it was said to be . So he used formal methods to get a formal proof. But
Maryna Viazovska solved the related problem in eight dimensions. Her proof is here. The excitement of this packing result is striking compared with Hales’s result. No need for correctness checks in her proof.
Henry Cohn says here:
One measure of the complexity of a proof is how long it takes the community to digest it. By this standard, Viazovska’s proof is remarkably simple. It was understood by a number of people within a few days of her arXiv posting, and within a week it led to further progress: Abhinav Kumar, Stephen Miller, Danylo Radchenko, and I worked with Viazovska to adapt her methods to prove that the Leech lattice is an optimal sphere packing in twenty-four dimensions. This is the only other case above three dimensions in which the sphere packing problem has been solved.
So, a proof that a great proof is a proof that helps create new proofs of something else. Okay, nasty way to say it. What mean is a great proof is one that enables new insights, that enables further progress, that advances the field. Not just a result that “checks” for correctness.
The famous ABC conjecture of Joseph Oesterle and David Masser has been claimed by Shinichi Mochizuki. Arguments continue about his proof. Peter Scholze and Jakob Stix believe his proof is flawed and is unfixable. Mochizuki claims they are wrong.
Will a formal proof solve this impasse? Perhaps not. A proof that explains why it is true might. A proof that advances number theory elsewhere might, a proof that could solve other problems would likely.
What do you think about the role of proofs? Did we miss the point years ago?
Will formal verification become effective in the near future? And when it does, will it help provide explanations? We note this recent discussion of a presentation by Kevin Buzzard of Imperial College, London, and one-day workshop on “The Mechanization of Math” which took place two weeks ago in New York City.
[Typo fixed]
]]>[ Royal Society ] |
Sir Timothy Gowers is a Fields medalist and fellow blogger. Sometimes he (too) writes about simple topics.
Today I would like to talk about a simple problem that came up recently.
The problem is a simple to state “obvious fact”. The reason I thought you might be interested is that I had a tough time finding the solution. I hope you find the explanation below interesting.
The general issue of proving obviously true statements is discussed here for example. Here is an example from Gowers:
Let be intervals of real numbers with lengths that sum to less than , then their union cannot be all of .
He says:
It is quite common for people to think this statement is more obvious than it actually is. (The “proof” is this: just translate the intervals so that the end point of is the beginning point of , and so on, and that will clearly maximize the length of interval you can cover. The problem is that this argument works just as well in the rationals, where the conclusion is false.)
The following simple problem came up the other day. Suppose that is an odd number. Show there is some so that
Here is the gcd, the greatest common divisor of and . For example,
This result seems totally obvious, must be true, but I had trouble finding a proof.
There is a unproved conjecture in number theory that says: There are an infinite number of so that both and are prime. This clearly shows that there is an for our problem.
I like conjectures like this since they give you an immediate insight that a statement is likely to be true. But we would like a proof that does not use any unproved conjectures. Our problem can be viewed as a poor version of some of these conjectures. Suppose that you have a conjecture that there are an infinite number of so that
are all prime for some given functions . Then the poor version is to prove that there are so that these numbers are all relatively prime to some given . There are some partial results to the prime version by Ben Green and Terence Tao.
My initial idea was to try to set to something like . The point is that this always satisfies the first constraint: that is for any . Then I planned to try and show there must be some that satisfies the second constraint. Thus the goal is to prove there is some so that
But this is false. Note that if divides then
and so is always divisible by . Ooops.
My next idea was to set to a more “clever” value. I tried
Here I thought that I could make special and control the situation. Now
This looked promising. I then said to myself that why not make a large prime . Then clearly
Since and are relatively prime by the famous Dirichlet’s Theorem on arithmetic progressions we could make a prime too by selecting . This would satisfy the second constraint, and we are done.
Not quite. The trouble is that we need to have also that
Now this is
The trouble is that might not be relatively prime to . So we could just This seems like a recursion and I realized that it might not work.
I finally found a solution thanks to Delta Airlines. My dear wife, Kathryn Farley, and I were stuck in DC for several hours waiting for our delayed flight home. This time was needed for me to find a solution.
The key for me was to think about the value . It is usually a good idea to look at the simplest case first. So suppose that , then clearly the constraints
are now trivial. The next simplest case seems to be when is a prime. Let’s try . Now works. Let’s generalize this to any prime . The trick is to set so that
Then is equal to modulo , which is not divisible by . This shows that when is an odd prime there is always some .
Okay how do we get the full result? What if is the product of several primes? The Chinese remainder theorem to the rescue. Suppose that is divisible by the distinct odd primes . We can easily see that we do not care if there are repeated factors, since that cannot change the relatively prime constraints.
Then we constraint by:
and
for all primes . Then the Chinese remainder theorem proves there is some . Done.
Is there some one line proof of the problem? Do you know any references? There are several obvious generalizations of this simple problem, perhaps someone might look into them.
]]>
Cracking a Diophantine problem for 42 too
Andrew Booker is a mathematician at the University of Bristol, who works in analytic number theory. For example he has a paper extending a result of Alan Turing on the Riemann zeta function. Yes our Turing.
Today Ken and I will talk about his successful search for a solution to a 64 year old problem.
He was inspired by a video on the search problem authored by Tim Browning and Brady Haran. The question was to find a solution to
Booker found
The search was for all possible solutions with bounded by . Note that this is expensive, and is not even close to polynomial time in the number of bits. But it is feasible today thanks to modern technology:
The total computation used was approximately core-years over one month of real time.
Before we turn to our discussion, note that Booker’s paper on extending Turing is really a result on proof checking. Turing had great intuition, terrible that we lost him so early. He, Turing, essentially proved the first result ever on how to efficiently check a computation. Booker says:
Turing introduced a method for certifying the completeness of a purported list of zeros of that is guaranteed to work (when the list is in fact complete). Turing’s method has remained a small but essential ingredient in all subsequent verifications of RH and its many generalizations.
That is checking the zeroes of the Riemann zeta function .
Speaking of checking, when I was drafting this I initially had the wrong solution:
which is wrong. Can you quickly see why this cannot be right? Answer at the end.
The press love Booker’s result. Not the one on the zeta function, the one on the number .
Part of the excitement is caused by the number . In complexity theory we rarely see explicit numbers—more likely to see expressions like
and worse.
The press seem to like the numerology of . The number is quite neat:
Most important is the connection with Rolling-Rock beer:
The press from Newsweek and other sites talked about Booker’s solution. See here and here. And here at the Quanta magazine with a great diagram:
One said:
To crunch the numbers, he then used a cluster of powerful computers – 512 central processing unit (CPU) cores at the same time – known as Blue Crystal Phase 3. When he returned to his office one morning after dropping his children off at school, he spotted the solution on his screen. “I jumped for joy,” he recalled.
Another reported,
Booker said: “This one’s right at the boundary between what we know how to prove and what we suspect might be undecidable.”
I hope we will get the same coverage for our big results.
The press love Booker’s result. Not the one on the zeta function, the one on the number . This search was jointly led by Andrew Sutherland of MIT.
Part of the excitement is caused by the number . In complexity theory we rarely see explicit numbers—more likely to see expressions like
and worse.
The press seem to like the numerology of . The number is quite neat:
Most important is the connection with The Hitchhiker’s Guide to the Galaxy:
The press from New Scientist and other sites talked about Booker’s solution. See here and here. But here the Quanta magazine seems not to have mentioned the number at all in over three months:
One said:
Of course, it wasn’t simple. The pair had to go large, so they enlisted the aid of the Charity Engine, an initiative that spans the globe, harnessing unused computing power from over 500,000 home PCs to act as a sort of “planetary supercomputer.”
It took over a million hours of computing time, but the two mathematicians found their solution:
Another reported:
Booker said: “I feel relieved … we might find what we’re looking for with a few months of searching, or it might be that the solution isn’t found for another century.”
I hope we will get the same coverage for our big results.
Booker and Sutherland also discovered that
This is the next-largest solution after and . Weird. And the first solution not to duplicate a number. And it uses two numbers that agree to markedly more decimal places than those in the above solutions for and . Weirder.
Booker wanted to search for a solution to
Actually his main interest was in , but his method is general. How does one do this for bounded by . The obvious method is: Try all numbers below .
This is too expensive and requires time. Too much, even with a cluster of fast processors.
An improvement is to try all in the range and then check that is cube. This runs in time. Still too much.
A key insight is to re-write the equation as
Then we note that must be a divisor of . Since there are few such divisors, we can improve the time greatly. For the divisors of some simple algebra and the quadratic formula shows that
and
This shows that the search is now reduced to . Still too much, but close to doable. The next trick is to avoid the factoring step. See Booker’s paper for the rest of the search description.
I like the progression of time bounds from
Can one beat ? Could there be an algorithm that runs in for some ? Can any of our tricks apply here? A possible observation: Booker is clever but he writes that the methods use not
time, but that they use
time. Maybe we can help in some manner. What do you think? The next unsolved number, , awaits.
§
Answer to the question on checking: Take the numbers modulo .
becomes
[Typo fixed]
]]>Composite crop of src1, src2, src3 |
Aram Harrow, Avinatan Hassidim, and Seth Lloyd are quantum stars who have done many other things as well. They are jointly famous for their 2009 paper, “Quantum algorithm for linear systems of equations.”
Today Ken and I discuss an aspect of their paper that speaks to a wider context.
The paper by Harrow, Hassidim, and Lloyd addresses a fundamental problem: given an matrix and an -vector , solve . This is called the Linear Systems Problem (LSP). They say in their abstract:
Here, we exhibit a quantum algorithm for this task that runs in time, an exponential improvement over the best classical algorithm.
What strikes us is what HHL meant by “this task” in their abstract. It is not LSP. Indeed, it can’t be LSP. They want to do it with a number of qubits that is . But the solution is a length- vector, which in general has at least bits of information. No matter how much you entangle and wrangle, you cannot extract more than bits of information out of qubits. So even if is -sparse in some sense, and ignoring the issue of the time it would take to output the entries of , you can’t solve LSP with fewer qubits to get all of in the first place.
The problem they solve has been given its own name in subsequent treatments, including a 2015 paper by Andrew Childs, Robin Kothari, and Rolando Somma that greatly improved the time to achieve precision:
Quantum Linear Systems Problem (QLSP): Given a succinct representation of an invertible Hermitian matrix , code for a unitary operator that maps to a quantum state , and an error tolerance , output a quantum state such that
The symbol is called a ket in Paul Dirac’s bra-ket notation. “Ketting” a problem to our mind means accepting not only an approximate answer but also a loss of information that might apply to inputs as well as outputs.
What a solution to QLSP gives you is not but a quantum state that approximates the unit-vector multiple of . As HHL say earlier in their abstract, this changed problem is useful if what you really want is not but some other value derived from . They say for example that desired information could come from a quantum measurement of —that is, of .
This blog did have a 2012 post on “lazy problem solving” that included the quip:
The idea is that when we solve a system for , it is likely that we do not really want . We want some information about . For example, we might wish to know the norm of . The question we plan to address is, are there algorithms that can do this faster than solving the entire linear system?
Ironically, the problem we suggested there, which is computing up to multiplicative error , is destroyed by normalizing into a quantum state . Oh well. But other posts have considered changing the objectives of algorithms. But the general idea involved in the formulation of QLSP strikes us anew—and in ways different from Scott Aaronson’s caveats in a wonderful review he wrote of HHL titled, “Quantum Machine Learning Algorithms: Read the Fine Print.”
Going from LSP to QLSP switches the problem and enables showing that a quantum algorithm can solve the switched problem much faster than before. Very nice: HHL and its followups deserving of myriad citations. Scott’s review frames the question of whether the switch is fair from a practical perspective, including the restrictions on and . Perhaps the jury is still out on this: we’ve mentioned several times the nifty result by Ewin Tang that found a quick classical solution to a problem downstream of HHL that had been thought to require quantum. But switching a problem for research is fair—we do this all time in theory. We hope that the modified problem helps us understand the original problem. Let’s look at some examples.
We think the general recipe will be clear from a couple more examples. The first one seems not to be the same as what is called “QSAT” here or here.
“Ketted SAT”: Given a CNF formula with variables but -sized clauses, and , output a quantum state such that for some satisfying assignment , .
We still suspect this is known at least implicitly in the context of interactive proofs and PCPs and error-correcting codes. In that context, the idea of getting an assignment within distance of a satisfying one might make sense. For other purposes it is not so useful—e.g., it might not help utiliize the self-reducibility structure of SAT. It may simply be equivalent to known succinct-SAT problems after applying the coding used to obtain holographic proofs, but we don’t immediately see it.
Note that the fact of being in CNF with small clause size (which can even be constant) mirrors the sparseness condition on that is usually assumed in cases of HHL. So the analogy to QLSP holds up there. At least “ketted SAT” is a different problem.
“Ketted Factoring”: Given a really large integer , output a quantum state such that for some nontrivial (prime) factor of , , where .
We can suppose that we have oracle lookup to bits of , so that all the bits can be placed into quantum superposition. We could alternatively encode as the quantum state , but then the problem might not be well-defined because cannot retain all info about .
This is not in the shadow of Shor’s algorithm because that needs to have digits in order eventually to get a factor exactly. The quantumized version applies when is really big. Again, it is not clear that the solutions would be useful for things like breaking cryptosystems. They might represent points of cryptographic weakness in some detailed sense. In any event, it’s a different problem.
With all of these there are questions about how far the change reflects on the original problem, but at least the change generated substantial research and interesting new angles.
Every even number is the sum of two primes Every even number is the sum of two almost-primes.
A number is an almost prime if it is a prime or a product of at most two primes. This is a quite useful result, but not what we really believe is true.
not equal to not equal to .
This says that if we relativize to some oracle , then and are not equal. This is a famous result of Theodore Baker, John Gill, and Robert Solovay. It sheds light on the structure of algorithms, but does not rule out that could still equal .
The problem.
Let be when is even, and when is odd. The problem is whether the orbit
reaches for all positive integers . The change is:
All positive integers Almost all.
Actually, Terence Tao got a cool result this year only by making a second change. In his paper “Almost All Orbits Of The Collatz Map Attain Almost Bounded Values” he proves that for almost all and any that tends to infinity. Thus
is fine. Here is the smallest integer that is reached in the orbit of . It is scary how hard this is to prove.
Find explicit boolean functions with non-linear boolean complexity.
Boolean complexity with negation Boolean complexity without negation.
These are the seminal results on monotone boolean complexity. They came up two years ago in an abortive attempt to prove , which we covered here.
We come full circle back to quantum computing. This week there has been much hubbub over a prematurely-released announcement that Google researchers have built a quantum device able to complete probabilistic searches that are not feasible by any classical computer. If one goes back to origins of the term “quantum supremacy” in 2012, before Google’s approach was conceived in 2015, one can say the change is:
specific physical simulation or complexity-based problems randomized problems.
Scott has a great post with preliminary descriptions and evaluations, and we confess to adapting the following telegraphic evocation of how it works from Ryan O’Donnell and others in the comments section: Given a randomly-generated probability distribution on , call “-heavy” if gives it probability for some fixed . Even given in white-box form via an -qubit quantum circuit with randomly placed gates, it is evidently hard (compare this) for a classical computer to find heavy strings with frequency , concretely for . Google’s quantum machine, which is effectively programmed by the user presenting the random , can however find heavy strings with frequency with a highly significant separation between and . We surmise that exhaustive tests over classical circuits trying the search at smaller values of have witnessed the smaller value at those .
Again, this approach is different from demonstrating a specific computable function to separate classical and quantum and from the instances considered in the concluding post, with “supremacy” in its title, of our eight-part series in 2012 between Gil Kalai and, yes, Aram Harrow.
Is our general notion of “ketted problems” useful? Ketted SAT and Factoring lack the natural continuity in (Q)LSP, but at least for SAT it can be grafted on by composing the formula with an error-correcting code. There is still the issue of being able to extract less classical information from the ketted approximations.
We note a theory workshop being held this Saturday at UW Seattle in honor of Paul Beame’s 60th birthday. We congratulate Paul on both.
[added Aug. 2019 CACM link for Ewin Tang’s theorem; changed 1/e to (1 – 2/e) near end but not sure if that fix is intended by the source; reverted 1/e since the “(1 – 2/e)” holds in a simplified situation]
John Robson, who goes by Mike, has worked on various problems including what is still the best result on separating words—the topic we discussed the other day. Ken first knew him for his proof than checkers is -complete and similar hardness results for chess and Go.
Today I want to talk about his theorem that any two words can be separated by an automaton with relativley few states.
In his famous paper from 1989, he proved an upper bound on the Separating Word Problem. This is the question: Given two strings and , how many states does a deterministic automaton need to be able to accept and reject ? His theorem is:
Theorem 1 (Robson’s Theorem) Suppose that and are distinct strings of length . Then there is an automaton with at most states that accepts and rejects .
The story of his result is involved. For starters, it is still the best upper bound after almost three decades. Impressive. Another issue is that a web search does not quickly, at least for for me, find a PDF of the original paper. I tried to find it and could not. More recent papers on the separating word problem reference his 1989 paper, but they do not explain how he proves it.
Recall the problem of separating words is: Given two distinct words of length , is there a deterministic finite automaton that accepts one and rejects the other? And the machine has as few states as possible. Thus his theorem shows that roughly the number of states grows at most like the square root of .
I did finally track the paper down. The trouble for me is the paper is encrypted. Well not exactly, but the version I did find is a poor copy of the original. Here is an example to show what I mean:
[ An Example ] |
So the task of decoding the proof is a challenge. A challenge, but a rewarding one.
Robson’s proof uses two insights. The first is he uses some basic string-ology. That is he uses some basic facts about strings. For example he uses that a non-periodic string cannot overlap itself too much.
He also uses a clever trick on how to simulate two deterministic machines for the price of one. This in general is not possible, and is related to deep questions about automata that we have discussed before here. Robson shows that it can be done in a special but important case.
Let me explain. Suppose that is a string. We can easily design an automaton that accepts if and only if is the string . The machine will have order the length of states. So far quite simple.
Now suppose that we have a string of length and wish to find a particular occurrence of the pattern in . We assume that there are occurrences of in . The task is to construct an automaton that accepts at the end of the copy of . Robson shows that this can be done by a automaton that has order
Here is the length of the string .
This is a simple, clever, and quite useful observation. Clever indeed. The obvious automaton that can do this would seem to require a cartesian product of two machines. This would imply that it would require
number of states: Note the times operator rather than addition. Thus Robson’s trick is a huge improvement.
Here is how he does this.
Robson’s uses a clever trick in his proof of the main lemma. Let’s work through an example with the string . The goal is to see if there is a copy of this string starting at a position that is a multiple of .
The machine starts in state and tries to find the correct string as input. If it does, then it reaches the accepting state . If while doing this it gets a wrong input, then it switches to states that have stopped looking for the input . After seeing three inputs the machine reaches and then moves back to the start state.
[ The automaton ] |
We will now outline the proof in some detail.
The first lemma is a simple fact about hashing.
Lemma 2 Suppose and
Then all but primes satisfy
Proof: Consider the quantity for not equal to . Call a prime bad if it divides this quantity. This quantity can be divisible by at most primes. So there are at most bad primes in total.
We need some definitions about strings. Let be the length of the string . Also let be the number of occurrences of in .
A string has the period provided
for all so that is defined. A string is periodic provided it has a period that is less than half its length. Note, the shorter the period the more the string is really “periodic”: for example, the string
is more “periodic” than
Lemma 3 For any string either or is not periodic.
Proof: Suppose that is periodic with period where is a single character. Let the length of equal . So by definition, . Then
for . So it follows that
This shows that and cannot both be periodic, since
Lemma 4 Suppose that is not a periodic string. Then the number of copies of in a string is upper bounded by where
Proof: The claim follows once we prove that no two copies of in can overlap more than where is the length of . This will immediately imply the lemma.
If has two copies in that overlap then clearly
for some and all in the range . This says that has the period . Since is not periodic it follows that . This implies that the overlap of the two copies of are at most length . Thus we have shown that they cannot overlap too much.
Say an automaton finds the occurrence of in provided it enters a special state after scanning the last bit of this occurrence.
Lemma 5 Let be a string of length and let be a non-periodic string.Then, there is an automaton with at most states that can find the occurrence of in where
Here allows factors that are fixed powers of . This lemma is the main insight of Robson and will be proved later.
The following is a slightly weaker version of Robson’s theorem. I am still confused a bit about his stronger theorem, to be honest.
Theorem 6 (Robson’s Theorem) Suppose that and are distinct strings of length . Then there is an automaton with at most states that accepts and rejects .
Proof: Since and are distinct we can assume that starts with the prefix and starts with the prefix for some string . If the length of is less than order the theorem is trivial. Just construct an automaton that accepts and rejects .
So we can assume that for some strings and where the latter is order in length. By lemma we can assume that is not periodic. So by lemma we get that
Then by lemma we are done.
Proof: Let have length and let be a non-periodic string in of length . Also let . By the overlap lemma it follows that is bounded by .
Let occur at locations
Suppose that we are to construct a machine that finds the copy of . By the hashing lemma there is a prime so that
if and only if . Note we can also assume that .
Let’s argue the special case where is modulo . If it is congruent to another value the same argument can be used. This follows by having the machine initially skip a fixed amount of the input and then do the same as in the congruent to case.
The automaton has states and for . The machine starts in state and tries to get to the accepting state . The transitions include:
This means that the machine keeps checking the input to see if it is scanning a copy of . If it gets all the way to the accepting state , then it stops.
Further transitions are:
and
The second group means that if a wrong input happens, then moves to . Finally, the state resets and starts the search again by going to the start state with an epsilon move.
Clearly this has the required number of states and it operates correctly.
The open problem is: Can the SWP be solved with a better bound? The lower bound is still order . So the gap is exponential.
[Added “Mike”, some typo fixes]
[ From his home page ] |
Jeffrey Shallit is a famous researcher into many things, including number theory and being a skeptic. He has a colorful website with an extensive quotation page—one of my favorites by Howard Aiken is right at the top:
Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.
Today I thought I would discuss a wonderful problem that Jeffrey has worked on.
Jeffrey’s paper is joint with Erik Demaine, Sarah Eisenstat, and David Wilson. See also his talk. They say in their introduction:
Imagine a computing device with very limited powers. What is the simplest computational problem you could ask it to solve? It is not the addition of two numbers, nor sorting, nor string matching—it is telling two inputs apart: distinguishing them in some way.
More formally:
Let and be two distinct long strings over the usual binary alphabet. What is the size of the smallest deterministic automaton that can accept one of the strings and reject the other?
That is, how hard is it for a simple type of machine to tell apart from ? There is no super cool name for the question—it is called the Separating Words Problem (SWP).
Pavel Goralčik and Vaclav Koubek introduced the problem in 1986—see their paper here. Suppose that and are distinct binary words of length . Define to be the number of states of the smallest automaton that accepts and rejects or vice-versa. They proved the result that got people interested:
Theorem 1 For all distinct binary words and of length ,
That is the size of the automaton is asymptotically sub-linear. Of course there is trivially a way to tell the words apart with order states. The surprise is that one can do better, always.
In 1989 John Robson obtained the best known result:
Theorem 2 For all distinct binary words and of length ,
This bound is pretty strange. We rarely see bounds like it. This suggest to me that it is either special or it is not optimal. Not clear which is the case. By the way it is also known that there are and so that
Thus there is an exponential gap between the known lower and upper bounds. Welcome to complexity theory.
What heightens interest in this gap is that whenever the words have different lengths, there is always a logarithmic-size automaton that separates them. The reason is our old friend, the Chinese Remainder Theorem. Simply, if there is always a short prime that does not divide , which means that the DFA that goes in a cycle of length will end in a different state on any of length from the state on any of length . Moreover, the strings and where equals plus the least common multiple of require states to separate. Padding these with s gives equal-length pairs of all lengths giving SEP(x,y).
Some other facts about SWP can be found in the paper:
Point (b) underscores why it has been hard to find “bad pairs” that defeat all small DFAs. All this promotes belief that logarithmic is the true upper bound as well. Jeffrey stopped short of calling this a conjecture in his talk, but he did offer a 100-pound prize (the talk was in Britain) for improving Robson’s bound.
There are many partial results in cases where and are restricted in some way. See the papers for details. I thought I would just repeat a couple of interesting open cases.
How hard is it to tell words from their reversal? That is, if is a word can we prove a better bound on
Recall is the reversal of the word . Of course we assume that is not the same as its reversal—that is, we assume that is not a palindrome.
How hard is it to tell words apart from their cyclic shifts?
How hard is it to tell words from their You get the idea: try other operations on words.
The SWP is a neat question in my opinion. I wonder if there would be some interesting consequence if we could always tell two words apart with few states. The good measure of a conjecture is: how many consequences are there that follow from it? I wonder if there could be some interesting applications. What do you think?
]]>
Self-play and Ramsey numbers
[ Talking about worst case ] |
Avrim Blum is the CAO for TTIC. That is he is the Chief Academic Officer at the Toyota Technological Institute of Chicago. Avrim has and continues to make key contributions to many areas of theory—including machine learning, approximation algorithms, on-line algorithms, algorithmic game theory, the theory of database privacy, and non-worst-case analysis of algorithms.
Today I want to discuss a suggestion of Avrim for research on self-play.
Self-play is the key to many recent AI results on playing games. These results include essentially solving the games Chess, Go, Shogi, forms of poker, and many others. They were solved by algorithms that start with no knowledge of the game, save the rules. The algorithm then learn the secrets of playing the game by self-play: by playing games against itself. For example, the AI chess programs did not know that “a rook is worth more than a pawn.” But they discover that by playing the game over and over. Impressive.
For example, David Sweet on his hacker site says referring to self-play:
This is mysterious to me. If it only played against itself, where did new information come from? How did it know if it was doing well? If I play a game of chess against myself, should I say I did well if I beat myself? But when I beat myself, I also lose to myself. And how could I ever know if I’d do well against someone else?
I was at TTIC last month and over lunch we discussed self-play possibilities for theory problems. I suggested that the planted clique problem might be a potential example. Recall the planted clique problem is the task of distinguishing two types of graphs:
If is large, it is easy to tell these apart—just count the number of edges. If is small enough, it is open if one can tell them apart. The largest clique in a random graph typically has size near . This implies that there is a quasi-polynomial time average-case algorithm: just try all subsets of size around .
My intuition was that a program might be able to exploit self-play to solve planted clique problems. The point is that it is easy, by definition, to generate “yes” and “no” examples for this problem. Note, this is not known for SAT problems—generating hard instances there is not clear. This was my point. Could the AI methods somehow divine the planted clique version of “a rook is worth more than a pawn”? Could they use self-play to solve planted clique problems?
I wondered to the lunch group about all this. It left the group unexcited.
Then Avrim asked a better question. He wondered if self-play methods could be used to solve a long standing problem concerning Ramsey numbers. Recall the Ramsey number is defined as the smallest such every red-green coloring of the edges of the complete graph has either a red or a green subgraph of size at least .
The exact value of is unknown, although it is known to lie between and . See a post by Gil Kalai on his blog for some discussions. Joel Spencer quotes Paul Erdős:
Erdős asks us to imagine an alien force, vastly more powerful than us, landing on Earth and demanding the value of or they will destroy our planet. In that case, he claims, we should marshal all our computers and all our mathematicians and attempt to find the value. But suppose, instead, that they ask for . In that case, he believes, we should attempt to destroy the aliens.
Aliens are not attacking currently, but Avrim’s idea is that perhaps we could organize a self-play attack on the problem. The idea would be to try to build a “game” version of this question. The algorithm would try to create a strategy that finds a red/green coloring for the complete graph so that no -clique is all red or all green.
We need to arrange the computation of the Ramsey number as the result of some type of game. The paradox to me is that the play of a game suggests , while the Ramsey calculation is clearly of complexity . So how do we make the Ramsey calculation into a game? Ken and I wonder if there is a natural game so that playing the game well yields insight into the value of a Ramsey number.
Is there a game with simple rules so that playing it well yields bounds on general Ramsey numbers?
There have been several attempts to use non-standard methods to compute Ramsey numbers. See the following:
See also this survey on computational methods in general.
The asymmetry between the upper and lower bounds shapes approaches. The lower bound of 43 was proved by finding a two-coloring of the edges of without a green or red -clique. Once a single coloring is guessed its property is easy to prove. The improvement of the upper bound from 49 to 48 two years ago needed checking two trillion cases in order to fence-away all possible colorings. This has led to a common belief that 43 is the answer—if it were higher then a coloring of would have been found by now.
Can we use self-play to turn this belief into something more concrete? The training would begin on running self-play on the known cases . This should create a neural net that is highly skilled at finding colorings that are free of small monochrome cliques. The question is how to leverage its presumed failure once we hit size .
Perhaps someone should take Avrim’s suggestion and try it out. A natural idea would be to see if this approach could compute the known smaller Ramsey numbers—getting their exact upper bounds.
]]>