Cropped from Mexican NotiCyTI obit |
Héctor Garcia-Molina died just before Thanksgiving. He was a computer scientist and Professor in both the Departments of Computer Science and Electrical Engineering at Stanford University.
Today Ken and I write in sadness about his passing and to express some personal appreciations.
We at GLL have talked about him before. See this for his story about the fun of using an IBM mainframe for teaching. Or see this for a story about Héctor and meetings.
I, Dick, had the pleasure to have worked with him, while we were both faculty at Princeton in the 1980’s and beyond.
Héctor was the chair of the Stanford Computer Science Department from January 2001 to December 2004. Stanford then rotated the chair so all took their turn. I know that he “hated” being an administrator in general. But of course being a team player he took his turn.
One way to see his real feelings about being a chair was to look at the clock his students constructed for him. The clock was a digital timer that counted down the seconds that remained in his term as the chair. It started at roughly 126227808. I am sure he did fine, but the clock was a statement.
While Héctor was at Princeton we worked together on a project—the MMM project. It led to one of his least cited papers—a 1984 paper with me and Jacobo Valdes. OK, it has 48 citations according to IEEE. The idea of the project was to use memory rather than processors to speed up computations. He was a joy to work with: he was careful, and thoughtful, and just fun to work with on any project.
We did write a second paper with Richard Cullingford instead of Valdes, which Héctor presented at a meeting on knowledge-based management systems. The above Google Books link goes to the end with a discussion in which a simple issue was raised—we paraphrase:
Won’t it take forever just to write all zeros to the memory?
Héctor had a scientist’s answer: the project was still not at the prototype stage so he didn’t know. He said the project should be viewed as a scientific experiment to find out. Maybe he also had an inkling that what was coming was a decade of breakthroughs in CPU design and parallel/pipeline processing after all.
Héctor was unparalleled at seeing the future. I always thought that one of his abilities, one that set him apart, was his ability to predict directions of research. This allowed him, and his students, to write early papers in research areas before they became hot. This is one of the talents that made him so amazing.
I recall way before the world wide web was created he had students working on adding links to documents. I recall a talk by one of his students at Princeton that discussed what we now call URLs. One question that was raised during the talk was: How were the links going to be created? There was a lively discussion about this. Could they be created automatically? If not why would people take the time to create the links? Indeed.
Héctor saw that links would be created. That people would take the effort to create them. I must admit that he was right, and he saw the future better than most. I wish I had a fraction of his ability to see directions like he did.
Héctor told me that when he first got to Stanford the fund managers and investors roamed the halls. They would ask anyone they could if they had an idea for a company or a startup. It was a constant issue that Héctor had to deal with. They were continually trying to steal away students.
He told me he felt like he was the head of an abbey and was always having to protect his charges within the walls.
When the impetus came from within it was different. Of course, Héctor was the advisor of Sergey Brin at the time he and Larry Page conceived Google. Brin and Page found that their search engine prototypes were so good the dataflow was constantly straining Stanford’s machines. They needed to scrounge for more disks and processors to mount their servers. Héctor already oversaw the Stanford Digital Libraries Project and he arranged for funds to purchase spare parts for the data servers.
It is interesting that in this 2001 interview in the SIGMOD Record, Héctor did not have a high opinion of the industrial side of his area:
Again, I don’t think industry really does very much research. They come up with an idea and they try to sell it. If it was a good idea, maybe they will make money. Even if it was a bad idea, if they have good marketing people, they might still make money and we never know … I don’t think they have an advantage over [academics] in testing the ideas and evaluating them and performing measurements and really understanding what are the right techniques.
In the same interview, he had sage advice for students after completing their PhDs and during the tenure process, mainly on the side of not trying to play the system but focus on doing what you love.
Héctor had been a graduate student at Stanford. So his return there as a professor was a kind of homecoming. He was a home-team player in many senses. One of them is shown by this photo:
Stanford University obituary source |
He was a registered Stanford sports photographer. He also taught a course at Stanford on photography. We don’t know if he had special insights on detecting “deepfake” photos and videos.
Our condolences go to all his family. Héctor you will be missed. You are missed.
]]>
Bopuifs fodszqujpo qspcmfn.
“Unsung Entrepreneur” source |
Adolph Ochs was the owner of the New York Times. In 1897 he created the paper’s slogan, “All the News That’s Fit to Print.” We at GLL would like some suggestions on our own slogan. Send us your ideas. Please no suggestion of “All the news that fits we print,” as that is already out there.
Today Ken and I wish to comment on a recent article in the NYT that was on end-to-end encryption.
The article leads by saying:
A Justice Department official hinted on Monday that a yearslong fight over encrypted communications could become part of a sweeping investigation of big tech companies.
Of course, end-to-end encryption scrambles messages so that only the sender and receiver can decode the message. Other methods are weaker: some only encrypt messages as they enter part of the network. This means that one must trust the network to keep your message secret. Thus the end-to-end method reduces the number of parties that one must trust.
In 1912, Ochs was a party to encryption that was literally end-to-end on the globe. The New York Times had bought exclusive American rights to report Roald Amundsen’s expedition to the South Pole. When Amundsen returned to Hobart, Tasmania, he sent a coded cable to his brother Leon who was acting as conduit to the Times and the London Daily Chronicle. The brother pronounced the coast clear for Amundsen to communicate directly to the papers. The stories were still quickly plagiarized once the first one appeared in the Times, and Ochs had to defend his rights with lawsuits.
There is an ongoing interest in using end-to-end encryption to protect more and more of our messages. And this interest leads to several hard problems.
The main one addressed by the NYT article is: Does this type of encryption protect bad actors? Many believe that encryption makes it impossible to track criminals. Many in law enforcement, for example, wish to have the ability to access any messages, at least on a court order. Some countries are likely to make this the law—that is, they will insist that they always can access any message. A followup NYT article described debates within Interpol about these matters.
The above problem is not what we wish to talk about today. We want to raise another problem.
How do we know that our messages are being properly encrypted?
We could check that our app is in end-to-end mode. The app will say “yes”. The problem is that this does not prove anything. The deeper question is how do we know that messages are correctly encrypted. Indeed.
Suppose that we are told that the message has been sent to another person as the encrypted message . How do we know that this has been done? Several issues come to mind:
The app could lie. The app could for example say it is encrypting your message and it did not.
The app could mislead. The app could send an encrypted message and also send the clear message to who ever it wishes.
The app could be wrong. The app could think that the message was properly encrypted. The key, for example, could be a weak key.
The app method could be flawed. The app’s method could be incorrect. The method used might be vulnerable to non or unknown attacks.
Authenticated encryption seems to cover only part of the need. It can confirm the identity of the sender and that the ciphertext has not been tampered with. This is, however, a far cry from verifying that the encryption itself is proper and free of holes that could make it easy to figure out. Our point is also aside from problems with particular end-to-end implementations such as those considered in this 2017 paper.
Bopuifs fodszqujpo qspcmfn was encrypted with the simple key
The point of this silly example is that it might have been encrypted by a harder method, but it was only encrypted by a trivial substitution method. Nevertheless, Google could not figure it out:
]]>
Plus predicaments of error modeling
Cropped from Bacon Sandwich source |
Sir David Spiegelhalter is a British statistician. He is a strong voice for the public understanding of statistics. His work extends to all walks of life, including risk, coincidences, murder, and sex.
Today we talk about extending one of his inventions.
His invention has to do with grading the performance of people and models that make predictions. A scoring rule grades how often predictions are right. But it may not tell how difficult the situations are. It is easy to look good with predictions when they start with a high chance of success. A weather forecaster predicting sunny-versus-rainy will be right more often in Las Vegas than in Boston. Quoting this FiveThirtyEight item:
If you want to have an easy life as a weather forecaster, you should get a job in Las Vegas, Phoenix or Los Angeles. Predict that it won’t rain in one of those cities, and you’ll be right about 90 percent of the time.
In a 1986 paper, for a particular scoring rule defined by Glenn Brier in 1950, Spiegelhalter worked out how to equalize the forecaster grading. He applied his Z-test not to weather as Brier was concerned with but to medical prognoses and clinical trials.
What I am doing with a small group of graduate students in Buffalo is trying to turn Spiegelhalter’s kind of Z-test around once more. If a forecaster fares poorly, we will try to flag not the model but the behavior of the subjects being modeled. In weather we would want to tell when Mother Nature, not the models, has gone off the rails. Well, we are actually looking for ways to tell when a human being has left the bounds of human predictability for reasons that are inhuman—such as cheating with a computer at chess. And maybe it can shed more light on whether our computers can possibly “cheat” with quantum mechanics.
Let’s consider situations in which the number is usually more than , that is, usually more than “rain” or “no rain.” The forecaster lays down projections for the chance of each outcome. If outcome happens, then the Brier score for that forecast is
If the forecaster was certain that would happen and so put , all other , then the score would be zero. Thus lower is better for the Brier score.
If you put probability on the outcome that happened, then you get penalized both for the difference and for the remaining probability which you put on outcomes that did not happen. It is possible to decompose the score in another way that changes the emphasis:
Then is a fixed measure of how you spread your forecasts around, while all the variability in your score comes from how much stock you placed in the outcome that happened. The worst case is having put , whereupon your Brier penalty is .
We would like our forecasts always to be perfect, but reality gives us situations that are inherently nondeterministic—with unknown “true probabilities” . The vital point is that the forecaster should not try to hit on the nose at every time but rather to match the true probabilities. Once we postulate , the expected Brier score is
This is uniquely minimized by setting for each , which defines as a strictly proper scoring rule. Without the second term in (1) the rule would not be proper for . When , becomes equal to . Thus represents an unavoidable prediction penalty from the intrinsic variance. If all are equal, , then the expected score cannot be less than .
A second example, the log-likelihood prediction scoring rule, is in the original longer draft of this post.
Spiegelhalter’s -score neatly drops out the unavoidable penalty term by taking the difference of the score with the expectation. Schematically it is defined as
where means the projected variance . However, here is where it is important to notate the whole series of forecasting situations with outcomes for each . The actual statistic is
The denominator presumes that the forecast situations are independent so that the variances add. The numerator expands to be
The original application is a confidence test of the “null hypothesis” that the projections are good. Thus we plug in for all and so that we test
To illustrate, suppose we do ten independent trials of an event with four outcomes whose true probabilities are . The sum in parentheses is . If the outcomes conform exactly to these probabilities then equals once, twice, three times, and four times. This exactly cancels the , so makes , as expected. Most trials will give a nonzero numerator, but in the long run, the numerator divided by tends toward zero and the denominator scales to match it, thus keeping the -statistic normally distributed.
A high , on the other hand—highly positive or highly negative—indicates that the forecasting is way off. That (2) is an aggregate statistic over independent trials justifies treating the -values as standard scores. This applies also to -tests made similarly from other scoring rules besides the Brier score. The test thus becomes a verdict on the model. High -values on certain subsets of the data may reveal biases.
Our idea is the opposite. Suppose we know that the forecasts are true, or suppose they have biases that are known and correctable over moderately large data sets. We may then be able to fit as an unbiased estimator (of zero) over large training sets. Then it can become a judgment of whether the data has become unnatural.
As I have detailed in numerous posts on this blog, my system for detecting cheating with computers at chess already provides several statistical -scores. Why would I want another one?
The motive involves the presence of multiple strong chess-playing programs, each with its own quirks and distribution of values for moves. They are used in two different ways:
Having multiple engines helps point 1. My intent to blend the values from different engines has been blunted by issues I discussed here. Thus I now have to train my model separately (and expensively) for each (new version of each) program. I can then blend the , but point 2 still remains at issue: My tests measure concordance with a specific program. Originally the program Rybka 3 was primary and Houdini 4B secondary. Now Stockfish 7 is primary and Komodo 10.0 secondary—until I update to their latest versions. The second engine is supposed to confirm a positive result from the first one. This already means that my model is not trying to detect exactly which program was used.
Nevertheless, my results often vary between testing engines. The engines compete against each other and may be crafted to disagree on certain kinds of moves. They agree with each other barely 75–80% in my tests. I would like to factor these differences out.
The Spiegelhalter -test appeals because its reference is not to a particular chess program, but to the prediction quality of my model itself—which per point 1 can be informed by many programs in concert. It gives a way to predicate predictivity. A high value will attest that the sequence of played moves falls outside the range of predictability for human players of the same rated skill level.
To harness for some scoring rule , we need to quantify the nature of my model’s projections. In fact, my model has a clear bias toward conservatism in judging the frequency of particular non-optimal moves. This is discussed in my August post on my model upgrade and shown graphically in an appended note on why the conservative setting of a “gradient” parameter is needed to preserve dynamical stability. The fitting offsets this in a way that creates an opposite bias elsewhere. I hope to correct both biases at the same stroke by a specific means of modeling how the err with respect to the postulated true probabilities .
We postulate an original source of error terms all i.i.d. as , where governs the magnitude of Gaussian noise. This noise can be transformed and related in various ways, e.g.:
There are further forms to consider and it is not yet clear from data within my model which one most applies. We would be interested in examples where these representations have been employed and in observations about their natures.
Given the error terms, we can write each as a function of and . One issue is having at most degrees of freedom among , owing to the constraint that the as well as sum to . We handle this by choosing some fixed as the “pivot” and using the constraints to eliminate and , leaving the other error terms free. In all cases, the proposed method of defining what we notate as is:
If the resulting -scores parameterized by make sense, the last step will be adjusting them to conform to normal distribution, via the resampling process mentioned recently here and earlier here. We are not there yet. But observations from Spiegelhalter tests with (equivalently, with fixed to zero) suggest that the resulting single, authoritative, “pure” predictivity test may rival the sharpness of my current tests involving specific chess programs.
To see a key wrinkle, consider the first error form. It is symmetrical: . When we substitute for and take , the symmetry of around makes it drop out of the numerator of (2), and out of everything in the denominator except one place where becomes . There is hence nothing for to fit and we are basically left with the original Spiegelhalter .
In the second form, however, we get . If we presume small enough to make the distribution of outside negligible, then we can use the series expansion to approximate
Under normal expectation, the odd-power terms drop out (so their signs don’t matter) and we get
This credits as being greater than . Provided the projections for the substituted indices were generally slightly conservative, this has hope of correcting them.
Already, however, we have traipsed over some pitfalls of methodology. One is that the normal expectation
regardless of how small is. For any , regions around the pole get some fixed finite probability. Another is the simple paradox of our second form saying:
is an unbiased estimator of , but is not an unbiased (or even finite) estimator of .
A third curiosity comes from the fourth error form. It gives , so . We have
exactly, without approximation. Again the sign of does not matter. So we get
But by the original fourth equation we get
So we have and , with both expectations being over the same noise terms. This is like the famous Lake Wobegon syndrome. What it indicates is the need for care in where and how to apply these error representations.
Have you seen this idea of directly testing (un)predictability in the literature? Might it improve the currently much-debated statistical tests for quantum supremacy?
Which error model seems most likely to apply? Where have the paradoxes in our last section been noted?
[some wording tweaks]
source |
Xuejun Yang is a Senior Staff Engineer at FutureWei Technologies. He is the DFA on the 2011 paper, “Finding and Understanding Bugs in C Compilers.”
Today Ken and I discuss a clever idea from that paper.
The paper was brought to our attention just now in a meaty comment by Paul D. We thank him for it—the topic interests both of us. We don’t think Paul D. means to be anonymous, but in keeping with that we’ll give just a cryptic hint to his identity: The saying “a man on the make” is widely known, but for more than the millennium he has been the unique person in the world to whom it applies literally. Update 11/20: Turns out we (I, Ken) were wrong about the identity, see this.
Yang was made unique by being listed out of alphabetical order on the paper. This is notable because the most common practice in our field is to list alphabetically irrespective of prominence. Hence we’ve invented the term ‘DFA’ for “Designated” or “Distinguished” First Author. The other authors are Yang Chen, Eric Eide, and John Regehr, all from the University of Utah.
Paul D.’s comment notes that there was evidence that verification methods could improve compiler correctness. By compiler we mean the program that transforms high level code into machine code. These programs are used countless times every day and their correctness is clearly very important.
Their correctness is tricky for several reasons. The main one is that almost all compilers try to optimize code. That is when they transform code into instructions they try to rewrite or rearrange the instructions to yield better performance. Compilers have been doing this forever. The trouble is that changing instructions to increase performance is dangerous. The changes must not affect the values that are computed. If they are not done carefully they can actually make the answers faster, but incorrect. This is the reason correctness is tricky.
Formal verification requires a lot of effort. The highest effort should go into mission-critical software. But compilers are mission-critical already, unless we know mission-critical software won’t be compiled on a particular one. Hence it is notable when formal verification makes a compiler more reliable.
The idea in the paper Paul referenced is quite elegant. They built a program called Csmith. It operates as follows:
Suppose that is a compiler they wish to test. Then generate various legal C programs . For each of these let be the answer that yields. Here is the compiled program. Then check whether is correct.
For example:
int foo (void) { signed char x = 1; unsigned char y = 255; return x > y; }
Some compilers returned , but the correct answer is . There are further examples in a 2012 companion paper and these slides from an earlier version. The Csmith homepage has long lists of compiler bugs they found.
Of course if crashes or refuse to compile then the compiler is wrong. But what happens if is computed. How does Csmith know if the answer is correct? This seems to be really hard. This correctness testing must be automated: the whole approach is based on allowing tons of random programs to be tested. They cannot assume that humans will be used to check the outputs.
This is the clever idea of this paper. They assume that there are at least two compilers say and . Then let be the output of and let be the output of . The key insight is:
If is not equal to , then one of the compilers is wrong.
A very neat and elegant idea. For software in general it is called differential testing.
This at least alerts when there are problems with some compilers and some programs. One can use this trick to discover programs that cause at least some compilers to have problems. This is extremely valuable. It allowed Csmith to discover hundreds of errors in production compilers—errors that previously were missed.
Fuzzing is defined by Wikipedia as testing by “providing invalid, unexpected, or random data as inputs to a computer program.” An early historical example, Apple’s “Monkey” program, worked completely randomly. To ensure that the found bugs are meaningful and analyzable, Csmith needed a deeper, structured, “intelligent” design, not just the generation of Mayhem.
For one, Csmith needed to avoid programs than do not have deterministic behavior. The formal C standards itemize cases in which compilers are allowed to have arbitrary, even self-inconsistent, behavior. There are lots of them in C. A bug with dubious code could be dismissed out of hand.
For another, the probability that a program built haphazardly by the original Csmith version would reveal bugs was observed to peak at about 80KB source-code size, about 1,000 lines across multiple pages. Those don’t make great examples. So Csmith has its own routines to compress bug instances it has found. Simple tricks are shortening numerical expressions to use only the bug-sensitive parts. Others are lifting local variables out of blocks and bypassing pointer jumps.
A third goal is that the generator should branch out to all aspects of the language—in this case, C—not just the “grungy” parts that are ripe for finding compiler bugs. The paper talks about this at length. Regehr, who was Yang’s advisor, is also a blogger. His current post, dated November 4, is titled, “Helping Generative Fuzzers Avoid Looking Only Where the Light is Good, Part 1.” We guess that “Part 2” will go even more into details.
Regarding the formally-verified CompCert compiler, Paul D. quoted from the paper:
The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users.
This August 2019 paper by Michaël Marcozzi, Qiyi Tang, Alastair Donaldson, and Cristian Cadar gives recent results involving Csmith and other tools. They have an interesting discussion on page 2, from which we excerpt:
In our experience working in the area […], we have found compiler fuzzing to be a contentious topic. Research talks on compiler fuzzing are often followed by questions about the importance of the discovered bugs, and whether compiler fuzzers might be improved by taking inspiration from bugs encountered by users of compilers “in the wild.” Some … argue that any miscompilation bug, whether fuzzer-found or not, is a ticking bomb that should be regarded as severe, or avoided completely via formal verification (in the spirit of CompCert).
They go on to say, however, that when a fully-developed compiler is used for non-critical software, the kinds of bugs typically found by fuzzing tend to have questionable importance. Their paper is titled, “A Systematic Impact Study for Fuzzer-Found Compiler Bugs.”
So far they have found definite results that seem to have mixed implications. In their future-work section they note that they have evaluated the impact of bugs in compilers on the intended function of programs they compile, but not on possible security holes—which as we noted in our Cloudflare post can come from (misuse of) simple code that is completely correct. This leads us further to wonder, coming full-circle, whether formal methods might help quantify the relative importance of aspects of a language and areas of a compiler to guide more-intelligent generation of test cases.
The above comment is interesting, but perhaps finding obscure bugs is important. Perhaps such bugs could be used to attack systems. That is perhaps some one could use them to break into a system. Security may be compromised by any error, even an unlikely one to occur in the wild.
What do you think?
]]>
Models of the primes
[ Montreal ] |
Andrew Granville writes brilliant papers that explain hard results in number theory. He also proves hard results in number theory.
Today, Ken and I use the famous Goldbach conjecture to discuss a third rail: how to identify which results “should be” true even though they have been too hard to prove.
Granville has just recently published a graphic novel, Prime Suspects: The Anatomy of Integers and Permutations, with his sister Jennifer Granville and illustrator Robert Lewis. It features constables named Green and Tao, a detective Jack von Neumann, and students named Emmy Germain and Sergei Langer among other allusions to real (and complex) mathematicians. It grew out of a 2009 musical play that premiered at IAS.
The driver of the plot is a deep connection between the frequency of primes below a number and that of permutations in that are “prime” in the sense of having only one cycle. Substituting for and vice-versa tends to create correspondences of known results in number theory vis-à-vis permutation group theory. See this MAA review. Going beyond the known theorems described in the novel, we wonder how far such heuristic sleuthing methods can go on long-unsolved cases.
The statement we know as the (Strong) Goldbach Conjecture is that every even number can be written as the sum of two prime numbers. It was made in 1742 by Christian Goldbach. He wrote to Leonhard Euler:
“Every integer that can be written as the sum of two primes can also be written as the sum of as many primes as desired, until all terms are .”
Well, that is not what we call Goldbach’s conjecture. He and many others at the time considered to be a prime number. What he’s getting at can be seen from this snippet of his letter:
Above his sums, Goldbach put an asterisk * to the note in the margin, in which he asserts his conjecture:
“Every integer greater than 2 can be written as the sum of three primes.”
Wait—that is not the Goldbach conjecture either. It is the “Weak” one and was apparently proved in 2013 by Harald Helfgott. We discussed this in a 2014 post whose larger theme we are continuing here. It also proves the first conjecture, but not the strong conjecture.
But what Goldbach seems to be driving at with his drawing of sums is having one of the “primes” be . Then the strong conjecture is needed. Euler pointed this out in his reply to Goldbach’s letter. But Euler, who was a saint in many ways, charitably reminded Goldbach of a communication earlier that year when Goldbach had observed that his first conjecture followed from the strong statement. Euler went on to say:
“That every even number should be a sum of two primes I hold to be a completely certain theorem, irrespective of my not being able to prove it.”
Ken has translated this a little differently from Wikipedia’s article and its source, reading into Euler’s words the stance of truth shining apart from proof. How one can justify this stance is what we want to discuss.
The conjecture is curious on several fronts. For one it is usually said to be “obviously correct.” It has been checked to about or so by computation. There are many open conjectures in number theory that are likely to be true. But few are claimed to be “true” with such a strong bias. Many other conjectures are likely to be true, but none as likely as the Goldbach.
In 1975, Hugh Montgomery and Robert Vaughan proved that the Goldbach is true for most even numbers. That is that the number of possible even numbers less than some are not sums of two primes grows like . Thus if one picks a random even number it is likely to be the sum of two primes. Here the “likely” is a mathematical certainty.
How do we “know” that it is likely to be true? One source is the method of prime models. Primes are quite mysterious and hard to understand. So there are heuristic models that suggest we think of the primes as “random”. Of course this is silly, since the primes are a deterministic fixed sequence of numbers. But the hope is that the following is true.
If is a statement about the primes that is with high probability in the random model, then it is true.
Of course this is nonsense.
But it is interesting nonsense. Harald Cramér has a model that is simple. Granville add some refinements to this model here and here. More recently William Banks and Kevin Ford and Terence Tao have a new model for the primes here.
These models are useful for making and thinking about number theory conjectures. Perhaps one day they will be able to really be used to determine truth. They are certainly good heuristics to have when studying the prime numbers. We are jealous. In complexity theory it would be wonderful to have anything like these models. Perhaps
Cramér’s model is simple to state. Imagine that the primes are replaced by a random set by placing in with probability . And we make these choices independently. The Fermat numbers are those
The first of these
are prime. Fermat thought this continued but it is not true. Euler showed that the next is not a prime
An interesting problem is are there any more prime Fermat numbers? Many believe that there are no more, or art most there are a finite number in total. Let’s look at using the model to understand the Fermat numbers:
.
Therefore, the total expected number of Fermat primes is at most finite. Of course this is assuming the model is predictive.
Our friends at Wikipedia say:
As with many famous conjectures in mathematics, there are a number of purported proofs of the Goldbach conjecture, none of which are accepted by the mathematical community.
Try a Google search yourself for “Goldbach conjecture proved”. The top hits include several “proofs” that the conjecture is true. The proofs are all short and simple. All are believed to be wrong. I find it interesting that the proofs, in many cases, use a random like argument in there “proofs”. The trouble is that the above models are only heuristics. So the proofs seem to be incomplete.
Can we imagine getting heuristic models for complexity theory? For quantum algorithms perhaps. What would such heuristic models even look like? We wonder.
]]>Britannica source |
Paul Painlevé was a French mathematician who specialized in classical mechanics. He is known for a conjecture in 1895 about anomalies in Isaac Newton’s equations that was proved in 1988.
Tonight, Ken and I celebrate Halloween with some mathematical horrors that may have real physical consequences.
Painlevé is also known for having been Prime Minister of France for two short stints in 1917 and 1925. He was also Minister of War under President Raymond Poincaré, a cousin of Henri Poincaré. He was the first Frenchman in powered flight—alongside Wilbur Wright in 1908—and at the end of his life headed the French Air Force.
A basic fact about Newtonian gravity is that the gravitational potential energy of two point masses and is proportional to
where is the distance between them. We usually think of how gravity becomes weaker as grows but when is tiny it becomes quite strong. Since the potential is negative, it is possible for an individual particle in a finite -body system to accelerate to arbitrary speed without violating the conservation of energy. But can it happen in a finite amount of time—and without actually becoming zero in a collision?
Painlevé proved a ‘no’ answer for but suspected ‘yes’ for . Zhihong Xia proved in 1988 it can happen for , extending earlier advances by his thesis advisor, Donald Saari. The case is still open; in all cases, the initial conditions for it to happen form a measure-zero subset of the configuration space.
The difference from is shown by this diagram from their 1995 AMS Notices paper, which has a greatly understandable telling of the whole story. We, however, envision a fantasy story about what could have happened much earlier.
We want to imagine that Xia’s result was proved not near the end of the 20th century but near the start—in particular, before Albert Einstein’s creation of General Relativity, but after Special Relativity. Say the result was obtained in 1908 by Painlevé after his flight with Wright. That same year, Hugo von Zeipold proved a startling consequence of Painlevé’s conjecture, which we now know to be a theorem:
Theorem 1 Under Newtonian gravity, a -body set of point masses can eject a particle to infinity in finite time.
That is, without collisions, a Newtonian -body problem of point masses can create separations between particles that grow to infinity within a finite time. Saari and Xia say that the effect can be partially seen if you do the following:
Place a tennis ball on top of a basketball and drop the pair from chest height. The tennis ball rebound is pretty dramatic—enough so that it should be done outside.
Ken tried it and it works. It does not work so well if the tennis ball is replaced by a piece of Halloween candy. If the basketball is replaced by a pumpkin, it definitely will not work.
We know already, however, from Einstein’s theory of special relativity that it cannot work. We would have an instant before the singular time at which the point mass has just passed the speed of light. At that time it has acquired infinite energy according to special relativity, but cannot have withdrawn it from the potential by then.
In 1908 this would have been an internal indication that Newton’s theory of gravity must break down. There were of course external indicators, such as the theory’s incorrect prediction of the orbit of Mercury. Maybe internal ones were known, but this seems most glaring in retrospect. Are we right in this interpretation? The result by Xia is shocking enough as it stands, and makes us wonder what other surprises lurk in equations.
Other possibilities of singularities happening within finite time have not been ruled out by any physical theories. The New York Times Magazine in July 2015 ran a profile titled “The Singular Mind of Terry Tao” with a great opening sentence:
This April, as undergraduates strolled along the street outside his modest office on the campus of the University of California, Los Angeles, the mathematician Terence Tao mused about the possibility that water could spontaneously explode.
Indeed, Tao proved it can happen under a plausible modification of the Navier-Stokes equations of fluid dynamics. He writes there:
Intriguingly, the method of proof in fact hints at a possible route to establishing blowup for the true Navier-Stokes equations, which I am now increasingly inclined to believe is the case (albeit for a very small set of initial data).
As with Painlevé’s conjecture—Xia’s theorem—the point would be not that the initial conditions could happen with any perceptible probability, but that our world is capable of their happening at all. At least, that is, with the equations by which we describe our world. Our own mention at the start of 2015 alluded to the possibility of Tao’s blowup applying to fluid-like fields in cosmological theories.
Just last week, a column on the Starts With a Bang blog alerted us to an issue with equations that could be skewing cosmological theories today. The blog is written by Ethan Siegel for Forbes and its items are linked regularly on RealClear Science. Siegel draws an analogy to a phenomenon with Fourier series that feeds into things Dick and I (Ken writing this part) have already been thinking about. Things used all the time in theory…
The Fourier phenomenon is named for the American physicist Josiah Gibbs, but Gibbs was not the original discoverer. We could add this to our old post on a law named for Stephen Stigler, who did not discover it, that no scientific law is named for its original discoverer. The first discoverer of the Gibbs Phenomenon was—evidently—Henry Wilbraham in 1848. Through the magic of the Internet we can convey it by lifting Wilbraham’s original figure straight from his paper:
What this shows is the convergence of a sum of sine waves to a square wave. The convergence is pointwise except at the jump discontinuities , but it is not uniform. The -values do not converge on any interval crossing but instead rise about above the square wave function value in the vicinity of , no matter how many terms are summed in the approximations . Wilbraham’s middle drawing depicts this in finer detail than any other rendering I have found. The persistent overshoot is physically real—it is the cause of ringing artifacts in signal processing.
Now the convergence does satisfy a criterion of -approximation of a function that we use all the time in theory: for any and large enough , except for an fraction of in the domain. This kind of convergence is used internally in the proofs of quantum algorithms for linear algebra which we recently discussed. If the value is explicitly what you’re after, this is fine. But if you use the value only implicitly while composing the approximation with some other function , you must beware that the compositions are not thrown off in a constant way by the overshoots.
Siegel draws attention to what is alleged as something similar actually happening to current physical theories that use a well-known class of algorithmic simulations. The details are in a new paper by Anton Baushev and Sergey Pilipenko, titled “The central cusps in dark matter halos: fact or fiction?” To show the relation between this and the opening example in this post, we need only quote the paper (page 2)—
However, the present state-of-art of -body simulation tests … can hardly be named adequate. The commonly-used criterion of the convergence of -body simulations in the halo center is solely the density profile stability [per which] the central cusp (close to ) is formed quite rapidly ().
—and then quote Siegel’s own description of the core-cusp problem and the allegation:
In theory, matter should fall into a gravitationally bound structure and undergo what’s known as violent relaxation, where a large number of interactions cause the heaviest-mass objects to fall towards the center (becoming more tightly bound) while the lower-mass ones get exiled to the outskirts (becoming more loosely bound) and can even get ejected entirely.
Since similar phenomena to the expectations of violent relaxation were seen in the simulations, and all the different simulations had these features, we assumed that they were representative of real physics. However, it’s also possible that they don’t represent real physics, but rather represent a numerical artifact inherent to the simulation itself.
Is all this real physics? Or is it artifacts showing that current theories and/or algorithms are flawed? Whichever is the truth, our equations have tricks that may not lead to treats.
For a relevant postscript, Painlevé did not abandon physics when he rose in politics. In 1921, he effectively removed an apparent singularity at the event horizons of black holes in general relativity. He and colleagues discussed this with Einstein the following year, but apparently not the -body conjecture. How far are we right in our interpretation that the (proved) conjecture plus special relativity suffices to disprove Newtonian gravity by itself?
]]>
What it takes to understand and verify the claim
Cropped from 2014 Wired source |
John Martinis of U.C. Santa Barbara and Google is the last author of a paper published Wednesday in Nature that claims to have demonstrated a task executed with minimum effort by a quantum computer that no classical computer can emulate without expending Herculean—or Sisyphean—effort.
Today we present a lay understanding of the claim and discuss degrees of establishing it.
There are 76 other authors of the paper. The first 75 are alphabetical, then comes Hartmut Neven before Martinis. Usually pride of place goes to the first author, but that depends on size. Martinis is also the corresponding author. The cox in a rowing race rides at the rear. We have discussed aspects of papers with a huge number of authors here.
Three planks of a quantum supremacy claim are:
Scott Aaronson not only has made two great posts on these and many other aspects of the claim, he independently proposed in 2015 the sampling task that was programmed, and he analyzed it in a foundational paper with Lijie Chen of MIT. Researchers at Google had already been thinking along those lines, and they anchored the team composed from numerous other institutions as well. As if on cue—just a couple days before Wednesday’s announcement—a group from IBM put out a post and paper taking issue with the argument for the third plank.
We’ll start with the task and go in order 1-3-2.
Any -qubit quantum circuit and input to induces a probability distribution on . Because it will not matter if we prepend up to NOT gates to , we may suppose . Then is a unit complex vector of length with entries corresponding to possible outputs . Then the probability of getting by a final measurement of all qubits is
Next we consider probability distributions that are generated uniformly at random by the following process, for some and taking :
for to :
choose a uniformly at random;
increment its probability by .
Here we intend to be the number of binary nondeterministic gates in the circuit. In place of Hadamard gates the experimental circuits get their nondeterminism from these three single-qubit gates (ignoring global phase for in particular):
Here where and is another name for NOT. The difference from using Hadamard gates matters to technical analysis of the distributions but the interplay between quantum nondeterministic gates and classical random coins remains in force.
The choice of , , or is itself uniformly random at each point where a single-qubit gate is used, except for not repeating the same gate on the same qubit, and those choices determine . Now we can give an initial statement of the task tailored to what the paper achieves:
Given randomly-generated quantum circuits as inputs, distinguish with high probability from any .
In more detail, the object is to take a number and moderately large integer , both dictated by practical elements of the experiment, and fulfill this task statement:
Given randomly-generated , generate samples such that .
It’s important to note that there are two stages of randomness: one over which chooses , and then the stage of measuring after (perhaps imperfectly) executing . The latter can be repeated to get a large sample of strings for a given . The nature of the former stage matters most to justifying how to interpret tests of the samples and to closing loopholes. Our does not signify having uniform distribution in the latter sampling, but rather covers classical alternatives in the former stage that (with overwhelming probability) belong to a class we call . The for random will (again w.o.p.) belong to a class which we explain next.
In honor of the baseball World Series, we offer a baseball analogy. To make differences sharper to see, we take , so . This is not what the experiment does: their biggest instance has 20 layers totaling nondeterministic single-qubit gates (plus two-qubit gates) on the qubits. But let us continue.
We are distributing units of probability among “batters” . A batter who gets two units hits a double, three units makes a triple, and so on. The key distinction is between the familiar batting average and the slugging average, which averages all the bases scored with hits:
Thus with respect to a random , and without any knowledge of , a chosen team of hitters cannot expect to have a joint slugging average higher than . Moreover, for any fixed , the chance of getting a slugging average higher than tails away exponentially in (provided also grows).
With respect to , however, a quantum device can do better. Google’s device programs itself given as the blueprint. So it just executes and measures all qubits to sample the output. Finding its own heavy hitters is what a quantum circuit is good at. The probability of getting a hitter who hits a triple is magnified by compared to a uniform choice. Moreover, will never output a string with zero hits—a “can’t miss” property denied to a classical reader of . For large the probability distribution approaches and the slugging expectation is approximately
That is, a team drafted by sampling from random quantum circuits expects to have a slugging average near . This defines the class . If works perfectly, the average will surpass whenever with near certainty as grows.
Google’s circuits have up to , so . Then the “can’t miss” aspect of the quantum advantage is less sharp but the approximation is closer and the idea of is the same. The nature of can actually be seen from point intensities in speckle patterns of laser light:
The practical challenge is that the implementation of is not perfect. The consequence of an error in the final output is severe. The heavy-hitter outputs of a random are generally not bit-wise similar, so sampling their neighbors is like sampling uniform distribution. As the paper says, “A single bit or phase flip over the course of the algorithm will completely shuffle the speckle pattern and result in close to zero fidelity.”
Their circuits are sufficiently random that effects of sporadic errors over millions of samples can be modeled by a simple equation using quantum mixed states. We shortcut the paper’s physical analysis by drawing on John Preskill’s illustration of a de-polarizing channel in chapter 3 of his wonderful online notes on quantum computation to reach the same equation (1). The modeling has informative symmetry when the errors of a bit flip, phase flip, or both are considered equally likely with probability . The action on the entangled pair in the Bell basis is given by the density matrix evolution where
where and is the density matrix of the completely mixed two-qubit state which is just a classical distribution. This presumes ; note that completely mixes the Bell basis already. The fidelity of to the original state is then given by
This modeling already indicates that with serial opportunities for error the fidelity will decay as . The Google team found low ‘crosstalk’ between qubits and they used exactly this expression in the form , evidently with being the native gate error rate they call and where having for single-qubit gates supplies the factor .
The error for the two-qubit gates is similarly represented. (The full modeling in the supplement, section V, is more refined.)
By observing their benchmarks (discussed below) for varying small they could calculate the decay concretely and hence estimate values of for the vast majority of runs with larger . The random nature of the circuits evidently makes covariance of errors that could systematically upset this modeling negligible. Thus they can conclude that their device effectively samples from the distribution
Such distributions can be said to belong to the class . The paper reports that their is driven below but stays above in trials. This bounds the range of the they can separate by. That is separated from zero achieves the first plank and starts on the second. The third needs attention first, however.
Both concrete and asymptotic complexity evidence matter for the third plank, the former for now and the latter for how and everything else may scale up in the future. In asymptotic complexity, we still don’t know that and , which sandwich the quantum feasible class , are different. Thus asymptotic evidence about polynomial bounds must be conditional. Asymptotic evidence about linear time bounds can be sharper but then tends to be conditioned on forms of SETH in ways we still find puzzling.
Lower bounds in concrete complexity are less known and have a self-defeating aspect: We are trying to say that any program run for less than an infeasible time must fail. But we can’t run for time to show that it fails because time is just as infeasible as time . The best we can do is run for a feasible , either (i) on a smaller task size, or (ii) on the original task but argue it doesn’t show progress. Neither is the same; we made some attempts on (ii).
What the paper does instead is argue that a particular classical approach (also from the Aaronson-Chen paper) would take 10,000 years on today’s hardware. This reminds us of a famous 1977 “Mathematical Games” column by Martin Gardner, which quotes an estimate by Ron Rivest that for factoring a 126-digit number on then-current hardware, “the running time required would be about 40 quadrillion years!” It took only until 1994 for this to be broken. Sure enough, IBM calculated that a more-clever implementation of on the Summit supercomputer would take under 3 days. The point is not so much that the Summit hardware is comparable as that estimates based on what are currently thought to be the best possible (classical) methods need asterisks.
On the asymptotic side, the last section (XI) of the paper’s 66-page supplement proves a theorem toward showing that a classical simulation from that scales polynomially with would collapse to , and similarly for sub-exponential running times. It does not get all the way there, however: improvements would need to be made in upper bounds for approximation and for worst-case to average-case equivalence. [Added 10/31/19: see this new paper by Scott A. and Sam Gunn.] Moreover, there is a difference from what their statistical testing achieves that we try to explain next.
We can cast the second plank in the general context of predictive modeling. Consider a forecaster who places estimates on the true probabilities of various events. Here we need to compute the probabilities of output strings observed from the physical device, using the given circuit and the estimate of . This must be done classically, and incurs the “-versus-” issue discussed above.
But before we get to that issue, let’s say more from the viewpoint of predictive modeling. We measure how well the forecasts conform to the true by applying a prediction scoring rule. If outcome happens, then the log-likelihood rule assesses a penalty of
This is zero if the outcome was predicted with certainty but goes to infinity if the individual is very low—which is an issue in the quantum case. The expected score based on the true probabilities is
The log-likelihood rule is strictly proper insofar as the unique way to minimize is to set for each . In human contexts this means the model has incentive to be as accurate as possible.
The formula (2) is the cross-entropy from the distribution to the distribution. Before we can use it, we need to ask a question:
What is forecasting what? Is the device the imperfect model and do the “true ” come from the analysis of giving ?
This is how it appeared to us and seems from other writing, but we can argue the opposite from first principles: The physical device is the “ground truth” however it works. The assertion that it is executing a blueprint with some estimated loss in fidelity is really the model. Then it follows that is analogous to not , and we can call it . Since we can compute it, we can calculate in (2).
Saying this leaves “” in (2) as denoting the device’s true probabilities of giving the output strings . These are not directly observable: it is infeasible to sample the device often enough to expect a sufficiently large number of repeated occurrences based on the “birthday” threshold. Thus there appears to be no way to estimate individual values in (2), but this doesn’t matter: the very act of sampling the device carries out the “” part of (2). Summing for the that occur over a large but feasible number of trials gives estimates of (2) that are close enough to make the needed distinctions with high confidence. We can then match the estimate of against the theoretical estimate, which we may call , assuming accurate knowledge of . By the scoring function being strictly proper, this match entails achieving with sufficient approximation. This property of the goal mitigates some of the modeling issue.
This issue was clarified by reading Ryan O’Donnell’s 2018 notes on quantum supremacy, which preview this same experiment. The above view on which is “forecaster”/”forecastee” might defend the team against his opinion that it “kind of gets its stats backwards”—but the inability to compute cross-entropy from the blueprints’ distribution to that of the device remains an issue. What the team did instead, however, is shift to something simpler they call “linear cross-entropy.” They simply show that the from their samples collectively beat the “” that applies to —more simply put, that when summed over -many trials ,
This just boils down to giving a z-score based on the modeling for . It is analogous to how I (Ken writing this) test for cheating at chess. We are blowing a whistle to say the physical device is getting surreptitious input from quantum mechanics to achieve a strength of compared to a “classical player” who is “rated” as having strength .
The difference from showing that the device’s score from (2) is within a hair of is that this is based on . To be sure, the paper shows that their -scores conform to those one would expect an “-rated” device to achieve. But this is still not the same as (2). Whether it is tantamount for enough purposes—including the theorem about —is where we’re most unsure, and we note distinctions between fully (classically) sampling and “spoofing” the statistical tests(s) raised by Scott (including directly in reply to me here) and others. The authors say that using “linear cross-entropy” gave sharper results and that they tried other (unspecified) measures. We wonder how much of the space of scoring rules familiar in predictive modeling has been tried, and whether rules having more gentle tail behavior for tiny than might do better.
Finally, there is the issue that the team were able to verify exactly only for circuits up to qubits and/or with levels, not with levels. This creates a dilemma in that IBM’s paper may push them toward or , but that increases the gap from instance sizes they can verify. This also pushes away from the possibly of observing the nature of more directly by finding repeated strings in the second-stage sampling of a fixed . The “birthday paradox” threshold for repeats is roughly samples, which might be feasible for around (given the classical work needed for each , which IBM’s cleverness might speed) but not above . The distinguishing power of repeats drops further with . We intend to say more about these last few points, and we are sure there are many chapters still to write about supremacy experiments.
Is the evidence so far convincing to you? Is enough being done on the third plank to exclude possible clever classical use of the fact that the circuits are given as “white boxes”? Are there possible loopholes?
We would also be grateful to know where we may have oversimplified our characterization of the task and our analysis of the issues.
[Added more error-modeling details to the real-world section; some minor word changes; clarified how X,Y,W are chosen; addendum to clarify modeling issues; 10/31/19: removed addendum after blending it into a revision of the last main section—original version preserved here—and linked new Aaronson-Gunn paper.]
[ Harvard ] |
Harry Lewis is known for his research in mathematical logic, and for his wonderful contributions to teaching. He had two students that you may have heard of before: a Bill Gates and a Mark Zuckerberg.
Today I wish to talk about a recent request from Harry about a book that he is editing.
The book is the “Classic Papers of CS” based on a course that he has been teaching for years. It will contain 46 papers with short introductions by Harry. My paper from 1977 with Alan Perlis and Rich DeMillo will be included. The paper is “Social Processes and Proofs of Theorems and Programs”.
Harry says that “A valued colleague believes this paper displays such polemical overreach that it should not appear in this collection”. I hope that it does still appear anyway. Harry goes on to say
And though verification techniques are widely used today for hardware designs, formal verification of large software systems is still a rarity.
Indeed.
I have mixed feeling about our paper, which is now getting close to fifty years old. I believe we had some good points to make then. And that these are still relevant today. Our paper starts with:
Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think.
Our point was just this: Proofs in mathematics and not just formal arguments that show that a theorem is correct. They are much more. They must show why and how something is true. They must explain and extend our understanding of why something is true. They must do more than just demonstrate that something is correct.
They must also make it clear what they claim to prove. A difficulty we felt, then, was that care must be given to what one is claiming to prove. In mathematics often what is being proved is simple to state. In practice that is less clear. A long complex statement may not correctly capture what one is trying to prove.
Who proves that the specification is correct?
I have often wondered why some do not see this point. That proofs are more than “correctness checks”. I thought I would list some “proofs” of this point.
The great Carl Gauss gave the first proof of the law of quadratic reciprocity. He later published six more proofs, and two more were found in his posthumous papers. There are now over two hundred published proofs.
So much to say that a proof is just a check.
Thomas Hales solved the Kepler conjecture on sphere packing in three-dimensional Euclidean space. He faced some comments that his proof might not be certain—it was said to be . So he used formal methods to get a formal proof. But
Maryna Viazovska solved the related problem in eight dimensions. Her proof is here. The excitement of this packing result is striking compared with Hales’s result. No need for correctness checks in her proof.
Henry Cohn says here:
One measure of the complexity of a proof is how long it takes the community to digest it. By this standard, Viazovska’s proof is remarkably simple. It was understood by a number of people within a few days of her arXiv posting, and within a week it led to further progress: Abhinav Kumar, Stephen Miller, Danylo Radchenko, and I worked with Viazovska to adapt her methods to prove that the Leech lattice is an optimal sphere packing in twenty-four dimensions. This is the only other case above three dimensions in which the sphere packing problem has been solved.
So, a proof that a great proof is a proof that helps create new proofs of something else. Okay, nasty way to say it. What mean is a great proof is one that enables new insights, that enables further progress, that advances the field. Not just a result that “checks” for correctness.
The famous ABC conjecture of Joseph Oesterle and David Masser has been claimed by Shinichi Mochizuki. Arguments continue about his proof. Peter Scholze and Jakob Stix believe his proof is flawed and is unfixable. Mochizuki claims they are wrong.
Will a formal proof solve this impasse? Perhaps not. A proof that explains why it is true might. A proof that advances number theory elsewhere might, a proof that could solve other problems would likely.
What do you think about the role of proofs? Did we miss the point years ago?
Will formal verification become effective in the near future? And when it does, will it help provide explanations? We note this recent discussion of a presentation by Kevin Buzzard of Imperial College, London, and one-day workshop on “The Mechanization of Math” which took place two weeks ago in New York City.
[Typo fixed]
]]>[ Royal Society ] |
Sir Timothy Gowers is a Fields medalist and fellow blogger. Sometimes he (too) writes about simple topics.
Today I would like to talk about a simple problem that came up recently.
The problem is a simple to state “obvious fact”. The reason I thought you might be interested is that I had a tough time finding the solution. I hope you find the explanation below interesting.
The general issue of proving obviously true statements is discussed here for example. Here is an example from Gowers:
Let be intervals of real numbers with lengths that sum to less than , then their union cannot be all of .
He says:
It is quite common for people to think this statement is more obvious than it actually is. (The “proof” is this: just translate the intervals so that the end point of is the beginning point of , and so on, and that will clearly maximize the length of interval you can cover. The problem is that this argument works just as well in the rationals, where the conclusion is false.)
The following simple problem came up the other day. Suppose that is an odd number. Show there is some so that
Here is the gcd, the greatest common divisor of and . For example,
This result seems totally obvious, must be true, but I had trouble finding a proof.
There is a unproved conjecture in number theory that says: There are an infinite number of so that both and are prime. This clearly shows that there is an for our problem.
I like conjectures like this since they give you an immediate insight that a statement is likely to be true. But we would like a proof that does not use any unproved conjectures. Our problem can be viewed as a poor version of some of these conjectures. Suppose that you have a conjecture that there are an infinite number of so that
are all prime for some given functions . Then the poor version is to prove that there are so that these numbers are all relatively prime to some given . There are some partial results to the prime version by Ben Green and Terence Tao.
My initial idea was to try to set to something like . The point is that this always satisfies the first constraint: that is for any . Then I planned to try and show there must be some that satisfies the second constraint. Thus the goal is to prove there is some so that
But this is false. Note that if divides then
and so is always divisible by . Ooops.
My next idea was to set to a more “clever” value. I tried
Here I thought that I could make special and control the situation. Now
This looked promising. I then said to myself that why not make a large prime . Then clearly
Since and are relatively prime by the famous Dirichlet’s Theorem on arithmetic progressions we could make a prime too by selecting . This would satisfy the second constraint, and we are done.
Not quite. The trouble is that we need to have also that
Now this is
The trouble is that might not be relatively prime to . So we could just This seems like a recursion and I realized that it might not work.
I finally found a solution thanks to Delta Airlines. My dear wife, Kathryn Farley, and I were stuck in DC for several hours waiting for our delayed flight home. This time was needed for me to find a solution.
The key for me was to think about the value . It is usually a good idea to look at the simplest case first. So suppose that , then clearly the constraints
are now trivial. The next simplest case seems to be when is a prime. Let’s try . Now works. Let’s generalize this to any prime . The trick is to set so that
Then is equal to modulo , which is not divisible by . This shows that when is an odd prime there is always some .
Okay how do we get the full result? What if is the product of several primes? The Chinese remainder theorem to the rescue. Suppose that is divisible by the distinct odd primes . We can easily see that we do not care if there are repeated factors, since that cannot change the relatively prime constraints.
Then we constraint by:
and
for all primes . Then the Chinese remainder theorem proves there is some . Done.
Is there some one line proof of the problem? Do you know any references? There are several obvious generalizations of this simple problem, perhaps someone might look into them.
]]>
Cracking a Diophantine problem for 42 too
Andrew Booker is a mathematician at the University of Bristol, who works in analytic number theory. For example he has a paper extending a result of Alan Turing on the Riemann zeta function. Yes our Turing.
Today Ken and I will talk about his successful search for a solution to a 64 year old problem.
He was inspired by a video on the search problem authored by Tim Browning and Brady Haran. The question was to find a solution to
Booker found
The search was for all possible solutions with bounded by . Note that this is expensive, and is not even close to polynomial time in the number of bits. But it is feasible today thanks to modern technology:
The total computation used was approximately core-years over one month of real time.
Before we turn to our discussion, note that Booker’s paper on extending Turing is really a result on proof checking. Turing had great intuition, terrible that we lost him so early. He, Turing, essentially proved the first result ever on how to efficiently check a computation. Booker says:
Turing introduced a method for certifying the completeness of a purported list of zeros of that is guaranteed to work (when the list is in fact complete). Turing’s method has remained a small but essential ingredient in all subsequent verifications of RH and its many generalizations.
That is checking the zeroes of the Riemann zeta function .
Speaking of checking, when I was drafting this I initially had the wrong solution:
which is wrong. Can you quickly see why this cannot be right? Answer at the end.
The press love Booker’s result. Not the one on the zeta function, the one on the number .
Part of the excitement is caused by the number . In complexity theory we rarely see explicit numbers—more likely to see expressions like
and worse.
The press seem to like the numerology of . The number is quite neat:
Most important is the connection with Rolling-Rock beer:
The press from Newsweek and other sites talked about Booker’s solution. See here and here. And here at the Quanta magazine with a great diagram:
One said:
To crunch the numbers, he then used a cluster of powerful computers – 512 central processing unit (CPU) cores at the same time – known as Blue Crystal Phase 3. When he returned to his office one morning after dropping his children off at school, he spotted the solution on his screen. “I jumped for joy,” he recalled.
Another reported,
Booker said: “This one’s right at the boundary between what we know how to prove and what we suspect might be undecidable.”
I hope we will get the same coverage for our big results.
The press love Booker’s result. Not the one on the zeta function, the one on the number . This search was jointly led by Andrew Sutherland of MIT.
Part of the excitement is caused by the number . In complexity theory we rarely see explicit numbers—more likely to see expressions like
and worse.
The press seem to like the numerology of . The number is quite neat:
Most important is the connection with The Hitchhiker’s Guide to the Galaxy:
The press from New Scientist and other sites talked about Booker’s solution. See here and here. But here the Quanta magazine seems not to have mentioned the number at all in over three months:
One said:
Of course, it wasn’t simple. The pair had to go large, so they enlisted the aid of the Charity Engine, an initiative that spans the globe, harnessing unused computing power from over 500,000 home PCs to act as a sort of “planetary supercomputer.”
It took over a million hours of computing time, but the two mathematicians found their solution:
Another reported:
Booker said: “I feel relieved … we might find what we’re looking for with a few months of searching, or it might be that the solution isn’t found for another century.”
I hope we will get the same coverage for our big results.
Booker and Sutherland also discovered that
This is the next-largest solution after and . Weird. And the first solution not to duplicate a number. And it uses two numbers that agree to markedly more decimal places than those in the above solutions for and . Weirder.
Booker wanted to search for a solution to
Actually his main interest was in , but his method is general. How does one do this for bounded by . The obvious method is: Try all numbers below .
This is too expensive and requires time. Too much, even with a cluster of fast processors.
An improvement is to try all in the range and then check that is cube. This runs in time. Still too much.
A key insight is to re-write the equation as
Then we note that must be a divisor of . Since there are few such divisors, we can improve the time greatly. For the divisors of some simple algebra and the quadratic formula shows that
and
This shows that the search is now reduced to . Still too much, but close to doable. The next trick is to avoid the factoring step. See Booker’s paper for the rest of the search description.
I like the progression of time bounds from
Can one beat ? Could there be an algorithm that runs in for some ? Can any of our tricks apply here? A possible observation: Booker is clever but he writes that the methods use not
time, but that they use
time. Maybe we can help in some manner. What do you think? The next unsolved number, , awaits.
§
Answer to the question on checking: Take the numbers modulo .
becomes
[Typo fixed]
]]>