Peter Landweber, Emanuel Lazar, and Neel Patel are mathematicians. I have never worked with Peter Landweber, but have written papers with Larry and Laura Landweber. Perhaps I can add Peter one day.
Today I want to report on a recent result on the fiber structure of continuous maps.
The paper by Landweber, Lazar, and Patel (LLP) is titled, “On The Fiber Diameter Of Continuous Maps.” Pardon me, but I assume that some of you may not be familiar with the fiber of a map. Fiber has nothing to do with the content of food or diets, for example. Fibers are a basic property of a map.
Their title does not give away any suggestion that their result is relevant to those studying data sets. Indeed even their full abstract only says at the end:
Applications to data analysis are considered.
I just became aware of their result from reading a recent Math Monthly issue. The paper has a number of interesting results—all with some connection to data analytics. I must add that I had not seen it earlier because of a recent move, and the subsequent lack of getting US mail. Moves are disruptive—Bob Floyd used to tell me that “two moves equal a fire”—and I’ve just moved twice. Oh well.
The fiber of a map at is the set of points so that . The diameter of a fiber is just what you would expect: the maximum distance of the points in the fiber. LLP prove this—they say they have a “surprisingly short proof” and give earlier sources for it at the end of their paper:
Theorem: Let be a continuous function where . Then for any , there exists whose fiber has diameter greater than .
The following figure from their paper conveys the essence of the proof in the case :
For one might expect a difficult dimension-based agument. However, they leverage whatever difficult reasoning went into the following theorem by Karol Borsuk and Stanislaw Ulam. We have mentioned both of them multiple times on this blog but never this theorem:
Theorem: Let be any continuous function from the -sphere to . Then there are antipodal points that give the same value, i.e., some on the sphere such that .
The proof then simply observes that -spheres of radius live inside for any , and arbitrarily large . The antipodal points belong to the same fiber of but are apart.
Why should we care about this theorem? That’s a good question.
One of the main ideas in analytics is to reduce the dimension of a set of data. If we let the data lie in a Euclidean space, say , then we may wish to map the data down to a space of lower dimension. This yields lots of obvious advantages—the crux is that we can do many computational things on lower-dimensional data that would be too expensive on the original -dimensional space.
The LLP result shows that no matter what the mapping is, as long as it is continuous, there must be points that are far apart in the original space and yet get map to the exactly same point in the lower space. This is somewhat annoying: clearly it means there will always be points that the map does not classify correctly.
One of the issues I think raised by this work on LLP is that within areas like big-data people can work on it from many angles. I think that we do not always see results from another area as related to our work. I believe that many people in analytics are probably surprised by this result, and I would guess that they may have not known about the result previously. This phenomenon seems to be getting worse as more researchers work on similar areas, but come at the problems with different viewpoints.
Can we do a better job at linking different areas of research? Finally, with respect, this seems like a result that could have been proved decades ago? Perhaps one of the great consequences of new areas like big data is to raise questions that were not thought about previously.
[fixed typo R^m, corrected picture of Landweber, added note on sources for main theorem]
Noam Chomsky is famous for many many things. He has had a lot to say over his long career, and he wrote over 100 books on topics from linguistics to war and politics.
Today I focus on work that he pioneered sixty years ago.
Yes sixty years ago. The work is usually called the Chomsky hierarchy(CH) and is a hierarchy of classes of formal grammars. It was described by Noam Chomsky in 1956 driven by his interest in linguistics, not war and politics. Some add Marcel-Paul Schützenberger’s name to the hierachry. He played a crucial role in the early development of the theory of formal languages—see his joint
paper with Chomsky from 1962.
We probably all know about this hierarchy. Recall grammars define languages:
One neat thing about this hierarchy is that it has long been known to be strict: each class is more powerful than the previous one. Each proof that the next class is more powerful is really a beautiful result. Do you know, offhand, how to prove each one?
I have a simple question:
Should we still teach the CH today?
Before discussing let me explain some about grammars.
In the 1950’s people started to define various programming languages. It quickly became clear that if they wanted to be precise they needed some formal method to define their languages. The formalism of context-free grammars of Noam Chomsky was well suited for at least defining the syntax of their languages—semantics were left to “English,” but at least the syntax would be well defined.
Another milestone in the late 1950s was the publication, by a committee of American and European computer scientists, of “a new language for algorithms”; the ALGOL 60 Report (the “ALGOrithmic Language”). This report consolidated many ideas circulating at the time and featured several key language innovations: Perhaps one of the most useful was a mathematically exact notation, Backus-Naur Form (BNF), that was used to describe their grammar. It is not more expressive than context-free grammars, but is more user friendly and variants of still it are still used today.
I must add a story about the power of defining the syntax of a language precisely. Jeff Ullman moved from Princeton to Stanford many years ago in 1979. I must thank him, since his senior position was the one that I received in 1980. Jeff was a prolific writer of textbooks already then and used an old method from Bell Labs, TROFF, to write his books. On arrival at Stanford he told me that he wanted to try out the then new system that Don Knuth had just created in 1978—of course that was the TeX system. Jeff tried the system out and liked it. But then he asked for the formal syntax description, since he wanted to be sure what the TeX language was. He asked and the answer from Knuth was:
There is no formal description. None.
Jeff was shocked. After all Knuth had done seminal work on context-free grammars and was well versed in formal grammars—for example Knuth invented the LR parser (Left to Right, Rightmost derivation). TeX was at the time only defined by what Knuth’s program accepted as legal.
Let’s return to my question: Should we still teach the CH today?
It is beautiful work. I specially think the connection between context-free languages and pushdown automata is wonderful, non-obvious, and quite useful. Context free and pushdown automata led to Steve Cook’s beautiful work on two-way deterministic pushdown automata (2DPDA). He showed they could be simulated in linear time on a random-access machine.
This insight was utilized by Knuth to find a linear-time solution for the left-to-right pattern-matching problem, which can easily be expressed as a 2DPDA:
This was the first time in Knuth’s experience that automata theory had taught him how to solve a real programming problem better than he could solve it before.
The work was finally written up and published together with Vaughan Pratt and James Morris several years later.
And of course context sensitive languages led to the LBA problem. This really was the question whether nondeterministic space is closed under complement. See our discussion here.
Should I teach the old CH material? Or leave it out and teach more modern results? What do you think? Do results have a “teach-by-date?”
[Fixed paper link]
Non-technical fact-check source |
Dan Brown is the bestselling author of the novel The Da Vinci Code. His most recent bestseller, published in 2013, is Inferno. Like two of his earlier blockbusters it has been made into a movie. It stars Tom Hanks and Felicity Jones and is slated for release on October 28.
Today I want to talk about a curious aspect of the book Inferno, since it raises an interesting mathematical question.
Brown’s books are famous for their themes: cryptography, keys, symbols, codes, and conspiracy theories. The first four of these have a distinctive flavor of our field. Although we avoid the last in our work, it is easy to think of possible conspiracies that involve computational theory. How about these: certain groups already can factor large numbers, certain groups have real quantum computers, certain groups have trapdoors in cryptocurrencies, or …
The book has been out for awhile, but I only tried to read it the other day. It was tough to finish so I jumped to the end where the “secret” was exposed. Brown’s works have sold countless copies and yet have been attacked as being poorly written. He must be doing something very right. His prose may not be magical—whose is?—but his plots and the use of his themes usually makes for a terrific “cannot put down” book.
Well I put it down. But I must be the exception. If you haven’t read the book and wish to do so without “spoilers” then you can put down this column.
The Inferno is about the release of a powerful virus that changes the world. Before I go into the mathematical issues this virus raises I must point out that Brown’s work has often been criticized for making scientific errors and overstepping the bounds of “plausible suspension of disbelief.” I think it is a great honor—really—that so many posts and discussions are around mistakes that he has made. Clearly there is huge interest in his books.
Examples of such criticism of The Inferno have addressed the DNA science involved, the kind of virus used, the hows of genetic engineering and virus detection, and the population projections, some of which we get into below. There is also an entire book about Brown’s novel, Secrets of Inferno
However, none of these seems to address a simple point that we hadn’t found anywhere, until Ken noticed it raised here on the often-helpful FourmiLab site maintained by the popular science writer John Walker. It appears when you click “Show Spoilers” on that page, so again you may stop reading if you don’t wish to know.
How does the virus work? The goal of the virus is to stop population explosion.
The book hints that it is airborne, so we may assume that everyone in the world is infected by it—all women in particular. Brown says that 1/3 are made infertile. There are two ways to think about this statement. It depends on the exact definition of the mechanism causing infertility.
The first way is that when you get infected by the virus a coin is flipped and with probability 1/3 you are unable to have children. That is, when the virus attacks your original DNA there is a 1/3 chance the altered genes render you infertile. In the 2/3-case that the virus embeds in a way that does not cause infertility, that gets passed on to children and there is no further effect. In the 1/3-case that the alteration causes infertility, that property too gets passed on. Except, that is, for the issue in this famous quote:
Having Children Is Hereditary: If Your Parents Didn’t Have Any, Then You Probably Won’t Either.
Thus the effect “dies out” almost immediately; it would necessarily be just one-shot on the current generation.
The second way is that the virus allows the initial receiver to be fertile but has its effect when (female) children are born. In one third of cases the woman becomes infertile, and otherwise is able to have children when she grows up.
In this case the effect seems to work as claimed in the book. Children all get the virus and it keeps flipping coins forever. Walker still isn’t sure—we won’t reveal here the words he hides but you can find them. In any event, the point remains that this would become a much more complex virus. And Brown does not explain this point in his book—at least I am unsure if he even sees the necessary distinctions.
The other discussions focus on issues like how society would react to this reduction in fertility. Except for part of one we noted above, however, none seems to address the novel’s mathematical presumptions.
The purpose of the virus is to reduce the growth rate in the world’s population. By how much is not clear in the book. The over-arching issue is that it is hard to find conditions under which the projection of the effect is stable.
For example, suppose we can divide time into discrete units of generations so that the world population of women after generations follows the exponential growth curve . Ignoring the natural rate of infertility and male-female imbalance and other factors for simplicity, this envisions women having female children on average. The intent seems to be to replace this with women having female children each, for in the next generation. This means multiplying by , so
becomes the new curve. The problem is that this tends to zero unless , whereas the estimates of that you can get from tables such as this are uniformly lower at least since 2000.
The point is that the blunt “1/3” factor of the virus is thinking only in such simplistic terms about “exponential growth”—yet in the same terms there is no region of stability. Either growth remains exponential or humanity crashes. Maybe the latter possibility is implicit in the dark allusions to Dante Alighieri’s Inferno that permeate the plot.
In reality, as our source points out, it would not take much for humanity to compensate. If a generation is 30 years and we are missing 33% of women, then what’s needed is for just over 3% of the remaining women to change their minds about not having a child in any given year. We don’t want to trivialize the effect of infertility, but there is much more to adaptability than the book’s tenet presumes.
Have you read the book? What do you think about the math?
]]>
Some CS reflections for our 700th post
MacArthur Fellowship source |
Lin-Manuel Miranda is both the composer and lyricist of the phenomenal Broadway musical Hamilton. A segment of Act I covers the friendship between Alexander Hamilton and Gilbert du Motier, the Marquis de Lafayette. This presages the French co-operation in the 1781 Battle of Yorktown, after which the British forces played the ballad “The World Turned Upside Down” as they surrendered. The musical’s track by the same name has different words and melodies.
Today we discuss some aspects of computing that seem turned upside down from when we first learned and taught them.
Yesterday was halfway between our Fourth of July and France’s Bastille Day, and was also the last day of Miranda performing the lead on-stage with the original Hamilton company. They are making recordings of yesterday’s two performances, to be aired at least in part later this year. A month ago, Miranda wrote an op-ed in the New York Times against the illegal (in New York) but prevalent use of “bots” to snap up tickets the moment they become available for later marked-up resale.
This is also the 700th post on this blog. It took until 1920 for a Broadway show of any kind to reach 700 performances. The Playbill list of “Long Runs on Broadway” includes any show with 800 or more performances. That mark is within our reach, and our ticket prices will remain eminently reasonable.
This list is just what strikes us now—far from exhaustive—and we invite our readers to add opinions about examples in comments.
Forty-five years ago, Dick Karp showed how the difficulty of SAT represented by NP-completeness spreads to other natural problems. As the number of complete problems from many areas of science and operations research soared into the thousands by the 1979 publication of the book Computers and Intractability, people regarded NP-completeness as tantamount to intractability.
Today the flow is in the other direction—as expressed for instance in this talk by Moshe Vardi. Dick Karp himself has been among many in the vanguard—I remember his talk on practical solvability of Hitting-Set problems at the 2008 LiptonFest and here is a relevant paper. We now reduce problems to SAT in order to solve them. SAT-solvers that work in many cases are big business. In some whole areas the SAT-encodings of major problems are well-behaved, as we remarked about rank-aggregation and voting theory in the third section of this post from last October. The solvers can even tackle huge problems. Marijn Heule, Oliver Kullback, and Victor Marek proved that every 2-coloring of the interval has a monochromatic Pythagorean triple in a proof of over 200 terabytes in uncompressed length.
Quadratic time is notionally on the low end of polynomial time, and “polynomial time” has long been used as a synonym for “easy.” But as the amount of data we can and need to handle has mushroomed, the difference in scaling between quasi-linear and quadratic is more and more felt. This difference has even been argued for cryptographic security. A particular definition of quasi-linear time is time as named by Claus Schnorr for his theorem on quasi-linear completeness of SAT; see also this.
In genomics the quadratic time of algorithms for full edit-distance measures is felt enough to warrant approximative methods, as we covered in our memorial a year ago for Alberto Apostolico. This also puts meaning behind theoretical evidence that time for edit distance cannot be improved.
These two items seem to contradict each other, but point up a difference in scale between data and logical control. Often a thousand data points are nothing. A formula with a thousand clauses can say a lot.
My first doctoral student had been working on neural networks before I became his advisor in 1991, and I remember the feeling of their being under a cloud. The so-called AI Winter traced in part to lower bounds shown against certain shallow neural nets in the 1969 book Perceptrons by Marvin Minsky and Seymour Papert. We discussed complexity aspects of this in our memorial of Minsky last January.
Since then what has emerged is that composing a bunch of these nets, as in a convolutional neural network (CNN), is both feasible and algorithmically effective. The recent breakthrough on playing Go is just a headline among many emerging applications of CNNs and larger systems. We are not saying neural nets and deep learning are the be-all or anything more than a “cartoon” of the brain, but rather noting them among many reasons that AI and machine learning are resurgent.
The same AI-winter article on Wikipedia mentions the collapse of Lisp-dedicated systems in 1987, and more widely, many companies devoted to data-parallel architectures “left nothing but their logos on coffee mugs” as a colleague once put it. Subsequently I perceived signs of stagnation in functional languages in the late 1990s and early 00s. This lent a ghostly air to John Backus’s famous 1978 Turing Award lecture, “Can Programming Be Liberated From the von Neumann Style?”
Unlike the revenant in last year’s award-winning movie of that name, this one has come back with a different body. Not a large-scale dedicated machine system, but rather the pan-spectral pervasion we call the Cloud. A great lecture we heard by Mike Franklin on Amplab activities highlighted the role of programs written in the functional language Scala running on the Apache Spark framework.
A common thread in all these items is the combined efficacy and scalability of algorithmic primitives whose abstract forms characterize quasi-linear time: sorting, parallel prefix sum (as one of several forms of map-reduce), convolution, streaming count-sketching, and the like.
We considered mentioning some subjects that have seen changes such as digital privacy and block ciphers, but maybe these are not so “upside-down.” Doubtless we are missing many more. What developments in computing have carried shock on the order of the discovery that neutrinos have mass in particle physics? We invite your suggestions and opinions.
Here also is a web folder of photos from Dick’s wedding and honeymoon.
[added link to Vardi on SAT solving]
Richard Lipton is, among so many other things, a newlywed. He and Kathryn Farley were married on June 4th in Atlanta. The wedding was attended by family and friends including many faculty from Georgia Tech, some from around the country, and even one of Dick’s former students coming from Greece. Their engagement was noted here last St. Patrick’s Day, and Kathryn was previously mentioned in a relevantly-titled post on cryptography.
Today we congratulate him and Kathryn, and as part of our tribute, revisit a paper of his on factoring from 1994.
They have just come back from their honeymoon in Paris. Paris is many wonderful things: a flashstone of history, a center of culture, a city for lovers. It is also the setting for most of Dan Brown’s novel The Da Vinci Code and numerous other conspiracy-minded thrillers. Their honeymoon was postponed by an event that could be a plot device in these novels: the Seine was flooded enough to close the Louvre and Musée D’Orsay and other landmarks until stored treasures could be brought to safe higher ground.
It is fun to read or imagine stories of cabals seeking to collapse world systems and achieve domination. Sometimes these stories turn on scientific technical advances, even purely mathematical points as in Brown’s new novel, Inferno. It needs a pinch to realize that we as theorists often verge on some of these points. Computational complexity theory as we know it is asymptotic and topical, so it is a stretch to think that papers such as the present one impact the daily work of those guarding the security of international commerce or investigating possible threats. But from its bird’s-eye view there is always the potential to catch a new glint of light reflected from the combinatorial depths that could not be perceived until the sun and stars align right. In this quest we take a spade to dig up old ideas anew.
Pei’s Pyramid of the Louvre Court = Phi delves out your prime factor… (source) |
The paper is written in standard mathematical style: first a theorem statement with hypotheses, next a series of lemmas, and the final algorithm and its analysis coming at the very end. We will reverse the presentation by beginning with the algorithm and treating the final result as a mystery to be decoded.
Here is the code of the algorithm. It all fits on one sheet and is self-contained; no abstruse mathematics text or Rosetta Stone is needed to decipher. The legend says that the input is a product of two prime numbers, is a polynomial in just one variable, and refers to the greatest-common-divisor algorithm expounded by Euclid around 300 B.C. Then come the runes, which could not be simpler:
Exiting enables carrying out the two prime factors of —but a final message warns of a curse of vast unknowable consequences.
How many iterations must one expect to make through this maze before exit? How and when can the choice of the polynomial speed up the exploration? That is the mystery.
Our goal is to expose the innards of how the paper works, so that its edifice resembles another famous modern Paris landmark:
This is the Georges Pompidou Centre, whose anagram “go count degree prime ops” well covers the elements of the paper. Part of the work for this post—in particular the possibility of improving to —is by my newest student at Buffalo, Chaowen Guan.
Let with and prime. To get the expected running time, it suffices to have good lower and upper bounds
and analogous bounds for . Then the probability of success on any trial is at least
This lower bounds the probability of , whereupon gives us the factor .
We could add a term for the other way to have success, which is . However, our strategy will be to make and hence close to by considering cases where but is still large enough to matter. Then we can ignore this second possibility and focus on . At the end we will consider relaxing just so that is bounded away from .
Note that we cannot consider the events and to be independent, even though and are prime, because and may introduce bias. We could incidentally insert an initial text for without affecting the time or improving the success probability by much. Then conditioned on its failure, the events and become independent via the Chinese Remainder Theorem. This fact is irrelevant to the algorithm but helps motivate the analysis in part.
This first analysis thus focuses the question to become:
How does computing change the sampling?
We mention in passing that Peter Shor’s algorithm basically shows that composing certain (non-polynomial) functions into the quantum Fourier transform greatly improves the success of the sampling. This requires, however, a special kind of machine that, according to some of its principal conceivers, harnesses the power of multiple universes. There are books and even a movie about such machines, but none have been built yet and this is not a Dan Brown novel so we’ll stay classically rooted.
Two great facts about polynomials of degree are:
The second requires the coefficients of to be integers. Neither requires all the roots to be integers, but we will begin by assuming this is the case. Take to be the set of integer roots of . Then define
where as usual . The key point is that
To prove this, suppose the random belongs to but not to . Then for some , so , but since . There is still the possibility that is a nonzero multiple of , which would give and deny success, but this entails and so this is accounted by subtracting off .
Our lower bound will be based on . There is one more important element of the analysis. We do not have to bound the running time for all , that is, all pairs of primes. The security of factoring being hard is needed for almost all . Hence to challenge this, it suffices to show that is large in average case over . Thus we are estimating the distributional complexity of a randomized algorithm. These are two separate components of the analysis. We will show:
For many primes belonging to a large set of primes of length substantially below , where is the length of , is “large.”
We will quantify “large” at the end, and it will follow that since is substantially greater, is “tiny” in the needed sense. Now we are ready to estimate the key cardinality .
In the best case, can be larger than by a factor of . This happens if for every root , the values do not hit any other members of . When this happens, itself can be as large as . Then
By a similar token, for any , if , then —or in general,
The factor of is the lever by which to gain a higher likelihood of quick success. When will it be at our disposal? It depends on whether is “good” in the sense that and also on itself being large enough.
For each root and define the “strand” where There are always distinct values in any strand. If then every strand has most as non-roots. There is still the possibility that —that is, such that —which would prevent a successful exit. This is where really comes in, attending to the upper bound .
The Paris church of St. Sulpice and its crypt (source) |
Hence what can make a prime “bad” is having a low number of strands. When and the strands and coincide—and this happens for any other such that divides .
Here is where we hit the last important requirement on . Suppose where is the product of every prime other than . Then and coincide for every prime . It doesn’t matter that is astronomically bigger than or ; the strands still coincide within and within .
Hence what we need to do is bound the roots by some value that is greater than any we are likely to encounter. The is not too great: if we limit to of some same given length as that of , then so . We need not impose the requirement but must replace above by where . We can’t get in trouble from such that divides and divides since then divides already. This allows the key observation:
For any distinct pair , there are at most primes such that divides .
Thus given we have “slots” for primes . Every bad prime must occupy a certain number of these slots. Counting these involves the last main ingredient in Dick’s paper. We again try to view it a different way.
Given , and replacing the original with , ultimately we want to call a prime bad if , where . We will approach this by calling “bad” if there are strands.
For intuition, let’s suppose , . If we take as the paper does, then we can make bad by inserting it into three slots: say , , and . We could instead insert a copy of into , , and , which lumps into one strand and leaves free to make two others. In the latter case, however, we also know by transitivity that divides , , and as well. Thus we have effectively used up not slots on . Now suppose instead, so “bad” means getting down to strands. Then we are forced to create at least one -clique and this means using more than slots. Combinatorially the problem we are facing is:
Cover nodes by cliques while minimizing the total number of edges.
This problem has an easy answer: make all cliques as small as possible. Supposing is an integer, this means making -many -cliques, which (ignoring the difference between and ) totals edges. When is constant this is , but shows possible ways to improve when is not constant. We conclude:
Lemma 1 The number of bad primes is at most .
We will constrain by bounding the degree of . By this will also bound relative to so that the number of possible strands is small with respect to , which will lead to the desired bound on . Now we are able to conclude the analysis well enough to state a result.
Define to mean problems solvable on a fraction of inputs by randomized algorithms with expected time . Superscripting means having an oracle to compute from for free. If is such that the time to compute is and this is done once per trial, then a given algorithm can be re-classified into without the oracle notation.
Theorem 2 Suppose is a sequence of polynomials in of degrees having integer roots in the interval , for all . Then for any fixed , the problem of factoring -bit integers with belongs to
provided and .
Proof: We first note that the probability of a random making is negligible. By the Chinese Remainder Theorem, as remarked above, gives independent draws and and whether depends only on . This induces a polynomial over the field of degree (at most) . So the probability of getting a root mod is at most
which is exponentially vanishing. Thus we may ignore in the rest of the analysis. The chance of a randomly sampled in a strand of length coinciding with another member of is likewise bounded by and hence ignorable.
The reason why the probability of giving a root in the field is not vanishing is that is close to . By , we satisfy the constraint
The condition ensures that this is the actual asymptotic order of . Since we are limiting attention to primes of the same length as , the “” above can be to base . Hence has the right order to give that for some and constant fraction of primes of length , the success probability of one trial over satisfies
Hence the expected number of trials is . The extra in the theorem statement is the time for each iteration, i.e., for arithmetic modulo and the Euclidean algorithm.
It follows that if is also computable modulo in time, and presuming so that , then factoring products of primes whose lengths differ by just a hair is in randomized average-case polynomial time. Of course this depends on the availability of a suitable polynomial . But could be any polynomial—it needs no relation to factoring other than having plenty of distinct roots relative to its degree as itemized above. Hence there might be a lot of scope for such “dangerous” polynomials to exist.
Is there a supercomputer under the Palais Royale? (source) |
Dick’s paper does give an example where a with specified properties cannot exist, but there is still a lot of play in the bounds above. This emboldens us also to ask exactly how big the “hair” needs to be. We do not actually need to send toward zero. If a constant fraction of the values get bounced by the event, then the expected time just goes up by the same constant factor.
We have tried to present Dick’s paper in an “open” manner that encourages variations of its underlying enigma. We have also optically improved the result by using rather than as in the paper. However, this may be implicit anyway since the paper’s proofs might not require “” to be constant, so that by taking one can make for any desired factor . Is all of this correct?
If so, then possibly one can come even tighter to for the length of . Then the question shifts to the possibilities of finding suitable polynomials . The paper “Few Product Gates But Many Zeroes,” by Bernd Borchert, Pierre McKenzie, and Klaus Reinhardt, goes into such issues. This paper investigates “gems”—that is, integer polynomials of degree having distinct integer roots and minimum possible circuit complexity for their degree—finding some for as high as 55 but notably leaving open. Moreover, the role of a limitation on the magnitude of a constant fraction of a gem’s roots remains at issue, along with roots exceeding having many relatively small prime factors.
Finally, we address the general case with rational coefficients (and ). If in lowest terms then means (divided by ) so the algorithm is the same. Suppose is a rational root in lowest terms and does not divide , nor the denominator of any Then we can take such that for some and define . This gives
which we write as . Then . Because is a root, it follows that is a sum of terms in which each numerator is a multiple of and each denominator is not. So in lowest terms where possibly . Thus either yields or falls into one of two cases we already know how to count: is another root of or we have found a root mod . Since behaves the same as for all , we can define integer “strands” as before. There remains the possibility that strands induced by two roots and coincide. Take the inverse for and resulting integer , then the strands coincide if . This happens iff . Multiplying both sides by gives
so it follows that divides the numerator of in lowest terms. Thus we again have “slots” for each distinct pair of rational roots and each possible prime divisor of the numerator of their difference. Essentially the same counting argument shows that a “bad” must fill of such slots. The other ways can be bad include dividing the denominator of a root or the denominator of a coefficient —although neither way is mentioned in the paper it seems the choices for and in the above theorem leave just enough headroom. Then we just need to be a bound on all numerators and denominators involved in the bad cases, arguing as before. Last, it seems as above that only a subset of the roots with constant (or at least non-negligible) is needed to obey this bound. Assuming this sketch of Dick’s full argument is airtight and works for our improved result, we leave its possible further ramifications over the integer case as a further open problem.
Update 7/10: >
I’ve made a web folder of photos from Dick’s wedding and honeymoon.
[linked wedding announcement, clarified nature of Shor’s phi, added more about gems, linked photos]
Anna Gilbert and Atri Rudra are top theorists who are well known for their work in unraveling secrets of computation. They are experts on anything to do with coding theory—see this for a book draft by Atri with Venkatesan Guruswami and Madhu Sudan called Essential Coding Theory. They also do great theory research involving not only linear algebra but also much non-linear algebra of continuous functions and approximative numerical methods.
Today we want to focus on a recent piece of research they have done that is different from their usual work: It contains no proofs, no conjectures, nor even any mathematical symbols.
Their new working paper is titled, “Teaching Theory in the time of Data Science/Big Data.” As you might guess it is about the role of theory in the education of computer scientists today. The paper contains much information that they have collected on what is being taught at some of the top departments in computer science, and how the current immense interest in Big Data is affecting classic theory courses.
A short overview of what they find is:
The above is leading to pressure to delete and/or modify theory courses. From Atri’s CS viewpoint and Anna’s as Mathematics faculty active in the theory community, both wish to see CS majors obtain degrees that leave them well versed in CS in general and theory in particular. Undergraduates in programs with a CS component should likewise be well served in formal and mathematical areas. Is this possible given the finite constraints on the curriculums? It is not clear, but their paper shows what is happening right now with theory courses (plus linear algebra and probability/statistics), what is being planned for the near future, and some options that may be useful to consider.
§
For the purpose of this post, we made some edits to their text which follows, with their permission. Some changes were stylistic and some more content-oriented. Their PDF version linked as above may evolve over time—especially upon success of their appeal for reader input at the end. So to obtain a complete and current picture please visit their paper too.
Now Anna and Atri speak:
The genesis of this article is a conversation between the two authors that started six weeks ago. One of us (Anna) was giving a talk at an NSF workshop on Theoretical Foundations of Data Science (TFoDS) and the other (Atri) was thinking about changes to the Computer Science (henceforth CS) curriculum that his department at the University at Buffalo is considering. Anna’s talk at NSF, which included data on theory courses at top ranked schools, generated a great deal of interest in knowing even more about the state of theory courses. This was followed by more data collection on our part.
This post is meant as a starting point of discussion on how we teach theory courses, especially in the light of the increased importance of data science. It is not a position paper—it does not argue that the current trends are inherently good or bad, nor does it prescribe any silver bullet. We do suggest some possible courses of action around which discussion can begin.
CS enrollments as well as the numbers of CS majors have increased exponentially in the last few years. In 2014, Ed Lazowska, Eric Roberts, and Jim Kurose exhibited the trend in the former, not only majors. Their graphs in Figure 1 show the trend in introductory CS course enrollments at six institutions in the years 2006–2014.
Figure 1. Enrollment trends in introductory CS sequences at six institutions (Stanford, MIT, University of Pennsylvania, Harvard, University of Michigan, and University of Washington) from 2006–2014. |
Lazowska’s presentation has more detailed statistics and a discussion of the potential implications of these increases. These trends remain valid in 2016, for example as shown by the following chart for the University at Buffalo. In addition to total number of CSE majors, it shows the enrollment in CSE 115 (the introduction to CSE course), CSE 191 (Discrete Math), CSE 250 (Data Structures), CSE 331 (Algorithms) and CSE 396 (Theory of Computation), all of which are required of all CS majors:
Figure 2. Enrollment trends, University at Buffalo CSE 8/08–5/16, with total majors. |
As enrollments out-pace hiring, class sizes have exploded. Lazowska points out that over 10% of Princeton’s majors are CS majors, while it is highly unlikely that 10% of Princeton’s faculty will ever be CS faculty. At the same time, many institutions are re-evaluating and changing their theoretical computer science (henceforth TCS) course requirements and content.
The twin pressures of staffing and content are shifting priorities in both the material covered and how it is covered—e.g., reducing emphasis on proofs and essay-type problems which are harder to grade. We are not judging these shifts or tying them directly to enrollments, but are for now observing that they are happening and impact a large (and increasing) number of students.
The changes in course content, in emphasis on particular TCS components, and in overall CS requirements (including mathematics and statistics) are occurring exactly when there is a big move towards “computational thinking” in many fields and a national emphasis on STEM education more broadly. Not only are the fundamental backgrounds of incoming CS majors thereby changing, but the CS audience is expanding to students in other fields that are benefiting from solid computational foundations. With the increasing role of data and concomitant needs for machine learning and statistics, it is important to obtain a deep understanding of the mathematical foundations of data science. Traditional TCS has been founded on discrete mathematics, but “continuous” math—especially as related to statistics, probability, and linear algebra—is increasingly important in ways also reflected by cutting-edge TCS research.
We considered the top 20 CS schools according to the US News ranking of graduate programs, numbering 24 including ties. It may be inappropriate to use the graduate program rankings to consider the undergraduate program requirements, and it should be noted that the rankings cover all of the graduate program not just TCS, but this is a reasonable starting point. We sent colleagues a short survey and collected data (available spreadsheet) on these 24 schools. Since several include Engineering in one department as at Buffalo or a separate department as at Michigan we will use `CSE’ as the collective term.
We counted the total number of theory courses that all CS majors have to take within the CSE department and then calculated the fraction over the total number of required courses. We categorized the theory courses under these bins:
The bounds are not sharp—a Data Structures course always covers algorithms associated to the data structures and may overlap with an Algorithms course especially when graphs are covered—and Algorithms often includes some complexity theory, especially NP-completeness. In our spreadsheet these columns are followed by the number of theory electives—besides these required courses—that all CS majors have to take. We would like to clarify four things:
We begin with statistics on the total number of semesters of theory courses that are currently required of all CS majors, standardly equating 3 quarters or trimesters to 2 semesters. The basic statistics are in Table 1.
The median number of semester-long courses was three. All but one school requires a discrete math course, all but two require a Data Structures course, and all but nine require an Algorithms course. Eight schools require a Theory of Computation course separate from Algorithms. All these schools have a significant programming component in their Data Structures course. Only one, Cornell, currently adds programming assignments in the required algorithms course. We would like to remind the reader that we are only considering TCS courses required of all CS majors—for instance, CS 124/125 at Harvard has programming assignments but is not required of all CS majors.
We limited attention to cases where courses in Probability/Statistics and/or Linear Algebra are required of all CS majors but taught outside of CSE. We focus on these two courses since they are most relevant to data science.
Probability/Statistics. Of those surveyed, nineteen schools required a Probability/Statistics course, while five did not. Five had developed a specific required course within the CSE department (Stanford, Berkeley, UIUC, Univ. of Washington, and MIT), three had choices among courses both inside and outside the CSE department, and eleven required a course outside CSE. Of the five institutions that did not require a Probability/Statistics course, two (Univ. of Wisconsin and Harvard) listed such a course among electives in Mathematics. Princeton, Yale, and Brown do not list such a course.
Linear Algebra. Sixteen surveyed schools require a Linear Algebra course, out of 24 total. Of the 16, only Brown and Columbia provide a linear algebra course within CSE that satisfies the requirement, though both allow for non-CSE linear algebra courses.
After reflecting on the data in relation to our initial observations about increasing CS enrollments and emphasis on computational thinking across disciplines, we dug deeper and asked people further questions about changes they have seen or are discussing at their institutions. Of eight departments responding (as of 6/10/16):
Four universities changed their Mathematics requirements in the last 10 years. These changes are primarily to require fewer semesters of Calculus II or III (e.g., some no longer require Ordinary Differential Equations) and, instead, require Linear Algebra and/or Probability/Statistics (whether inside the CSE department or not). Two institutions plan to make changes in the future, likely to require Linear Algebra.
We suggest that now is the time to re-think some of the theory curriculum, to work with our colleagues in Mathematics and Statistics, and to develop mathematical foundations classes that are appropriate both for CS majors and STEM majors more broadly. Especially for CS majors, this exposure should come no later than junior year. Here are some starting points for this discussion.
Our goal is to educate the different students at our respective institutions as best we can, by working with our colleagues at our home institutes and by having a dialogue with our theory colleagues across the country.
After sending emails initially to friends in our social networks to gather data and/or supplement the above preliminary analysis, we noted that we had asked only three women total. We then mused on how we could have increased that number by thinking a bit harder about which women were in our social network and whether the institutions we collected figures for had women theorists. We found that, upon reflection, we could have asked eight more women in our social networks, for a total of 11 women theorists, each at a different school, among the top 24 institutions. There are certainly more than 11 institutions with women theorists but either the women faculty are in areas we are not familiar with or they are women in our areas whom we do not yet know personally (e.g., new, junior faculty). In other words, a ten-minute reflection yielded an almost four-fold increase in representatives from an under-represented group.
We recognize that our sample covers only 24 top institutions. This was done mostly to reduce work on our part since the first data was collected by reading the relevant curricula webpages. Needless to say, a better picture of TCS and math requirements for CS degrees in schools in the US can be gained with more data. We are hoping that readers of this blog at many more institutions can make valuable contributions to our data collection and discussion. Those of you interested can contribute your institution’s information to this survey by filling in a Google form. We will periodically update the master spreadsheet with information that we get from this Google form.
We join Anna and Atri in their appeal which ends their paper: the destiny of theory courses can be considered as one large “open problem.” They conclude by thanking those who have already contributed data and others at Michigan and Buffalo and Georgia Tech (besides us) and MIT for inputs to their article.
We have a few remarks of our own: The main ulterior purpose of theory courses is to sharpen analytical modes of thinking and linear deductive argument, among skills often lumped into the general term “mathematical maturity.” The Internet and advances in technology have brought greater and quicker rewards for non-linear, associative, and more-visual modes. These might seem to compete with or even replace “theory,” but the point behind Anna and Atri’s post is that while diffused among more courses in various areas, the need for analytical and linear-deductive experience grows overall.
What emerges is a greater call for mathematical maturity before capstone courses in these areas, as opposed to the view that a required theory course can be taken in the senior year. Shifting TCS material into an early discrete mathematics course may accomplish this. As we have discussed in Buffalo, this could accompany an across-the-board upgrade in rigor of our entry curriculum, but that may discourage some types of students. That in turn might slow increased enrollments—amid several feedback loops whose consequences are an open problem.
[clarified in Buffalo figure that “Total” means majors.]
Ernie Croot, Vsevolod Lev, and Péter Pach (CLP) found a new application of polynomials last month. They proved that every set of size at least has three distinct elements such that . Jordan Ellenberg and Dion Gijswijt extended this to for prime powers . Previous bounds had the form at best. Our friend Gil Kalai and others observed impacts on other mathematical problems including conjectures about sizes of sunflowers.
Today we congratulate them—Croot is a colleague of Dick’s in Mathematics at Georgia Tech—and wonder what the breakthroughs involving polynomials might mean for complexity theory.
What’s amazing is that the above papers are so short, including a new advance by Ellenberg that is just 2 pages. In his own post on the results, Tim Gowers muses:
[The CLP argument presents a stiff challenge to my view that] mathematical ideas always result from a fairly systematic process—and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them. …[T]he argument has a magic quality that leaves one wondering how on earth anybody thought of it.
We don’t know if we can explain the source of the ‘magic’ but we will try to describe it in a way that might help apply it.
At top level there is no more sleight-of-hand than a simple trick about matrix rank. We discussed ideas of rank some time ago.
If a matrix is a sum of matrices each of rank at most , then any condition that would force to have rank must be false.
A simple case is where the condition zeroes every off-diagonal element of . Then the main diagonal can have at most nonzero entries. This actually gets applied in the papers. The fact that column rank equals row rank also helps for intuition, as Peter Cameron remarks.
A second trick might be called “degree-halving”: Suppose you have a polynomial of degree . Even if is irreducible, might be approximated or at least “subsumed” term-wise by a degree- product . When is multi-linear, or at least of bounded degree in each variable—call this —we may get where is close to .
In any case, at least one of must have degree at most , say . If we can treat and its variables as parameters, maybe even substitute them by well-chosen constants, then we are down to of degree . Then is a sum of terms each having a monomial of total degree in variables each with power at most .
The number of such monomials is relatively small. This limits the dimension of spaces spanned by such , which may in turn connect to the bound above and/or limit the size of exceptional subsets of the whole space . We discussed Roman Smolensky’s famous use of the degree-halving trick in circuit complexity here.
These tricks of linear algebra and degree are all very well, but how can we use them to attack our problem? We want to bound the size of subsets having no element such that for some nonzero , , , and all belong to . This is equivalent to having no three elements such that . This means that the following two subsets of are disjoint:
How can we use polynomials to gain leverage on this? The insight may look too trivial to matter:
Any polynomial supported only on must vanish on .
Let be the complement of and let be the space of polynomials vanishing on that belong to our set . We can lower-bound the size of by observing that the evaluation map from to the graph of its values on is a linear transformation. Its image has size at most , and since is the kernel, we have , so .
Well, this is useless unless , but is the complement of which is no bigger than the set we are trying to upper-bound. So it is useless—unless is pretty big. So we need to choose —and maybe —to be not so low. We can do this, but how can this lower bound on help? We need a “clashing” upper bound. This is where the presto observation by CLP came in.
Given the set , make a matrix whose entry in row , column , is . In APL notation this is the “ outerproduct” of with itself. Its diagonal is and the rest is .
Now apply to every entry to get a matrix . By every off-diagonal entry vanishes, so is a diagonal matrix. Its rank is hence the number of nonzero diagonal entries. If we can upper-bound , then we can upper-bound by the hoc-est-corpus rubric of description complexity:
Every can be described by its up-to- nonzero values on , so there are at most of them.
The papers use bounds on the dimension of in place of description complexity, but this is enough to see how to get some kind of upper bound. Since , taking logs base gives us:
It remains to bound , but it seems to take X-ray vision just to see that a bound can give us anything nontrivial. OK, any fixed bound on makes the right-hand side only which yields a contradiction, so there is hope. The rank trick combines with degree-halving to pull a bound involving out of the hat. Here is the version by Ellenberg and Gijswijt where the nonce choice suffices and the coefficients on in are replaced by a general triple such that :
Lemma 1 With , , and as above, put to be the set of polynomials in that vanish on . Then for all there are at most values for which
Proof: Let and be vectors of variables, and write
where each coefficient is in and the sum is over pairs of monomials whose product has degree at most and at most in any variable. Collect the terms in which has total degree at most separately from those where does, so that we get
where each and is an arbitrary function and now the sum is over monomials of total degree at most (and still no more than in any variable if we care). Now look at the matrix whose entry is , including the diagonal where . We have
This is a sum of at most single-entry matrices, so the rank of is at most that. Since is a diagonal matrix and makes , there are at most nonzero values over .
Sawing the degree in half stacks up against . We retain freedom to choose (and possibly ) to advantage. There are still considerable numerical details needed to ensure this works and tweaks to tighten bounds—for which we refer to the papers—but we have shown the “Pledge,” the “Turn,” and the “Prestige” of the argument.
Can you find more applications of the polynomial technique besides those enumerated in the papers and posts we have linked? For circuit complexity we’d not only like to go from back to as CLP have it, but also get results for when is not a prime power. Can we make assumptions (for sake of contradiction) that create situations with higher “leverage” than merely being disjoint?
[changed subtitle; linked hoc-est-corpus which literally means, “here is the body”; deleted and changed remarks before “Open Problems”; inserted tighter sum into description complexity formula.]
Shiteng Chen and Periklis Papakonstaninou have just written an interesting paper on modular computation. Its title, “Depth Reduction for Composites,” means converting a depth-, size- circuit into a depth-2 circuit that is not too much larger in terms of as well as .
Today Ken and I wish to talk about their paper on the power of modular computation.
One of the great mysteries in computation, among many others, is: what is the power of modular computation over composite numbers? Recall that a gate outputs if and otherwise. It is a simple computation: Add up the inputs modulo and see if the sum is . If so output , else output . This can be recognized by a finite-state automaton with states. It is not a complex computation by any means.
But there lurk in this simple operation some dark secrets. When is a prime the theory is fairly well understood. There remain some secrets but by Fermat’s Little Theorem a gate has the same effect as a polynomial. In general, when is composite, this is not true. This makes understanding gates over composites much harder: simply because polynomials are easy to handle compared to other functions. As I once heard someone say:
“Polynomials are our friends.”
Chen and Papakonstaninou (CP) increase our understanding of modular gates by proving a general theorem about the power of low depth circuits with modular gates. This theorem is an exponential improvement over previous results when the depth is regarded as a parameter rather than constant. Their work also connects with the famous work of Ryan Williams on the relation between and .
We will just state their main result and then state one of their key lemmas. Call a circuit of , , , and gates (for some ) an -circuit.
Theorem 1 There is an efficient algorithm that given an circuit of depth , input length , and size , outputs a depth-2 circuit of the form of size , where denotes some gate whose output depends only on the number of s in its input.
This type of theorem is a kind of normal-form theorem. It says that any circuit of a certain type can be converted into a circuit of a simpler type, and this can be done without too much increase in size. In complexity theory we often find that it is very useful to replace a complicated type of computational circuit with a much cleaner type of circuit even if the new circuit is bigger. The import of such theorems is not that the conversion can happen, but that it can be done in a manner that does not blow up the size too much.
This happens all through mathematics: finding normal forms. What makes computational complexity so hard is that the conversion to a simpler type often can be done easily—but doing so without a huge increase in size is the rub. For example, every map
can be easily shown to be equal to an integer-valued polynomial with coefficients in provided is a finite subset of . For every point , set
where the inner product is over the finitely many that appear in the -th place of some member of . Then is an integer and is the only nonzero value of on . We get
which is a polynomial that agrees with on .
Well, this is easy but brutish—and exponential size if is. The trick is to show that when is special in some way then the size of the polynomial is not too large.
One of the key insights of CP is a lemma, Lemma 5 in their paper, that allows us to replace a product of many gates by a summation. We have changed variables in the statement around a little; see the paper for the full statement and context.
Lemma 5 Let be variables over the integers and let be relatively prime. Then there exist integral linear combinations of the variables and integer coefficients so that
The value of can be composite. The final modulus can be in place of and this helps in circuit constructions. Three points to highlight—besides products being replaced by sums—are:
Further all of this can be done in a uniform way, so the lemma can be used in algorithms. This is important for their applications. Note this is a type of normal form theorem like we discussed before. It allows us to replace a product by a summation. The idea is that going from products to sums is often a great savings. Think about polynomials: the degree of a multi-variate polynomial is a often a better indicator of its complexity of a polynomial than its number of terms. It enables them to remove layers of large gates that were implementing the products (Lemma 8 in the paper) and so avoids the greatest source of size blowup in earlier constructions.
A final point is that the paper makes a great foray into mixed-modulus arithmetic, coupled with the use of exponential sums. This kind of arithmetic is not so “natural” but is well suited to building circuits. Ken once avoided others’ use of mixed-modulus arithmetic by introducing new variables—see the “additive” section of this post which also involves exponential sums.
The result of CP seems quite strong. I am, however, very intrigued by their Lemma 5. It seems that there should be other applications of this lemma. Perhaps we can discover some soon.
]]>
A way to recover and enforce privacy
McNealy bio source |
Scott McNealy, when he was the CEO of Sun Microsystems, famously said nearly 15 years ago, “You have zero privacy anyway. Get over it.”
Today I want to talk about how to enforce privacy by changing what we mean by “privacy.”
We seem to see an unending series of breaks into databases. There is of course a huge amount of theory literature and methods for protecting privacy. Yet people are still broken into and lose their information. We wish to explore whether this can be fixed. We believe the key to the answer is to change the question:
Can we protect data that has been illegally obtained?
This sounds hopeless—how can we make data that has been broken into secure? The answer is that we need to look deeper into what it means to steal private data.
The expression “the horse has left the barn” means:
Closing/shutting the stable door after the horse has bolted, or trying to stop something bad happening when it has already happened and the situation cannot be changed.
Indeed, our source gives as its main example: “Improving security after a major theft would seem to be a bit like closing the stable door after the horse has bolted.”
Photo by artist John Lund via Blend Images, all rights reserved. |
This strikes us as the nub of privacy. Once information is released on the Internet, whether by accident or by a break-in, there seems to be little that one can do. However, we believe that there may be hope to protect the information anyway. Somehow we believe we can shut the barn door after the horse has left, and get the horse back.
Suppose that some company makes a series of decisions. Can we detect if those decisions depend on information that they should not be using. Let’s call this Post-Privacy Detection.
Consider a database that stores values where is an -bit vector of attributes and is a attribute. Think of as small, even a single bit such as the sex of the individual with attributes . Let us also suppose that the database is initially secure for insofar as given many samples of the values of only, it is impossible to gain advantage in inferring the values of . Thus the leak of is meaningful information.
Now say a decider is an entity that uses information from this database to make decisions. has one or more Boolean functions of the attributes. Think of as a yes/no on some issue: granting a loan, selling a house, giving insurance at a certain rate, and so on. The idea is that while may not be secret—the database has been broken into—we can check that in aggregate that is effectively secret.
The point here is that we can detect if is being used in an unauthorized manner to make some decision, given protocols for transparency that enable sampling the values . If given a polynomial number of samples we cannot tell ‘s within then we have large-scale assurance that was not material to the decision. Our point is this: a leak of values about individuals is material only if they are used by someone to make a decision that should not depend on their “private” information. Thus if a bank gets values of , but does not use them to make a decision, then we would argue that that information while public was effectively private.
Definition 1 Let a database contain values of the form , and let be a Boolean function. Say that the part is effectively private for the decision provided there is another function so that
where . A decider respects if is effectively private in all of its decision functions.
We can prove a simple lemma showing that this definition implies that is not compromised by sampling the decision values.
Lemma 2 If the database is secure for and is effectively private, then there is no function such that .
Proof: Suppose for contradiction such an exists. Also suppose for avoiding contradiction of effective privacy that a function as above exists. Then given , we obtain with probability . Then using we obtain with overall probability at least . This contradicts the initial security of the database for .
To be socially effective, our detection concept should exert influence on deciders to behave in a manner that overtly does not depend on the unauthorized information. This applies to repeatable decisions whose results can be sampled. The sampling would use protocols that effect transparency while likewise protecting the data.
Thus our theoretical notion would require social suasion for its effectiveness. This includes requiring deciders to provide infrastructure by which their decisions can be securely sampled. It might not require them to publish their -oblivious decision functions , only that they could—if challenged—provide one. Most of this is to ponder for the future.
What we can say now, however, is that there do exist ways we can rein in the bad effects of lost privacy. The horses may have bolted, but we can still exert some long-range control over the herd.
Is this idea effective? What things like it have been proposed?
]]>
From knight’s tours to complexity
Von Warnsdorf’s Rule source |
Christian von Warnsdorf did more and less than solve the Knight’s Tour puzzle. In 1823 he published a short book whose title translates to, The Leaping Knight’s Simplest and Most General Solution. The ‘more’ is that his simple algorithm works for boards of any size. The ‘less’ is that its correctness remains yet unproven even for square boards.
Today we consider ways for chess pieces to tour not 64 but up to configurations on a chessboard.
Von Warnsdorf’s rule works only for the ‘path’ form of the puzzle, where the knight is started in a corner of an board and must visit all the other squares in hops. It does not yield a final hop back to start to make a Hamilton cycle. The rule is always to move the knight to the available square with the fewest connections to open squares. In case of two or more tied options, von Warnsdorf incorrectly believed the choice could be arbitrary, but simple tiebreak rules have been devised that work in all known cases. More-recent news is found in papers linked from a website maintained by Douglas Squirrel of Frogholt, England. We took the above screenshot from his animated implementation of the rule when the knight, having started in the upper-left corner, is a few hops from finishing at upper right.
The first person known to have published a solution was the Kashmiri poet Rudrata in the 9th century. He found a neat way to express his solution in 4 lines of 8-syllable Sanskritic verse that extend to an 8×8 solution when repeated. In modern terms he solved the following:
Color the squares so that for all k, the k-th square of the tour has the same color as the k-th square in row-major order—in other words, the usual way of reading left-to-right and down by rows—while maximizing the number m of colors used.
Note that we can guarantee by starting in the upper-left corner and using a different color for all other squares. However, the usual parity argument with the knight doesn’t even let us 2-color the remaining squares to guarantee because the last square of the first row and the first square of the second row have the same parity. Rudrata achieved for the upper half with cell 21 also a singleton color; this implies for the whole board and for . Can it be beaten? Most to our point, is there a “Rudrata Rule” for as simple as von Warnsdorf’s?
We now put a coin heads-down on each square. Our chess pieces are going to move virtually through the space by flipping over the coins in squares they attack. Our questions will be of the form, can they reach all configurations, and if not:
How small can Boolean circuits be to recognize the set of reachable strings?
Let’s warm up with a different problem. Suppose the coins are colored not embossed so you cannot tell by touch which side is which, and the room is pitch dark. You are told that k of the coins are showing heads but not which ones. You must take some of the coins off the board, optionally flipping some or all while placing them nearby on the table. The lights are then switched on, and you win if your coins have the same number of heads as the ones left on the board. Can you always win?
I may have seen this puzzle as a child but it was fresh when I read it here. Our point connecting to this post is that the solution, which can be looked up here, is simple in terms of k and so can be computed by tiny Boolean circuits.
Since the tours will be reversible, we can equally well start with any coin configuration and ask whether the piece can transform it to the all-tails state. This resembles solving Rubik’s Cube. We’ll try each chess piece one-by-one, the knights last.
Our rook can start on any square. It flips each coin in the same row or column (“rank” and “file” in chess parlance) as the square it landed on. Then it moves to one of those squares and repeats the flipping. If it moved within a rank then the coins in that row will be back the way they were except that the two the rook was on will be flipped. We can produce a perfect checkerboard pattern by moving the rook a1-c1-c3-c5-e5-g5-g7 then back g5-c5-c1. Since order doesn’t matter and operations from the same square cancel, this has the same effect as doing a1, c3, e5, and g7 “by helicopter.”
Since the rook always attacks 14 squares, an even number of coins flip at each move, so half the space is ruled out by parity. There is however a stronger limitation. Each rook flip is equivalent to flipping the entire row and then the entire column. We can amplify the rook by allowing row and column flips singly. But then we see that there are only 16 such operations. Again since repeats cancel, this means at most configurations are possible. We ask:
Is there a simple formula, yielding small Boolean circuits, for determining which configurations are reachable on an board?
We can pose this for the Rook, with-or-without “helicoptering,” and for the row-or-column flips individually. Small circuits would mean that strings in denoting reachable configurations enjoy a particular form of succinctness.
Since the rook fails to tour the whole exponential-sized space, let’s try the bishop.
The bishop can flip any odd number of coins from 7 to 13. It is limited to squares of one color but we can allow the opposite-color bishop to tag-team with it. I was just about to pose the same questions as above for the bishops when a familiar imperious voice swelled behind me. It was the Red Queen.
“I have all the power of your towers and prelates—and you need only one of me. I shall surely fill the space.”
I was no one to stand in her way, but the Dormouse awoke and quietly began scratching figures on paper. “Besides the sixteen ranks and files, there are fifteen southeast-to-northwest diagonals, including the corner squares a1 and h8 by themselves. And there are fifteen southwest-to-northeast diagonals. This makes only 16 + 15 + 15 = 46 64 operations. Hence, Your Majesty, even if we could parcel out your powers, you could fill out at most a fraction of the space.”
I expected the Red Queen to yell, “Off with his head!” But instead she stooped over the Dormouse and hissed,
“Sorry—I slept through the rest of Alice,” explained the Dormouse as he slunk away. Despite the Dormouse’s proof I thought it worth asking the same questions as for the rook and bishop about the queen’s subspace . What kind of small formulas or circuits can recognize it, whether requiring her to flip all coins in all directions or allowing to flip just one rank or file or diagonal at a time?
While I was wondering, His Majesty quietly strode to the center and said,
“I do not wantonly project power without bound; I reserve my influence so that my action on every square is distinctive.”
We can emphasize how far things stay distinctive by posing our basic questions in a more technical manner:
Do the sixty-four vectors over representing the king’s flipping action on each square span the vector space ? If not, what can we say about the circuit complexity of the linear subspace they generate?
On a board the four -vectors form a basis, but for and the king fails to span. For , kings in the two lower corners produce the same configuration as kings in the two upper corners. For , kings in a ring on a2, b4, d3, and c1 flip just the corner coins, as do the kings in the mirror-image ring. What about and ? Is there an easy answer?
Meeker still are the pawns, who attack only the two squares diagonally in front, or just one if on an edge file. They cannot attack their first rank, nor the second in legal chess games, but opposing pawns can. Then it is easy to see that the pawn actions span the space. The lowly contribution by the edge pawn is crucial, since it flips just one coin not two.
The knight flips all the coins a knight’s move away. One difference from the queen, rook, bishop, and king is that on its next move all the coins it flips will be new. Our revised Knight’s Tour question is:
Can the knight connect the string to any configuration by a sequence of knight’s moves, perhaps allowing multiple visits to some squares? Or if we disallow multiple visits in a tour, can we do it by “helicoptering”? Same questions for boards. If the answer is no, then are there easy formulas or succinct circuits determining the space of reachable configurations?
An example for needing multiple visits or helicoptering is that the configuration with heads on c2,b3 and g6,f7 is produced by knights acting in the corners a1 and h8, which are not connected by a knight’s move. If there is some other one-action-per-square combination that produces it, then by simple counting the knight cannot span—even with helicoptering.
The knight does fail to span a board because the corner d4 produces the same result as the knight on a1: heads on c2 and b3. The regular knight’s tour fails too on a so this can be excused for the same “lack of legroom” reason. What about and higher?
Thus having coins on the chessboard scales up some classic tour problems exponentially. Our larger motivation is what the solutions might tell us about complexity.
Do you like our exponential “tour” problems? Really they are reachability problems. Can you solve them?
Will von Warnsdorf’s rule ever be proved correct for all higher n?
Note: To update our recent quantum post, Gil Kalai released an expanded version of his AMS Notices article, “The Quantum Computer Puzzle.” We also congratulate him on being elected an Honorary Member of the Hungarian Academy of Sciences.