Non-technical fact-check source |
Dan Brown is the bestselling author of the novel The Da Vinci Code. His most recent bestseller, published in 2013, is Inferno. Like two of his earlier blockbusters it has been made into a movie. It stars Tom Hanks and Felicity Jones and is slated for release on October 28.
Today I want to talk about a curious aspect of the book Inferno, since it raises an interesting mathematical question.
Brown’s books are famous for their themes: cryptography, keys, symbols, codes, and conspiracy theories. The first four of these have a distinctive flavor of our field. Although we avoid the last in our work, it is easy to think of possible conspiracies that involve computational theory. How about these: certain groups already can factor large numbers, certain groups have real quantum computers, certain groups have trapdoors in cryptocurrencies, or …
The book has been out for awhile, but I only tried to read it the other day. It was tough to finish so I jumped to the end where the “secret” was exposed. Brown’s works have sold countless copies and yet have been attacked as being poorly written. He must be doing something very right. His prose may not be magical—whose is?—but his plots and the use of his themes usually makes for a terrific “cannot put down” book.
Well I put it down. But I must be the exception. If you haven’t read the book and wish to do so without “spoilers” then you can put down this column.
The Inferno is about the release of a powerful virus that changes the world. Before I go into the mathematical issues this virus raises I must point out that Brown’s work has often been criticized for making scientific errors and overstepping the bounds of “plausible suspension of disbelief.” I think it is a great honor—really—that so many posts and discussions are around mistakes that he has made. Clearly there is huge interest in his books.
Examples of such criticism of The Inferno have addressed the DNA science involved, the kind of virus used, the hows of genetic engineering and virus detection, and the population projections, some of which we get into below. There is also an entire book about Brown’s novel, Secrets of Inferno
However, none of these seems to address a simple point that we hadn’t found anywhere, until Ken noticed it raised here on the often-helpful FourmiLab site maintained by the popular science writer John Walker. It appears when you click “Show Spoilers” on that page, so again you may stop reading if you don’t wish to know.
How does the virus work? The goal of the virus is to stop population explosion.
The book hints that it is airborne, so we may assume that everyone in the world is infected by it—all women in particular. Brown says that 1/3 are made infertile. There are two ways to think about this statement. It depends on the exact definition of the mechanism causing infertility.
The first way is that when you get infected by the virus a coin is flipped and with probability 1/3 you are unable to have children. That is, when the virus attacks your original DNA there is a 1/3 chance the altered genes render you infertile. In the 2/3-case that the virus embeds in a way that does not cause infertility, that gets passed on to children and there is no further effect. In the 1/3-case that the alteration causes infertility, that property too gets passed on. Except, that is, for the issue in this famous quote:
Having Children Is Hereditary: If Your Parents Didn’t Have Any, Then You Probably Won’t Either.
Thus the effect “dies out” almost immediately; it would necessarily be just one-shot on the current generation.
The second way is that the virus allows the initial receiver to be fertile but has its effect when (female) children are born. In one third of cases the woman becomes infertile, and otherwise is able to have children when she grows up.
In this case the effect seems to work as claimed in the book. Children all get the virus and it keeps flipping coins forever. Walker still isn’t sure—we won’t reveal here the words he hides but you can find them. In any event, the point remains that this would become a much more complex virus. And Brown does not explain this point in his book—at least I am unsure if he even sees the necessary distinctions.
The other discussions focus on issues like how society would react to this reduction in fertility. Except for part of one we noted above, however, none seems to address the novel’s mathematical presumptions.
The purpose of the virus is to reduce the growth rate in the world’s population. By how much is not clear in the book. The over-arching issue is that it is hard to find conditions under which the projection of the effect is stable.
For example, suppose we can divide time into discrete units of generations so that the world population of women after generations follows the exponential growth curve . Ignoring the natural rate of infertility and male-female imbalance and other factors for simplicity, this envisions women having female children on average. The intent seems to be to replace this with women having female children each, for in the next generation. This means multiplying by , so
becomes the new curve. The problem is that this tends to zero unless , whereas the estimates of that you can get from tables such as this are uniformly lower at least since 2000.
The point is that the blunt “1/3” factor of the virus is thinking only in such simplistic terms about “exponential growth”—yet in the same terms there is no region of stability. Either growth remains exponential or humanity crashes. Maybe the latter possibility is implicit in the dark allusions to Dante Alighieri’s Inferno that permeate the plot.
In reality, as our source points out, it would not take much for humanity to compensate. If a generation is 30 years and we are missing 33% of women, then what’s needed is for just over 3% of the remaining women to change their minds about not having a child in any given year. We don’t want to trivialize the effect of infertility, but there is much more to adaptability than the book’s tenet presumes.
Have you read the book? What do you think about the math?
]]>
Some CS reflections for our 700th post
MacArthur Fellowship source |
Lin-Manuel Miranda is both the composer and lyricist of the phenomenal Broadway musical Hamilton. A segment of Act I covers the friendship between Alexander Hamilton and Gilbert du Motier, the Marquis de Lafayette. This presages the French co-operation in the 1781 Battle of Yorktown, after which the British forces played the ballad “The World Turned Upside Down” as they surrendered. The musical’s track by the same name has different words and melodies.
Today we discuss some aspects of computing that seem turned upside down from when we first learned and taught them.
Yesterday was halfway between our Fourth of July and France’s Bastille Day, and was also the last day of Miranda performing the lead on-stage with the original Hamilton company. They are making recordings of yesterday’s two performances, to be aired at least in part later this year. A month ago, Miranda wrote an op-ed in the New York Times against the illegal (in New York) but prevalent use of “bots” to snap up tickets the moment they become available for later marked-up resale.
This is also the 700th post on this blog. It took until 1920 for a Broadway show of any kind to reach 700 performances. The Playbill list of “Long Runs on Broadway” includes any show with 800 or more performances. That mark is within our reach, and our ticket prices will remain eminently reasonable.
This list is just what strikes us now—far from exhaustive—and we invite our readers to add opinions about examples in comments.
Forty-five years ago, Dick Karp showed how the difficulty of SAT represented by NP-completeness spreads to other natural problems. As the number of complete problems from many areas of science and operations research soared into the thousands by the 1979 publication of the book Computers and Intractability, people regarded NP-completeness as tantamount to intractability.
Today the flow is in the other direction. Dick Karp himself has been among many in the vanguard—I remember his talk on practical solvability of Hitting-Set problems at the 2008 LiptonFest and here is a relevant paper. We now reduce problems to SAT in order to solve them. SAT-solvers that work in many cases are big business. In some whole areas the SAT-encodings of major problems are well-behaved, as we remarked about rank-aggregation and voting theory in the third section of this post from last October. The solvers can even tackle huge problems. Marijn Heule, Oliver Kullback, and Victor Marek proved that every 2-coloring of the interval has a monochromatic Pythagorean triple in a proof of over 200 terabytes in uncompressed length.
Quadratic time is notionally on the low end of polynomial time, and “polynomial time” has long been used as a synonym for “easy.” But as the amount of data we can and need to handle has mushroomed, the difference in scaling between quasi-linear and quadratic is more and more felt. This difference has even been argued for cryptographic security. A particular definition of quasi-linear time is time as named by Claus Schnorr for his theorem on quasi-linear completeness of SAT; see also this.
In genomics the quadratic time of algorithms for full edit-distance measures is felt enough to warrant approximative methods, as we covered in our memorial a year ago for Alberto Apostolico. This also puts meaning behind theoretical evidence that time for edit distance cannot be improved.
These two items seem to contradict each other, but point up a difference in scale between data and logical control. Often a thousand data points are nothing. A formula with a thousand clauses can say a lot.
My first doctoral student had been working on neural networks before I became his advisor in 1991, and I remember the feeling of their being under a cloud. The so-called AI Winter traced in part to lower bounds shown against certain shallow neural nets in the 1969 book Perceptrons by Marvin Minsky and Seymour Papert. We discussed complexity aspects of this in our memorial of Minsky last January.
Since then what has emerged is that composing a bunch of these nets, as in a convolutional neural network (CNN), is both feasible and algorithmically effective. The recent breakthrough on playing Go is just a headline among many emerging applications of CNNs and larger systems. We are not saying neural nets and deep learning are the be-all or anything more than a “cartoon” of the brain, but rather noting them among many reasons that AI and machine learning are resurgent.
The same AI-winter article on Wikipedia mentions the collapse of Lisp-dedicated systems in 1987, and more widely, many companies devoted to data-parallel architectures “left nothing but their logos on coffee mugs” as a colleague once put it. Subsequently I perceived signs of stagnation in functional languages in the late 1990s and early 00s. This lent a ghostly air to John Backus’s famous 1978 Turing Award lecture, “Can Programming Be Liberated From the von Neumann Style?”
Unlike the revenant in last year’s award-winning movie of that name, this one has come back with a different body. Not a large-scale dedicated machine system, but rather the pan-spectral pervasion we call the Cloud. A great lecture we heard by Mike Franklin on Amplab activities highlighted the role of programs written in the functional language Scala running on the Apache Spark framework.
A common thread in all these items is the combined efficacy and scalability of algorithmic primitives whose abstract forms characterize quasi-linear time: sorting, parallel prefix sum (as one of several forms of map-reduce), convolution, streaming count-sketching, and the like.
We considered mentioning some subjects that have seen changes such as digital privacy and block ciphers, but maybe these are not so “upside-down.” Doubtless we are missing many more. What developments in computing have carried shock on the order of the discovery that neutrinos have mass in particle physics? We invite your suggestions and opinions.
Here also is a web folder of photos from Dick’s wedding and honeymoon.
]]>
We revisit a paper from 1994
Richard Lipton is, among so many other things, a newlywed. He and Kathryn Farley were married on June 4th in Atlanta. The wedding was attended by family and friends including many faculty from Georgia Tech, some from around the country, and even one of Dick’s former students coming from Greece. Their engagement was noted here last St. Patrick’s Day, and Kathryn was previously mentioned in a relevantly-titled post on cryptography.
Today we congratulate him and Kathryn, and as part of our tribute, revisit a paper of his on factoring from 1994.
They have just come back from their honeymoon in Paris. Paris is many wonderful things: a flashstone of history, a center of culture, a city for lovers. It is also the setting for most of Dan Brown’s novel The Da Vinci Code and numerous other conspiracy-minded thrillers. Their honeymoon was postponed by an event that could be a plot device in these novels: the Seine was flooded enough to close the Louvre and Musée D’Orsay and other landmarks until stored treasures could be brought to safe higher ground.
It is fun to read or imagine stories of cabals seeking to collapse world systems and achieve domination. Sometimes these stories turn on scientific technical advances, even purely mathematical points as in Brown’s new novel, Inferno. It needs a pinch to realize that we as theorists often verge on some of these points. Computational complexity theory as we know it is asymptotic and topical, so it is a stretch to think that papers such as the present one impact the daily work of those guarding the security of international commerce or investigating possible threats. But from its bird’s-eye view there is always the potential to catch a new glint of light reflected from the combinatorial depths that could not be perceived until the sun and stars align right. In this quest we take a spade to dig up old ideas anew.
Pei’s Pyramid of the Louvre Court = Phi delves out your prime factor… (source) |
The paper is written in standard mathematical style: first a theorem statement with hypotheses, next a series of lemmas, and the final algorithm and its analysis coming at the very end. We will reverse the presentation by beginning with the algorithm and treating the final result as a mystery to be decoded.
Here is the code of the algorithm. It all fits on one sheet and is self-contained; no abstruse mathematics text or Rosetta Stone is needed to decipher. The legend says that the input is a product of two prime numbers, is a polynomial in just one variable, and refers to the greatest-common-divisor algorithm expounded by Euclid around 300 B.C. Then come the runes, which could not be simpler:
Exiting enables carrying out the two prime factors of —but a final message warns of a curse of vast unknowable consequences.
How many iterations must one expect to make through this maze before exit? How and when can the choice of the polynomial speed up the exploration? That is the mystery.
Our goal is to expose the innards of how the paper works, so that its edifice resembles another famous modern Paris landmark:
This is the Georges Pompidou Centre, whose anagram “go count degree prime ops” well covers the elements of the paper. Part of the work for this post—in particular the possibility of improving to —is by my newest student at Buffalo, Chaowen Guan.
Let with and prime. To get the expected running time, it suffices to have good lower and upper bounds
and analogous bounds for . Then the probability of success on any trial is at least
This lower bounds the probability of , whereupon gives us the factor .
We could add a term for the other way to have success, which is . However, our strategy will be to make and hence close to by considering cases where but is still large enough to matter. Then we can ignore this second possibility and focus on . At the end we will consider relaxing just so that is bounded away from .
Note that we cannot consider the events and to be independent, even though and are prime, because and may introduce bias. We could incidentally insert an initial text for without affecting the time or improving the success probability by much. Then conditioned on its failure, the events and become independent via the Chinese Remainder Theorem. This fact is irrelevant to the algorithm but helps motivate the analysis in part.
This first analysis thus focuses the question to become:
How does computing change the sampling?
We mention in passing that Peter Shor’s algorithm basically shows that composing certain (non-polynomial) functions into the quantum Fourier transform greatly improves the success of the sampling. This requires, however, a special kind of machine that, according to some of its principal conceivers, harnesses the power of multiple universes. There are books and even a movie about such machines, but none have been built yet and this is not a Dan Brown novel so we’ll stay classically rooted.
Two great facts about polynomials of degree are:
The second requires the coefficients of to be integers. Neither requires all the roots to be integers, but we will begin by assuming this is the case. Take to be the set of integer roots of . Then define
where as usual . The key point is that
To prove this, suppose the random belongs to but not to . Then for some , so , but since . There is still the possibility that is a nonzero multiple of , which would give and deny success, but this entails and so this is accounted by subtracting off .
Our lower bound will be based on . There is one more important element of the analysis. We do not have to bound the running time for all , that is, all pairs of primes. The security of factoring being hard is needed for almost all . Hence to challenge this, it suffices to show that is large in average case over . Thus we are estimating the distributional complexity of a randomized algorithm. These are two separate components of the analysis. We will show:
For many primes belonging to a large set of primes of length substantially below , where is the length of , is “large.”
We will quantify “large” at the end, and it will follow that since is substantially greater, is “tiny” in the needed sense. Now we are ready to estimate the key cardinality .
In the best case, can be larger than by a factor of . This happens if for every root , the values do not hit any other members of . When this happens, itself can be as large as . Then
By a similar token, for any , if , then —or in general,
The factor of is the lever by which to gain a higher likelihood of quick success. When will it be at our disposal? It depends on whether is “good” in the sense that and also on itself being large enough.
For each root and define the “strand” where There are always distinct values in any strand. If then every strand has most as non-roots. There is still the possibility that —that is, such that —which would prevent a successful exit. This is where really comes in, attending to the upper bound .
The Paris church of St. Sulpice and its crypt (source) |
Hence what can make a prime “bad” is having a low number of strands. When and the strands and coincide—and this happens for any other such that divides .
Here is where we hit the last important requirement on . Suppose where is the product of every prime other than . Then and coincide for every prime . It doesn’t matter that is astronomically bigger than or ; the strands still coincide within and within .
Hence what we need to do is bound the roots by some value that is greater than any we are likely to encounter. The is not too great: if we limit to of some same given length as that of , then so . We need not impose the requirement but must replace above by where . We can’t get in trouble from such that divides and divides since then divides already. This allows the key observation:
For any distinct pair , there are at most primes such that divides .
Thus given we have “slots” for primes . Every bad prime must occupy a certain number of these slots. Counting these involves the last main ingredient in Dick’s paper. We again try to view it a different way.
Given , and replacing the original with , ultimately we want to call a prime bad if , where . We will approach this by calling “bad” if there are strands.
For intuition, let’s suppose , . If we take as the paper does, then we can make bad by inserting it into three slots: say , , and . We could instead insert a copy of into , , and , which lumps into one strand and leaves free to make two others. In the latter case, however, we also know by transitivity that divides , , and as well. Thus we have effectively used up not slots on . Now suppose instead, so “bad” means getting down to strands. Then we are forced to create at least one -clique and this means using more than slots. Combinatorially the problem we are facing is:
Cover nodes by cliques while minimizing the total number of edges.
This problem has an easy answer: make all cliques as small as possible. Supposing is an integer, this means making -many -cliques, which (ignoring the difference between and ) totals edges. When is constant this is , but shows possible ways to improve when is not constant. We conclude:
Lemma 1 The number of bad primes is at most .
We will constrain by bounding the degree of . By this will also bound relative to so that the number of possible strands is small with respect to , which will lead to the desired bound on . Now we are able to conclude the analysis well enough to state a result.
Define to mean problems solvable on a fraction of inputs by randomized algorithms with expected time . Superscripting means having an oracle to compute from for free. If is such that the time to compute is and this is done once per trial, then a given algorithm can be re-classified into without the oracle notation.
Theorem 2 Suppose is a sequence of polynomials in of degrees having integer roots in the interval , for all . Then for any fixed , the problem of factoring -bit integers with belongs to
provided and .
Proof: We first note that the probability of a random making is negligible. By the Chinese Remainder Theorem, as remarked above, gives independent draws and and whether depends only on . This induces a polynomial over the field of degree (at most) . So the probability of getting a root mod is at most
which is exponentially vanishing. Thus we may ignore in the rest of the analysis. The chance of a randomly sampled in a strand of length coinciding with another member of is likewise bounded by and hence ignorable.
The reason why the probability of giving a root in the field is not vanishing is that is close to . By , we satisfy the constraint
The condition ensures that this is the actual asymptotic order of . Since we are limiting attention to primes of the same length as , the “” above can be to base . Hence has the right order to give that for some and constant fraction of primes of length , the success probability of one trial over satisfies
Hence the expected number of trials is . The extra in the theorem statement is the time for each iteration, i.e., for arithmetic modulo and the Euclidean algorithm.
It follows that if is also computable modulo in time, and presuming so that , then factoring products of primes whose lengths differ by just a hair is in randomized average-case polynomial time. Of course this depends on the availability of a suitable polynomial . But could be any polynomial—it needs no relation to factoring other than having plenty of distinct roots relative to its degree as itemized above. Hence there might be a lot of scope for such “dangerous” polynomials to exist.
Is there a supercomputer under the Palais Royale? (source) |
Dick’s paper does give an example where a with specified properties cannot exist, but there is still a lot of play in the bounds above. This emboldens us also to ask exactly how big the “hair” needs to be. We do not actually need to send toward zero. If a constant fraction of the values get bounced by the event, then the expected time just goes up by the same constant factor.
We have tried to present Dick’s paper in an “open” manner that encourages variations of its underlying enigma. We have also optically improved the result by using rather than as in the paper. However, this may be implicit anyway since the paper’s proofs might not require “” to be constant, so that by taking one can make for any desired factor . Is all of this correct?
If so, then possibly one can come even tighter to for the length of . Then the question shifts to the possibilities of finding suitable polynomials . The paper “Few Product Gates But Many Zeroes,” by Bernd Borchert, Pierre McKenzie, and Klaus Reinhardt, goes into such issues. This paper investigates “gems”—that is, integer polynomials of degree having distinct integer roots and minimum possible circuit complexity for their degree—finding some for as high as 55 but notably leaving open. Moreover, the role of a limitation on the magnitude of a constant fraction of a gem’s roots remains at issue, along with roots exceeding having many relatively small prime factors.
Finally, we address the general case with rational coefficients (and ). If in lowest terms then means (divided by ) so the algorithm is the same. Suppose is a rational root in lowest terms and does not divide , nor the denominator of any Then we can take such that for some and define . This gives
which we write as . Then . Because is a root, it follows that is a sum of terms in which each numerator is a multiple of and each denominator is not. So in lowest terms where possibly . Thus either yields or falls into one of two cases we already know how to count: is another root of or we have found a root mod . Since behaves the same as for all , we can define integer “strands” as before. There remains the possibility that strands induced by two roots and coincide. Take the inverse for and resulting integer , then the strands coincide if . This happens iff . Multiplying both sides by gives
so it follows that divides the numerator of in lowest terms. Thus we again have “slots” for each distinct pair of rational roots and each possible prime divisor of the numerator of their difference. Essentially the same counting argument shows that a “bad” must fill of such slots. The other ways can be bad include dividing the denominator of a root or the denominator of a coefficient —although neither way is mentioned in the paper it seems the choices for and in the above theorem leave just enough headroom. Then we just need to be a bound on all numerators and denominators involved in the bad cases, arguing as before. Last, it seems as above that only a subset of the roots with constant (or at least non-negligible) is needed to obey this bound. Assuming this sketch of Dick’s full argument is airtight and works for our improved result, we leave its possible further ramifications over the integer case as a further open problem.
Update 7/10: >
I’ve made a web folder of photos from Dick’s wedding and honeymoon.
[linked wedding announcement, clarified nature of Shor’s phi, added more about gems, linked photos]
Anna Gilbert and Atri Rudra are top theorists who are well known for their work in unraveling secrets of computation. They are experts on anything to do with coding theory—see this for a book draft by Atri with Venkatesan Guruswami and Madhu Sudan called Essential Coding Theory. They also do great theory research involving not only linear algebra but also much non-linear algebra of continuous functions and approximative numerical methods.
Today we want to focus on a recent piece of research they have done that is different from their usual work: It contains no proofs, no conjectures, nor even any mathematical symbols.
Their new working paper is titled, “Teaching Theory in the time of Data Science/Big Data.” As you might guess it is about the role of theory in the education of computer scientists today. The paper contains much information that they have collected on what is being taught at some of the top departments in computer science, and how the current immense interest in Big Data is affecting classic theory courses.
A short overview of what they find is:
The above is leading to pressure to delete and/or modify theory courses. From Atri’s CS viewpoint and Anna’s as Mathematics faculty active in the theory community, both wish to see CS majors obtain degrees that leave them well versed in CS in general and theory in particular. Undergraduates in programs with a CS component should likewise be well served in formal and mathematical areas. Is this possible given the finite constraints on the curriculums? It is not clear, but their paper shows what is happening right now with theory courses (plus linear algebra and probability/statistics), what is being planned for the near future, and some options that may be useful to consider.
§
For the purpose of this post, we made some edits to their text which follows, with their permission. Some changes were stylistic and some more content-oriented. Their PDF version linked as above may evolve over time—especially upon success of their appeal for reader input at the end. So to obtain a complete and current picture please visit their paper too.
Now Anna and Atri speak:
The genesis of this article is a conversation between the two authors that started six weeks ago. One of us (Anna) was giving a talk at an NSF workshop on Theoretical Foundations of Data Science (TFoDS) and the other (Atri) was thinking about changes to the Computer Science (henceforth CS) curriculum that his department at the University at Buffalo is considering. Anna’s talk at NSF, which included data on theory courses at top ranked schools, generated a great deal of interest in knowing even more about the state of theory courses. This was followed by more data collection on our part.
This post is meant as a starting point of discussion on how we teach theory courses, especially in the light of the increased importance of data science. It is not a position paper—it does not argue that the current trends are inherently good or bad, nor does it prescribe any silver bullet. We do suggest some possible courses of action around which discussion can begin.
CS enrollments as well as the numbers of CS majors have increased exponentially in the last few years. In 2014, Ed Lazowska, Eric Roberts, and Jim Kurose exhibited the trend in the former, not only majors. Their graphs in Figure 1 show the trend in introductory CS course enrollments at six institutions in the years 2006–2014.
Figure 1. Enrollment trends in introductory CS sequences at six institutions (Stanford, MIT, University of Pennsylvania, Harvard, University of Michigan, and University of Washington) from 2006–2014. |
Lazowska’s presentation has more detailed statistics and a discussion of the potential implications of these increases. These trends remain valid in 2016, for example as shown by the following chart for the University at Buffalo. In addition to total number of CSE majors, it shows the enrollment in CSE 115 (the introduction to CSE course), CSE 191 (Discrete Math), CSE 250 (Data Structures), CSE 331 (Algorithms) and CSE 396 (Theory of Computation), all of which are required of all CS majors:
Figure 2. Enrollment trends, University at Buffalo CSE 8/08–5/16, with total majors. |
As enrollments out-pace hiring, class sizes have exploded. Lazowska points out that over 10% of Princeton’s majors are CS majors, while it is highly unlikely that 10% of Princeton’s faculty will ever be CS faculty. At the same time, many institutions are re-evaluating and changing their theoretical computer science (henceforth TCS) course requirements and content.
The twin pressures of staffing and content are shifting priorities in both the material covered and how it is covered—e.g., reducing emphasis on proofs and essay-type problems which are harder to grade. We are not judging these shifts or tying them directly to enrollments, but are for now observing that they are happening and impact a large (and increasing) number of students.
The changes in course content, in emphasis on particular TCS components, and in overall CS requirements (including mathematics and statistics) are occurring exactly when there is a big move towards “computational thinking” in many fields and a national emphasis on STEM education more broadly. Not only are the fundamental backgrounds of incoming CS majors thereby changing, but the CS audience is expanding to students in other fields that are benefiting from solid computational foundations. With the increasing role of data and concomitant needs for machine learning and statistics, it is important to obtain a deep understanding of the mathematical foundations of data science. Traditional TCS has been founded on discrete mathematics, but “continuous” math—especially as related to statistics, probability, and linear algebra—is increasingly important in ways also reflected by cutting-edge TCS research.
We considered the top 20 CS schools according to the US News ranking of graduate programs, numbering 24 including ties. It may be inappropriate to use the graduate program rankings to consider the undergraduate program requirements, and it should be noted that the rankings cover all of the graduate program not just TCS, but this is a reasonable starting point. We sent colleagues a short survey and collected data (available spreadsheet) on these 24 schools. Since several include Engineering in one department as at Buffalo or a separate department as at Michigan we will use `CSE’ as the collective term.
We counted the total number of theory courses that all CS majors have to take within the CSE department and then calculated the fraction over the total number of required courses. We categorized the theory courses under these bins:
The bounds are not sharp—a Data Structures course always covers algorithms associated to the data structures and may overlap with an Algorithms course especially when graphs are covered—and Algorithms often includes some complexity theory, especially NP-completeness. In our spreadsheet these columns are followed by the number of theory electives—besides these required courses—that all CS majors have to take. We would like to clarify four things:
We begin with statistics on the total number of semesters of theory courses that are currently required of all CS majors, standardly equating 3 quarters or trimesters to 2 semesters. The basic statistics are in Table 1.
The median number of semester-long courses was three. All but one school requires a discrete math course, all but two require a Data Structures course, and all but nine require an Algorithms course. Eight schools require a Theory of Computation course separate from Algorithms. All these schools have a significant programming component in their Data Structures course. Only one, Cornell, currently adds programming assignments in the required algorithms course. We would like to remind the reader that we are only considering TCS courses required of all CS majors—for instance, CS 124/125 at Harvard has programming assignments but is not required of all CS majors.
We limited attention to cases where courses in Probability/Statistics and/or Linear Algebra are required of all CS majors but taught outside of CSE. We focus on these two courses since they are most relevant to data science.
Probability/Statistics. Of those surveyed, nineteen schools required a Probability/Statistics course, while five did not. Five had developed a specific required course within the CSE department (Stanford, Berkeley, UIUC, Univ. of Washington, and MIT), three had choices among courses both inside and outside the CSE department, and eleven required a course outside CSE. Of the five institutions that did not require a Probability/Statistics course, two (Univ. of Wisconsin and Harvard) listed such a course among electives in Mathematics. Princeton, Yale, and Brown do not list such a course.
Linear Algebra. Sixteen surveyed schools require a Linear Algebra course, out of 24 total. Of the 16, only Brown and Columbia provide a linear algebra course within CSE that satisfies the requirement, though both allow for non-CSE linear algebra courses.
After reflecting on the data in relation to our initial observations about increasing CS enrollments and emphasis on computational thinking across disciplines, we dug deeper and asked people further questions about changes they have seen or are discussing at their institutions. Of eight departments responding (as of 6/10/16):
Four universities changed their Mathematics requirements in the last 10 years. These changes are primarily to require fewer semesters of Calculus II or III (e.g., some no longer require Ordinary Differential Equations) and, instead, require Linear Algebra and/or Probability/Statistics (whether inside the CSE department or not). Two institutions plan to make changes in the future, likely to require Linear Algebra.
We suggest that now is the time to re-think some of the theory curriculum, to work with our colleagues in Mathematics and Statistics, and to develop mathematical foundations classes that are appropriate both for CS majors and STEM majors more broadly. Especially for CS majors, this exposure should come no later than junior year. Here are some starting points for this discussion.
Our goal is to educate the different students at our respective institutions as best we can, by working with our colleagues at our home institutes and by having a dialogue with our theory colleagues across the country.
After sending emails initially to friends in our social networks to gather data and/or supplement the above preliminary analysis, we noted that we had asked only three women total. We then mused on how we could have increased that number by thinking a bit harder about which women were in our social network and whether the institutions we collected figures for had women theorists. We found that, upon reflection, we could have asked eight more women in our social networks, for a total of 11 women theorists, each at a different school, among the top 24 institutions. There are certainly more than 11 institutions with women theorists but either the women faculty are in areas we are not familiar with or they are women in our areas whom we do not yet know personally (e.g., new, junior faculty). In other words, a ten-minute reflection yielded an almost four-fold increase in representatives from an under-represented group.
We recognize that our sample covers only 24 top institutions. This was done mostly to reduce work on our part since the first data was collected by reading the relevant curricula webpages. Needless to say, a better picture of TCS and math requirements for CS degrees in schools in the US can be gained with more data. We are hoping that readers of this blog at many more institutions can make valuable contributions to our data collection and discussion. Those of you interested can contribute your institution’s information to this survey by filling in a Google form. We will periodically update the master spreadsheet with information that we get from this Google form.
We join Anna and Atri in their appeal which ends their paper: the destiny of theory courses can be considered as one large “open problem.” They conclude by thanking those who have already contributed data and others at Michigan and Buffalo and Georgia Tech (besides us) and MIT for inputs to their article.
We have a few remarks of our own: The main ulterior purpose of theory courses is to sharpen analytical modes of thinking and linear deductive argument, among skills often lumped into the general term “mathematical maturity.” The Internet and advances in technology have brought greater and quicker rewards for non-linear, associative, and more-visual modes. These might seem to compete with or even replace “theory,” but the point behind Anna and Atri’s post is that while diffused among more courses in various areas, the need for analytical and linear-deductive experience grows overall.
What emerges is a greater call for mathematical maturity before capstone courses in these areas, as opposed to the view that a required theory course can be taken in the senior year. Shifting TCS material into an early discrete mathematics course may accomplish this. As we have discussed in Buffalo, this could accompany an across-the-board upgrade in rigor of our entry curriculum, but that may discourage some types of students. That in turn might slow increased enrollments—amid several feedback loops whose consequences are an open problem.
[clarified in Buffalo figure that “Total” means majors.]
Ernie Croot, Vsevolod Lev, and Péter Pach (CLP) found a new application of polynomials last month. They proved that every set of size at least has three distinct elements such that . Jordan Ellenberg and Dion Gijswijt extended this to for prime powers . Previous bounds had the form at best. Our friend Gil Kalai and others observed impacts on other mathematical problems including conjectures about sizes of sunflowers.
Today we congratulate them—Croot is a colleague of Dick’s in Mathematics at Georgia Tech—and wonder what the breakthroughs involving polynomials might mean for complexity theory.
What’s amazing is that the above papers are so short, including a new advance by Ellenberg that is just 2 pages. In his own post on the results, Tim Gowers muses:
[The CLP argument presents a stiff challenge to my view that] mathematical ideas always result from a fairly systematic process—and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them. …[T]he argument has a magic quality that leaves one wondering how on earth anybody thought of it.
We don’t know if we can explain the source of the ‘magic’ but we will try to describe it in a way that might help apply it.
At top level there is no more sleight-of-hand than a simple trick about matrix rank. We discussed ideas of rank some time ago.
If a matrix is a sum of matrices each of rank at most , then any condition that would force to have rank must be false.
A simple case is where the condition zeroes every off-diagonal element of . Then the main diagonal can have at most nonzero entries. This actually gets applied in the papers. The fact that column rank equals row rank also helps for intuition, as Peter Cameron remarks.
A second trick might be called “degree-halving”: Suppose you have a polynomial of degree . Even if is irreducible, might be approximated or at least “subsumed” term-wise by a degree- product . When is multi-linear, or at least of bounded degree in each variable—call this —we may get where is close to .
In any case, at least one of must have degree at most , say . If we can treat and its variables as parameters, maybe even substitute them by well-chosen constants, then we are down to of degree . Then is a sum of terms each having a monomial of total degree in variables each with power at most .
The number of such monomials is relatively small. This limits the dimension of spaces spanned by such , which may in turn connect to the bound above and/or limit the size of exceptional subsets of the whole space . We discussed Roman Smolensky’s famous use of the degree-halving trick in circuit complexity here.
These tricks of linear algebra and degree are all very well, but how can we use them to attack our problem? We want to bound the size of subsets having no element such that for some nonzero , , , and all belong to . This is equivalent to having no three elements such that . This means that the following two subsets of are disjoint:
How can we use polynomials to gain leverage on this? The insight may look too trivial to matter:
Any polynomial supported only on must vanish on .
Let be the complement of and let be the space of polynomials vanishing on that belong to our set . We can lower-bound the size of by observing that the evaluation map from to the graph of its values on is a linear transformation. Its image has size at most , and since is the kernel, we have , so .
Well, this is useless unless , but is the complement of which is no bigger than the set we are trying to upper-bound. So it is useless—unless is pretty big. So we need to choose —and maybe —to be not so low. We can do this, but how can this lower bound on help? We need a “clashing” upper bound. This is where the presto observation by CLP came in.
Given the set , make a matrix whose entry in row , column , is . In APL notation this is the “ outerproduct” of with itself. Its diagonal is and the rest is .
Now apply to every entry to get a matrix . By every off-diagonal entry vanishes, so is a diagonal matrix. Its rank is hence the number of nonzero diagonal entries. If we can upper-bound , then we can upper-bound by the hoc-est-corpus rubric of description complexity:
Every can be described by its up-to- nonzero values on , so there are at most of them.
The papers use bounds on the dimension of in place of description complexity, but this is enough to see how to get some kind of upper bound. Since , taking logs base gives us:
It remains to bound , but it seems to take X-ray vision just to see that a bound can give us anything nontrivial. OK, any fixed bound on makes the right-hand side only which yields a contradiction, so there is hope. The rank trick combines with degree-halving to pull a bound involving out of the hat. Here is the version by Ellenberg and Gijswijt where the nonce choice suffices and the coefficients on in are replaced by a general triple such that :
Lemma 1 With , , and as above, put to be the set of polynomials in that vanish on . Then for all there are at most values for which
Proof: Let and be vectors of variables, and write
where each coefficient is in and the sum is over pairs of monomials whose product has degree at most and at most in any variable. Collect the terms in which has total degree at most separately from those where does, so that we get
where each and is an arbitrary function and now the sum is over monomials of total degree at most (and still no more than in any variable if we care). Now look at the matrix whose entry is , including the diagonal where . We have
This is a sum of at most single-entry matrices, so the rank of is at most that. Since is a diagonal matrix and makes , there are at most nonzero values over .
Sawing the degree in half stacks up against . We retain freedom to choose (and possibly ) to advantage. There are still considerable numerical details needed to ensure this works and tweaks to tighten bounds—for which we refer to the papers—but we have shown the “Pledge,” the “Turn,” and the “Prestige” of the argument.
Can you find more applications of the polynomial technique besides those enumerated in the papers and posts we have linked? For circuit complexity we’d not only like to go from back to as CLP have it, but also get results for when is not a prime power. Can we make assumptions (for sake of contradiction) that create situations with higher “leverage” than merely being disjoint?
[changed subtitle; linked hoc-est-corpus which literally means, “here is the body”; deleted and changed remarks before “Open Problems”; inserted tighter sum into description complexity formula.]
Shiteng Chen and Periklis Papakonstaninou have just written an interesting paper on modular computation. Its title, “Depth Reduction for Composites,” means converting a depth-, size- circuit into a depth-2 circuit that is not too much larger in terms of as well as .
Today Ken and I wish to talk about their paper on the power of modular computation.
One of the great mysteries in computation, among many others, is: what is the power of modular computation over composite numbers? Recall that a gate outputs if and otherwise. It is a simple computation: Add up the inputs modulo and see if the sum is . If so output , else output . This can be recognized by a finite-state automaton with states. It is not a complex computation by any means.
But there lurk in this simple operation some dark secrets. When is a prime the theory is fairly well understood. There remain some secrets but by Fermat’s Little Theorem a gate has the same effect as a polynomial. In general, when is composite, this is not true. This makes understanding gates over composites much harder: simply because polynomials are easy to handle compared to other functions. As I once heard someone say:
“Polynomials are our friends.”
Chen and Papakonstaninou (CP) increase our understanding of modular gates by proving a general theorem about the power of low depth circuits with modular gates. This theorem is an exponential improvement over previous results when the depth is regarded as a parameter rather than constant. Their work also connects with the famous work of Ryan Williams on the relation between and .
We will just state their main result and then state one of their key lemmas. Call a circuit of , , , and gates (for some ) an -circuit.
Theorem 1 There is an efficient algorithm that given an circuit of depth , input length , and size , outputs a depth-2 circuit of the form of size , where denotes some gate whose output depends only on the number of s in its input.
This type of theorem is a kind of normal-form theorem. It says that any circuit of a certain type can be converted into a circuit of a simpler type, and this can be done without too much increase in size. In complexity theory we often find that it is very useful to replace a complicated type of computational circuit with a much cleaner type of circuit even if the new circuit is bigger. The import of such theorems is not that the conversion can happen, but that it can be done in a manner that does not blow up the size too much.
This happens all through mathematics: finding normal forms. What makes computational complexity so hard is that the conversion to a simpler type often can be done easily—but doing so without a huge increase in size is the rub. For example, every map
can be easily shown to be equal to an integer-valued polynomial with coefficients in provided is a finite subset of . For every point , set
where the inner product is over the finitely many that appear in the -th place of some member of . Then is an integer and is the only nonzero value of on . We get
which is a polynomial that agrees with on .
Well, this is easy but brutish—and exponential size if is. The trick is to show that when is special in some way then the size of the polynomial is not too large.
One of the key insights of CP is a lemma, Lemma 5 in their paper, that allows us to replace a product of many gates by a summation. We have changed variables in the statement around a little; see the paper for the full statement and context.
Lemma 5 Let be variables over the integers and let be relatively prime. Then there exist integral linear combinations of the variables and integer coefficients so that
The value of can be composite. The final modulus can be in place of and this helps in circuit constructions. Three points to highlight—besides products being replaced by sums—are:
Further all of this can be done in a uniform way, so the lemma can be used in algorithms. This is important for their applications. Note this is a type of normal form theorem like we discussed before. It allows us to replace a product by a summation. The idea is that going from products to sums is often a great savings. Think about polynomials: the degree of a multi-variate polynomial is a often a better indicator of its complexity of a polynomial than its number of terms. It enables them to remove layers of large gates that were implementing the products (Lemma 8 in the paper) and so avoids the greatest source of size blowup in earlier constructions.
A final point is that the paper makes a great foray into mixed-modulus arithmetic, coupled with the use of exponential sums. This kind of arithmetic is not so “natural” but is well suited to building circuits. Ken once avoided others’ use of mixed-modulus arithmetic by introducing new variables—see the “additive” section of this post which also involves exponential sums.
The result of CP seems quite strong. I am, however, very intrigued by their Lemma 5. It seems that there should be other applications of this lemma. Perhaps we can discover some soon.
]]>
A way to recover and enforce privacy
McNealy bio source |
Scott McNealy, when he was the CEO of Sun Microsystems, famously said nearly 15 years ago, “You have zero privacy anyway. Get over it.”
Today I want to talk about how to enforce privacy by changing what we mean by “privacy.”
We seem to see an unending series of breaks into databases. There is of course a huge amount of theory literature and methods for protecting privacy. Yet people are still broken into and lose their information. We wish to explore whether this can be fixed. We believe the key to the answer is to change the question:
Can we protect data that has been illegally obtained?
This sounds hopeless—how can we make data that has been broken into secure? The answer is that we need to look deeper into what it means to steal private data.
The expression “the horse has left the barn” means:
Closing/shutting the stable door after the horse has bolted, or trying to stop something bad happening when it has already happened and the situation cannot be changed.
Indeed, our source gives as its main example: “Improving security after a major theft would seem to be a bit like closing the stable door after the horse has bolted.”
Photo by artist John Lund via Blend Images, all rights reserved. |
This strikes us as the nub of privacy. Once information is released on the Internet, whether by accident or by a break-in, there seems to be little that one can do. However, we believe that there may be hope to protect the information anyway. Somehow we believe we can shut the barn door after the horse has left, and get the horse back.
Suppose that some company makes a series of decisions. Can we detect if those decisions depend on information that they should not be using. Let’s call this Post-Privacy Detection.
Consider a database that stores values where is an -bit vector of attributes and is a attribute. Think of as small, even a single bit such as the sex of the individual with attributes . Let us also suppose that the database is initially secure for insofar as given many samples of the values of only, it is impossible to gain advantage in inferring the values of . Thus the leak of is meaningful information.
Now say a decider is an entity that uses information from this database to make decisions. has one or more Boolean functions of the attributes. Think of as a yes/no on some issue: granting a loan, selling a house, giving insurance at a certain rate, and so on. The idea is that while may not be secret—the database has been broken into—we can check that in aggregate that is effectively secret.
The point here is that we can detect if is being used in an unauthorized manner to make some decision, given protocols for transparency that enable sampling the values . If given a polynomial number of samples we cannot tell ‘s within then we have large-scale assurance that was not material to the decision. Our point is this: a leak of values about individuals is material only if they are used by someone to make a decision that should not depend on their “private” information. Thus if a bank gets values of , but does not use them to make a decision, then we would argue that that information while public was effectively private.
Definition 1 Let a database contain values of the form , and let be a Boolean function. Say that the part is effectively private for the decision provided there is another function so that
where . A decider respects if is effectively private in all of its decision functions.
We can prove a simple lemma showing that this definition implies that is not compromised by sampling the decision values.
Lemma 2 If the database is secure for and is effectively private, then there is no function such that .
Proof: Suppose for contradiction such an exists. Also suppose for avoiding contradiction of effective privacy that a function as above exists. Then given , we obtain with probability . Then using we obtain with overall probability at least . This contradicts the initial security of the database for .
To be socially effective, our detection concept should exert influence on deciders to behave in a manner that overtly does not depend on the unauthorized information. This applies to repeatable decisions whose results can be sampled. The sampling would use protocols that effect transparency while likewise protecting the data.
Thus our theoretical notion would require social suasion for its effectiveness. This includes requiring deciders to provide infrastructure by which their decisions can be securely sampled. It might not require them to publish their -oblivious decision functions , only that they could—if challenged—provide one. Most of this is to ponder for the future.
What we can say now, however, is that there do exist ways we can rein in the bad effects of lost privacy. The horses may have bolted, but we can still exert some long-range control over the herd.
Is this idea effective? What things like it have been proposed?
]]>
From knight’s tours to complexity
Von Warnsdorf’s Rule source |
Christian von Warnsdorf did more and less than solve the Knight’s Tour puzzle. In 1823 he published a short book whose title translates to, The Leaping Knight’s Simplest and Most General Solution. The ‘more’ is that his simple algorithm works for boards of any size. The ‘less’ is that its correctness remains yet unproven even for square boards.
Today we consider ways for chess pieces to tour not 64 but up to configurations on a chessboard.
Von Warnsdorf’s rule works only for the ‘path’ form of the puzzle, where the knight is started in a corner of an board and must visit all the other squares in hops. It does not yield a final hop back to start to make a Hamilton cycle. The rule is always to move the knight to the available square with the fewest connections to open squares. In case of two or more tied options, von Warnsdorf incorrectly believed the choice could be arbitrary, but simple tiebreak rules have been devised that work in all known cases. More-recent news is found in papers linked from a website maintained by Douglas Squirrel of Frogholt, England. We took the above screenshot from his animated implementation of the rule when the knight, having started in the upper-left corner, is a few hops from finishing at upper right.
The first person known to have published a solution was the Kashmiri poet Rudrata in the 9th century. He found a neat way to express his solution in 4 lines of 8-syllable Sanskritic verse that extend to an 8×8 solution when repeated. In modern terms he solved the following:
Color the squares so that for all k, the k-th square of the tour has the same color as the k-th square in row-major order—in other words, the usual way of reading left-to-right and down by rows—while maximizing the number m of colors used.
Note that we can guarantee by starting in the upper-left corner and using a different color for all other squares. However, the usual parity argument with the knight doesn’t even let us 2-color the remaining squares to guarantee because the last square of the first row and the first square of the second row have the same parity. Rudrata achieved for the upper half with cell 21 also a singleton color; this implies for the whole board and for . Can it be beaten? Most to our point, is there a “Rudrata Rule” for as simple as von Warnsdorf’s?
We now put a coin heads-down on each square. Our chess pieces are going to move virtually through the space by flipping over the coins in squares they attack. Our questions will be of the form, can they reach all configurations, and if not:
How small can Boolean circuits be to recognize the set of reachable strings?
Let’s warm up with a different problem. Suppose the coins are colored not embossed so you cannot tell by touch which side is which, and the room is pitch dark. You are told that k of the coins are showing heads but not which ones. You must take some of the coins off the board, optionally flipping some or all while placing them nearby on the table. The lights are then switched on, and you win if your coins have the same number of heads as the ones left on the board. Can you always win?
I may have seen this puzzle as a child but it was fresh when I read it here. Our point connecting to this post is that the solution, which can be looked up here, is simple in terms of k and so can be computed by tiny Boolean circuits.
Since the tours will be reversible, we can equally well start with any coin configuration and ask whether the piece can transform it to the all-tails state. This resembles solving Rubik’s Cube. We’ll try each chess piece one-by-one, the knights last.
Our rook can start on any square. It flips each coin in the same row or column (“rank” and “file” in chess parlance) as the square it landed on. Then it moves to one of those squares and repeats the flipping. If it moved within a rank then the coins in that row will be back the way they were except that the two the rook was on will be flipped. We can produce a perfect checkerboard pattern by moving the rook a1-c1-c3-c5-e5-g5-g7 then back g5-c5-c1. Since order doesn’t matter and operations from the same square cancel, this has the same effect as doing a1, c3, e5, and g7 “by helicopter.”
Since the rook always attacks 14 squares, an even number of coins flip at each move, so half the space is ruled out by parity. There is however a stronger limitation. Each rook flip is equivalent to flipping the entire row and then the entire column. We can amplify the rook by allowing row and column flips singly. But then we see that there are only 16 such operations. Again since repeats cancel, this means at most configurations are possible. We ask:
Is there a simple formula, yielding small Boolean circuits, for determining which configurations are reachable on an board?
We can pose this for the Rook, with-or-without “helicoptering,” and for the row-or-column flips individually. Small circuits would mean that strings in denoting reachable configurations enjoy a particular form of succinctness.
Since the rook fails to tour the whole exponential-sized space, let’s try the bishop.
The bishop can flip any odd number of coins from 7 to 13. It is limited to squares of one color but we can allow the opposite-color bishop to tag-team with it. I was just about to pose the same questions as above for the bishops when a familiar imperious voice swelled behind me. It was the Red Queen.
“I have all the power of your towers and prelates—and you need only one of me. I shall surely fill the space.”
I was no one to stand in her way, but the Dormouse awoke and quietly began scratching figures on paper. “Besides the sixteen ranks and files, there are fifteen southeast-to-northwest diagonals, including the corner squares a1 and h8 by themselves. And there are fifteen southwest-to-northeast diagonals. This makes only 16 + 15 + 15 = 46 64 operations. Hence, Your Majesty, even if we could parcel out your powers, you could fill out at most a fraction of the space.”
I expected the Red Queen to yell, “Off with his head!” But instead she stooped over the Dormouse and hissed,
“Sorry—I slept through the rest of Alice,” explained the Dormouse as he slunk away. Despite the Dormouse’s proof I thought it worth asking the same questions as for the rook and bishop about the queen’s subspace . What kind of small formulas or circuits can recognize it, whether requiring her to flip all coins in all directions or allowing to flip just one rank or file or diagonal at a time?
While I was wondering, His Majesty quietly strode to the center and said,
“I do not wantonly project power without bound; I reserve my influence so that my action on every square is distinctive.”
We can emphasize how far things stay distinctive by posing our basic questions in a more technical manner:
Do the sixty-four vectors over representing the king’s flipping action on each square span the vector space ? If not, what can we say about the circuit complexity of the linear subspace they generate?
On a board the four -vectors form a basis, but for and the king fails to span. For , kings in the two lower corners produce the same configuration as kings in the two upper corners. For , kings in a ring on a2, b4, d3, and c1 flip just the corner coins, as do the kings in the mirror-image ring. What about and ? Is there an easy answer?
Meeker still are the pawns, who attack only the two squares diagonally in front, or just one if on an edge file. They cannot attack their first rank, nor the second in legal chess games, but opposing pawns can. Then it is easy to see that the pawn actions span the space. The lowly contribution by the edge pawn is crucial, since it flips just one coin not two.
The knight flips all the coins a knight’s move away. One difference from the queen, rook, bishop, and king is that on its next move all the coins it flips will be new. Our revised Knight’s Tour question is:
Can the knight connect the string to any configuration by a sequence of knight’s moves, perhaps allowing multiple visits to some squares? Or if we disallow multiple visits in a tour, can we do it by “helicoptering”? Same questions for boards. If the answer is no, then are there easy formulas or succinct circuits determining the space of reachable configurations?
An example for needing multiple visits or helicoptering is that the configuration with heads on c2,b3 and g6,f7 is produced by knights acting in the corners a1 and h8, which are not connected by a knight’s move. If there is some other one-action-per-square combination that produces it, then by simple counting the knight cannot span—even with helicoptering.
The knight does fail to span a board because the corner d4 produces the same result as the knight on a1: heads on c2 and b3. The regular knight’s tour fails too on a so this can be excused for the same “lack of legroom” reason. What about and higher?
Thus having coins on the chessboard scales up some classic tour problems exponentially. Our larger motivation is what the solutions might tell us about complexity.
Do you like our exponential “tour” problems? Really they are reachability problems. Can you solve them?
Will von Warnsdorf’s rule ever be proved correct for all higher n?
Note: To update our recent quantum post, Gil Kalai released an expanded version of his AMS Notices article, “The Quantum Computer Puzzle.” We also congratulate him on being elected an Honorary Member of the Hungarian Academy of Sciences.
Akram Boukai is a researcher in material science, and an expert on converting heat to electric energy: thermoelectrics.
Today I wish to talk about a beautiful presentation he just gave at TTI/Vanguard in San Francisco.
Thermoelectrics is an effect that seems to have nothing to do with our usual topics. But Boukai uses a mathematical trick to make a “new” type of material. This material has to have quite special properties, and he is able to make it by using ideas that we are familiar with in theory. This is a great example, I believe, of theory interacting with technology.
Boukai presented his work at TTI/Vanguard, which is a conference I have talked about before—see here. It is oriented toward the future of technology of all kinds, with a special emphasis on electronic and computer technology. The talks often highlight new technologies, many of which are being developed by startups. This is the case with Boukai, who is co-founder of the company Silicium Energy. They are attempting to build components that will radically change how we power small devices. This is especially relevant to IoT—that is, the “Internet of Things.” Think watches, for example, that never need to be recharged.
In order to understand the math problem we need at least a high level understanding of the Seebeck effect, named after Thomas Seebeck. He discovered in 1821 that a compass needle is deflected, if it is connected to a loop that contains two metals, provided there is a temperature difference between the metals. Wikipedia’s diagram illustrates the underlying phenomenon:
The compass needle moves because the electrons in the metals act differently owing to their temperature difference, and thereby create an electrical current. This current then induces a magnetic field that moves the needle. Seebeck named this phenomenon the thermomagnetic effect, which is really wrong. The primary effect is the creation of an electrical flow—this was renamed to “thermoelectricity” by Hans Ørsted. Wrong or not, it is still called the Seebeck effect—he may have guessed how it worked incorrectly, but he discovered the effect.
Thus, the goal is to try and extract energy from a small heat difference. For example, Silicium Energy plans to use this method to build watches that need no recharging. The watches would exploit that while on your wrist there is a natural source of a heat difference: we are warm and the air around us is usually cooler. So by the Seebeck effect there will be an electrical current. The amount of energy created is tiny, but it will be large enough to power the processor in a modern digital watch.
This sounds doable. Yet it is tricky. The problem is getting a material that is a great conductor of electrons, but a poor conductor of heat. The insight that Boukai’s company is based on is that this can be made out of silicon. The advantage of using silicon and not some exotic materials, which have been used before, is cost. Silicon devices can be made using standard technology, for pennies per device, while exotic materials can be very expensive.
Being able to turn silicon into a thermoelectric material and do it at low cost is quite a feat. Silicon has good electrical properties, but also is a pretty good conductor of heat. The trick is to find a way to lower the thermal conductivity of silicon in order to increase its thermoelectrical efficiency. Lowering the thermal conductivity makes it easier to keep the cold side of the device cold to create that temperature difference needed by the Seebeck effect.
Boukai and his co-workers’ clever idea—finally—is to fabricate a piece of silicon that uses its structure to make a material that conducts electrons well and conducts heat poorly. Here is how he does this: Imagine a square of silicon, with the top side hot and the bottom cold. Initially—by the Seebeck effect—electrons will move from the top to the bottom and create a current. This is wonderful. However, the problem is that heat will quickly also flow from the hot top to the cold bottom and will make the Seebeck effect stop.
The trick is to make random defects, essentially holes, in the silicon. The point is this:
We thank him for sending the following picture:
Note: in physics, a phonon is a collective arrangement of atoms or molecules in a solid. They play a key role in the transport of heat. Boukai’s trick depends on the size of phonons, which are much larger than electrons. This explains why electrons are pictured as scooters and phonons as trucks.
I know this is a rough explanation, but I believe it is a reasonable description of what happens. And the fact that heat flows as a random-walk type process yields in practice a fold decrease in the silicon’s thermal conductivity. This keeps the watch running. See his joint paper for more technical details.
The ideas of random behavior and statistical mechanics have been around in physics for a long time. Karl Pearson coined the term “random walk” in 1905, the same year as Albert Einstein’s famous paper on Brownian motion. Ising models partly motivated the concept of , and Markov chains were long studied in physics before becoming a staple of computer theory. So there is no chicken-egg question about which methods came first where.
What strikes Ken and me as distinctively algorithmic, however, is the way the silicon materials are being programmed to have a physical property directly. This is different and feels more qualitative than programming logic gates on silicon. Of course there are other cases of mathematical structure and algorithmic behaviors being used to create new materials—witness the recent Nobel Prizes for work on graphene and quasicrystals.
I really liked the trick used here. Is there some other application where we could imagine using it to make some other new material, or even to use the trick abstractly in some algorithm?
]]>
Some fun rejection comments
Joshua Gans and George Shepherd were doctoral students in economics at Stanford University back in the 1990s. They wrote an interesting paper that I just came across titled, “How Are the Mighty Fallen: Rejected Classic Articles by Leading Economists.” It grew into a 1995 book edited by Shepherd: Rejected: Leading Economists Ponder the Publication Process.
Today I want to discuss the same issue in our area of theory.
Ken and I have not had a chance to do a formal survey of papers that were rejected in our area. We also would not do exactly the same as Gans and Shepherd since it’s not what happens to the “mighty” that matters most but rather to the great band of those doing productive and creative and sporadically uneven work. Our point is rather that all of us who write articles for conferences and journals are subject to sporadically uneven reviews.
So we will today just offer a few things from personal experience to season the grill. We are mostly interested in negative comments from bad reviews. We could also touch on the opposite, heroic reviews that found subtle mistakes—or maybe mistakes missed by everyone including the referees.
I once got the following comment back from a top theory conference in the rejection e-mail:
The authors assume incorrectly that the graph has an even number of vertices in Lemma
The graph in question was a cubic graph. By what is sometimes called the First Theorem of graph theory, all cubic graphs have this property. Just double-count edge contributions and one gets that
where is the number of vertices and the number of edges. I assume the referee was overwhelmed with work, but
I once submitted a short paper, joint with Andrea LaPaugh and Jon Sandberg, to the Hawaii International Conference on System Sciences and it was accepted. Well, sort-of accepted. The head of the conference asked me to make the paper “longer.” I asked back:
“What was missing? Was the problem not motivated? Was the proof unclear?” And so on.
The head simply replied: “we like longer papers.” I pushed and said I thought making a paper longer for no reason seemed wrong. He responded that the paper was now unaccepted.
I could not believe it. We quickly sent it off to an IEEE journal. Don Knuth handled it, and it soon was accepted with minor changes only. By the way the paper solved a simple question: what is the best way to store a triangular array in memory?
At the presentation of the Knuth Prize to Leonid Levin the following story was told about reviews:
Leonid once submitted a paper to a journal and got back a negative review: It said that the paper was too short and also terms were used before they were defined. Leonid responded by taking two identical copies of his paper, stapling them together, and resubmitting the “new” paper. It was now twice as long, which answered the first issue, and clearly all terms were defined before they were used.
It is unclear what happened to the paper.
Then there is the folklore rejection letter:
What is correct in your paper is known, and what is new is wrong.
I hope to never get this one.
We’d love to hear from you with your own examples of strange reviews.
I submitted the above post last week to my blog editor but didn’t hear back—I assumed he was overwhelmed with work. Then he replied and asked me to make the post “longer.” I asked back:
“What was missing? Was the issue not motivated? Was the evidence unclear?” And so on.
The editor simply replied, “it’s a bit thin.” I pushed and said I thought making a post longer for no reason seemed wrong. This editor at least gave some concrete suggestions:
Use something from the featured paper.
Gans and Shepherd give one interesting kind of example where the delay caused by rejection enabled others with similar ideas to get ahead—not on purpose by the rejecter but just-so. They also give some self-revealing quotes including this one by the economist Paul Krugman:
The self-serving answer [to the “why me?” question] is that my stuff is so incredibly innovative that people don’t get the point. More likely, I somehow rub referees and editors the wrong way, maybe by claiming more originality than I really have. Whatever the cause, I still open return letters from journals with fear and trembling, and more often than not get bad news. I am having a terrible time with my current work on economic geography: referees tell me that it’s obvious, it’s wrong, and anyway they said it years ago.
Use others’ personal examples or famous ones.
Ken recently heard a true giant admit he gets rejections, “often because the referees don’t believe this work is really new.” Perhaps Krugman’s last clause means the same?
A famous case in our field was the number of times the first interactive proofs paper by Shafi Goldwasser and Silvio Micali was rejected from FOCS and STOC before finally appearing at STOC 1985 with Charles Rackoff as third author. Ken recalls people excitedly telling him and everyone about the work at the FCT conference in Sweden in August 1983. This could have become an example like in the Gans-Shepherd paper of others pipping ahead, but happily didn’t.
Try to source the quotation at the end.
A version of it was used by Christopher Chabris and Joshua Hart at the end of their negative review in the New York Times last month of the book The Triple Package:
Our conclusion is expressed by the saying, “What is new is not correct, and what is correct is not new.”
In March, Ken took part in an online discussion with Chabris about sourcing it. Ken recalled hearing it in the early 1980s in this snarkier form:
“This paper has content that is novel and correct. However, the parts that are novel are not correct, and the parts that are correct are not novel.”
It was already then a widely-known math cliché. Someone else in the discussion sourced it to the slamming of John Keynes’ book, The General Theory of Employment, Interest, and Money, by Henry Hazlitt in the introduction to his 1960 book, The Critics of Keynsian Economics:
In spite of the incredible reputation of the book, I could not find in it a single important doctrine that was both true and original. What is original in the book is not true, and what is true is not original. In fact, even most of the major errors in the book are not original, but can be found in a score of previous writers.
We wonder if any of our readers can find an earlier source? When did “not true” become “not new”? In any event it was one tough review.
Was my referee right about lengthening the post?
[photo at top; format fixes]