Cropped from S.I. Kids source |
John Urschel is a PhD student in the Applied Mathematics program at MIT. He has co-authored two papers with his Penn State Master’s advisor, Ludmil Zikatanov, on spectra-based approximation algorithms for the -complete graph bisection problem. A followup paper with Zikatanov and two others exploited the earlier work to give new fast solvers for minimal eigenvectors of graph Laplacians. He also plays offensive guard for the NFL Baltimore Ravens.
Today Ken and I wish to talk about a new result by the front linesman of -completeness, Dick Karp, about football.
Karp—Dick—-is a huge fan of various sports. Recently at Avi Wigderson’s birthday conference, held at Princeton’s IAS, he told me a neat new result on how to play football. It concerns a basic decision that a coach faces after his team scores a touchdown: kick for one extra point or attempt a riskier “two-point conversion” play.
We wonder whether players like Urschel might be useful for blocking out these decisions, more than blocking and opening “holes” for runners. By the way Urschel, who attended Canisius High School in Buffalo before his B.S. and M.S. plus a Stats minor at Penn State, is no stranger to IAS. Here he is with my friend Peter Sarnak there last year:
AMS Notices feature source |
Thus he is doing all the things we would advise a young theorist or mathematician to do: publish, circulate, talk about interesting problems, get on a research team, and open up avenues for deep penetration and advances by teammates.
After a touchdown is made the team scoring gets 6 points. It then has an option:
Traditionally the right call in most game situations is (1). Usually the kicker can make the ball go through the posts most of the time, while getting the ball into the end zone is much more difficult. Of course at the end of a game there may be reasons to try for the 2 points. If the game is about over and you need 2 points to tie, that is probably the best play.
Karp set up a basic model of this decision. His model is a bit idealized, but I expect that he can and will make it more realistic in the future. The version I heard over a wonderful dinner that Kathryn Farley, my wife, set up for a small group of friends was not the proper venue to go into various technical details. So I will just relay his basic idea.
In his model we make several assumptions:
The last clause means that we’re modeling the choice by an infinite walk. If you wish you may subtract 1 so that a kick gives 0, a successful play gives +1, but an unsuccessful one gives -1. Karp’s question is this:
What should the coach do? Always kick or sometimes go for two?
You might think about it before reading on for Karp’s answer.
His insight is this:
Theorem 1 (Fundamental Theorem of Football?) The optimal strategy is initially always to go for two. If after some number of tries you have succeeded times, so that you are ahead of what kicking would have brought, switch over to kicking.
Ken’s first reaction was to note a difference from “gambler’s ruin” which means to double-down after every lost 50-50 bet. In football this would mean that after you missed one conversion play, the next try would bring +3 points on success but subtract 1 from your score if you missed. Next time you could go for 5 but failure would cost 3. If you think in the +1/-1 terms compared to 0 for kicking, then this is the classic martingale system of doubling the bet 1,2,4,8… until you win and net +1. The ruin for gamblers is the chance of swiftly going bankrupt—but in football you can only lose one game.
However, we are not allowing doubling the bet either. It’s the classic random walk situation: right or left along the number line with equal probability, except that you can elect to stop and stay where you are. With probability 1, such a walk starting at 0 will reach a stage in positive territory at +1 net, and then we stop.
A real game, of course, does not have unboundedly many touchdowns—though some college games I’ve watched have sure felt like it. So if you miss a few two-point tries and the game is deep into the second half, you’re left holding the bag of foregone points.
The question comes down to, what is the utility of nosing ahead by the extra point when you succeed, compared to being down more when you fail? How likely are game scenarios where that one extra point is the decider? To be concrete, suppose you score 3 touchdowns in the first three quarters. Following Karp’s strategy nets you an extra point 5/8 of the time: succeed the first time then kick twice, or fail the first time and succeed twice. Just 1/4 of the time you’ve lost a point, but 1/8 of the time you’re net -3 and need an extra field goal to get back to par.
There are late-game situations where the extra point is worth so much that it pays to go for two even with a chance of success that is under 40%. Suppose you are down 14 points and score a touchdown with 4 minutes left. You have enough time to stop the other team and get the ball back again for one more drive, but that’s basically it. You have to assume you will score a touchdown on that drive too, so the only variable is the conversion decision. The issue is that if you kick now and kick again, you’ve only tied the game and have a 50% win chance in overtime. Whereas if you go for two you have an chance of winning based on this figuring, plus if you fail you can still tie and get to overtime after your next TD. Thus your win expectation is
which crosses 50% when
When you have a expectation by going for two. Yet for human reasons, this is not on the standard chart of game situations calling for a two-point try. The human bias is toward maximizing your chances of “staying in the game” which is not the same as maximizing winning. There was a neat analysis of a similar situation in chess last year.
The challenging question to deepen Karp’s insight is, how far can we sensibly broaden this kind of analysis? Does this observation apply in real games? It seems to call for a big modeling and simulation effort, guided by theory where we might vary Karp’s simple rule and adjust for different probabilities of success on both the two-point tries and the kicks. This would bring it into the realm of machine learning with high-dimensional data, and per remarks on Urschel’s MIT homepage, perhaps he is headed that way.
What do you think of the insight for football strategy? We could also talk about when (not) to punt…
Urschel missed this season’s first three games but is starting at left guard right now for the Ravens, who are beating up on my Giants 10-0 as we go to post. Oh well. Of course he reminds me I once took classes from an NFL quarterback who similarly “went for two” in football and mathematics. We wish Urschel all the best and will follow his careers with interest. Enjoy the games today and tomorrow night.
[de-LaTeXed numerals for better look]
Jamie Morgenstern is a researcher into machine learning, economics, and especially mechanism design.
Today Ken and I would like to discuss a joint paper of hers on the classic problem of matching schools and students.
The paper is titled, “Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy).” It is joint with Sampath Kannan, Aaron Roth, and Zhiwei Wu. Let’s call this paper (KMRW). She gave a talk on it at Georgia Tech last year.
There are various instances of matching-type problems, but perhaps the most important is the NRMP assignment of graduating medical students to their first hospital appointments. In 2012 Lloyd Shapley and Alvin Roth—Aaron’s father—were awarded the Nobel Prize in Economics “for the theory of stable allocations and the practice of market design.”
The original matching problem was that of marrying females and males, on which we just posted about a recent joint paper of Noam Nisan. Here is a standard description of the stable matching problem (SMP):
Assume there are men and women. Further each has a linear ranking of the members of the opposite sex. Find a way to marry all the men and all the women so that the assignment is stable. This means that no two people would both rather marry each other instead of the partners they are assigned to.
In 1962, David Gale and Shapley proved that any SMP always has a solution, and even better, they gave a quadratic time algorithm that finds it. SMP as stated is less practical than the problem that results if the men are also allowed to marry each other, likewise the women, with everyone ranking everyone else as partners. But in the same famous paper, Gale and Shapley solved an even more realistic problem:
The hospitals/residents problem—also known as the college admissions problem—differs from the stable marriage problem in that the “women” can accept “proposals” from more than one “man” (e.g., a hospital can take multiple residents, or a college can take an incoming class of more than one student). Algorithms to solve the hospitals/residents problem can be hospital-oriented (female-optimal) or resident-oriented (male-optimal). This problem was solved, with an algorithm, in the original paper by Gale and Shapley, in which SMP was solved.
Let’s call this the college admissions problem (CAP).
Consider adding another condition to CAP. Stability is clearly an important requirement—without it there would be students and colleges that would prefer to swap their choices. Having stability which avoids such a situation is quite nice. It does not force the assignment to be perfect in any sense, but does make it at least a “local” optimal solution. This is important, and it is used in real assignment problems today.
Yet there is a possibility that students, for example, could “game” the system. What if a student selected a strange ordering of schools that they claim they wish to attend. But the list is not really what they want. Why would they do this? Simply they may be able to lie about their preferences to insure that they get their top choice.
The basic point seems easiest to illustrate in the non-gender, single-party case, so let’s say need to form two pairs. They have circular first choices and second choices in the opposite circle:
Let’s say happens. and got their second choices, but it is not to their advantage to room together. They could have had their first choices under the also-stable configuration but there was no way to force the algorithm to choose it. However, let’s suppose and lie and declare each other to be their second choices:
The algorithm given these preferences sees as unstable since would join with , and the resulting as unstable because and would prefer each other. The resulting stable now gives and their first choices. Given what they may have known about ‘s and ‘s preference lists, there was no danger they’d actually have to pair up. Examples in the two-gender and student-college cases need six actors but the ideas are similar.
The basic Gale-Shapley algorithm for the two-party case actually has two “poles” depending on which party has its preferences attended to first. The current NRMP algorithm favors the applicants first. In college admissions one can imagine the schools going first. Alvin Roth proved that the poles had the following impact on truthfulness:
Theorem 5. In the matching procedure which always yields the optimal stable outcome for a given one of the two sets of agents (i.e., for or for ), truthful revelation is a dominant strategy for all the agents in that set.
Corollary 5.1. In the matching procedure which always yields the optimal stable outcome for a given set of agents, the agents in the other set have no incentive to misrepresent their first choice.
A dominant strategy here means a submitted list of preferences that the individual has no motive to change under any combination of preferences submitted by the other agents (on either side). The upshot is that the favored side has no reason to deviate from their true preferences, but the non-favored side has motive to spy on each other (short of colluding) and lie about their second and further preferences.
There has been much research on improving fairness by mediating between the “poles” but this has not solved the truthfulness issue. Can we structure the algorithm so that it still finds stable configurations but works in a way that both parties have incentive to be truthful and use their real orderings?
Unfortunately, Roth’s same paper proved that there is no method to solve SMP or CAP that is both stable and truthful. Only the above conditions on one side or the other are possible.
Roth’s negative result reminds me of the famous dialogue toward the end of the 1992 movie A Few Good Men:
Judge Randolph: Consider yourself in contempt!
Kaffee: Colonel Jessup, did you order the Code Red?
Judge Randolph: You don’t have to answer that question!
Col. Jessup: I’ll answer the question!
[to Kaffee]
Col. Jessup: You want answers?
Kaffee: I think I’m entitled to.
Col. Jessup: You want answers?
Kaffee: I want the truth!
Col. Jessup: You can’t handle the truth!
However, a recurring theme in theory especially since the 1980s is that often we can work around such negative results by:
Recall that in the college admissions case where the schools act first, the schools have no motive to do other than declare their true preferences, but the students might profit if they submit bogus second, third, and further preferences to the algorithm. The new KMRW paper states the following:
We present a mechanism for computing asymptotically stable school optimal matchings, while guaranteeing that it is an asymptotic dominant strategy for every student to report their true preferences to the mechanism. Our main tool in this endeavor is differential privacy: we give an algorithm that coordinates a stable matching using differentially private signals, which lead to our truthfulness guarantee. This is the first setting in which it is known how to achieve nontrivial truthfulness guarantees for students when computing school optimal matchings, assuming worst- case preferences (for schools and students) in large markets.
By “approximately stable” the KMRW paper means a relaxation of the third of three conditions on a function that maps students to colleges plus the “gap year” option . If left off his or her application list, that is read as preferring to . Symmetrically, may have an admission threshold that some students fall below.
The relaxation is that for some fixed , colleges may hold on admitting qualified students who might prefer them if they are within of their capacity:
There is also an approximation condition on the potential gain from lying about preferences. This requires postulating a numerical utility for student to attend college that is monotone in the true preferences. Given , say the preference list submitted by is -approximately dominant if for all other lists by —and all lists submitted by other students which induce the mappings with and with —we have
In fact, KMRW pull the randomization lever by stipulating the bound only on the difference in expected values over matchings produced by the randomized algorithm they construct. Then this says that no student can expect to gain more than utility by gaming the system.
Confining the utility values to ensures that any is larger than some differences if the number of colleges is great enough, so this allows some slack compared to strict dominance which holds if . It also enables the utility values to be implicitly universally quantified in the condition by which and “mesh” with , the number of students, and the minimum capacity of any college:
Theorem 1 Given and , there are universal constants and an efficient randomized algorithm that with high probability produces a feasible -approximately stable solution in which submitting the true preferences is -approximately dominant in expected value for each student, provided that and
This is an informal statement. The mechanism underlying the proof also works for not fixed: provided grows faster than , the approximate stability and probable-approximate truth-telling dominance can be achieved with and both tending toward . It is neat that the approximations are achieved using concepts and tools from differential privacy, which we have posted about before. By analogy with PAC, we might summarize the whole system as being “probably approximately truthful.”
The notion of truth is fundamental. In logic the notion of truth as applied to mathematical theories is central to the Incompleteness Theorems of Kurt Gödel and Alfred Tarski.
In economics the notion of truth is different but perhaps even more important. Imagine any set of agents that are involved in some type of interaction: it could be a game, an auction, or some more complex type of interaction. Typically these agents make decisions, which affect not only what happens to them, but also to the other agents. In our “” example above, the further point is that “gaming the system” by and lowered the utility for and . But and could have tried the same game.
The effect on others highlights that this is a basic problem. It seems best when the interaction does not reward agents that essentially “lie” about their true interests. This speaks our desire for a system that rewards telling the truth—and takes the subject into the area of algorithmic mechanism design which we also featured in the post on Nisan. Indeed truth is addressed in his Gödel Prize-winning paper with Amir Ronen, for instance defining a mechanism to be strongly truthful if truth-telling is the only dominant strategy.
That paper follows with a sub-sub-section 4.3.1 on “basic properties of truthful implementations”—but what I’m not finding in these papers are theorems that tell me why truthfulness is important in economic interactions. It sounds self evident, but is it? There are many papers that show one cannot force agents to be truthful, and there are other results showing cases in which the agents’ individual best interests are to be truthful after all. I understand why a solution to a matching problem should be stable, but am not convinced that it needs to be truthful. In mathematics we can define a property that we add to restrict solutions to some problem, but we usually need to justify the property. If we are solving a equation, we may restrict the answers to be integers. The reason could be as simple as non-integer answers making no sense, such as buying 14.67 cans of Sprite for a party.
I get that being truthful does stop some behavior that one might informally dislike. What I feel as an outsider to the research into matching problems is simple: where is the theorem that shows that adding truthful behavior has some advantage? It is true in the analogous case of auctions that they can be designed so that truthful bidding is provably a dominant strategy, and plausibly this matters to competitors agreeing to and paying for the auction mechanism. Perhaps there is a “meta” game level where mechanism designs are strategies and there is a communal payoff function, in which truth-inducing mechanism designs may be optimal strategies. But overall I am puzzled. Perhaps this just shows how naive I am about this line of work.
What are your reactions on the importance of inducing truth-telling? To show at least dewy diligence on our part, here are a few good references. What would Gödel—who conveyed certain of his own mechanism design opinions to his friend Oskar Morgenstern (no relation)—say?
]]>
The winner of the 2016 ACM-IEEE Knuth Prize
Coursera source |
Noam Nisan has been one of the leaders in computational complexity and algorithms for many years. He has just been named the winner of the 2016 Donald E. Knuth Prize.
Today we congratulate Noam and highlight some of his recent contributions to algorithmic economics and complexity.
I (Dick) think that Noam missed out on proper recognition in the past. I am thrilled that he is finally recognized for his brilliant work. I do differ with the impression given by ACM’s headline. His game theory work is clearly important, but his early work on pseudorandom generators was of first order. And personally I wonder if that almost alone would be enough to argue that Noam is one of the great leaders in complexity theory—let alone the work on communication complexity and interactive proofs.
I (Ken) had forgotten that he was the ‘N’ in the LFKN paper with Carsten Lund, Lance Fortnow, and Howard Karloff proving that interactive proofs can simulate the polynomial hierarchy, which presaged the proof of . The first sentence of the ACM article does list these three complexity areas before mentioning game theory. There are others: Noam co-authored with Nathan Linial and Yishay Mansour a seminal 1993 paper that promoted Fourier analysis to study Boolean circuits, and with Mario Szegedy a 1994 paper on the polynomial method that impacted a host of topics including Boolean function sensitivity, decision trees, and quantum query lower bounds.
In what Lance once termed “walking away” from complexity, Noam in the mid-1990s became interested in algorithmic economics. Really we should say it is the interface between complexity and economics. We can say that theory—in particular mathematical game theory—frames much of this interface, but the business end of it is social. The genius we note in his seminal paper with Amir Ronen, titled “Algorithm Mechanism Design,” is in mapping the social elements to structures in distributed computation and communication theory (routing and protocols as well as communication complexity) that were already developed. This paper was one of three jointly awarded the 2012 Gödel Prize. Regarding it, the Knuth citation says:
A mechanism is an algorithm or protocol that is explicitly designed so that rational participants, motivated purely by their self-interest, will achieve the designer’s goals. This is of paramount importance in the age of the Internet, with many applications from auctions to network routing protocols. Nisan has designed some of the most effective mechanisms by providing the right incentives to the players. He has also shown that in a variety of environments there is a tradeoff between economic efficiency and algorithmic efficiency.
The last part of this addresses a question of more general import to us in theory:
How much impact can complexity lower bounds and conditional hardness relationships have in the real world?
The impact is helped along when the lower bounds come from communication complexity. Bounds on the number of bits that must be communicated to achieve a common goal (with high probability) are generally more concrete and easier to establish than those in the closed system of classical complexity theory, whose internal nature makes the most central bound questions asymptotic.
Noam long ago co-authored the textbook Communication Complexity with Eyal Kushilevitz, and also co-edited the 2007 text Algorithmic Game Theory with Tim Roughgarden, Éva Tardos, and Georgia Tech’s own Vijay Vazirani. The whole package of great breadth as well as depth in foundational areas, imbued with computer science education, ably fits the Knuth Prize profile. But we are going to make a forward-looking point by exploring how some of his recent joint papers reach a synthesis that includes some implications for complexity.
“Public Projects, Boolean Functions, and the Borders of Border’s Theorem,” with Parikhshit Gopalan and Tim Roughgarden, in the 2015 ACM Conference on Economics and Computation. Border’s Theorem is an instance where the feasible space of an exponential-sized linear program has a natural projection onto the space of a polynomial-sized one. The linear programs express allocations of -many bid-for goods to bidders according to their expressed and private valuations of those goods. The paper shows complexity obstacles to extending this nice exponential-to-polynomial property to other economic situations.
The overall subject can be read about in two sets of lecture notes by Roughgarden. We adapt an example from section 2 of this lecture to give the flavor: Suppose you have 2 items for sale and one prospective buyer such that for each item there is a 50-50 chance the buyer is willing to pay $1 for it, but you lose $1 for each item you fail to sell (call it shelving cost). If you price each item at $1, you will expect to make one sale, reaping $1 but also losing $1 on the unsold item, for zero expected net revenue. That’s no better than if you gave each away for $0. If you set the price in-between, say $0.50, you will expect a net loss—because the buyer’s probability distribution of value is discrete not smooth. But if you bundle them at 2-for-$1, you will expect to sell the bundle three-fourths of the time, for expected net revenue ($1 + $1 + $1 – $2)/4 = $0.25, which is positive.
Our change from Roughgarden’s prices and values of $1,$2 is meant to convey that problems about (random) Boolean functions and set systems are lurking here. Suppose we have items; what is the optimal bundling strategy? Grouping them at $2-for-4 expects to net the same $0.125 per item as the above grouping $1-for-2. But grouping $1-for-3 expects to reap $1 seven-eights of the time and lose $3 one-eighth of the time, for $4/8 = $0.50 net for 3 items, which gives a slightly better $0.167 per item. Is this optimal? We could write a big linear program to tell—also in situations with other prices and distributions of buyer values and conditions.
In several of these situations, they show that a Border-like theorem would put a -hard problem into and hence collapse the polynomial hierarchy. A technical tool for later results in the paper is the expectation of a Boolean function over a random assignment and for each the expectation of , which constitute the zeroth and first-degree coefficients of the Fourier transform of . Certain -hard problems about vectors of these coefficients are related to economics problems for which smaller LPs would collapse them.
“Networks of Complements,” with Moshe Babaioff and Liad Blumrosen, in ICALP 2016. Imagine a situation where buyers will only buy pairs of items in bundles. We can make an undirected graph whose nodes are the items and each edge is weighted by the price the buyer is willing to pay. Graph theory supplies tools to analyze the resulting markets, while behavior for special kinds of graphs may yield insights about the structure of problems involving them.
“A Stable Marriage Requires Communication,” with Yannai Gonczarowski, Rafail Ostrovsky, and Will Rosenbaum, in SODA 2015. The input consists of permutations and of . A bijection is unstable if there exist such that and , otherwise it is stable. Figuratively, the latter inequality means that woman prefers man to her own husband and the former means that man likewise prefers woman to his own wife. There always exist “marriage functions” that are stable, but the problem is to find one.
This is a classic example of a problem whose best-known algorithms run in time order-of in worst case, but time in average case for permutations generated uniformly at random. This random model may not reflect “practical case” however, so the worst-case time merits scrutiny for possible improvement, perhaps under some helpful conditions. One avenue of improvement is not to require reading the whole input, whose size when the permutations are listed out is already order-of (ignoring a log factor by treating elements of as unit size), but rather allow random access to selected bits. These queries can be comparisons or more general. The paper finds a powerful new reduction from the communication complexity of disjointness (if Alice and Bob each hold an -bit binary string, verify there is no index where each string has a ) into preference-based problems to prove the query number must be even for randomized algorithms and even with certain extra helpful conditions.
“Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions,” with Parikhshit, Rocco Servedio, Kunal Talwar, and Avi, in the 2016 Innovations conference.
The sensitivity of (a family of) -variable Boolean functions is the minimum over of the number of adjacent to such that . Here adjacent means and differ in one bit. The functions are smooth if , or variously, if . One way to achieve is for to be computed by a depth- decision tree, since along each branch determined by an there are at most bits on which the branch’s assigned value depends. Noam, first solo and then in the 1994 paper with Szegedy, conjectured a kind of converse: there is a universal constant such that every has a decision tree of depth .
Decision trees are broadly more powerful in corresponding depth and size measures than Boolean circuits, which in turn are more powerful than formulas. Although belief in Noam’s conjecture has grown, no one had determined upper bounds for circuits and formulas. The paper gives circuits of size and formulas of depth with somewhat laxer size . The size is scarcely the same as a polynomial in , but it’s a start, and when is polylog, the size is quasipolynomial and the depth is polylog. This most-recent paper will hopefully jump-start work on proving the conjecture and fundamental work on Boolean functions.
Of course we have given only a cursory overview intended to attract you to examine the papers in greater detail, even skipping some of their major points which are welcome in comments.
We congratulate Noam warmly on the prize and his research achievements.
We can announce one new thing: LEMONADE. No it is not Beyoncé’s latest album but rather a new podcast series by Dick and Kathryn on positive approaches to problematic issues in academia, culture, and life in general.
[fixed “size”->”depth” in statement of sensitivity conjecture]
Sarah Cannon is a current PhD student in our Algorithms, Combinatorics, and Optimization program working with Dana Randall. Sarah has a newly-updated paper with Dana and Joshua Daymude and Andrea Richa entitled, “A Markov Chain Algorithm for Compression in Self-Organizing Particle Systems.” An earlier version was presented at PODC 2016.
Today Ken and I would like to discuss the paper, and relate it to some recent results on soft robots.
For starters let’s call the paper (CDRR)—after the authors last names. Being lazy let me also start by quoting part of their abstract:
We consider programmable matter as a collection of simple computational elements (or particles) with limited (constant-size) memory that self-organize to solve system-wide problems of movement, configuration, and coordination. Here, we focus on the compression problem, in which the particle system gathers as tightly together as possible, as in a sphere or its equivalent in the presence of some underlying geometry. More specifically, we seek fully distributed, local, and asynchronous algorithms that lead the system to converge to a configuration with small perimeter. We present a Markov chain based algorithm that solves the compression problem under the geometric amoebot model, for particle systems that begin in a connected configuration with no holes.
What does this mean? They imagine simple devices that lie on a 2-dimensional lattice. Each device operates in with the same rules: they can decide what to do next based only on their local environment; however, the devices have access to randomness. The goal is that the devices over time should tend, with high probability, to form a tightly grouped system. This is what they call the compression problem. The surprise is that even with only local interactions, such devices can form a configuration that close to as tight a configuration as possible. Roughly they will collapse into a region of perimeter order at most .
As I started to write this post, I discovered that are some neat results on soft robots. As usual, theorists like CDRR think about objects. They study the behavior of many simple devices and prove theorems that hold often for large enough . Their main one, as a stated above, is that there are devices that can solve the compression problem and get within a region of perimeter order .
On the other hand, practical researchers often start by studying the case of . I think both ends of the spectrum are important, and they complement each other. Since I am not a device physicist, I will just point out the highlights of the recent work on soft robots.
The above photo is of a “light-powered micro-robot capable of mimicking the slow, steady crawl of an inchworm or small caterpillar.” See this for more details.
This is research done by a team in Poland led by Piotr Wasylczyk, who writes:
Designing soft robots calls for a completely new paradigm in their mechanics, power supply and control. We are only beginning to learn from nature and shift our design approaches towards these that emerged in natural evolution.
Their soft robot is based on no motors or pneumatic actuators to make it move. Instead it uses a clever liquid crystal elastomer technology: when exposed to light the device moves like a caterpillar. Further the light can be adjusted to make the device move in different ways.
I have no idea if this work can be extended to , nor could it be used to implement even small numbers of the devices that CDRR require. I thought you might enjoy hearing about such creepy devices. Let’s turn to the mathematical work on the compression problem.
CDRR assume that they have a fixed structure, which is an infinite undirected graph. Their devices or particles sit on the vertices of this structure. This, of course, forces the particles to be a discrete system. As you might guess, the usual structures are fixed lattices. These lattices have periodic structure that makes the systems that result at least possible to understand. Even with this regular structure the global behavior of their particles can be quite subtle.
The models of this type have various names; the one they use is called the amoebot model as proposed in this 2014 paper. I like to think of them as soft robots creeping along the lattice.
Okay the above is a cartoon. Here are some more-illustrative figures from the 2014 paper:
Their “particles” reside in a triangular lattice and either sit at one vertex or occupy two adjacent vertices. Figuratively, the worm is either pulled together or is stretched out. They can creep to an adjacent vertex by stretching out and later contracting. They cannot hop as in Chinese checkers.
We don’t know if the game of Go can be played sensibly on a Chinese checkers board, but anyway these worms cannot play Go. A necessity in Go is forming interior holes called eyes. Although the movement rules crafted by CDRR are entirely local and randomized, the configuration of worms can never make an eye.
Let be a node occupied by a contracted worm and an adjacent empty node. Then let and be the two nodes adjacent to both and Define the neighborhood to include and , the other three nodes adjacent to , and the other three nodes adjacent to . Here is an alternate version of the rules in CDRR for it to be legal for to expand into and then contract into :
There are further rules saying initially that no node in may be part of an expanded worm and covering possible adjacent expanded worms after has expanded. However, their effect is to enable treating the nodes concerned as unoccupied, so that Markov chains built on these rules need only use states with all worms contracted. The rules are enforceable by giving each worm a fixed finite local memory that its neighbors can share.
The “not both” in the first rule subsumes their rule that the worm cannot have five occupied neighbors, which would cause an eye upon its moving to . The second and third rules preserve connectedness of the whole and ensure that the move does not connect islands at . The third rule also ensures that a path of open nodes through now can go . The rules’ symmetry makes a subsequent move back from to also legal, so that reversible.
Their chain always executes the move if the number of triangles the worm was part of on node is no greater than the number of triangles it joins on , and otherwise happens with probability where is a fixed constant. Given the rules, it strikes us that using the difference in neighbor counts as the exponent is equivalent. The whole system is concurrent and asynchronous, but is described by first choosing one worm uniformly at random for a possible move at each step, and then choosing an unoccupied neighbor at random.
Here is a configuration from their paper in which the only legal moves involve the third rule:
The edges denote adjacent contracted worms, not expanded worms. Suppose the chosen node is the one midway up the second column at left. It forms a triangle with its two neighbors to the left, so The node above is vacant, but moving there would close off a big region below it. The move is prevented by the second rule because would comprise four nodes that are not all connected within Similarly the node below is forbidden. Either node to the right of is permitted by rule 3, however. Since the move will certainly happen if is randomly chosen.
This suggests there could be examples where the only legal moves go back and forth, creating cycles as can happen in John Conway’s game of Life. However, CDRR show that every connected hole-free n-worm configuration is reachable from any other. This makes the Markov chain ergodic. Their main theorem is:
Theorem 1 For all and , and sufficiently large , when is run from any connected, hole-free -worm configuration, with all but exponentially vanishing probability it reaches and stays among configurations with total perimeter at most times the minimum possible perimeter for nodes.
The second main theorem shows a threshold of for the opposite behavior: the perimeter stays at size .
The researchers CDRR are experts at the analysis of Markov chains. So they view their particles as such a system. Then they need to prove that the resulting Markov system behaves the way they claim: that as time increases they tend to form a tight unit that solves the compression problem.
Luckily there are many analytical tools at their disposal. But regarding the ergodicity alone, they say:
We emphasize the details of this proof are far from trivial, and occupy the next ten pages.
Their particles are pretty simple, but to prove that the system operates as claimed requires quite careful analysis. Take a look at their paper for the details.
I (Dick) will make one last detailed comment. They want their system to operate completely locally. This means that there can be no global clock: each particles operates asynchronously. This requires some clever ideas to make it work: they want each particle to activate in an random manner. They use the trick that random sequences of actions can be approximated using Poisson clocks with mean The key is:
After each action, a particle then computes another random time drawn from the same distribution and executes again after that amount of time has elapsed. The exponential distribution is unique in that, if particle has just activated, it is equally likely that any particle will be the next particle to activate, including particle . Moreover, the particles update without requiring knowledge of any of the other particles’ clocks. Similar Poisson clocks are commonly used to describe physical systems that perform updates in parallel in continuous time.
After looking at their paper in some depth, we find the result that local independent particles can actually work together to solve a global problem remains intriguing. Yes there are many such results but they usually have global clocks and other assumptions. The fact that compression is achievable by a weaker model is very neat.
Large Numbers in Computing source |
Wilhelm Ackermann was a mathematician best known for work in constructive aspects of logic. The Ackermann function is named after him. It is used both in complexity theory and in data structure theory. That is a pretty neat combination.
I would like today to talk about a proof of the famous Halting Problem.
This term at Georgia Tech I am teaching CS4510, which is the introduction to complexity theory. We usually study general Turing machines and then use the famous Cantor Diagonal method to show that the Halting Problem is not computable. My students over the years have always had trouble with this proof. We have discussed this method multiple times: see here and here and here and in motion pictures here.
This leads always to the question, what really is a proof? The formal answer is that it is a derivation of a theorem statement in a sound and appropriate system of logic. But as reflected in our last two posts, such a proof might not help human understanding. The original meaning of “proof” in Latin was the same as “probe”—to test and explore. I mean “proof of the Halting Problem” in this sense. We think the best proofs are those that show a relationship between concepts that one might not have thought to juxtapose.
The question is how best to convince students that there is no way to compute a halting function. We can define Turing machines in a particular way—or define other kinds of machines. Then we get the particular definition
How can we prove that is not computable? We want to convey not only that this particular is uncomputable, but also that no function like it is computable.
Trying the diagonal method means first defining the set
We need to have already defined what “accept” means. OK, we show that there is no machine whose set of accepted strings equals . Then what? We can say that the complementary language is not decidable, but we still need another step to conclude that is uncomputable. And when you trace back the reason, you have to fall back on the diagonal contradiction—which feels disconnected and ad hoc to the particular way and are defined.
Ken in his classes goes the route first, but Sipser’s and several other common textbooks try to hit directly. The targeted reason is one that anyone can grab:
It is impossible for a function to give the value —or any greater value.
Implementations of this, however, resort to double loops to define . Or like Sipser’s they embed the “” idea in the proof anyway, which strikes us as making it harder than doing separate steps as above. We want the cleanest way.
Here is the plan. As usual we need to say that represents a computation. If the computation halts then it returns a result. We allow the machine to return an integer, not just accept or reject. If the machine does not halt then we can let this value be undefined; our point will be that by “short-circuit” reasoning the question of an undefined value won’t even enter.
Now let be defined as the halting function above.
Theorem 1 The function is not computable.
Proof: Define the function as follows:
Suppose that is computable. Then so is . This is easy to see: just do the summation and when computing compute the part first. If it is then it adds nothing to the summation, so it “short-circuits” and we move on to the next . If it is then we compute and add that to the summation. Let stand for the summation before the last term; then .
Now if the theorem is false, then there must be some such that the machine computes . But then
This is impossible and so the theorem follows.
What Ken and I are really after is relating this to hierarchies in complexity classes. When the are machines of a complexity class then the functions and are computable. It follows that is not computed by any and so does not belong to . What we want is to find similar functions that are natural.
Ackermann’s famous function does this when is the class of primitive recursive functions. There are various ways to define the machines , for instance by programs with counted loops only. The that tumbles out is not primitive recursive—indeed it out-grows all primitive recursive functions. Showing that does likewise takes a little more formal work.
In complexity theory we have various time and space hierarchy theorems, say where is . For any time constructible , we can separate from by a “slowed” diagonalization. The obtained this way, however, needs knowledge of and its constructibility to define it. By further “padding” and “translation” steps, one can laboriously make it work for , for any fixed , and a similar theorem for deterministic space needs no log factors at all. This is all technical in texts and lectures.
Suppose we’re happy with , that is, with a non-“tight” hierarchy. Can we simply find a natural that works for ? Or suppose is a combined time and space class, say machines that run in time and space simultaneously. Can we possibly get a natural that is different from what we get by considering time or space separately?
We’d like the “non-tight” proofs to be simple enough to combine with the above proof for halting. This leads into another change we’d like to see. Most textbooks define computability several chapters ahead of complexity, so the latter feels like a completely different topic. Why should this be so? It is easy to define the length and space usage of a computation in the same breath. Even when finite automata are included in the syllabus, why not present them as special cases of Turing machines and say they run in linear time, indeed time ?
Is the above Halting Problem proof clearer than the usual ones? Or is it harder to follow?
What suggestions would you make for updating and tightening theory courses? Note some discussion in the comments to two other recent posts.
[some word fixes]
Marijn Heule, Oliver Kullmann, and Victor Marek are experts in practical SAT solvers. They recently used this ability to solve a longstanding problem popularized by Ron Graham.
Today Ken and I want to discuss their work and ask a question about its ramifications.
The paper by them—we will call it and them HKM—is titled, “Solving and Verifying the boolean Pythagorean Triples problem via Cube-and-Conquer.” The triples problem is a “Ramsey”-like question raised years ago by Graham. Cube-and-Conquer is a method for solving large and complex SAT problems. Sandwiched in between is a clever new tuning of resolution SAT methods called “DRAT” which we discuss in some detail.
Ron was interested in a problem that generalizes Schur’s Theorem, due to Issai Schur. Suppose we color the numbers red and green. Can we always find three distinct numbers of the same color so that
Schur’s theorem says that provided is large enough this is true. Note that another way of putting this is that with , all elements of the set of nonempty sums from are the same color. Several mathematicians independently proved the extension that there are arbitrarily large sets with this property—indeed for any number of colors:
All the sums of course are linear. What happens if we go to higher powers ?
If we simply look at -th powers of sums from then we tie into the same theorem via for all . Taking sums of -th powers such as is different. We can map that case into the simple sums problem with in place of , but it is not clear how to argue similarly with mapped colorings . Sets of the form are special.
We can make them even more special by requiring to be a perfect -th power too. OK, for we are kidding, but the case and is the famous one of Pythagorean triples. Suppose we color the numbers red and green. Can we always find three distinct numbers of the same color so that
This is the question Ron asked. In the spirit of Paul Erdős, he offered $100 for a solution.
The answer from HKM is that this extension is true. Perhaps that is not too surprising, since many problems can be generalized from from linear to non-linear cases. But what is really perhaps the most interesting part is that HKM found a proof via using SAT solvers.
The exact theorem HKM prove is:
Theorem 1 The set can be partitioned into two parts, such that no part contains a Pythagorean triple, while this is impossible for .
Note, this shows that Schur’s theorem does extend from to .
What is special about ? According to this list of triples up to , which is linked under “Integer Lists” from Douglas Butler’s TSM Resources site, there are seven Pythagorean triples involving :
There are five distinct entries for , various others before it, and quite a few for after it. Nothing, however, shouts why is a barrier. It seems better to think of it as a tipping point.
There are colorings for the numbers up to . This immediately stops any simple brute-force approach. What must be done is to break the immense number of cases down to a more manageable number. HKM did this by clever use of known SAT methods with at the addition of heuristics that are tailored to this question.
The previous best positive result had been a 2-coloring of with no monochromatic triple, so HKM had a good idea of how large an to try. The SAT encoding is simple: use variables for , and for every such that , include the clauses
If we give the meaning that the number is colored green, then this says that for every Pythagorean triple, at least one member must be green and at least one must be red. 3SAT remains NP-complete for clauses of all-equal sign, as follows by tweaking the proof at the end of this post.
Now with , what HKM needed to do was to prove the formula unsatisfiable. Proving satisfiability is easy when you know or guess a satisfying assignment—in this case, a coloring. The following graphic from the Nature article on their work shows a coloring for in which the white squares are “don’t-cares”—they can be either color:
The top row goes 24 squares; the cell after them is the sticking point. How to prove there is no consistent way to color it? Given the formula , it may be hard to recognize that it entails a contradiction. The general idea, roughly speaking, is to add more clauses to make a formula so that:
Besides good guesses for , HKM were armed with the latest knowledge on well-performing heuristics. A 2012 paper by Matti Järvisalo with Heule and Armin Biere includes an overview of resolution-related properties involving a big array of acronyms such as AT and HBC and RHT. The AT stands for “asymmetric tautology” and the ‘R’ prefix applied to a formula property enlarges by adding cases where a certain kind of resolution yields a formula with . Combining these two yields the following definition—we paraphrase the newer paper’s version informally:
Definition 2 Given a formula and a clause not in , say has RAT via a literal in if for all clauses of containing the following happens: when you make the other literals in and false, remove , and simplify, you get an immediate contradiction.
We should say more about “simplify”: Suppose are those other literals. Making them false is the same as making the formula
which has their negations as unit clauses. We simplify by removing for each all clauses with (those were satisfied) and deleting from other clauses. After doing this, there may be other unit clauses, whereupon we repeat. If we get both and for some variable , that’s the immediate contradiction we seek. What’s important is that this unit resolution process, while “logically” inferior to full resolution, stops in polynomial time.
Now suppose is satisfiable and has RAT via . If there is a satisfying assignment of that sets or one of the other literals in true, then is also satisfied. So suppose sets them all false. Now there must exist a clause in containing such that sets the other literals in that clause false—if none then we could have set true after all. Then above is satisfied by , but this is a contradiction of the (immediate) contradiction.
Note also that if is satisfiable then of course so is . This isn’t important to the unsatisfiability proof but is good to know: RAT clauses can be added freely. The trick is to find them. Definition 2 was crafted to make it polynomial-time recognizable that is RAT when you have it, but you still have to find it. A particularly adept choice of may allow simplifications that delete other clauses, yielding a technique called DRAT for “Deletion Resolution Asymmetric Tautology” proofs.
This is where the other ingenious heuristics—tailored for the triples problem but following a general paradigm called “cube-and-conquer”—come in. We’ll refer those details to the paper and its references, but this breakthrough should make one excited to read more about the state of the art.
The problem took “only” two days of computing on a supercomputer—the Texas Advanced Computing Center. The computation generated 200 terabytes of raw text output. It is not clear to us whether even more intermediate text was generated on-the-fly as unsuccessful moves were backtracked-out, or how much. HKM say in their abstract:
…Due to the general interest in this mathematical problem, our result requires a formal proof. Exploiting recent progress in unsatisfiability proofs of SAT solvers, we produced and verified a proof in the DRAT format, which is almost 200 terabytes in size. From this we extracted and made available a compressed certificate of 68 gigabytes, that allows anyone to reconstruct the DRAT proof for checking.
As with all computer proofs we still would like a human-readable proof. It is not that we do not trust the validity of the current proof, but rather that we would like to “understand” if possible why Ron’s problem is answered. Can we possibly extract from the certificate a dart of reasoning that yields a shorter explanation? It might be a numerical potential function whose values in this case are guessable and verifiable, such that for some threshold analytically implies unsatisfiability.
We also wonder why the size- formulas treated here should be any more difficult than ones you can get for factoring -bit numbers. As we noted above, the all-signs-equal condition on the literals comes without loss of generality. So the degree of ease that allowed solving on a university center in two days must come from how the Pythagorean pattern gave a leg up to “cube-and-conquer.” For factoring there might be other legs—and the “” from current security standards might yield even smaller formulas.
Last, as noted in the papers we’ve linked, the DART condition has universality properties with regard to resolution in general, yet builds on steps that are in polynomial-time . It was an incremental liberalization of previously used steps, and this makes us wonder whether it can be enhanced further while still yielding proofs that take up length and time. Perhaps we can get the part to be ? That would refute some forms of the “Exponential Time Hypothesis,” which we last discussed here.
The most immediate questions raised by this wonderful work are: what about other equations, and what about allowing more colors? Does having three colors zoom the problem beyond any hope of attack by today’s computers, or will the practical breakthroughs continue a virtuous cycle with advances in theory that bring more cases into the realm of feasibility? Is there an asymptotic analysis that might guide our ability to forecast this?
[fixed typo in SAT encoding and struck “not” between “does” and “extend”]
Wikimedia Commons source |
Nicholas Saunderson was the fourth Lucasian Professor at Cambridge, two after Isaac Newton. He promoted Newton’s Principia Mathematica in the Cambridge curriculum but channeled his original work into lecture notes and treatises rather than published papers. After his death, most of his work was collected into one book, The Elements of Algebra in Ten Books, whose title recalls Euclid’s Elements. It includes what is often credited as the first “extended” version of Euclid’s algorithm.
Today we raise the idea of using algorithms such as this as the basis for proofs.
Saunderson was blind from age one. He built a machine for doing what he called “Palpable Arithmetic” by touch. As described in the same book, it was an enhanced abacus—not a machine for automated calculation of the kind a later Lucasian professor, Charles Babbage, attempted to build.
We take the “palpable” idea metaphorically. Not only beginning students but we ourselves still find proofs by contradiction or “infinite descent” hard to pick up at first reading. We wonder how far mathematics can be developed so that the hard nubs of proofs are sheathed in assertions about the availability and correctness of algorithms. The algorithm’s proof may still involve contradiction, but there’s a difference: You can interact with an algorithm. It is hard to interact with a contradiction.
It was known long before Euclid that the square root of 2 is irrational. In the terms Saunderson used, the diagonal of a square is “incommensurable” with its side.
Alexander Bogomolny’s great educational website Cut the Knot has an entire section on proofs. Its coverage of the irrationality of itemizes twenty-eight proofs. All seem to rely on some type of infinite descent: if there is a rational solution then there is a smaller one and so on. Or they involve a contradiction of a supposition whose introduction seems perfunctory rather than concrete. We gave a proof by induction in a post some years ago, where we also noted a MathOverflow thread and a discussion by Tim Gowers about this example.
We suspect that one reason the proof of this simple fact is considered hard for a newcomer is just that it uses these kinds of descent and suppositions. Certainly the fact itself was considered veiled in antiquity. According to legend the followers of Pythagoras treated it as an official secret and murdered Hippasus of Metapontum for the crime of divulging it. To state it truly without fear today, we still want a clear view of why the square root of 2 is irrational.
Our suggestion below is to avoid the descent completely. Of course it is used somewhere, but it is encapsulated in another result. The result is that for any co-prime integers and there are integers and such that
The and are given by the extended Euclidean algorithm. Incidentally, this was noted earlier by the French mathematician Claude Bachet de Méziriac—see this review—while Saunderson ascribed the general method to his late colleague Roger Cotes two pages before his chapter “Of Incommensurables” (in Book V) where he laid out full details.
Here is the closest classical proof we could find to our aims. We quote the source verbatim (including “it’s” not “its”) and will reveal it at the end.
Proposition 15. If there be any whole number, as , whose square root cannot be expressed by any other whole number; I say then that neither can it be expressed by any fraction whatever.
For if possible, let the square root of be expressed by a fraction which when reduced to it’s least integral terms is , that is, let , then we shall have ; but the fraction is in it’s least terms, by the third corollary to the twelfth proposition, because the fraction was so; and the fraction is in it’s least terms, because 1 cannot be further reduced; therefore we have two equal fractions and both in their least terms; therefore by the tenth proposition, these two fractions must not only be equal in their values, but in their terms also, that is, must be equal to , and to 1: but cannot be equal to , because is a whole number by the supposition, and is supposed to admit of no whole number for its root; therefore the square root of cannot possibly be expressed by any fraction whatever. Q.E.D.
The cited propositions are that two fractions in lowest whole-number terms must be identical and that if and are co-prime to then so is . The proof of the latter starts with the for-contradiction words “if this be denied,” so the absence of such language above gets only part credit. This all does not come trippingly off the tongue; rather it sticks trippingly in the throat. Let’s try again.
In fact, we don’t need the concepts of “lowest terms” or co-primality or the full statement of the identity named for Étienne Bézout. It suffices to assert that for any whole numbers and , there are integers and such that the number
divides both and . This is what the extended Euclidean algorithm gives you.
Then for the proof, suppose that for integers and . We take and , and let , be the resulting integers. Now let’s do some simple algebra:
It follows that
Now divide both sides of this by . We get
The conclusion is that , hence . Thus . But also divides . So is an integer—the same end as the classical proof.
This is a contradiction. But it is a palpable contradiction. For instance, of course we can see that isn’t an integer. Thus we claim that the effect of this proof is more concrete.
Is this a new proof? We doubt it. But the proof is nice in that it avoids any recursion or induction. The essential point—the divisibility of into and —is coded into the Euclidean algorithm.
Is ours at least smoother than the classical proof we quoted? The latter is from Saunderson’s book, on pages 304–305 which come soon after his presentation of the algorithm on pages 295–298.
What other proofs can benefit from similar treatment by “reduction to algorithms”?
[fixed missing n in last line of proof, some word tweaks]
Peter Landweber, Emanuel Lazar, and Neel Patel are mathematicians. I have never worked with Peter Landweber, but have written papers with Larry and Laura Landweber. Perhaps I can add Peter one day.
Today I want to report on a recent result on the fiber structure of continuous maps.
The paper by Landweber, Lazar, and Patel (LLP) is titled, “On The Fiber Diameter Of Continuous Maps.” Pardon me, but I assume that some of you may not be familiar with the fiber of a map. Fiber has nothing to do with the content of food or diets, for example. Fibers are a basic property of a map.
Their title does not give away any suggestion that their result is relevant to those studying data sets. Indeed even their full abstract only says at the end:
Applications to data analysis are considered.
I just became aware of their result from reading a recent Math Monthly issue. The paper has a number of interesting results—all with some connection to data analytics. I must add that I had not seen it earlier because of a recent move, and the subsequent lack of getting US mail. Moves are disruptive—Bob Floyd used to tell me that “two moves equal a fire”—and I’ve just moved twice. Oh well.
The fiber of a map at is the set of points so that . The diameter of a fiber is just what you would expect: the maximum distance of the points in the fiber. LLP prove this—they say they have a “surprisingly short proof” and give earlier sources for it at the end of their paper:
Theorem: Let be a continuous function where . Then for any , there exists whose fiber has diameter greater than .
The following figure from their paper conveys the essence of the proof in the case :
For one might expect a difficult dimension-based agument. However, they leverage whatever difficult reasoning went into the following theorem by Karol Borsuk and Stanislaw Ulam. We have mentioned both of them multiple times on this blog but never this theorem:
Theorem: Let be any continuous function from the -sphere to . Then there are antipodal points that give the same value, i.e., some on the sphere such that .
The proof then simply observes that -spheres of radius live inside for any , and arbitrarily large . The antipodal points belong to the same fiber of but are apart.
Why should we care about this theorem? That’s a good question.
One of the main ideas in analytics is to reduce the dimension of a set of data. If we let the data lie in a Euclidean space, say , then we may wish to map the data down to a space of lower dimension. This yields lots of obvious advantages—the crux is that we can do many computational things on lower-dimensional data that would be too expensive on the original -dimensional space.
The LLP result shows that no matter what the mapping is, as long as it is continuous, there must be points that are far apart in the original space and yet get map to the exactly same point in the lower space. This is somewhat annoying: clearly it means there will always be points that the map does not classify correctly.
One of the issues I think raised by this work on LLP is that within areas like big-data people can work on it from many angles. I think that we do not always see results from another area as related to our work. I believe that many people in analytics are probably surprised by this result, and I would guess that they may have not known about the result previously. This phenomenon seems to be getting worse as more researchers work on similar areas, but come at the problems with different viewpoints.
Can we do a better job at linking different areas of research? Finally, with respect, this seems like a result that could have been proved decades ago? Perhaps one of the great consequences of new areas like big data is to raise questions that were not thought about previously.
[fixed typo R^m, corrected picture of Landweber, added note on sources for main theorem]
Noam Chomsky is famous for many many things. He has had a lot to say over his long career, and he wrote over 100 books on topics from linguistics to war and politics.
Today I focus on work that he pioneered sixty years ago.
Yes sixty years ago. The work is usually called the Chomsky hierarchy(CH) and is a hierarchy of classes of formal grammars. It was described by Noam Chomsky in 1956 driven by his interest in linguistics, not war and politics. Some add Marcel-Paul Schützenberger’s name to the hierachry. He played a crucial role in the early development of the theory of formal languages—see his joint
paper with Chomsky from 1962.
We probably all know about this hierarchy. Recall grammars define languages:
One neat thing about this hierarchy is that it has long been known to be strict: each class is more powerful than the previous one. Each proof that the next class is more powerful is really a beautiful result. Do you know, offhand, how to prove each one?
I have a simple question:
Should we still teach the CH today?
Before discussing let me explain some about grammars.
In the 1950’s people started to define various programming languages. It quickly became clear that if they wanted to be precise they needed some formal method to define their languages. The formalism of context-free grammars of Noam Chomsky was well suited for at least defining the syntax of their languages—semantics were left to “English,” but at least the syntax would be well defined.
Another milestone in the late 1950s was the publication, by a committee of American and European computer scientists, of “a new language for algorithms”; the ALGOL 60 Report (the “ALGOrithmic Language”). This report consolidated many ideas circulating at the time and featured several key language innovations: Perhaps one of the most useful was a mathematically exact notation, Backus-Naur Form (BNF), that was used to describe their grammar. It is not more expressive than context-free grammars, but is more user friendly and variants of still it are still used today.
I must add a story about the power of defining the syntax of a language precisely. Jeff Ullman moved from Princeton to Stanford many years ago in 1979. I must thank him, since his senior position was the one that I received in 1980. Jeff was a prolific writer of textbooks already then and used an old method from Bell Labs, TROFF, to write his books. On arrival at Stanford he told me that he wanted to try out the then new system that Don Knuth had just created in 1978—of course that was the TeX system. Jeff tried the system out and liked it. But then he asked for the formal syntax description, since he wanted to be sure what the TeX language was. He asked and the answer from Knuth was:
There is no formal description. None.
Jeff was shocked. After all Knuth had done seminal work on context-free grammars and was well versed in formal grammars—for example Knuth invented the LR parser (Left to Right, Rightmost derivation). TeX was at the time only defined by what Knuth’s program accepted as legal.
Let’s return to my question: Should we still teach the CH today?
It is beautiful work. I specially think the connection between context-free languages and pushdown automata is wonderful, non-obvious, and quite useful. Context free and pushdown automata led to Steve Cook’s beautiful work on two-way deterministic pushdown automata (2DPDA). He showed they could be simulated in linear time on a random-access machine.
This insight was utilized by Knuth to find a linear-time solution for the left-to-right pattern-matching problem, which can easily be expressed as a 2DPDA:
This was the first time in Knuth’s experience that automata theory had taught him how to solve a real programming problem better than he could solve it before.
The work was finally written up and published together with Vaughan Pratt and James Morris several years later.
And of course context sensitive languages led to the LBA problem. This really was the question whether nondeterministic space is closed under complement. See our discussion here.
Should I teach the old CH material? Or leave it out and teach more modern results? What do you think? Do results have a “teach-by-date?”
[Fixed paper link]
Non-technical fact-check source |
Dan Brown is the bestselling author of the novel The Da Vinci Code. His most recent bestseller, published in 2013, is Inferno. Like two of his earlier blockbusters it has been made into a movie. It stars Tom Hanks and Felicity Jones and is slated for release on October 28.
Today I want to talk about a curious aspect of the book Inferno, since it raises an interesting mathematical question.
Brown’s books are famous for their themes: cryptography, keys, symbols, codes, and conspiracy theories. The first four of these have a distinctive flavor of our field. Although we avoid the last in our work, it is easy to think of possible conspiracies that involve computational theory. How about these: certain groups already can factor large numbers, certain groups have real quantum computers, certain groups have trapdoors in cryptocurrencies, or …
The book has been out for awhile, but I only tried to read it the other day. It was tough to finish so I jumped to the end where the “secret” was exposed. Brown’s works have sold countless copies and yet have been attacked as being poorly written. He must be doing something very right. His prose may not be magical—whose is?—but his plots and the use of his themes usually makes for a terrific “cannot put down” book.
Well I put it down. But I must be the exception. If you haven’t read the book and wish to do so without “spoilers” then you can put down this column.
The Inferno is about the release of a powerful virus that changes the world. Before I go into the mathematical issues this virus raises I must point out that Brown’s work has often been criticized for making scientific errors and overstepping the bounds of “plausible suspension of disbelief.” I think it is a great honor—really—that so many posts and discussions are around mistakes that he has made. Clearly there is huge interest in his books.
Examples of such criticism of The Inferno have addressed the DNA science involved, the kind of virus used, the hows of genetic engineering and virus detection, and the population projections, some of which we get into below. There is also an entire book about Brown’s novel, Secrets of Inferno
However, none of these seems to address a simple point that we hadn’t found anywhere, until Ken noticed it raised here on the often-helpful FourmiLab site maintained by the popular science writer John Walker. It appears when you click “Show Spoilers” on that page, so again you may stop reading if you don’t wish to know.
How does the virus work? The goal of the virus is to stop population explosion.
The book hints that it is airborne, so we may assume that everyone in the world is infected by it—all women in particular. Brown says that 1/3 are made infertile. There are two ways to think about this statement. It depends on the exact definition of the mechanism causing infertility.
The first way is that when you get infected by the virus a coin is flipped and with probability 1/3 you are unable to have children. That is, when the virus attacks your original DNA there is a 1/3 chance the altered genes render you infertile. In the 2/3-case that the virus embeds in a way that does not cause infertility, that gets passed on to children and there is no further effect. In the 1/3-case that the alteration causes infertility, that property too gets passed on. Except, that is, for the issue in this famous quote:
Having Children Is Hereditary: If Your Parents Didn’t Have Any, Then You Probably Won’t Either.
Thus the effect “dies out” almost immediately; it would necessarily be just one-shot on the current generation.
The second way is that the virus allows the initial receiver to be fertile but has its effect when (female) children are born. In one third of cases the woman becomes infertile, and otherwise is able to have children when she grows up.
In this case the effect seems to work as claimed in the book. Children all get the virus and it keeps flipping coins forever. Walker still isn’t sure—we won’t reveal here the words he hides but you can find them. In any event, the point remains that this would become a much more complex virus. And Brown does not explain this point in his book—at least I am unsure if he even sees the necessary distinctions.
The other discussions focus on issues like how society would react to this reduction in fertility. Except for part of one we noted above, however, none seems to address the novel’s mathematical presumptions.
The purpose of the virus is to reduce the growth rate in the world’s population. By how much is not clear in the book. The over-arching issue is that it is hard to find conditions under which the projection of the effect is stable.
For example, suppose we can divide time into discrete units of generations so that the world population of women after generations follows the exponential growth curve . Ignoring the natural rate of infertility and male-female imbalance and other factors for simplicity, this envisions women having female children on average. The intent seems to be to replace this with women having female children each, for in the next generation. This means multiplying by , so
becomes the new curve. The problem is that this tends to zero unless , whereas the estimates of that you can get from tables such as this are uniformly lower at least since 2000.
The point is that the blunt “1/3” factor of the virus is thinking only in such simplistic terms about “exponential growth”—yet in the same terms there is no region of stability. Either growth remains exponential or humanity crashes. Maybe the latter possibility is implicit in the dark allusions to Dante Alighieri’s Inferno that permeate the plot.
In reality, as our source points out, it would not take much for humanity to compensate. If a generation is 30 years and we are missing 33% of women, then what’s needed is for just over 3% of the remaining women to change their minds about not having a child in any given year. We don’t want to trivialize the effect of infertility, but there is much more to adaptability than the book’s tenet presumes.
Have you read the book? What do you think about the math?
]]>