Edgar Daylight was trained both as a computer scientist and as a historian. He writes a historical blog themed for his near-namesake Edsger Dijkstra, titled, “Dijkstra’s Rallying Cry for Generalization.” He is a co-author with Don Knuth of the 2014 book: Algorithmic Barriers Failing: P=NP?, which consists of a series of interviews of Knuth, extending their first book in 2013.
Today I wish to talk about this book, focusing on one aspect.
The book is essentially a conversation between Knuth and Daylight that ranges over Knuth’s many contributions and his many insights.
One of the most revealing discussions, in my opinion, is Knuth’s discussion of his view of asymptotic analysis. Let’s turn and look at that next.
We all know what asymptotic analysis is: Given an algorithm, determine how many operations the algorithm uses in worst case. For example, the naïve matrix product of square by matrices runs in time . Knuth dislikes the use of notation, which he thinks is used often to hide important information.
For example, the correct the count of operations for matrix product is actually
In general Knuth suggests that we determine, if possible, the number of operations as
where and are both explicit functions and is lower-order. The idea is that not only does this indicate more precisely that the number of operations is , not just , but also is forces us to give the exact constant hiding under the . If the constant is only approached as increases, perhaps the difference can be hidden inside the lower-order term.
An example from the book (page 29) is a discussion of Tony Hoare’s quicksort algorithm. Its running time is , on average. This allows one, as Knuth says, to throw all the details away, including the exact machine model. He goes on to say that he prefers to know:
that quicksort makes comparisons, on average, and exchanges, and stack adjustments, when sorting random numbers.
Theorists create algorithms as one of their favorite activities. A classic way to get a paper accepted into a top conference is to say: In this paper we improve the running time of the best known algorithm for X from order to by applying methods Y.
But is the algorithm of this paper really new? One possibility is that the analysis of the previous paper was too coarse and the algorithms are actually the same. Or at least equivalent. The above information is logically insufficient to rule out this possibility.
Asymptotic analysis à-la Knuth comes to the rescue. Suppose that we proved that the older algorithm X ran in time
Then we would be able to conclude—without any doubt—that the new algorithm was indeed new. Knuth points this out in the interviews, and adds a comment about practice. Of course losing the logarithmic factor may not yield a better running time in practice, if the hidden constant in is huge. But whatever the constant is, the new algorithm must be new. It must contain some new idea.
This is quite a nice use of analysis of algorithms in my opinion. Knowing that an algorithm contains, for certain, some new idea, may lead to further insights. It may eventually even lead to an algorithm that is better both in theory and in practice.
Daylight’s book is a delight—a pun? As always Knuth has lots to say, and lots of interesting insights. The one caveat about the book is the subtitle: “P=NP?” I wish Knuth had added more comments about this great problem. He does comment on the early history of the problem: for example, explaining how Dick Karp came down to Stanford to talk about his brilliant new paper, and other comments have been preserved in a “Twenty Questions” session from last May. Knuth also reminds us in the book that as reported in the January 1973 issue of SIGACT News, Manny Blum gave odds of 100:1 in a bet with Mike Paterson that P and NP are not equal.
[fixed picture glitch at top]
Neil L. is a Leprechaun. He has visited me every St. Patrick’s Day since I began the blog in 2009. In fact he visited me every St. Patrick’s Day before then, but I never talked about him. Sometimes he comes after midnight the night before, or falls asleep on my sofa waiting for me to rise. But this time there was no sign of him as I came back from a long day of teaching and meetings and went out again for errands.
Today Ken and I wish you all a Happy St. Patrick’s Day, and I am glad to report that Neil did find me.
When I came back I was sorting papers and didn’t see him. I didn’t know he was there until I heard,
Top o’ the evening to ye.
Neil continued as he puffed out some green smoke: “I had some trouble finding you this year. Finally got where you were—good friends at your mobile provider helped me out.” I was surprised, and told him he must be kidding. He answered, “Of course I always can find you, just having some fun wi’ ye.” Yes I agreed and added that I was staying elsewhere. He puffed again and said “yes I understand.”
I said I had a challenge for him, a tough challenge, and asked if he was up for it. He said, “Hmmm, I do not owe you any wishes, but a challenge… Yes I will accept a challenge from ye, any challenge that ye can dream up.” He laughed, and added, “we leprechauns have not lost a challenge to a man for centuries. I did have a cousin once who messed up.”
I asked if he would share his cousin’s story, and he nodded yes. “‘Tis a sad story. My cousin was made a fool of once, a terrible black mark on our family. Why, we were restricted from any St Patrick Day fun for a hundred years. Too long a punishment in our opinion—the usual is only a few decades. Do ye want to know what my cousin did? Or just move on to the challenge? My time is valuable.”
I nodded sympathetically, so he carried on.
“One fine October day in Dublin me cousin was sitting under a bridge—under the lower arch where a canalside path went.
“He spied a gent walking with his wife along the path but lost in thought and completely ignoring her. He thought the chap would be a great mark for a trick but forgot the woman. She spied him and locked on him with laser eyes and of course he was caught—he could not run unless she looked away.
“He tried to ply her with a gold coin but she knew her leprechaun lore and was ruthless. He resigned himself to granting wishes but she would not have that either. With her stare still fixed she took off her right glove, plucked a shamrock, and lay both at his feet for a challenge. A woman had never thrown a challenge before, and there was not in the lore a provision for return-challenging a woman. So my cousin had to accept her challenge. It came with intense eyes:
“I challenge you to tell the answer to what is vexing and estranging my husband.”
“Aye,” Neil sighed, “you or I or any lad in the face of such female determination would be reduced to gibberish, and that is what me cousin blurted out:
“The gent looked up like the scales had fallen from his eyes, and he embraced his wife. This broke the stare, and my cousin vanished in great relief. And did the gent show his gratitude? Nay—he even carved that line on the bridge but gave no credit to my cousin.”
I clucked in sympathy, and Neil seemed to like that. He put down his pipe and gave me a look that seemed to return comradeship. Then I understood who the “cousin” was. Not waiting to register my understanding, he invited my challenge as a peer.
I had in fact prepared my challenge last night—it was programmed by a student in my graduate advanced course using a big-integer package. Burned onto a DVD was a Blum integer of one trillion bits. I pulled it out of its sleeve and challenged Neil to factor it. The shiny side flashed a rainbow, and I joked there could really be a pot of gold at the end of it.
Neil took one puff and pushed the DVD—I couldn’t tell how—into my MacBook Air. The screen flashed green and before I could say “Jack Robinson” my FileZilla window opened. Neil blew mirthful puffs as the progress bar crawled across. A few minutes later came e-mail back from my student, “Yes.”
I exclaimed, “Ha—you did it—but the point isn’t that you did it. The point is, it’s doable. You proved that factoring is easy. Could be quantum or classical but whatever—it’s practical.”
Neil puffed and laughed as he handed me back the suddenly-reappeared disk and said, “Aye, do ye really think I would let your lot fool me twice?”
I replied, “Fool what? You did it—that proves it.”
“Nay,” he said, “indeed I did it—I cannot lie—but ye can’t know how I did it enough to tell whether a non-leprechaun can do it. And a computer that ye build—be it quantum or classical or whatever—is a non-leprechaun.”
It hit me that a quantum computer that cannot be built is a leprechaun, and perhaps Peter Shor’s factoring algorithm only runs on those. But I wasn’t going to be distracted away from my victory.
“How can it matter whether a leprechaun does it?” Neil retorted that he didn’t have to answer a further challenge, “it’s not like having three wishes, you know.” But he continued, “since ye are a friend, I will tell ye three ways it could be, and you can choose one ye like but know ye: it could still be a fourth way.
“And I left ye a factor, but your student already had it, so I left ye no net knowledge at all.” And with a puff of smoke, he was gone.
Did I learn anything from the one-time factoring of my number? Happy St. Patrick’s Day anyway.
[moved part of dialogue at end from 2. to 1.]
source |
Larry Shaw apparently created the concept of Pi Day in 1988. He was then a physicist who worked at the San Francisco Exploratorium. He and his colleagues initially celebrated by marching around in circles, and then eating pies—that is fruit pies. As Homer Simpson would say: hmm.
Today Ken and I want to add to some of the fun of Pi Day, and come back to a different Pi that has occupied us.
See here and here for many of the more exotic celebrations that revolve around Pi Day. Of course Pi Day, or day, is based on the famous number
The extra excitement is that this year the date is 3/14/15. This uses the American “month first” convention. The international “date first” convention would go 31.4.15 but April does not have 31 days. The “year first” convention will have to wait another 1,126 years to hit 3141-5-9, and even so they pad with 0’s which makes it impossible. So month-first it has to be. We can get more digits by raising a toast at exactly 9:26:53 in the morning or evening.
Recall can be defined in many ways. One is that it is the reciprocal of the number
This is due to Srinivasa Ramanujan.
The classic definition is based on the fact that the circumference of a circle is a constant function of its radius, and half of that constant is . This must be proved and is not trivial. It was known to Euclid—see here for a detailed discussion.
The symbol Π is like
One is a capital and one is a product symbol: can you tell which is which? Perhaps you can but they are pretty close, and their derivation is the same. So let’s talk about the product symbol during this wonderful Pi Day.
The product is quite basic to mathematics and theory too. We note that
can be used to define various important functions. But the simplest such function is already an important open problem. That’s right: simply multiplying two integers, .
What is the complexity of integer product? There are multiple views on multiplication, going back to when it was first formulated as a problem.
In 1960 the great Andrey Kolmogorov conjectured that integer multiplication required quadratic time. That is, the simple high school method was optimal. This conjecture was plausible, especially back then when the field of computer theory was just beginning. So he organized a seminar at Moscow State University to study his conjecture with the hope to prove it. Within a week one of the students, Anatoly Karatsuba, found a clever algorithm that ran in time order-of
for -bit number multiplication.
The idea is to let and write and . Then we have
where and and . The multiplications by and are just bit-shifts to known locations independent of the values of and , so they don’t affect the time much. It is neat that and need just one recursive multiplication of -bit numbers each, but the two such multiplications for would remove all the advantage and still give time.
What Karatsuba noted instead was that
This needs just one more multiplication. End of conjecture. Boom: the conjectured lower bound was wrong. Kolmogorov explained the beautiful result at the next meeting, and then terminated the seminar.
Pretty cool to terminate a seminar—no?
SODA is the premier conference on algorithms, which has the wonderful properties of being held in January and usually at a warm locale.
The SODA view is to optimize the algorithm. The simplest meaning of this is to optimize the asymptotic running time. Andrei Toom, 5 years younger than Karatsuba was, took up the task and in a 1963 paper gave a hierarchy of improved exponents approaching but not reaching 1. The next step, called “Toom-3,” breaks into 3 equal pieces not 2, and by reducing 9 recursive multiplications to 5, achieves running time order-of
Steve Cook streamlined the description and the analysis in his doctoral thesis in 1966 and often gets joint credit.
If you break into pieces the running time is proportional to , whose exponent behaves roughly like and approaches . For any fixed the term is a constant, and one could be tempted to ignore it. However, as Wikipedia’s article avers,
[T]he function unfortunately grows very rapidly.
One can try to improve the algorithm by making depend on in the recursion, but as the article notes in the next sentence, this is an open research problem.
The barrier was broken by Arnold Schönhage and Volker Strassen in 1971. This achieved a running time of the form
with . The constant in the is high but not astronomical, and their algorithm has been programmed to give superior performance to Cook-Toom for over 30,000 bits.
The latest major part of the history is that Martin Fürer in 2007 improved to . This looks like more than but becomes less as grows. Alas the is so high that the algorithm is galactic. Work last year by David Harvey, Joris van der Hoeven, and Grégoire Lecerf improving the constant in the exponent has not changed that.
Fürer’s paper appeared in STOC not SODA, but I have a different picture in my mind when I think of STOC.
STOC is, of course, one of the top theory conferences, which is held in late spring or early summer and where it is held follows a complex secret formula. I have no idea what that formula is, but according to Wikipedia:
STOC is traditionally held in a different location each year.
Well not exactly: what does “different” mean here? STOC has repeated sites, just never at the same site two years in a row. Ken attended STOC in Montreal twice in fairly close succession: 1994 and 2002.
Okay let’s look at multiplication from a STOC view, that is from a more foundational view. We have already discussed the relationship between the cost of multiplication on Turing Machines and the famous conjecture of Juris Hartmanis that for any algebraic irrational number , the first digits of cannot be computed on a multitape Turing machine in time—and could even require time.
Part of the significance of this question is that it impacts the separation between and then integer multiplication is almost linear time. We know they are different, but only by a hair—less than even. The idea is that if and are really that close, then a common alternation method used to check integer multiplication can be simulated deterministically.
The method using alternation is to guess and check it by selecting a prime of about log bits. Then we must check . This is easy if one can compute an bit number mod fast. This can be done by a block structure trick and using alternation again to only have to check that one block is okay. We haven’t worked this all out, but clearly integer multiplication is involved in a fundamental way to complexity class relations. Another use of similar structure ideas is in this paper on checking integer multiplication by Dima Grigoriev and Gérald Tenenbaum.
Ken has found a different way this week to escape to a warm locale, by workshop rather than by conference. There is in fact a new branch of algebra and geometry named for the tropics, in which the binary minimum (or maximum) function plays the role of and replaces multiplication. Then the integer product problem goes away, perhaps.
Ken has a different idea, however. Perhaps we can avoid doing multiplication altogether. We don’t even need to use integers. Let stand for any “decoding” function from strings to integers, which need not be 1-to-1: an integer might have many string codes. The idea is to ask:
Are there linear-time computable binary functions , , and such that for some decoding function —of any complexity or maybe not even computable–and all strings :
The last clause allows us to simulate a comparison function, so that the strings emulate a discretely ordered ring whose operations are all in linear time. Is there such a fish? We don’t care so much about the complexity of since it would only need to be applied once after a plethora of efficient and operations, and in view of might not need to be computed at all.
What is the complexity of ? The operation, not the number.
[fixed radius/circumference in intro]
simple-talk interview source |
Stephen Johnson is one of the world’s top programmers. Top programmers are inherently lazy: they prefer to build tools rather than write code. This led Steve to create some of great software tools that made UNIX so powerful, especially in the “early days.” These included the parser generator named Yacc for “Yet Another Compiler Compiler.”
Today I (Dick) want to talk about another of his tools, called lint. Not an acronym, it really means lint.
Steve was also famous for this saying about an operating system environment for IBM mainframes named TSO which some of us were unlucky enough to need to use:
Using TSO is like kicking a dead whale down the beach.
Hector Garcia-Molina told me a story about using TSO at Princeton years before I arrived there. One day he wrote a program that was submitted to the mainframe. While Hector was waiting for it to run he noticed that it contained a loop that would never stop, and worse the loop had a print statement in it. So the program would run forever and print out junk forever. Yet Hector, because of the nature of TSO, could not kill the program. Hector went to the system people to ask them to kill his program. They answered that they could not kill it until it started to run. Even better: the program would not run until that evening—do not ask why. So they could not kill it. But the evening crew could once it started. So they left a handwritten note to kill Hector’s program later that night. A whale indeed.
Steve’s lint program took your C program, examined it, and flagged code that looked suspicious. The brilliant insight was that lint had no idea what you were really doing, but could say some constructs were likely to be bugs. These were flagged and often lint was right. A beautiful idea.
For example, consider the following simple C fragment:
while (x = y)
{
...
}
This is legal C code. But, it is most likely an error. The programmer probably meant to write:
while (x == y)
{
...
}
Recall that in C the test for equality is x == y while x = y is the assignment of y to x. The former could be correct yet it is likely a mistake. These are exactly the type of simple things that lint could flag.
The lint program has changed over the years and now there are more powerful tools that can flag suspicious usage in software written in many computer languages. It was originally developed by Steve in 1977 and described in a paper “Lint, a C program checker” (Computer Science Technical Report 65, Bell Laboratories, 1978).
I believe that we could build a lint for math that would do what Steve’s lint did for C code: flag suspicious constructs. Perhaps this already exists—please let me know if it does. But assuming it does not, I think even a tool that could catch very simple mistakes could be quite useful.
There is lots of research on mechanical proof systems. There is lots of interest in proving important theorems in formal languages so they can be checked. See this and this for some examples. Yet the vast majority of math is only checked by people. I think this is fine, even essential, but a lint program that at least caught simple errors would be of great use.
Let me give three types of constructs that it could catch. I assume that our lint would take in a LaTeX file and output warnings.
Unused variables. Consider
The lint program would notice that the variable is never used. Almost surely the intent was to write
Again note: this is not a certainty, since the former is a legal math expression.
Unbound variables. Consider
If there is nothing before to constrain , this is at best poor writing. Does range over all reals, all integers, or just all natural numbers? Again a construct that should be flagged.
Under-constrained variables. Consider the statement,
For some it follows that .
The statement may be technically true when , but for purpose of clear communication it needs a qualifier that . Perhaps the writer wrote that stands for a positive real number some pages earlier—we would not expect lint to pick that up. But we could reasonably ask lint to check for a mention of “” in a previous formula and/or paragraph.
The TextLint applet page hosted by Lukas Renggli with Fabrizio Perin and Jorge Ressia does not flag the unused-variable condition, and evidently does not try to handle the other two situations. It also fails to catch 2^16 which will give not the undoubtedly-intended . This is more a LaTeX syntax issue than the kind of math-semantics error we are gunning for; the programs mentioned here also seem limited to this level.
Does a lint program like this—for general mathematical writing not just LaTeX code—already exist? If not, should we build one?
[added “environment” qualifier to TSO]
Eric Allender, Bireswar Das, Cody Murray, and Ryan Williams have proved new results about problems in the range between and -complete. According to the wide majority view of complexity the range is vast, but it is populated by scant few natural computational problems. Only Factoring, Discrete Logarithm, Graph Isomorphism (GI), and the Minimum Circuit Size Problem (MCSP) regularly get prominent mention. There are related problems like group isomorphism and others in subjects such as lattice-based cryptosystems. We covered many of them some years back.
Today we are delighted to report recent progress on these problems.
MCSP is the problem: given a string of length and a number , is there a Boolean circuit with or fewer wires such that
For of other lengths , , we catenate the values of for the first strings in under the standard order. Since every -ary Boolean function has circuits of size which are encodable in bits, MCSP belongs to with linear witness size.
Several Soviet mathematicians studied MCSP in the late 1950s and 1960s. Leonid Levin is said to have desired to prove it -complete before publishing his work on -completeness. MCSP seemed to stand aloof until Valentine Kabanets and Jin-Yi Cai connected it to Factoring and Discrete Log via the “Natural Proofs” theory of Alexander Razborov and Steven Rudich. Eric and Harry Buhrman and Michal Koucký and Dieter van Melkebeek and Detlef Ronneburger improved their results in a 2006 paper to read:
Theorem 1 Discrete Log is in and Factoring is in .
Now Eric and Bireswar complete the triad of relations to the other intermediate problems:
Theorem 2 Graph Isomorphism is in . Moreover, every promise problem in belongs to as defined for promise problems.
Cody and Ryan show on the other hand that proving -hardness of MCSP under various reductions would entail proving breakthrough lower bounds:
Theorem 3
- If then , so .
- If then .
- If then (so ), and also has circuit lower bounds high enough to de-randomize .
- In any many-one reduction from (let alone ) to , no random-access machine can compute any desired bit of in time.
The last result is significant because it is unconditional, and because most familiar -completeness reductions are local in the sense that one can compute any desired bit of in only time (with random access to ).
The genius of MCSP is that it connects two levels of scaling—input lengths and —in the briefest way. The circuits can have exponential size from the standpoint of . This interplay of scaling is basic to the theory of pseudorandom generators, in terms of conditions under which they can stretch a seed of bits into bits, and to generators of pseudorandom functions .
An issue articulated especially by Cody and Ryan is that reductions to MCSP carry seeds of being self-defeating. The ones we know best how to design involve “gadgets” whose size scales as not . For instance, in a reduction from we tend to design gadgets for individual clauses in the given 3CNF formula —each of which has constant-many variables and encoded size. But if involves only -sized gadgets and the connections between gadgets need only lookup, then when the reduction outputs , the string will be the graph of a -sized circuit. This means that:
The two horns of this dilemma leave little room to make a non-trivial reduction to MCSP. Log-space and reductions are (to different degrees) unable to avoid the problem. The kind of reduction that could avoid it might involve, say, -many clauses per gadget in an indivisible manner. But doing this would seem to require obtaining substantial non-local knowledge about in the first place.
Stronger still, if the reduction is from a polynomially sparse language in place of , then even this last option becomes unavailable. Certain relations among exponential-time classes imply the existence of hard sparse sets in . The hypothesis that MCSP is hard for these sets impacts these relations, for instance yielding the conclusion.
A paradox that at first sight seems stranger emerges when the circuits are allowed oracle gates. Such gates may have any arity and output 1 if and only if the string formed by the inputs belongs to the associated oracle set . For any we can define to be the minimum size problem for such circuits relative to . It might seem axiomatic that when is a powerful oracle such as then should likewise be -complete. However, giving such an oracle makes it easier to have small circuits for meaningful problems. This compresses the above dilemma even more. In a companion paper by Eric with Kabanets and Dhiraj Holden they show that is not complete under logspace reductions, nor even hard for under uniform reductions. More strikingly, they show that if it is hard for under logspace reductions, then .
Nevertheless, when it comes to various flavors of bounded-error randomized Turing reductions, MCSP packs enough hardness to solve Factoring and Discrete Log and GI. We say some more about how this works.
What MCSP does well is efficiently distinguish strings having -sized circuits from the vast majority having no -sized circuits, where . The dense latter set is a good distinguisher between pseudorandom and uniform distributions on . Since one-way functions suffice to construct pseudorandom generators, MCSP turns into an oracle for inverting functions to an extent codified in Eric’s 2006 joint paper:
Theorem 4 Let be a dense language of strings having no -sized circuits, and let be computable in polynomial time with of polynomially-related lengths. Then we can find a polynomial-time probabilistic oracle TM and such that for all and ,
Here is selected uniformly from and is uniform over the random bits of the machine. We have restricted and more than their result requires for ease of discussion.
To attack GI we set things up so that “” and “” represent a graph and a permutation of its vertices, respectively. More precisely “” means a particular adjacency matrix, and we define to mean the adjacency matrix obtained by permuting according to . By Theorem 4, using the MCSP oracle to supply , one obtains and such that for all and -vertex graphs ,
Since is 1-to-1 we can simplify this while also tying “” symbolically to :
Now given an instance of GI via adjacency matrices, do the following for some constant times independent trials:
This algorithm has one-sided error since it will never accept if and are not isomorphic. If they are isomorphic, then arises as with the same distribution over permutations that it arises as , so Equation (1) applies equally well with in place of . Hence finds the correct with probability at least on each trial, yielding the theorem .
The proof for is more detailed but similar in using the above idea. There are many further results in the paper by Cody and Ryan and in the oracle-circuit paper.
These papers also leave a lot of open problems. Perhaps more importantly, they attest that these open problems are attackable. Can any kind of many-one reducibility stricter than reduce every language in to MCSP? Can we simply get from the assumption ? The most interesting holistic aspect is that we know new lower bounds follow if MCSP is easy, and now we know that new lower bounds follow if MCSP is hard. If we assume that MCSP stays intermediate, can we prove lower bounds that combine with the others to yield some non-trivial unconditional result?
[added paper links]
Cropped from source |
Jeff Skiles was the co-pilot on US Airways Flight 1549 from New York’s LaGuardia Airport headed for Charlotte on January 15, 2009. The Airbus A320 lost power in both engines after striking birds at altitude about 850 meters and famously ditched in the Hudson River with no loss of life. As Skiles’s website relates, he had manual charge of the takeoff but upon his losing his instrument panel when the engines failed,
“Captain Chesley Sullenberger took over flying the plane and tipped the nose down to retain airspeed.”
Skiles helped contact nearby airports for emergency landing permission but within 60 seconds Sullenberger and he determined that the Hudson was the only option. His front page does not say he did anything else.
Today we tell some stories about the technical content of forms of emptiness.
I am teaching Buffalo’s undergraduate theory of computation course again. I like to emphasize early on that an alphabet need not be only a set of letter or digit symbols, even though it will be or or similar in nearly all instances. The textbook by Mike Sipser helps by having examples where tokens like “REAR” or “<RESET>” denoting actions are treated as single symbols. An alphabet can be the set of atomic actions from an aircraft or video game console. Some controls such as joysticks may be analog, but their output can be transmitted digitally. What’s important is that any sequence of actions is represented by a string over an appropriately chosen alphabet.
I go on to say that strings over any alphabet can be re-coded over . Or over ASCII or UTF-8 or UNICODE, but those in turn are encoded in 8-bit or 16-bit binary anyway. I say all this justifies flexible thinking in that we can regard as “the” alphabet for theory but can speak in terms of a generic char type for practice. Then in terms of the C++ standard library I write alphabet = set<char>, string = list<char>, language = set<string>, and class = set<language>. I go on to say how “” abbreviates object-oriented class notation in which set<State;> Q; and alphabet Sigma; and State start; and set<State> finalStates; are fields and delta can be variously a map or a function pointer or a set of tuples regarded as instructions.
In the course I’m glad to go into examples of DFAs and NFAs and regular expressions right away, but reaching the last is high time to say more on formal language theory. I’ve earlier connected and to Boolean or and and, but concatenation of languages needs explaining as a kind of “and then.” One point needing reinforcement is that the concatenation of a language with itself, written or , equals , not . The most confusion and error I see, however, arises from the empty language versus the empty string (or in other sources).
I explain the analogy between multiplication and concatenation although the latter is not commutative, and that the operation between languages naturally “lifts” the definition of for strings. I then say that behaves like and behaves like under this analogy, but I don’t know how well that catches on with the broad range of students. So not always but a few times when lecture prep and execution has left 6–10 minutes in the period, I wrap a story into an example:
Let denote the alphabet of controls on a typical twin-engine Cessna business jet. I will define two languages over this alphabet—you tell me what they are:
After carefully writing this on board or slide, I say, “you have enough information to answer this; it could be a pop quiz.” I let 15–20 seconds go by to see if someone raises a hand amid bewildered looks in silence, and then I say, “OK—I’ll tell a real-life story.”
My father Robert Regan was a financial reporter specializing in aluminum and magnesium. Once in the 1970s he covered a meeting of aluminum company executives in North Carolina. One of the executives failed to show for the first evening, and the news of why was not conveyed until he appeared in splint and bandages at breakfast the next morning.
He told how his twin-engine crew-of-two jet had lost all power immediately after takeoff. With no time or room for turning back, the pilot spotted the flat roof of a nearby bowling alley and steered for it as little he could. The jet pancaked on the roof and bounced into the mercifully empty parking lot. Everyone survived and could thank the way the force of impact had been broken into two lesser jolts. The end of the executive’s tale and interaction with my father in the breakfast-room audience went about as follows:
Exec: I have never seen such a great piece of quick thinking and calm control in my lifetime of business, to say nothing of the sheer flying skill. That pilot ought to get a medal.
RR: The co-pilot deserves a medal too.
Exec: Why? He didn’t do anything.
RR: Exactly.
Maybe only then the significance of the words “appropriate for the co-pilot to initiate” in my definitions of and dawns on the class, as well as the Boolean and. The appropriate string is in both cases: the co-pilot should not “initiate” any actions.
As witnessed by the stories above, in the case of there is a good chance even if both engines fail, so the second clause is certainly satisfied. Thus . Perhaps the example of Sullenberger and Skiles at 850 meters makes it too pessimistic for me to say is a goner at 2,000 meters, but the point of the example is that an unsatisfied conjunct in a set definition makes the whole predicate false even if the part depending on is true. Thus the intent is .
There it is: the difference between and can be one of life and death. How much the story helps burnish the difference is hard to quantify, but at least much of the class tends to get a later test question involving this difference right.
Whether I tell the story or not, I next have to convey why turns around and becomes . I say that the convention helps make the power law true for all , but why is this law relevant for ? Why do we need to define anyway, let alone stipulate that it equals ?
If I say it’s like in arithmetic, the students can find various sources saying is a “convention” and “controversial.” So I say it’s like the convention that a for-loop
for (int i = 0; i < n; i++) { ... }
naturally “falls through” when . Even if the loop is checking for conditions that might force your code to terminate—and even if the body is definitely going to kill your program on entry—if the loop executes 0 times then you’re still flying. It’s a no-op represented by rather than a killing , so the whole flow-of-control analysis is
Thus it comes down to the logical requirement that a universally quantified test on an empty domain defaults to true. Not just they but I can feel this requirement better in programming terms.
To go deeper—usually as notes for TAs if time permits in recitations or as a note in the course forum—I connect to logic and relations. I’ve defined a function from a set to a set as a relation that satisfies the test
Now we can ask:
Is the empty relation a function?
There’s an impulse to answer, “of course it isn’t—there aren’t any function values.” But when the test becomes a universally quantified formula over an empty domain, and so it defaults to true. Thus counts as a function regardless of what is, even if too.
Because , the only possible relation on is . So the cardinality of the set of functions from to is . The notation for the set of functions from a set to a set , namely , is motivated by examples like being the set of binary functions on . There are such functions, and in general
With all this gives . Thus and are needed for the universe of mathematics based on sets and logic to come out right.
The same analysis shows that an empty relation on a nonempty domain is not a function. This means that even when stuff is empty, the type signature of the stuff matters too. One student in my course told me last week that the realization that “empty” could come with a type helped him figure things out.
Real advances in mathematics have come from structure channeling content even when the content is degenerate or empty. There are more hints of deeper structure even in basic formal language theory. I generally encourage the notation for regular expressions over Sipser’s in order to leverage the analogy between concatenation and multiplication, even though equals not any notion of “.” The property does not hold over any field, except for the possibility of the “field of one element” which we discussed some time back.
Now consider the suggestive analogy
What substance does it have beyond optics? The latter equation holds provided even over the complex numbers, and also holds in a sense for . The analogy , works in both equations to yield and . We then find it disturbing, however, that substituting , fails because which is not infinite.
Does it really fail? Perhaps it succeeds in some structure that embraces both equations—perhaps involving ? Our earlier post and its links noted that has an awful lot of structure and connections to other parts of mathematics despite its quasi-empty content.
We know several ways to build a universe on emptiness. In them the supporting cast of structure rather than is the real lead. The new actor in town, Homotopy Type Theory, aims to find the right stuff directly in terms of types and the identity relation and a key notion and axiom of univalence. As related in a recent survey by Álvaro Pelayo and Martin Warren in the AMS Bulletin, the object is to make and other sets emerge from the framework rather than form its base.
Does handling and right take care of everything?
]]>
Plus updated links to our Knuth and TED talks
Ada Lovelace was nuts. Some have used this to minimize her contributions to the stalled development of Charles Babbage’s “Analytical Engine” in the 1840s. Judging from her famously over-the-top “Notes” to her translation of the only scientific paper (known as the “Sketch”) published on Babbage’s work in his lifetime, we think the opposite. It took nuttily-driven intensity to carry work initiated by Babbage several square meters of print beyond what he evidently bargained for.
This month we have been enjoying Walter Isaacson’s new book The Innovators, which leads with her example, and have some observations to add.
Martin Campbell-Kelly and William Aspray, in their 2013 book Computer: A History of the Information Machine with Nathan Ensmenger and Jeffrey Yost, represent a consensus scholarly view:
“One should note, however that the extent of Lovelace’s intellectual contribution to the Sketch has been much exaggerated. … Later scholarship has shown that most of the technical content and all of the programs in the Sketch were Babbage’s work. But even if the Sketch were based almost entirely on Babbage’s ideas, there is no question that Ada Lovelace provided its voice. Her role as the prime expositor of the Analytical Engine was of enormous importance to Babbage…”
We agree with much of this but feel the intellectual aspect of amplification given by her notes is being missed and needs its special due.
Babbage was a polymath and rose to the Lucasian Professorship at Cambridge in the line of Isaac Newton and Paul Dirac and Stephen Hawking, but he never gave a lecture there while making many forays into politics and polemics and industrial practice and theology. He held weekly public gatherings in London all through the 1830s which Lovelace frequented. They included a prototype of his “Difference Engine,” which the British government had funded for the creation of error-free military and scientific tables to the tune of over ten million dollars in our money, but which he abandoned on perceiving the loftier idea of universal computation. In 1837 he wrote a long manuscript on the design of his “Analytical Engine” and its mechanics for the four basic arithmetical operations plus root-extraction. He dated it finished on his forty-sixth birthday 12/26/37, but did not publish it in any form. In 1840 he gave invited lectures on the engine at the University of Turin. They were scribed by a military mathematician and later politician named Luigi Menabrea who produced a paper in French two years later.
Lovelace was also friends with the electrical pioneers Charles Wheatstone and Michael Faraday. Wheatstone prompted her to translate Menabrea’s paper, which Babbage encouraged further by suggesting she add her own notes to it. Her translation was dutiful, but her seven “Notes” labeled A–G swelled to almost triple the length.
Her Note G mainly concerned the steps for calculating the th Bernoulli number (using notation rather than today’s or ). The technical parts included the following:
This was worked out in lengthy correspondence with Babbage and then, as Isaacson details, a month of “crunch time” for Lovelace in July 1843 before the printer deadline. Babbage wrote the following in his autobiography two decades later:
“[I] suggested that she add some notes to Menabrea’s memoir, an idea which was immediately adopted. We discussed together the various illustrations that might be introduced: I suggested several but the selection was entirely her own. So also was the algebraic working out of the different problems, except, indeed, that relating to the numbers of Bernoulli, which I had offered to do to save Lady Lovelace the trouble. This she sent back to me for an amendment, having detected a grave mistake which I had made in the process.”
There is debate on whether “algebraic working out” refers to the first part of Note G or the whole, but Lovelace asked for “the necessary … formulae” in a letter, so it seems clear to us that only the run-up to formula (8) is referred to. The identity of the “grave mistake” is not known, but it seems like an error of derivation not programming.
Menabrea’s paper has a brief mention of Bernoulli numbers toward the end, which points toward Babbage having raised but not elaborated their computation in his Turin lectures. What we don’t know is how far Babbage had worked out the programming details before and after 1840. What we do see is evidence of layers of stepwise refinement from spec to program over time.
Per Babbage’s account, Lovelace worked out at least the bottom layer of all her examples. The examples in her notes B–F have stronger ties to Menabrea’s coverage. Despite Babbage’s crediting her also for their initial-layer algebraic work it is plausible that he had already digested all details. The start of Note A seems to indulge algebraic whimsy in how it styles the Difference Engine’s limitation to polynomials of degree six. She says the “particular function whose integral it was constructed to tabulate” is
thus reducing it in a sense to nothingness. She puns on six applications of being “the whole sum and object of that engine,” trashing it by comparison to the Analytical Engine. Babbage tried to add a preface inveighing against the government’s refusal to fund the Analytical Engine in a way that would have come from her in print; her principled refusal may have surprised him into asking to scrap the whole paper before he saw sense and relented. In any event we can understand those who limit her credit in sections A–F to exposition and programming.
In the Bernoulli section, however, her impact in all layers comes out strong. There is a curious brief passage on why she omitted filling in plus or minus signs. Its stated deliberateness could be covering a gap in command, but in any event represents deferring a non-crucial step. The table and programming parts involve details of procedure and semantics at levels beyond what is evident from Babbage’s 1837 manuscript.
Hence we feel that, translated into today’s terms, her “Notes” make at least a great Master’s project. The question is, would it be more? Our analogy means to factor out niceties like all this happening a cool 100 years before Turing-complete computers began to be built and programmed. We’re trying to map it fairly onto graduate work in computer systems today. So which is it, master’s or doctorate?
We see several seeds of a PhD. In a long paragraph early in Note A and again later she expands on the semantic categories of the machine, which Allan Bromley toward the end of his 1982 paper opines had not been “clearly resolved” by Babbage. Unlike Babbage she keeps this development completely apart from the hardware. This includes her perceptive distinction:
“First, the symbols of operation are frequently also the symbols of the results of operations.”
Between Notes B and E she grapples with the numerics of real-number division and a hint of the idea of dividing by a series and achieving better approximation through iteration. Also in Note E, she develops the structural difference between nested and single iterations. In Note G there is hint of care about variables being free versus bound in logical sequences. Riding atop all this are her philosophical conclusions and prognostications, some of which Dick discussed and which Alan Turing answered at peer level in his 1950 paper “Computing Machinery and Intelligence.” They may be airy but all except perhaps the negative one on originality were right, and in our book that counts for a lot. As does the non-nutty overall precision of her work.
It is not unusual for a systems PhD to be “based on” and stay within the domain of equipment and topics by which the advisor applied for funding. The criterion for impact should be, did the student amplify the advisor’s vision? Did he—or she—find solutions that were not charted in advance? Does the student’s work enlarge the advisor’s prospects? Could he/she continue to develop the applications? That she and Babbage didn’t is most ascribable to lack of external funding and to engineering challenges that are still proving tough today for a project to build the Analytical Engine.
Isaacson’s book goes on to cover intricacies faced by female programmers of the ENIAC that were evidently less appreciated by the men on the hardware side. The end of Menabrea’s paper reflects the same mode of emphasizing the operations in hardware as Babbage’s 1837 manuscript. Even as Babbage helped with the complexities of the Bernoulli application in their correspondence, the point is that he did not pre-digest them—and as Isaacson sums up, it’s her initials that are on the paper. Isaacson does not say “amplifier,” but Maria Popova does in writing about his chapter:
“But Ada’s most important contribution came from her role as both a vocal champion of Babbage’s ideas, at a time when society questioned them as ludicrous, and as an amplifier of their potential beyond what Babbage himself had imagined.”
Hence to those who say that calling her the world’s first programmer is “nonsense” we reply “nuts.” Along with Autodesk co-founder John Walker, whose Fourmilab website hosts the definitively formatted Web copy of the “Sketch” and much else on the engine, we feel reading her paper is evidence enough that she was the first to face a wide array of the vicissitudes and significances of programming.
We note that last month polished recordings of our talks from last October became available:
We also note a long update by Gil Kalai of multiple developments “pro” and “con” on the feasibility of quantum computing—to cite a remark by Dick in our private conversations about this post, “if only Babbage and Lovelace had been able to show how the Analytical Engine could break cryptosystems…”
How do you regard our “advisor-student” terms for assessing Ada Lovelace’s contributions?
[clarified in the intro that Isaacson leads with Lovelace; added “much else” for Walker; added update on Kalai (as originally intended); “three weeks” –> “a month” in July 1843; added note about her treatment of division as one of “some”->”several” seeds of a PhD; other minor edits]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith are the inventors of differential privacy, as formulated in their 2006 paper “Calibrating Noise to Sensitivity in Private Data Analysis,” in the proceedings of the 2006 Theory of Cryptography Conference.
Today Ken and I want to talk about differential privacy again.
Our last discussion was on differential privacy (DP). We stand by our statements that it is a brilliant idea, has direct societal relevance, and is important in theory because it now is being used to prove theorems that do not seem to directly be about privacy. But we messed up in giving the main credit for the framing of the idea to Dwork. For this we apologize.
The paper by Dwork that we featured was an invited presentation to ICALP 2006 that opens the proceedings volume. We referenced its acknowledgments but did not bring the names to top level. In addition to McSherry, Nissim, and Smith, her paper names Moni Naor as joint author of the proof of impossibility of semantic security in that context.
We have since been apprised by them and others of the longer history of both the concept and its fulfillment by the mechanism of careful perturbations of database query responses. Among several precursors cited in their work are two PODS 2003 papers, one by Nissim with Irit Dinur titled “Revealing Information While Preserving Privacy” and the other by Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant titled “Limiting Privacy Breaches in Privacy Preserving Data Mining,” a 2000 SIGMOD paper “Privacy Preserving Data Mining” by Srikant with Rakesh Agrawal, and then a Crypto 2004 paper by Dwork and Nissim and a 2005 SIGMOD paper on “Practical Privacy: The SuLQ Framework” adding McSherry and Avrim Blum.
There are now many followup papers and surveys on DP. Three followup papers are “Mechanism Design via Differential Privacy” by McSherry with Kunal Talwar (FOCS 2007 version), “Smooth Sensitivity and Sampling in Private Data Analysis” by Nissim and Smith with Sofya Raskhodnikova (STOC 2007 version), and “What Can We Learn Privately?” by N-R-S plus Shiva Prasad Kasiviswanathan and Homin Lee (FOCS 2008; 2013 version). We also note a paper by Jaewoo Lee and Chris Clifton in the 2011 Information Security conference titled “How Much is Enough?: Choosing for Differential Privacy,” which speaks to us the technical delicacy of the mechanism; Lee and Clifton have gone on to vary the concepts. Of course there is also the new book by Dwork and Aaron Roth. If you are interested in the full history of DP and the progress from its roots to its current applications, we suggest to see these sources.
We apologize again. We retract any statement that suggested an inaccurate history of the invention of DP. We do not retract our opinions regarding its importance.
[referenced a second PODS 2003 paper]
Taekwondo source |
Cynthia Dwork is a computer scientist who is a Distinguished Scientist at Microsoft Research. She has done great work in many areas of theory, including security and privacy.
Today Ken and I wish to talk about the notion of differential privacy and Dwork’s untiring advocacy of it.
This concept is brilliant. It is, in our opinions, one of the greatest definitions of this century. Okay the century is just fifteen years old, but it is a terrific notion. The formulation was introduced in her paper with Frank McSherry, Kobbi Nissim, and Adam Smith at the 2006 Theory of Cryptography Conference. The name “differential privacy” was suggested by Michael Schroeder also with Dwork and McSherry at Microsoft, and appears first in her ICALP 2006 invited paper. Her expositions of the concept’s motivations and significance in that paper are a breath of fresh air and also animate her 2014 book joint with Aaron Roth. (Corrected—note also our followup.)
Dwork’s late father, Bernard Dwork, proved the first part of the famous Weil conjectures: the rationality of the zeta-function of a variety over a finite field. Ken had the rare privilege of doing an undergraduate seminar with him at Princeton in 1980 on ‘the’ zeta function—through Norman Levinson’s 1974 theorem placing at least one-third of the zeroes on the critical line—without being aware of the generalizations. Later Alexander Grothendieck and Pierre Deligne completed the rest of André Weil’s conjectures. The last part, due to Deligne, is usually considered the most difficult—of course the last step is always the hardest, an old saying I first heard from Ron Graham.
It is interesting to note that the actual proof by Deligne was not to Grothendieck’s liking. There are some who say that Grothendieck never even read it, since it did not use the deep tools he had developed to attack the conjectures. The last part of Deligne’s proof uses an estimation trick that Grothendieck certainly would not have liked: actually one that we use often in theory.
Roughly the argument goes like this: First one proves that
Then by a tensor-like trick one gets that
As this is true for arbitrarily large even , this implies that
Finally Poincaré duality then implies that
Clearly Bernard Dwork’s paper was important. It used deep methods to prove the first leg of the three parts of the Weil conjectures. Even if the later steps were harder, this step was still very deep, very hard, and very important. For this beautiful work, Dwork senior received, together with Kenkichi Iwasawa, the Cole Prize in 1962. This prize is given for outstanding work in number theory, and is one of the top prizes in the world.
This is one way to make a paper important: solve an open problem using whatever methods are available. Certainly a paper that solves an outstanding open problem is a likely candidate for being an important paper. For example, Andrew Wiles won the Cole Prize in 1997 for his essential finishing leg on Fermat’s Last Theorem.
Yet there is another way to write an important paper. Write a paper that changes mathematics or theory without solving an open problem. In many ways this is sometimes harder. It is hard to create a new concept in mathematics. We just talked about Carl Gauss’s beautiful introduction of the notion of congruence into number theory. One could argue, I believe, that this definition alone would have justified awarding Gauss a Cole Prize had one existed years ago. His definition changed number theory.
We would like to argue that the twin papers on Differential Privacy are among those great papers that may not have solved a deep open problem but that introduced a new and powerful notion.
The first half of her paper does not define the new notion. Instead, it puts a kabosh on a privacy notion that had been articulated as a chief desire as far back as 1977. This notion sought to assure that no breach of privacy—or at worst a negligible chance of breach of privacy—would result from permitted interactions with a statistical database such as getting means and percentiles. It was the analogue of semantic security in cryptosystems: any personal information gained by an adversary as a result of interaction with the database could be gained without it.
This might be achievable in a closed system where there is no information besides what is communicated via queries that the database allows. However, in the real world there are leaks of auxiliary information. The leak plus the database answers can cause a compromise that would not occur from either alone.
Here is a different example from the one involving Terry Gross in her paper—I am Ken writing this and the next section. Say my metro area has two charity drives, “Green Hearts” and “Blue Angels.” Suppose a metro official, thinking to praise me, leaks to someone else that I gave twice as much as the average donor to Blue Angels. That alone is not enough to determine my donation. However, the someone can then interact with Blue Angels to learn its number of participants and total gift. Indeed, charities often publish both figures afterward. Then my exact amount becomes known.
The ICALP paper’s impossibility proof, which was joint work with Moni Naor, shows that for any computationally formalized scheme attempting to achieve semantic security there is an “auxiliary information generator” that can break it by injecting leaks into the environment. Theory mavens can enjoy her proof if only to see how Turing machines that always halt are combined with a new definition of “fuzzy extractor.” If it doesn’t shut the door on the old objective, at least it changes the onus on anyone trying to revive it to find a weakness in her model. Exactly because her formulation is so general, this would be hard to envision.
The punch line to the charity example is that I didn’t give anything to Blue Angels. I gave to Green Hearts. I wasn’t in the Blue Angels database at all, and yet I was compromised when its most basic statistics became known. Or maybe I gave equal amounts to both. Whatever.
Next year comes around, and the drives are on again. Once bitten, twice shy. Should I avoid Blue Angels? Well, Green Hearts publishes the same figures. I could sue the leaker, but that’s beside the main point:
What responsibilities and liabilities can the organizations managing the databases reasonably be expected to bear?
The insight, born of long work in the area and collaboration with the above-mentioned co-authors and others, is to focus on differences that can be isolated to one’s participation in the database. Hence the term differential privacy (DP).
In my example, the compromise did not depend on whether I gave to Blue Angels. Had I done so—had I split my gift between the charities—it would have made only a tiny difference to the Blue Angels’ average (in this case none, in fact).
The 2006 papers prove that the common idea of adding a random amount up to to any returned query value, where is on the order of the difference in query value from including or deleting my record, assures that any adversary can gain only a tiny amount of information between the case where I am in and the case where I stay out. If the database is large enough, then the error in a reported average or percentile value will also be negligible.
The formula in her definition has an interesting technical point. She formulates it in terms that the probability of a damaging disclosure can increase by only a small multiplicative factor. It is natural to say, “a factor.” Now suppose, however, that a group of people are either in or out. Then the factor becomes
The other terms have powers and higher and so are even more strongly negligible, but their presence is annoying for doing rigorous estimates. Hence Dwork writes the factor as instead. Then composing it for people simply changes it to
which makes other estimates easier to do. Her full definition is that a randomized query-return function confers -differential privacy if for all data sets that differ in at most one element, and all subsets of the possible query results,
This definition is completely local to the database mechanism. It is also achieved by schemes in the last part of her paper, which she and her co-workers had already been developing. The wider point on the practical side is that this reinforces the legal view that a database provider should be liable for outcomes that result squarely from a person or persons stepping through their door.
On the theoretical side, the test of a new definition, a new notion, is simple. If the notion generates lots of interesting theorems about it then that is okay—indicative of a good notion. But it really is not indicative of a great notion. The acid test, in our opinion, is if the notion helps prove theorems from outside its own domain.
Translated to DP it means: all the theorems in the world using DP to prove theorems about DP are not sufficient. What is critical is if the DP notion can be used to prove theorems about topics that are not really about privacy. This is the acid test.
Happily DP passes this test. There are now results that use it to prove theorems about topics that seem to be distanced from any obvious connection to privacy. Let’s look at one due to Sampath Kannan, Jamie Morgenstern, Aaron Roth, and Zhiwei Wu—their paper just appeared in SODA 2015 as Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy).
We quote their abstract:
We present a mechanism for computing asymptotically stable school optimal matchings, while guaranteeing that it is an asymptotic dominant strategy for every student to report their true preferences to the mechanism. Our main tool in this endeavor is differential privacy: we give an algorithm that coordinates a stable matching using differentially private signals, which lead to our truthfulness guarantee. This is the first setting in which it is known how to achieve nontrivial truthfulness guarantees for students when computing school optimal matchings, assuming worst-case preferences (for schools and students) in large markets.
What other applications are there of this notion? Also are there simple definitions lurking out there that like differential privacy will change the entire field? Try to prove hard theorems, but also look for new definitions. They may—if you are “lucky”—be immensely important.
[Rewrote intro including link to followup correction post; changed ascriptions in later sections; changed possessive to “and a” in title]
Georgia Tech source |
Joseph Ford was a physicist at Georgia Tech. He earned his undergrad degree here in 1952, and after earning his PhD at Johns Hopkins, went to work for two years at Union Carbide in Niagara Falls before joining the University of Miami and then coming back to Tech. He was possibly lured back into academia by considering a paradox studied by Enrico Fermi, John Pasta, Stanislaw Ulam, and Mary Tsingou in the mid-1950s. The paradox is that periodic rather than ergodic motion can often result in complicated systems.
Today we wish to present a simple observation about hard-to-compute functions.
Ford was one of the founders of chaos theory and was instrumental in creating the journal Physica D for nonlinear dynamics. While becoming a Regents’ Professor at Tech in 1978 he was an early adopter of personal computers for his work. This accompanied a progression in his thinking expressed in the entire abstract of his 1983 paper, “How Random is a Coin Toss?”
In examining the differences between orderly and chaotic behavior in the solutions of nonlinear dynamical problems, we are led to explore algorithmic complexity theory, the computability of numbers and the measurability of the continuum.
This paper includes a photo of the coin toss before a National Football League game—something to think about for Sunday’s Super Bowl. For now, though, let’s think about how periodic phenomena might creep into uncomputable functions.
If is uncomputable, can it be possible that is computable modulo for some ? This is especially interesting when might only be uncomputable “because” it grows faster than every computable function.
The answer is quite simple: Yes. Suppose that is hard and consider . Then is always zero modulo , but knowing yields . This trivial answer—yes it can be easy to compute a hard function modulo a —calls to mind an old theory saying:
Too easy an answer to a problem suggests there is a better problem.
Let’s look for a better problem.
The difficulty with the function is that is artificial: we took a nice hard to compute function and changed it so it would be easy to compute modulo . This seems like “cheating”—no? So the better problem is to ask: are there natural hard problems so that they can be computed modulo some ?
A classic hard function is the halting problem, but it is only - valued and so will not be very interesting. The reason, of course, is that computing it modulo any yields the same value. So let’s look at a cousin to the halting problem: the busy beaver function, which we recently discussed.
Define to be the maximum number of 1’s that an -state Turing machine can leave in a halting computation that starts with a blank tape and uses only the characters 1 and blank. The Turing machine has just one head and must move it one cell left or right at each step; the tape is two-way infinite. A variant allows the machine also to stay—that is, not move its head—and this gives rise to a function .
I do not know whether we can compute or modulo any . It seems unlikely, but is not obvious to me. However, we can define a natural quasi-inverse of the busy-beaver function that is likewise uncomputable, but is computable modulo 2:
Definition 1 Define the beaver census function to be the number of n-state Turing machines that when run on blank tape halt with at least k-many 1’s on the tape. is defined similarly for Turing machines with “stay” moves.
If were computable, then we could compute by iterating until . Likewise is uncomputable, and this is interesting because both functions are bounded by the number of -state Turing machines of the respective kinds. These numbers are singly exponential in , so the uncomputability is apart from growth.
There are two “obvious” symmetries of Turing machines that affect the values of . One is that you can permute the state labels. Let us insist that the start state is labeled ; then there are the permutations of the other states. The second is handedness. Since the tape is two-way infinite, we can interchange “left” and “right” moves in every instruction to produce an equivalent machine that runs in mirror image.
Does this make and/or always be multiples of –? One immediate subtlety is that in the latter case when a machine has only “stay” moves, interchanging “left” and “right” does not produce a different code. Such a machine can leave at most one 1 on the tape, so this is immaterial when . For other machines the interchange cannot produce the same code as otherwise the machine would be nondeterministic, having both a “left” and “right” option at some state.
However, what can happen is that the left-right interchange has the same effect as a permutation of the states. Say that such a machine has an “unexpected symmetry.” This is what makes our problem nontrivial:
For which divisors of are and/or computable modulo ?
The only thing we can prove now is that the interchange symmetry is always available.
Theorem 2 and are computable modulo in polynomial time. So are their analogues for the busy-beaver measure that counts the number of steps before halting rather than the number of 1’s left on the tape.
Proof: For the left-right-only case the values are always even. For with , we can exhaustively simulate every machine having only “stay” moves since there is no unbounded interaction with the tape. There are still exponentially many machines to count, so polynomial time is not completely trivial. Every machine with state set is characterized by a graph on nodes. Each node is labeled either or for some state and expresses that the machine is looking at a blank or at a 1, respectively. Letting stand for the start state ( by our convention in the last section), becomes the root node in the graph. An edge represents that the machine has an instruction for state that changes blank to 1 and goes to state . An edge means that the machine in reading 1 leaves it alone and goes to ; the other four kinds of edge are defined similarly.
Thus we have to count the number of graphs of out-degree at most 1 on nodes in which the path beginning at never cycles and ends at a node in . To do so, for each possible length of such a path, we multiply two numbers: the count of such paths, and the number of ways to define irrelevant links (and non-links) from the other nodes. Summing those values over gives . The treatment for the step-counting measure is similar, except that the path need not end in .
Does this idea extend for the other divisors of —? Of course if the functions are computable modulo then they are computable modulo the others. We might expect that machines with an “unexpected symmetry” cannot be among the busiest beavers, so that we can discount them as soon as , perhaps.
What we are more curious to ask, however, is this:
Are there any other symmetries of Turing machines? Perhaps ones that emerge only when is sufficiently large?
Note that for , is the number of Turing machines achieving the busy-beaver bound. Thus our results about apply to the busiest beavers. Hence a way of looking for more symmetries is to see if has any larger divisors than . That is, are there busiest-beaver machines besides those that are equivalent under the “obvious” symmetries? The studies which we cited before seem not to have found any.
Can we find (other) interesting cases of uncomputable functions being computable mod ?
]]>