Global Science source |
Martin Gardner introduced many including myself to the joys of Discrete Mathematics. His glorious monthly column “Mathematical Games” for Scientific American included some continuous mathematics too, of course; one could say it was on “Concrete Mathematics.” However, I conjecture—based on a quick flip of the several books I own of his columns—that the symbols in a calculus context never appeared in them.
Yesterday was the 100 anniversary of Gardner’s birth. Dick and I wish to join the many others marking this centennial and thanking him for all he did to make math fun for so many.
His feature kicked off in 1956 with the famous column on hexaflexagons, which I will talk about in a moment. Gardner related in his autobiography, which was assembled three years after his death in 2010, how important this column was as his “break.” However, the column that made the most lasting impression on me began with the words:
The calculus of finite differences, a branch of mathematics that is not too well known but is at times highly useful, occupies a halfway house on the road from algebra to calculus.
This discrete “calculus” enables one to calculate a formula for any polynomial sequence given enough values for It also led to my favorite “visual proof” that : For any integer , if you write out the powers going across and take differences of adjacent values repeatedly to make an infinite equilateral triangle pointing down, the left side has the powers of . Iterating this gives you the powers of , but the entry for as counts down to steadfastly remains .
Tributes have been gathered all during this centennial year. Scientific American observed yesterday by posting a review of ten of Gardner’s most appreciated columns. Bill Gasarch’s post yesterday links to some of his and Lance Fortnow’s previous items on Gardner, and further to a site where anyone can contribute a testimonial.
Frederic Friedel, who co-founded the chess-database company ChessBase three decades ago, knew Gardner personally from 1979 as a fellow original member of the Committee for Scientific Investigation of Claims of the Paranormal (CSICOP, now CSI). The committee remains housed in my town of Amherst near Buffalo, now at the Center for Inquiry (CFI Western NY) which is across Sweet Home Road from my university campus. Friedel has described to me cold days in Buffalo and round tables with Carl Sagan and other luminaries. All this was before my own arrival in 1989.
Friedel was also among the column readers with whom Gardner interacted from the beginning in the 1950s. His awesome tribute yesterday includes appreciation of Gardner’s book Fads and Fallacies in the Name of Science, which also made a strong impression on me, and other links. Dick recalls the great chapter of that book that starts with Gardner saying that this next crazy claim cannot be disproved. It was that the universe was created recently with a full fossil record that makes it look much older. Indeed, it could be a so-called “Boltzmann Brain”—and a point made in this NY Times article is that it’s crazy that this is not crazy.
I never had any contact with Gardner, despite making a few visits to CFI; it ranks among numerous lost opportunities. I could mention many other influences from his columns, and looking through his book Mathematical Circus just now reminded me that his chapter on “Mascheroni Constructions” was my first knowledge of what I called the “class ” in my “STOC 1500″ post with Dick. I had a similar experience to what Douglas Hofstadter told in his own tribute in May 2010: I started off interested in Math+Physics, intending to take the latter as far as quantum mechanics and particles at Princeton. But I found advanced mechanics and electrodynamics tough going, and am ever grateful for being allowed to parachute out of the latter at midterm into Steve Maurer’s Discrete Mathematics course, in which I knew I’d found my métier. As I could have realized from my love of Gardner all along.
I’ve wanted to make a post on my hexaflexagon twist when I had time to create nice pictures, but now I will have to make do by referring to the fine illustrations in Gardner’s original column, which is freely available from the M.A.A. here. It requires making the standard “hexa-hexa” as shown in Gardner’s Figure 2. For best effect, in addition to numbering the faces 1–6 as shown there (and using a solid color for each face), label the six components of each face A–F in the left-to-right order given there.
The “Twist” is always applicable from one of the three inner faces (1, 2, or 3); finding when it applies from one of the outer faces and from the configurations that follow is more of a challenge. Instead of flexing as shown in Figure 3, follow these directions:
What you will get is a flexagon with the colors on its faces jumbled up—if you’ve used the lettering, you will have ‘1C’, ‘5B’, and ‘2F’-‘2E’-‘2D’-‘2C’ clockwise from upper right. You will still be able to flex it the standard way, but only exposing one other face—that is, you will have something isomorphic to a tri-hexaflexagon.
Now the real fun is that you can iterate this process. For one thing, you can invert it to restore your original hexa-hexaflexagon (teasing ‘2E’ and ‘2F’ forward and folding in ‘1C’). But you can also find other places from which to initiate another “Twist,” and these will lead to more tri-hexa configurations. One is to flip it over, rotate once counterclockwise so you fold backwards with ‘6B’ and ‘3C’ at right, tease forward ‘3E’-‘3D’, tuck ‘3C’ into the bowl atop ‘1D’, collapse and grab at the other end of ‘2A’-‘6A’, lift flap ‘2D’ out of the bowl, and unfold to see ‘2D’-‘4D’-‘6A’-‘2A’-‘3E’-‘3D’. Then you can flip over, rotate once more counterclockwise, and iterate—but there are other twists too.
Some will lump up thick wads of paper on three triangles of each face, so be ginger about it. Finally, after much exploration, you may come upon the “Dual Hexa.” This has six faces, in which the inner three alternate colors. It is, in fact, the configuration you would build if you first rotated the top part A of Gardner’s Figure 3 by 180 degrees. Then you may find a way to go from the primal to the dual and back by a long regular pattern of repeated twists.
As a high-school student in 1976, I attempted to map out the entire space of reachable configurations by hand, but made some bookkeeping errors and gave up. I wanted to write a computer program to simulate my twists, but did not make the time.
Can you do the “Twist”? The space of configurations you can explore is much larger than the “Tuckerman Traverse” of the standard hexa-hexa shown in Gardner’s Figure 4. Can you map it all out? Has anyone previously known about this?
[some format and word changes, updated to include letters of facets.]
Waterloo Mathematics source |
Michael Rubinstein is an expert on number theory, who is on the faculty of the University of Waterloo. He is one of the organizers of a 61-birthday symposium being held December 15–19 in Princeton for my friend and former colleague, Peter Sarnak. I guess it is a matter of taste for a number theorist whether to observe a birthday with a lot of factors (60) or a prime (59 or 61). Rubinstein also does extensive experimental mathematics and lists several code libraries below his publications on his website, which also has interesting articles on the math, history, and practice of musical tuning.
Today Ken and I wish to discuss a paper of his on one of my favorite problems: integer factoring.
The paper appears in the 2013 volume of the electronic journal INTEGERS, one of whose sponsors is the University of West Georgia. It is titled, “The distribution of solutions to with an application to factoring integers.” He studies the structure of solutions to (mod a), and uses this to prove the following result:
Theorem 1 For any , there is a deterministic factoring algorithm that runs in time.
The “” hides sub-linear terms including coming from the divisor bound, logarithmic terms from the overhead in nearly-linear time integer multiplication, and related sources.
Factoring algorithms partition into two types: unprovable and provable. The unprovable algorithm usually use randomness and/or rely for their correctness on unproved hypotheses, yet are observed to be the fastest in practice for numbers of substantial size. The provable algorithms are usually deterministic, but their key feature is that their correctness is unconditional.
For those trying to factor numbers, to break codes for example, they use the fastest unprovable algorithms such as the general number field sieve (GNFS). The cool part of factoring is that one can always check the result of any algorithm quickly, so anytime an unprovable algorithm fails, it fails in a visible manner.
Why care then about slower algorithms that are provable? Indeed. The answer is that we would like to know the best provable algorithm for every problem, and that includes factoring. We are also interested in these algorithms because they often use clever tricks that might be useable elsewhere in computational number theory. But for factoring there is a special reason that is sometimes hidden by the notation. The GNFS has postulated runtime
where . This is not a similar order to . To see this more clearly, let be the length of in binary, so . Then the GNFS is roughly and slightly above , but is , which is not only miles bigger, it is a properly exponential function.
Nobody knows a deterministic factoring algorithm that beats properly exponential time.
The unprovable algorithms taunt and tease us, because they hold out the promise of being able to beat exponential time, but nobody knows how to prove it. Indeed, as Rubinstein remarks in his intro, each “teaser” is a conceptual child of one of the exponential algorithms. A reason to care about his new algorithm is its potential to be a new way to attack factoring, even though it currently loses to the best known provable methods. These are all slight variations of the Pollard-Strassen method, all running in -type times; there are also algorithms assuming the generalized Riemann hypothesis. See this for details.
Most factoring algorithms—even Peter Shor’s quantum algorithm—use the idea of choosing a number and working modulo . If and share a factor then we can quickly find it by Euclid’s gcd algorithm, while otherwise the problem of finding such that provides useful information. The starting point of Rubinstein’s algorithm is to do this for values that are right near each other, and compare the values obtained.
In its outermost structure, it uses this fact: Suppose where and are primes of the same length, so . Suppose . Then if we write
with (note they are relatively prime to or else gcd gives us a factor) we get that and are fairly small, about . They are also roughly bounded by and by . Multiples of them will stay small. So now let us work modulo for . We have:
so
Now an issue is that the values giving are far from unique. However, among them as will be the collinear points
The increments are small enough that the points stay close to each other in Euclidean space. If we divide the space into boxes of the right size and offset, they will all stay inside one box. If we can search a box well enough to find them and get all of exactly, then we get and . Moreover—and this is why I like factoring—once we find them we can verify that we have the right ones; if the check fails and we were fooled by -many other points we can move on.
Using , Rubinstein is able to show that the amortized number of “foolers” in a square is . Since there are -many such squares and , we get the target runtime. Note this is amortized, not just expected, so the algorithm is deterministic. The most clever and painstaking aspect is that estimates of the asymptotic convergence of solutions of to uniform distribution on are needed to get the “amortized” part. The details become complicated, but Rubinstein writes in an engaging top-down manner, and he includes a section on how this might—might—be used to break the sub-exponential barrier.
I found this work interesting not just because it is a new approach to factoring. I have tried in the past to prove the following type of result: Is there an asymmetric cryptosystem that is breakable if and only if factoring can be done in polynomial time?
I want to replace systems like AES with ones that uses the hardness of factoring for their security. Systems like AES rely on intuition and experimental testing for their security—there is not even a conditional proof that they are secure.
My goal is really trivial. Any public-key system can be viewed as an asymmetric system. But what I want is that the encryption and decryption should be very fast. This is one of the reasons that modern systems use private-key systems to create symmetric keys: performance. Using public-key systems for all messages is too slow.
My idea is not far in spirit from what Rubinstein does in his factoring algorithm. This is what struck me when I first read his paper. His algorithm is slow because it has no idea which “box” to look at. Can we share some secret that would allow to be factored faster, yet still make it hard for those without the secret?
Can we extend Rubinstein’s algorithm to break the barrier? Can his methods be used as described above to create asymmetric systems that are based on factoring? What is the real cost of factoring?
[fixed sentence about private-key/symmetric]
Adrienne Bermingham is the manager of this year’s TEDx Buffalo event, which will be held this Tuesday at the Montante Center of Canisius College in downtown Buffalo.
Today I wish to proudly announce that our own Ken Regan is one of the presenters at this year’s event.
Bermingham is the head organizer of the TEDxBuffalo events, which started in 2011. This year’s theme is In Motion. When not organizing she works in Anthrozoology, which is the study of how we, humans, interact with animals. As one who daily interacts with our golden retriever, I would love to hear any advice on how to make that better.
The TED organization is dedicated to getting information out to the world: ideas worth spreading. TEDxBuffalo is an example of a local group working with them to put on an TED event. Their event, TEDxBuffalo 2014, is suppose to be relevant to Buffalo, by Buffalonians. It has no keynotes, panels, or any of the usual stuff we see at conferences. No parallel sessions. Just a day of “engaging and refreshing your brain.” Of course as a non-Buffalonain I would expect that the talks, while focused locally, will still be interesting to the rest of us. Its a twist on the famous phrase “think globally, act locally.”
Quoting them:
Applying a theme to our TEDx event allows us to highlight a strength we’ve identified in our community, curate a series of talks that have the ability to build off of one another, and send a clear, powerful message to members of our community and TEDx video viewers across the globe.
This year’s theme builds upon TEDxBuffalo 2013, which celebrated our city’s “Renaissance Citizens”. Now that we’ve acknowledged our city’s renaissance, it’s time to recognize those who are hard at work bringing about positive change in our community—those who are truly “In Motion”.
The day starts at 9am and goes to 4pm. It consists of a dozen talks. Go here for the exact time schedule and also more information on the talks.
A little secret: Ken cropped me out of the photo he used. It was taken at a Barnes and Noble in Ann Arbor when we attended the Michigan “Coding, Complexity, and Sparsity” workshop. Oh well.
The talks are described by bios of the speakers. I must say that Ken’s talk I get completely: it will be on his research into chess. The others are less clear—I guess that is part of the fun of a TEDx event. The talks are special, surprising, and should all be fun. Here are very short descriptions of the talks. Very short.
Maybe I don’t get Ken’s talk. His title is, “Getting to Know Our Digital Assistants.” I thought Ken was involved in making sure people don’t use digital assistants. I guess the only way to know is to watch.
The TEDx program with be broadcast on this Tuesday. General information is found here and go here for the live feed. Ken is talking in the middle of the 10:40am–noon session, perhaps shortly after 11:10am. As usual if you miss the live broadcast the talks will be still available on-line.
If you live nearby you may still be able to get there in person. For the rest of us I look forward to see Ken and the others in motion.
Update 10/16/14: Upon being reminded by John Sidles’ comment that YouTube and other video URLs have fields for jumping to a given time or frame, I (Ken) indexed all the talks and other segments of the day, and the direct links are now posted on the livestream page.
FullWiki source |
Alan Baker won the Fields Medal in 1970 for his work on algebraic and transcendental numbers. He greatly extended a theorem of Aleksandr Gelfond and Theodor Schneider which had solved the seventh problem on David Hilbert’s famous list. Baker’s extension says that if are algebraic numbers other than or , and if are irrational algebraic numbers that together with are linearly independent over the rationals, then the product of for to is transcendental. Hilbert had stated the case, which Gelfond and Schneider solved, and believed it would be harder than the Riemann Hypothesis.
Today Ken and I want to talk about computing numbers to high precision and their relationship to our recent discussion of Freeman Dyson’s challenge.
Recall that Dyson argued that the following is true, but unlikely to ever be proved:
For any , let . Reverse the digits of in decimal. The result is never a power of .
For example, , and its reversal is which at least ends in five. But dividing it by 25 yields , so it is not a power of .
To tell that the answer is no may only need computing a handful of the leading digits of . This is because when we reverse , we compare them with the low-order digits of a power of . It is easy to compute recursions for the low-order digits, in any base. Getting high-order digits, however, is a problem of approximation. Baker’s theorem, in the form that drove his transcendence result, gives a lower bound on how close a “nice” quantity can be to “nasty” values, which in turn provides an upper bound on the error of approximations we need.
At the end of our discussion I pointed out that it seemed likely that we could devise an algorithm to test Dyson’s claim for in time polynomial in the number of bits in . Thus we should be able to check whether
reversed in decimal is not a power of five. The point is that such an algorithm would be interesting, since it would allow the checking of the claim for huge numbers, numbers that we cannot even write down in decimal notation. The central issue in this was the ability to compute the leading few decimal digits of such large numbers. A theorem by Richard Brent allows the computation in polynomial time of the logarithm of such numbers to exponential precision. But it does not obviously allow the computation of the top digits. See our discussion for why the top few digits may be sufficient.
I ended the previous discussion with an implicit question:
Whether this can be made to work we will leave for the future.
Thanks to Eric Allender, I believe that we can almost close the loop and prove that there is an algorithm for checking Dyson’s claim. Almost.
I asked Eric how hard it is to compute the leading decimal digits of for large , and he immediately said that it was known. He pointed out that in 2010—so relatively recently—Mika Hirvensalo, Juhani Karhumäki, and Alexander Rabinovich proved that the top few digits of such numbers can indeed be computed in time polynomial in the size of . Their paper (older TR) states this for base , but their results on appear to work for any base, even even bases. The paper title states the point nicely: “Computing partial information out of Intractable: Powers of algebraic numbers as an example.”
What strikes us as their key insight is to connect the computation with Baker’s theorem. Algebraic numbers have two common measures of complexity: the degree of the minimum polynomial equation they solve, and their height which depends on the size (or rather the lengths) of the coefficients of this equation. The theorem says that provided the algebraic numbers are not all zero, and no is or , then
where is the maximum of and the heights, and is computable in terms of , the degrees, and the choices of complex logarithm branches. When , as the paper intends, this also needs the logarithms to be linearly independent over the rationals. (Independence for the appears needed only for the transcendence deduction; meanwhile, the fact of the sum being nonzero makes and the logarithms collectively linearly independent over the algebraic numbers.)
The paper cites later bounds given by Baker for computing in the case . This is used to control the amount of precision needed to determine the leading digits of the power of . This is all very clever, and another example of how results in numerical approximation can be the fulcrum of algorithms.
The reason the algorithm is not completely worked out is that the authors prove only that one can get the top two digits, and they restrict their results to odd bases. We believe, but have not yet verified, that they should be able to handle a few more digits. To check Dyson’s conjecture efficiently for a given , we believe it should sufficient to check digits of . It is also unclear how much having base which is not relatively prime to will interfere. We will leave this for the future—again.
]]>
It’s harder than you think
William Kahan is a numerical analyst and an expert on all things about floating point numbers. He won the 1989 Turing award for his pioneering work on how to implement floating point computations correctly. He has also been a tireless advocate for the care needed to avoid insidious numerical bugs, and his strong feelings come out in a 1997 interview for Dr. Dobbs’ Journal.
Today I want to talk about one of his results on how to sum a series of numbers.
It sounds trivial, but is really a difficult problem when inexact arithmetic is used. Since computers almost always use a fixed finite precision, the problem arises all the time, every day. Another view of adding up numbers is studied by Alexei Borodin, Persi Diaconis, and Jason Fulman in their paper on On adding a list of numbers: it is in the Bulletin of the AMS. They study the carries required—perhaps we will discuss this another time.
I must say that I have the pleasure of knowing Kahan, and have discussed him before here. He is the most articulate person I have ever met, the most knowledgable person about numerical computation, and an extremely interesting person.
Of course computers can only store finite representations of numbers. This does not mean that computers cannot manipulate and and : these are routinely used correctly by algebraic packages. It does mean we prefer computers in most computations to use finite representations. The simplest is to use fixed point arithmetic: a number is restricted to be in
for some large . The advantage of this method is that arithmetic is almost perfect in that all the usual rules of arithmetic are preserved: addition is, for example, commutative and associative. Well almost. The only problem is if an operation yields a number too big or too small. This overflow and underflow are the main issue. Provided one avoids this the representation is quite simple and easy to use. The famous John von Neumann recommended this method for the 1951 IAS machine.
The difficulty with fixed point representation is that division is a major problem. What is ? In fixed point integer arithmetic it appears that this must be rounded off. But that causes a huge problem, since it loses a great deal of information. Enter floating point. This represents numbers as
where is a fixed point number and so is . Now is equal to . Great—no loss in accuracy. But what is ? Now we can represent this approximately, but not exactly. This is both the strength and the difficulty of floating point numbers. The fact that numbers are only approximations yields many interesting questions about how to manage computations to minimize the loss of accuracy.
It is interesting, to me, that very early computers used hardware to implement floating point type systems. The Z3 of Konrad Zuse used them, and also implemented the square root operation in hardware. This machine was electromechanical:
A fundamental problem is that there can be rounding—see this item als from Dr. Dobbs’ for this example:
but
Of course this shows that addition with rounding is not associative. It also shows that while the two answers are different and they are close in a relative sense. One might think that this is okay, and it can be in many situations. If the computation was computing the difference between two known to be huge numbers, then the fact that the answer was zero or almost zero might be insignificant. But there are other examples where this is just plain wrong and could cause a program to fail. In any event, the failure of the associative law may at least make debugging and understanding a program much more difficult. This is probably why von Neumann was against floating point: it can be hard to debug and test.
Thus, the main question is can we do better? The answer is yes.
Let’s make the problem into a clean formal problem: Given a list of floating point numbers, what is the best way to get so that
is as small as possible? The above example shows that the trivial algorithm of adding up the numbers in order
is definitely not the optimal way.
This suggests that we use some kind of cleverness in either reordering the numbers or in keeping track of additional information. Of course, since this is such a basic problem, we want our algorithm to operate in time close to linear time—we want the number of basic operations to stay close to .
The state of the art is that there are algorithms that compute good answers to the sum problem. Some trade running time for accuracy. One of the simplest uses sorting to rearrange the numbers. Kahan himself developed an algorithm that achieves error growth that is bounded independent of : it does dependent on a kind of condition number that describes how well behaved the sum is. Another nice algorithm uses divide and conquer:
One recursively divides the set of numbers into two halves, sums each half, and then adds the two sums. This has the advantage of requiring the same number of arithmetic operations as the naive summation (unlike Kahan’s algorithm, which requires four times the arithmetic and has a latency of four times a simple summation) and can be calculated in parallel. The base case of the recursion could in principle be the sum of only one (or zero) numbers, but to amortize the overhead of recursion one would normally use a larger base case. The equivalent of pairwise summation is used in many fast Fourier transform (FFT) algorithms, and is responsible for the logarithmic growth of roundoff errors in those FFTs. In practice, with roundoff errors of random signs, the root mean square errors of pairwise summation actually grow as
James Demmel and Yozo Hida proved several interesting further results in a cool paper in 2002. One is that under reasonable assumptions they get the summation exactly when the answer is zero. This is a quite interesting property for those of us in theory.
It suggests a definition: Let be the set of finite precision numbers allowed. Let map to —note is a function defined over the reals. Let be an algorithm that implements . Say that this algorithm is perfect provided for any in , if is in , then . Thus, if the answer can be represented in , the algorithm always gives exactly that.
This is a strong property of an algorithm: if the answer is possible, then the algorithm does indeed get it. The algorithm by Demmel and Hida achieves this for the special case of summation for the special value . For other final values, however, it can be off by one-plus in the least designated significant unit. Their algorithm also requires that the number of values being summed is bounded in terms of the available precision and the significance requirement; if exceeds the bound then the algorithm can be thrown off wildly. This indicates the surprising delicacy of the kind of simple matter that programmers might take for granted.
One idea I had is to run the fixed-point parts of the calculations modulo for several large primes . Errors modulo different values of might be combined to reduce the overall error, or at least provide a guarantee.
What is the cost of making an algorithm perfect? What happens for more general type arithmetic computations? Are there similar methods that reduce the error, yet keep the computation cost about the same?
[fixed exponents in equation]
Complex analysts gallery source |
Tibor Radó was a mathematician who was the PhD advisor to a number of computer scientists. They include Shen Lin, Paul Young, and several others. All were his students while he was at Ohio State University. He is not the Rado of the Erdős-Ko-Rado theorem—that is Richard Rado, without an accent. Tibor Radó had an Erdős Number of 3, through Lin and then Ron Graham.
Today we want to continue to talk about the famous “Busy Beaver” problem which we mentioned in our discussion on the Microsoft SVC lab closing.
The problem is actually quite neat, is still studied after more than fifty years since Radó created it, and has interesting links to industrial labs. It, the busy beaver problem, is actually an important and cool problem. For starters we note that Radó’s paper was published in the Bell System Technical Journal in 1962: On non-computable functions.
The problem, which Radó called a game, is quite simple: Consider a Turing Machine with one two-way infinite tape, a finite deterministic state control of states, and alphabet . There is no separate “blank” character—the tape initially holds all s (or is regarded as the blank). How long can the machine run? We do not allow machines to run forever, we insist that our machines halt. The question is: can such a simple device run for a long time and still stop?
Clearly with enough states the machine could compute for some very fast growing function and a modest and then halt. This could easily take a huge amount of time. The reason the problem is interesting is that the number of states is quite small. So the “game” is to try and use just a few states to encode some long running computation. This is not easy to do even for tiny, which is why the Busy Beaver problem is still interesting even today.
The function is the number of steps that such a machine can make with finite control states and still halt. This is still known exactly only for . The current -state lower bound is
This was discovered by Heiner Marxen and Jürgen Buntrock in 1989. The busy-beaver function itself is reckoned as the number of ‘s on the tape at the time of halting; the same machine gives . There are still about 40 five-state programs whose status is unresolved. Marxen’s BB site has this and much other info. The number of different programs to consider grows hugely with , but still we find it amazing that small programs can run for so long.
What we also find amazing is that for some of the busiest machines, their few states can be made to seem even fewer with a simple prose description. Let’s describe one using an eco-friendly beaver who plants trees as well as chops them down. We figure the trees grow quickly in beaver-time to give shade, and we give our beaver some traits of a groundhog. He starts on a ridge that has been completely deforested, bare to his right and left as far as an eagle can see.
At any time, including the start, our beaver can decide to Plant Two Trees. This means finding the first bare spot at or rightward of his position and planting a tree there. If he can then plant in the spot immediately to its right, then he does so but stays under the first tree he planted, else he finds the first bare spot to his left, plants there, and steps left. Either way, he then pauses to Take Stock. Thus our beaver’s first action is to plant two adjacent trees and Take Stock under the first, with the second to his right.
source |
In the Take Stock state, our beaver looks at the spots left and right, and acts according to these four rules:
When Looking For His Shadow, if he sees it, he gets frightened and halts by planting a tree and burrowing under it. Else, he steps left again. Then if in sun he steps left to Plant Two Trees, while if under a tree he chops it down, moves left yet again, and Takes Stock.
That’s it. The first two Take Stock rules can even be combined to say that if he’s in sun, he plants there, ensures the place to his right is planted, and then Takes Stock again where he was. In the third rule, the “goes back” clause is unnecessary to say.
Returning to our beaver taking stock for the first time, he is in shade with no tree on his left, and so he immediately plants one tree there and two more on the far right, making trees in all. He is under the tree, chops down the (middle) tree, and next looks for his shadow when under the tree.
Of course he doesn’t see it, so he goes on beavering—until he has 95,524,079 trees in all, after over 8.69 trillion steps.
These rules can be coded by six Turing machine states, using two to plant trees, three to take stock, and one to check the shadow plus re-using one of the take-stock states. The take-stock state is reminiscent of a simple one-dimensional cellular automaton that depends only on its one-cell environment, but the planting step is decidedly non-local. This machine was discovered by Marxen and can be viewed in notes by Jeffrey Shallit. It is far from the record, however. The current busiest 6-state beaver, found by Pavel Kropitz and verified, is known to leave somewhat more than trees in over steps total.
How do people prove such gargantuan bounds on ? For starters one can try to create clever programs that take a long time. This will of course only yield lower bounds, but these are still quite interesting. For higher some machines have been constructed for which cannot conveniently be expressed by conventional exponential notation, but requires something like Donald Knuth’s arrow notation to estimate.
For upper bounds, this strategy must incorporate “determining” that certain machines will never halt, as well as those that halt. To quote from Marxen and Buntrock’s 1990 online paper “Attacking the Busy Beaver 5,” here is how they proceed: Enumerate all -state Turing machines , and for each of them do the following.
The key to make this huge search possible, even for small , relies on the ability to find and analyze “macro phases” of the computations. In the above example we were able to identify macro phases with subsets of the machine itself, but generally this requires hierarchically analyzing the trace of the computation. The art of this work is finding clever patterns that correctly predict non-halting yet are easy to apply. Ground-level patterns of the kind pictured on the left in the picture below can be abstracted and then organized in a hierarchical nested manner. Sometimes this is used to prove explosive growth before termination, and sometimes to prove non-termination.
Composite of src1, src2 (p78) |
Indeed, the drawing on the right comes from the 2005 MS thesis of Owen Kellett from 2004, which was part of a large project at RPI, and which includes and extends the specification by Rona Machlin and Quentin Stout of a “Christmas Tree” pattern that implies non-termination.
Thus the Busy Beaver problem becomes less about Turing machines and more about finding convergent and provably recurrent or divergent patterns in the kinds of bit-fields and word-fields in which machine-language programs operate.
Busy Beavers in any form represent the most extreme test cases of the Halting Problem in the sense of their raw step counts. They look like they will never halt but do. Unless there is some way a machine can look like it is going to halt but run forever, they are the only extreme in that sense. Of course, the functions and are uncomputable—for much the same reason that the Halting Problem is undecidable. They outgrow every computable function, and for any effective formal system of logic, there are only finitely many values or that can verify.
So why should we try to compute them? The wider but commensurate question is, why should we try to solve the Halting Problem? After all, the Halting Problem is ‘impossible’ to solve—everyone proves this in a typical theory course. With -hard problems too, the popular subtext is not to try to solve them. So why try?
We could quote a well-known exchange between Alice and the Red Queen in Lewis Carroll’s Through the Looking-Glass, or quote “To Dream the Impossible Dream” from the musical Man of La Mancha, or otherwise appeal to indomitable human spirit. But we can finally talk about another answer to: why try to solve impossible problems?
Because—it’s good for business.
Microsoft undertook to do exactly that. “Terminator” was a doubly-thematic name for their project, which is now called T2. Not only is it about program termination, but its purpose was to do the Arnold Schwarzenegger “Terminator” business on unruly device drivers. To quote the most recent press article linked from that page,
To reduce the number of buggy device drivers, Microsoft embarked on what it called “data-driven program verification.” This is a process whereby “you model a computer program as a mathematical system and the goal is to build tools that find proofs of correctness using mathematics and logic,” said [project manager Byron] Cook. …
Working out whether a device driver would get stuck in an infinite loop was a bit more tricky, as Microsoft was faced with the difficulty of addressing the halting problem. …[which Alan Turing showed undecidable]… But Cook said the nature of device drivers meant there were ways to analyze drivers to see if they would terminate.
A May 2011 ACM Communications article by Cook with Andreas Podelski and Andrey Rybalchenko describes some of the formal logic ingredients. Their examples are mostly numerical, and we wonder how well the formal methods can be applied to analyze the patterns thrown up like sawdust by the beavers. They conclude:
With fresh advances in methods for proving the termination of sequential programs that operate over mathematical numbers, we are now in the position to begin proving termination of more complex programs.
We have tried to re-position the Busy Beaver game in terms of brief prose descriptions of algorithmic procedures rather than Turing machines. Can this shift be cross-cut with practical examples? The succinct routines need not be maximum like the “busy beaver” for any size . This may help.
Which other types of “busy” machines can be described nicely in prose? We have not found anything as neat for Kropitz’s 6-state machine.
[added more links]
Microsoft Research source |
Omer Reingold is a brilliant researcher, and now all can see that he is also a very classy and thoughtful person. As you all know by now, he and many others have suddenly lost their jobs with the closing of Microsoft’s Silicon Valley Campus (SVC) Research Lab. The lab closed this week: it has been removed from Microsoft’s “Research labs worldwide” page. The closing affects about 50 jobs directly, and is part of an overall reduction of about 18,000 staff worldwide.
Today, Ken and I wish to express some of our feelings about this. At first we thought we could not add to what others have already said about the closing, but we realized we could still express support and add to the wider conversation.
Omer is among fourteen listed writers of the “Windows on Theory” blog, all affiliated with Microsoft Research. He wrote a short and classy post on the blog. My former student Parikshit Gopalan, who was at the lab and is now affiliated with a group in Redmond, added a similar comment, among many in the post. Omer and Parikshit wrote a nice paper in FOCS 2012 with Raghu Meka, Luca Trevisan, and Salil Vadhan, on new ways to construct pseudorandom generators. We covered it briefly two years ago.
Parikshit also writes for the blog—here is a nice post on crisply-stated open problems involving alphabet size in coding theory. In his comment last weekend he reflected:
History tells us that research labs are mortal. Like mortals, they are finally judged by their accomplishments rather than their longevity.
Judging the value of accomplishments, however, is often a longer-term process than “longevity,” and that plays into some of our thoughts.
We are not shocked. Luca Trevisan wrote a very thoughtful piece on his own blog, “in theory.” His piece started with:
I am still in shock at the news that Microsoft decided to shut down MSR SVC and fire, literally from one day to the next, almost all the scientific staff.
I, Dick, am not so shocked; I will explain in detail why in a moment.
We are very impressed by comments from some of the researchers. Not just Omer but many others affected and not. I think this is something that we can find heartening.
We are happy to hear about the community response. Luca wrote:
Here at Berkeley and Stanford we will do our best to help, and we will make sure that everybody has some space to work. There will also be some kind of community-wide response, but it will take some time to figure out what we can do. Meanwhile, I urge everybody to reach out to their friends formerly at MSR SVC, make them feel the love of their community, and connect them to opportunities for both short term and long term positions, as they weigh their options.
We are very happy to hear this, and we add that other institutions will also try and help any that need it. Of course being far away makes it less likely that some of us can help immediately. But I personally will try to get help from Tech for any that need it.
Probably no personal judgment. In academia with tenure cases etc., all judgment is personal—when cuts are on a larger scale, there is invariably protest which is often effective, as some commenters have noted. In a post three years ago on how research is like the stock market and “failure must be an option,” Ken inserted a note on his father’s recollection of the “Terrible Twenty.” This name for recipients of annual AT&T and/or IBM lab fellowships spoke the ethic that the company needed only a few to succeed to be golden. SVC was far from a case of going 0-for-20; highly profitable work has been documented coming from there. So it was probably a larger personnel matter, not a personal one, and this is far from unusual in big business.
“Windows on Theory” team source |
I have been around a while and have seen labs come and go over the years. I can still remember when AT&T Bell Labs was the greatest lab in the world; it still is strong, but once was the lab. One of the reasons for this was simple: its parent company was a monopoly. It had an immense amount of cash and could afford to have researchers that did whatever they wanted. For example, for years Shin Lin, a Bell Labs researcher, worked on the Busy Beaver Problem.
A busy beaver function quantifies these upper limits on a given measure, and is a noncomputable function. In fact, a busy beaver function can be shown to grow faster asymptotically than does any computable function. The concept was first introduced by Tibor Radó as the “busy beaver game” in his 1962 paper, “On Non-Computable Functions.”
Ron Graham was the director of the mathematics division which hired Lin, who was Radó’s student. Ron told me that each year he had to argue with upper management to keep Lin. Finally in 1970, Lin with Brian Kernighan found a brilliant algorithm for cutting graphs. This algorithm and its descendants has been used now for decades to approximate everything from VLSI layouts to the Traveling Salesman Problem. If Bell Labs had not kept Lin around, perhaps this algorithm would still exist, but perhaps not. Or maybe its discovery would have been delayed for decades. Who knows.
So, finally, here are two reasons I am not shocked, beyond factoring in that this is just the tip of a global 14% iceberg of layoffs that shows in our field. First, in my opinion, the ability for companies to support very far out research is sensitively linked to exigencies of their cash flow. Microsoft is being pressed by the shift from PC’s as the main platform—where they had an almost monopoly on the OS—to a place where there are many players in the mobile world. This means that they are less able to support research of an open kind. Second, even in times of good cash flow, whether from monopoly or booms, people had to fight to keep good people. One of my friends opined that in the levels below recent personnel changes at the top, no senior swingman at Microsoft was able “to set priorities for investments that will not mature in ten years.”
This does not let Microsoft off the hook. They could have handled the closure better. Well, perhaps better.
Let’s not let Ken off the hook either—he writes the rest of this post, besides his having written some of what’s above.
Some commenters have alluded to Microsoft’s recently spending 2.5 billion dollars in cash to acquire the Minecraft computer game. My kids—this is Ken—and their cousins have all been avid players. Unlike most video games—most games, period—its prime value is open-ended creativity. It can be played as a survival game, and servers can be set up to provide battle competitions, but this is secondary. The ethic is creative enjoyment and sharing. A question that was percolating even before the sale is how far players can monetize servers and other content they create.
In essence all of our community, in academia and industry both, have been engaged in a game of “Mindcraft” in which monetization and even practicality are not the prime movers. Perhaps the best-known testament upholding this is Godfrey Hardy’s 1940 essay, “A Mathematician’s Apology.” In this he famously stated that whole branches of mathematics including number theory were “useless,” especially for any “warlike purpose”—and he was instantly proved wrong by the use of number theory to break cryptosystems, to say nothing of creating them. The point we offer, for conversations such as those in Scott Aaronson’s item and comments, is not to say what is or proves to be valuable, but rather the sheer problem of judging—of projecting an appraisal:
What distinguishes “academic” research—in industry also—is undertaking creation that is not predicated on knowing, or even the possibility of judging, its extrinsic value.
Of course academics has institutions for judging intrinsic value by standards of research. By those there is no doubt about the salience of the “Busy Beaver” function . Knowing a bound on yields a decision procedure for any mathematical conjecture that can be coded via a -state machine checking iteratively for a counterexample; the Collatz problem gives an especially small . As the bibliography of Wikipedia’s BB page shows, academic interest has remained high. But let us notch up the moral of Dick’s story above by asking, what about its possible extrinsic value? BB programs are extreme cases for the generally important task of predicting and verifying program behavior. A recent essay by Liesbeth de Mol makes a connection to computer-assisted proving.
Even with something as globally profitable and understood as the open-source ethic, projecting value is hard—including its strategic competitive value to large sponsors. This partly owes to the difficulty of proving a negative, such as the cost of avoided bugs.
Here’s a silly but simpler example from my chess research. Despite near-universal belief in the chess community that the ratings of top players today are “inflated” by over 100 points compared to the 1970s and 1980s, my work shows that the implementation of the Elo rating system by the World Chess Federation (FIDE) has been remarkably stable from the beginning. A 30-point effect that can be more readily ascribed to progressively faster time controls since the early 1990s seems recently to have reversed. FIDE lists over 400,000 rated players, who pay fees of some tens of dollars per year to national federations under FIDE’s auspices for this and other services. Is knowing this integrity of ratings worth 50 cents per player? On the flip side of “judging a negative,” how much is saved by providing resistance to any impulse to fix something that isn’t broken? I may earn compensation for other aspects of my work, but how could one judge to monetize this aspect?
In summary, our thoughts go to all those affected by the closing and we wish all the best in the near and far future.
[name and word fixes]
Cropped from source |
Christos Papadimitriou has a joint paper with Adi Livnat, Aviad Rubinstein, Gregory Valiant, and Andrew Wan that will appear soon at FOCS 2014. The conference is October 19–21 in Philadelphia, with workshops and tutorials on the 18th. Here are the accepted papers, several others of which interest me a lot. The last parallel session on Monday afternoon before my own award lecture has three of them in one room, including a paper co-authored by my recent student Danupon Nangonkai, and three on quantum—it would be nice to be in a quantum superposition and attend both sessions.
Today Ken and I want to discuss their paper, which is on complexity-theoretic aspects of evolution.
Yes, evolution. Theory’s long reach has in the past touched diverse fields of science, engineering, and many other areas of human endeavor. But evolution is now getting more and more interest from theorists. Wikipedia begins its description with the sentence:
Evolution is the change in the inherited characteristics of biological populations over successive generations.
This doesn’t sound too far different from the idea of a computation “evolving” over successive steps. What distinguishes the biological kind is Charles Darwin’s identification of natural selection as the main mechanism, specifically for adaptation after changes in the environment. The discovery that genes in DNA are describable by strings, likewise their mutations, powerfully suggests a mating of biology and the “it from bit” view from theory. Can we use theory to turn Darwin inward and explain how complexity and diversity can arise from a simple Boolean model?
For many reasons that we will avoid, of all theories in science, evolution gets the most push-back. There are not many popular attacks on the inverse square law of gravity, even though string theory provides contexts in which it is false at short ranges. That matter is made of atoms used to be contentious, but no more, and there is no argument over how chemical bonds work. Evolution seems to get more than its share of attacks. Way more.
There is even a Wikipedia article on why this is so:
Many scientists and philosophers of science have described evolution as fact and theory, a phrase which was used as the title of an article by Stephen Jay Gould in 1981. He describes fact in science as meaning data, not absolute certainty but “confirmed to such a degree that it would be perverse to withhold provisional assent.” A scientific theory is a well-substantiated explanation of such facts. The facts of evolution come from observational evidence of current processes, from imperfections in organisms recording historical common descent, and from transitions in the fossil record. Theories of evolution provide a provisional explanation for these facts.
Ken thinks Gould’s point is better stated as evolution being not just a theory but also a paradigm. Within it, theories of particular mechanisms are evaluated; outside, the paradigm is applied via “evolutionary approaches to…” just about everything. The former give the impression that there is disagreement about the fact of evolution itself, while the latter include enough overreaches and oversimplifications to muddy the whole enterprise. The fact of evolution is also confused with questions about its rate and hill-climbing ability when “left to itself,” though experience in hospitals and our current lack of knowledge about complexity should forestall negative pronouncements on the latter.
What compounds the matter is that evolution can be enlarged to the study of how complex organized information systems arise. Indeed, several books by Richard Dawkins emphasize information content in biology and the quest for a “crane-lifting” mechanism for information in physics analogous to Darwin’s in biology. There the paradigm intersects directly with computer science:
src |
The serious question arises of how fast evolutionary-information processes can work as measured by standard yardsticks of complexity, and how much large-scale change they can effect. This makes a context for the paper.
Their paper is titled, “Satisfiability and Evolution.” Its abstract is:
We show that, if truth assignments on n variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in n and the inverse of the initial satisfaction probability) will end up almost surely consisting exclusively of satisfying truth assignments. We argue that this theorem sheds light on the problem of the evolution of complex adaptations.
Ken and I really like this work. Really. One reason is that it is a positive result. Often—not always—but often when we apply theory to areas from other parts of science we run into negative results. Telling people in another area of research that they cannot do X is not usually helpful. In many cases they need to do the work anyway, and so telling them a negative is really less than useful.
Another reason is that they work uses the evolution setup to show that a certain reasonable type of behavior happens. This is a quite pretty result about Boolean functions, independent of any connection to evolution. Further it shows that there are likely to be more such results in the near future. I think this paper is potentially quite important.
I am sorry that I missed the evolution workshop last winter at the Simons Theory Center—I got a bad cold and could not travel. I imagine that some of the work that is in this paper came out of that event. However, Ken heard a related talk invited to the Algorithmic Aspects of Information Management (AAIM 2014) conference in Vancouver in July, where Ken also had a paper. So we can turn the rest of this post over to Ken.
I, Ken, felt like I was following Christos around in summer 2012. He gave one version of this talk at the Turing Centennial in Princeton in May, which we covered. Then I heard much the same talk invited to AAAI-2012 that July in Toronto, where I was battling a robot. Then I was honored to speak in the University of Bergen’s Turing Centennial series in September a couple weeks after he did. His talk there was related more to his graphic novel Logicomix, which later appeared under our Christmas tree.
So, this past July, I was curious to see how his talk (later video) had evolved. Well, that is a mis-use of “evolved” unless he and his co-authors introduced processes to mutate slides and select—over successive generations—the ones with the highest research fitness. Perhaps they did. I was glad that a genetic example from the 2012 talks survived, since I had advertised—in encouraging my wife to join for the 8:20am start—that the talk would include sex.
We reproduce his first main example here. Consider two genes, and , with 4 and 5 alleles (mutable variants) respectively occurring in a population. Suppose the benefits of each of the 20 possible combinations are given by this matrix:
The argument is that global optimization techniques, such as simulated annealing, will tend toward the combinations with highest fitness unto themselves, here scoring and scoring . These are likened to asexual propagation. However, row has the best fitness expectation over a partner drawn randomly from the partition . Likewise columns and “mix best” when combined over the range of gene . The sexual recombination algorithm gives a random partner according to the population’s distribution, and each offspring gets or , likewise or , with 50-50 probability.
In a 2011 paper with Livnat and Marcus Feldman formalizing this process, he showed that isolated “peaks” like and tend to wear down in frequency, but “plateaus” like row and some of the columns survive. This showed why we need sex—or at least gave an algorithmic mechanism to answer the question:
Why is recombination (i.e., sexual reproduction) more successful than asexual reproduction?
Despite this answer, Christos admitted that “the role of sex is to this day not understood.” His talk went immediately on to a question that is more eye-opening than eyebrow-raising:
How can this crude mechanism and model account for the miracle [of complexity that] we see around us?
As stated in the new FOCS paper:
Given the reshuffling of genomes that occurs through recombination, how can complex traits that depend on many different genes arise and spread in a population?
The “complex traits” are approached and modeled by standard tools in complexity, most notably Boolean functions. Here is a mutated version of an example in the talk: Suppose we have monotone predicates of genetic factors , where enables a trait to be possible and guarantees it. Let be Hamming weight, and let be a numerical function of the environment . Suppose a trait follows a sigmoid probability function with high slope around a threshold as follows:
If the environment changes in a way that both increases and makes advantageous, then not only will the incidence of go up, but the above recombination algorithm will favor organisms that possess all or almost all of the factors. Then even if reverts to a lower level, the new population may have so many members with high that not only does remain satisfied but also . Then the trait is locked in to much of the population. This yields an interpretation of a famous experiment by Conrad Waddington which is common to the talk and paper.
The FOCS paper uses none of , , , , , or —these characters (some by me, some by them) die out or maybe stay latent. What it preserves is an abstraction of the counting advantage that having more factors gives in the above analysis. Let be a general Boolean function whose satisfaction confers a advantage in the following way: Define ; that is, is on satisfying assignments, on unsatisfying ones. Let the probability distribution on at time be defined by independent probabilities for each bit, and updated by the rule
We can re-cast this using the original and probabilities rather than expectations:
The question is, does the trait of satisfying come to dominate the population? Let be the initial satisfaction probability under whatever initial distribution is given, say uniform. They prove:
Theorem 1 If is monotone, then
The bulk of the paper, however, is devoted to how far this extends when is not monotone. The process has to be modified, the numerator in equation (1) is rewritten to confer possible advantage when too (according to and the variance of bit in ), is replaced by a multilinear extension with values in the interval , the size of the population needs to be large enough, more-complicated polynomial functions of , , and are involved, and there are interlocking bounds on , , , and the success probability. As usual see the paper for details, but we can abbreviate their result drastically as follows:
Theorem 2 Taken over all ,
We suspect that no human relationship counselor would disagree.
The paper includes a long discussion of possible variants and improvements to the model and results. Insofar as , namely the initial probability of a random assignment satisfying , is a factor in the denominator, the results lose force when is tiny. Indeed the authors disclaim relevance to the task of finding satisfying assignments to a hard formula with small . What they call “weak selection” owing to the smallness of in their main results becomes an interesting problem of how far can be increased, and maybe this informs general issues about the complexity of Boolean functions and applicability of Fourier techniques, not just evolution.
[changed order of names in first sentence from paper's alpha order]
Cropped from source |
Dick Lipton is of course the founder and driving writer of this weblog. He is also a computer scientist with a great record of pathbreaking research. The latter has just been recognized, I am delighted and proud to say, with the award of the 2014 Knuth Prize. The prize is awarded jointly by the ACM Special Interest Group on Algorithms and Computation Theory and the IEEE Technical Committee on the Mathematical Foundations of Computing, and was instituted in 1996, shortly after the formal retirement of the great—and very much active—Donald Knuth.
Today I am glad to give my congratulations in public, and also my thanks for a wonderful and long association.
Only in reading around while writing this post did I learn that Dick won another major honor earlier this year: election to the American Academy of Arts and Sciences. The end of April was crazy with personal and unusual professional matters for both of us, and it didn’t come up. In the 2014 list for CS he is alongside Jennifer Chayes, Ron Fagin, David Harel, Daphne Koller, Leslie Lamport, Leonid Levin, and Mendel Rosenblum.
The Knuth Prize is “for outstanding contributions to the foundations of computer science.” The description also says it may be partly based on educational contributions such as textbooks and students, but “is not based on service work for the community, although service might be included in the citation for a winner if it is appropriate.” The ACM release does not mention this blog. So just for now I will imitate Basil Fawlty’s advice in a famous episode of the immortal 1970’s BBC TV sitcom “Fawlty Towers”:
Don’t mention the blog.
But I will mention foundations, since that is an aspect of the recognition that comes with a price but is a longstanding purpose of this—wait, don’t mention the…
The release includes some of Dick’s major contributions, and some others have been noted on StackExchange. I can add the first work by Dick that I was aware of: two papers in 1979–80 with Rich DeMillo—also now at Georgia Tech—which were part of various efforts to answer questions about via logic and model theory. There is also his involvement with time-space tradeoffs for , which represent the best lower bounds—well, in a broad sense, kind-of lower bounds—known for any major -complete problem. Most recent and potentially important in my purview—but you might have to comb through his DLBP page to realize they’re there—are papers on multivariable polynomials modulo composite numbers, which are at the frontier of circuit complexity lower bounds.
What generates all this is a powerful beacon of ideas that have had a photoelectric effect on many parts of computer science, more than a typical theoretician encounters. I should know—it has been my privilege to do radiographic processing for many more, even some that haven’t yet appeared on this—oops—anyway, more ideas than I can often keep up with. Not to mention that I’m also pursuing (and opening) some ideas of my own. The point is that these ideas are almost all aimed at foundations. The examples above are aimed at foundations of complexity theory. That goes double for some ideas that haven’t progressed well, at least not yet.
The problem—with the problem and much more—is that the foundations of complexity are set in sturdy concrete. They basically haven’t budged. Beams of ideas bounce off or just get absorbed. Major related parts of algorithms and crypto and other fields are likewise shielded. This is polar the opposite situation of what Bill Thurston recounted about his field of manifold foliations in his famous essay, “On Proof and Progress in Mathematics.”
The difficulty of direct progress has been so discouraging that through the theory applications grapevine I’ve heard whispers that one can phrase as, “Don’t mention foundations…” We tried to present aspects of this more positively in a recent post featuring Chayes.
However, if you aim a beam hot enough and keep it focused for long enough, events do happen. The question should not be whether it’s worth aiming the beam, but rather what’s needed to establish and maintain the intensity and focus. What we suspect is that just as in particle physics, a century on from when the likes of Joseph Thomson and Ernest Rutherford and Robert Millikan could work alone or with one associate or two, progress will require greater co-ordination of multiple researchers and shared focus on ideas than our field is yet accustomed to. Hence this…well, we can mention that various people we know and admire are trying to foster this in their own ways, not just proposed but some activated to a wonderful degree.
As I said, aiming at foundations has a price—and the payment is alas more evenly distributed among those who try than the rewards. That’s why it is good to remember that ever so often, in small as well as big, a prize comes with the price. The above, too, has an added price, but perhaps it will be ushered in with a new model for appreciating the credits.
The Knuth Prize has the unusual period of being awarded every 1-1/2 years officially. The last five awards are dated 2010, 2011, 2012, 2013, and now 2014. Can this seeming contradiction be resolved mathematically? Perhaps the “1-1/2″ is governed by left-and-right approximands in the manner of John Conway’s surreal numbers, which Knuth described in an engaging book. A left-bound of 1.25 would just barely allow it. However, I suspect that any such involvement by Knuth would have instead fixed the period at days, and then I don’t see a solution…unless maybe arithmetic is inconsistent after all.
Of course we congratulate Dick without any inconsistency.
]]>
Things we did not know
Ulam holding a strange device |
Stanislaw Ulam was a Polish-American mathematician whose work spanned many areas of both continuous and discrete mathematics. He did pioneering research in chaos theory and Monte Carlo algorithms, and also invented the concept of a measurable cardinal in set theory. His essential modification of Edward Teller’s original H-bomb design is used by nearly all the world’s thermonuclear weapons, while he co-originated the Graph Reconstruction conjecture. His name is also associated with the equally notorious 3n+1 conjecture. Thus he was involved in some strange corners of math.
Today Ken and I want to talk about some strange facts observed by Ulam and others that we did not know or fully appreciate.
Perhaps you can use them, perhaps you may enjoy them, but they are all kind of fun. At least we think so. Ulam’s autobiography Adventures of a Mathematician shows his sense of fun, and he was described by Gian-Carlo Rota in these words:
Ulam’s mind is a repository of thousands of stories, tales, jokes, epigrams, remarks, puzzles, tongue-twisters, footnotes, conclusions, slogans, formulas, diagrams, quotations, limericks, summaries, quips, epitaphs, and headlines. [H]e simply pulls out of his mind the fifty-odd relevant items, and presents them in linear succession. A second-order memory prevents him from repeating himself too often before the same public.
There is another reason for discussing this today. Class is underway and we both are trying to get things going and we wanted to start a discussion out. We are working on a longer, more technical one, about—“that would be telling”—so let us do this now. Note: fifty extra points for where that phrase comes from—with a search.
Even before we come to Ulam’s famous observation about the pattern of prime numbers in a spiral, some things strike us as strange about spirals. Here we mean the simple spiral walk pattern starting from an origin square in the infinite planar lattice:
We can begin with any number in place of —thus if we begin with this is the same as adding to all the numbers. One basic fact is that if we take any ray from the origin that goes through the center of another cell, then the th number skewered along that ray is given by a quadratic function of . For instance, the horizontal ray is given by , the northeast ray by , and the ray of north-northeast knight’s moves by . Similar rays originating from other squares also have quadratic formulas.
If we sequence the numbers that fall on full lines rather than rays, however, we do not get a quadratic function. For instance, the numbers that fall on the horizontal axis in order are They do not fit any polynomial formula at all—one has to do integer division by 2 (throwing away remainders) to get a formula. Indeed, this sequence doesn’t have an entry in OEIS, the Online Encyclopedia of Integer Sequences. [Update: see this.] Likewise, the northwest-southeast line has all the odd squares but no nice formula. Nor does the line of knight moves .
But there is a singular exception. The southwest-northeast diagonal gives , which has the formula . Why does this happen?
If we add to this sequence, we get:
Each of these numbers is prime. The sequence is all prime until you substitute to get , which is . Of course, this is the famous prime-generating formula of Leonhard Euler. The extreme good fortune of this formula has been “explained” using class-number theory, but we did not notice the strange diagonal exception until writing this post.
The story goes that Ulam became bored during a long lecture in 1963, doodled a number spiral like the one pictured above, and circled the primes to see what the plot would look like. He was struck by how many substantially long diagonal segments of primes there were, going northwest as well as northeast.
This also happens if we start the spiral with numbers other than . Here is a plot for from Helgi Rudd’s beautiful website devoted to Ulam’s spiral:
Of course, gives Euler’s sequence as a huge diagonal swath through the origin, but as Wikipedia’s article notes, the effects of Euler’s formula show up on other diagonals with larger values of even with .
Despite connections noted in both sources to earlier theorems and conjectures about quadratic polynomials, it is still considered that this affinity for diagonals has not been sufficiently “explained.” The larger picture that strikes us the most is given immediately on the front page of Rudd’s site. Many unsolved problems in number theory, including the Riemann Hypothesis itself, involve the idea of how much the primes behave like a “random” sequence. This supports belief in the Goldbach Conjecture and the Twin Primes Conjecture, with probabilistic reasoning like Freeman Dyson’s in our previous post.
Thus the prime pictures exhibit some of their non-random structure, along lines of more-extreme deficiencies shown by George Marsaglia for some 1960s-vintage pseudorandom generators. How the primes can be random and non-random at the same time relates to that other post we’re working on, but “we’re not telling.”
In 1947, Nathan Fine proved that almost all binomial coefficients are even. This seems strange to me. More precisely let be the number of odd binomial coefficients with . Then . Even stronger:
See this for some related facts.
This one comes from here.
Let’s say you invest $10 in the market and you make a 10 percent return. You now have $11. Now, let’s say you lose 10 percent. Out of $11, that’s $1.10 leaving you with $9.90 which means you are down ten cents on the deal. You gained the same percentage as you lost yet you came out behind.
Well, you might speculate it has to do with the order of the transaction. After all, the 10 percent you lost was bigger than the 10 percent you gained because you were already up on the deal. That means reversing the order should have the opposite effect. Right?
Start with $10. Now lose 10 percent first. You have nine dollars. Then gain ten percent. That’s 90 cents leaving you with…$9.90.
Yep. You lost money again.
Strange as it may seem, a gain and a loss of the exact same percentage will always leave you with less cash – regardless of the order in which they occur.
I wonder if we can use this simple but strange fact in some algorithm analysis?
Suppose you take the average of the numbers of friends each of your friends has, and compare that with your own number of friends. Chances are will be significantly higher than . This leaves a lot of people wondering, why are my friends more popular, more vivacious, more successful, than I am?
This phenomenon has been well documented on Facebook and other contexts where “friend” is rigorously quantified. It is ascribed to a 1991 paper (titled as above) by the sociologist Scott Feld, but Ken and I would be surprised if it hadn’t already been noted in connection with the collaboration graph of mathematicians. Ken recalls a talk given by Donald Knuth on properties of this graph 30 years ago at Oxford, a few years after a paper by Tom Odda that also cites studies by Ron Graham and others, but has no recollection of this being mentioned.
It is easy to explain in those terms. Think of Paul Erdős, who famously had hundreds of collaborators. Many people were touched by him. But each of them had averages that included Erdős, which alone was probably enough to boost over their own valence in the graph. More generally, let us assign to every node a “collaborativeness potential” , and consider various ways to generate random graphs in which the probability of an edge depends on and . For instance, every node might pick a set of 100 potential co-authors at random, and the edge is placed with probability . Potential connections within will have a higher chance of succeeding with those having , who in turn will expect to have more successful connections.
The result is shown with more rigor by Steven Strogatz in this New York Times column. Still, we find it strange as a raw fact of life.
What is your favorite strange fact?
[fixed to for the binomial coefficients; added update about x-axis of Ulam spiral]