New York Times obituary source |
Lotfi Zadeh had a long and amazing life in academics and the real world. He passed away last month, aged 96.
Today Ken and I try to convey the engineering roots of his work. Then we relate some personal stories.
Zadeh was a Fellow of the ACM, the IEEE, the AAAI, and the AAAS and a member of the NAE. But besides this alphabet soup of US-based academies, we are impressed with the one he co-founded: the Eurasian Academy. His founding partners were a historian, a neurosurgeon, a music composer, and a mathematician. They recently elected three other members: an actress-screenwriter-director, an actor-director-writer, and a physicist.
In any alphabet of his life, one letter stands out: the letter Z. The term “Fuzzy Set” has two of them. But Zadeh’s first widely noted work goes by just the bare letter.
Pierre-Simon Laplace discovered a relative of the Fourier transform that has similarly motivated applications and often better behavior. When applied to the density function of a random variable on or , it has the form
Here can be a complex number. The function is holomorphic provided we are working on . A neat trick is that we can jump from to the cumulative distribution by
Can we get such nice properties for a discrete random variable on the integers? Zadeh’s advisor at Columbia, John Ragazzini, led him in showing the power of defining
where again can be any complex number, and the domain of and the sum can be or . We note that is often defined as a function of , that a similar sign issue was discussed in reviewing the 1952 Ragazzini-Zadeh paper, and we’ve switched versus in Wikipedia’s article on to make it look more like . With a positive exponent, is the probability generating function of .
How useful is this? Much of what we can say in a short space is the same as with Fourier: If we form the convolution
then its -transform is just the product function:
Using products this way makes convolutions easier to work with. Many hard-to-handle functions become nicer under their -transforms. The Dirac delta function if and otherwise is strange at face value—though it can be understood as the random variable whose outcome is always . Under the -transform, however,
Nothing can be nicer than the constant . For explanation of where is more general than the discrete Fourier transform and relatives we defer to this beautiful page. All this grew out of ideas in the 1940s by others including Witold Hurewicz—another z—but Zadeh’s joint paper had the greatest influence in signal processing.
The art of is continuous functions forming a well-behaved nimbus around certain discrete entities. Suppose we try to do this for every discrete concept? Begin with the idea of a set , namely a subset of some universe . Instead, let us think of a fuzzy set where
Here the real number is called the grade of memebership of in . The original set is the case if and otherwise. The point is that we are now free to consider other functions that approximate and are smoother and nicer to work with. We can consider whole ensembles of such functions.
From fuzzy sets it is a short step to fuzzy logic. This has an antecedent: the infinite-valued logic of Jan Łukasiewicz and others. A statement may have a truth value between 0 and 1. A common choice is to represent the value by a logistic curve of a main parameter. Here is a somewhat distorted curve for the statement “X is wealthy” parameterized by the net worth of X:
“Simulating Complexity” blog source |
The point for us is that logistic curves are natural to work with when modeling such predicates in a larger system. Here is a pertinent recent example for image processing. Further points are that a logical 0-1 assignment to “wealthy” would have an artificially sharp distinction somewhere and that the logistic curves are more faithful to neural-net models of how we think.
Zadeh’s original 1965 paper is one of the most cited science papers of all time. It has close to citations. He confessed that:
“I knew that just by choosing the label ‘fuzzy’ I was going to find myself in the midst of a controversy… If it weren’t called fuzzy logic, there probably wouldn’t be articles on it on the front page of the New York Times. So let us say it has a certain publicity value. Of course, many people don’t like that publicity value, and when they see it in the New York Times, it doesn’t sit well with them.”
That controversy was real—see the next section. Zadeh in an acceptance speech for the 1989 Honda Foundation prize said
“The concept of a fuzzy set has had an upsetting effect on the established order.”
I (Dick) never understood why this generalization of sets created such push-back. Stuart Russell, a Berkeley professor who worked next door to Mr. Zadeh for many years, noted:
He always took criticism as a compliment. It meant that people were considering what he had to say.
The impact of his work has been recognized by a posthumous “Golden Goose” Award. The award’s name counters the stigma of the “Golden Fleece” awards given out in 1975–1988 by US Senator William Proxmire in half-jest to federally-funded research projects he deemed frivolous and wasteful. Zadeh drew attention from Proxmire as a potential “Golden Fleece” awardee. The Golden Goose citation, however, describes the “Clear Impact,” especially as seen by engineering-minded Japanese:
Part of this interest came from the fact that ‘fuzzy’ was not a pejorative term in Japanese, but instead a neutral or even positive one. Researchers there took his idea and ran, creating conferences and journals focused on making advances in fuzzy logic. To this day, the only country with more patents on fuzzy ideas and concepts than the United States is Japan.
In 1986, the first commercial application of fuzzy logic hit the shelves in Japan: a fuzzy shower head. Using fuzzy concepts of hot, cold, high pressure, low pressure, and others, the shower head could use fuzzy logic to control showers across the country. Within a few years, the market was overflowing with fuzzy consumer products. Vacuum cleaners, rice cookers, air conditioning systems, microwaves, everything was moving to fuzzy control. Even the entire subway system of Sendai in Japan was built with fuzzy logic controlling the motion of the trains.
Way back in the first month of this blog, I (Dick) quoted the following remarks by William Kahan. I was in the audience for Zadeh’s lecture too but let’s let Kahan speak:
My two favorite stories about him concern his tremendous candor. The first is about his ideas on “fuzzy sets” and the second is on “who should get tenure.” I will only tell the first one—to protect the innocent and the guilty. When I first arrived at the Computer Science Department at Berkeley, the faculty decided to have a new series of lectures that fall. The plan was to have short lectures by each faculty of the department—in this way new graduate students would learn each faculty’s research area.
One day Professor Zadeh was presenting his area of research—an area that he created called “fuzzy sets.” Fuzzy sets were then and still are today a controversial area. Some researchers do not think much of this area. However, the area is immensely popular to many others. There are countless conferences, books, and journals devoted completely to this area. Kahan was in the audience while Zadeh was speaking. Finally, at some point Kahan could take it no more. He stood up and Zadeh asked him what his question was. Kahan stated in the most eloquent manner that it might be okay to work on fuzzy sets in the privacy of your own basement (after all this was Berkeley), but there was no excuse for exposing young minds to this “stuff”—his term was stronger. We all were shocked. For a few seconds no one spoke. I wondered how in the world Zadeh could respond. Zadeh finally said, “thank you for your comments,” and went on with the rest of talk, as if nothing had happened. The next year the faculty talks were cancelled.
I met Zadeh once, when he was the featured speaker at the 6th International Conference on Computing and Information (ICCI 1994), which was held at Trent University in Peterborough, Ontario, May 26–28, 1994. Jie Wang and I drove there from STOC which was held in Montreal that year. This small conference—not to be confused with ones having similar names and acronyms—lasted just a few more years. It is hard to find any information on the 1994 meeting now—just a few paper citations—and I have found no proof on the Internet that Zadeh was there. But he was—in a non-fuzzy but decidedly freezy setting.
There was a welcoming reception in the late afternoon of the 25th. It was slated to be outside in a wooded park on the university grounds. It was late May after all. But it was cold. I’ve known cold days in May in Buffalo, but none like that—biting wind and icy sleet. Only twenty or so of the registrants braved the weather. There was fortunately a round wooden structure, covered and enclosed and large enough to shelter us, but with no central heating. Instead it had a coal heat stove. We huddled around on chairs and stools and the part of the circular wall bench near the stove. Although over two hours of nominal daylight remained, the dark clouds and scant windows made it pitch night inside. If I recall correctly, the original intent of a cookout was shelved and replaced by a bulk order of sandwiches and potato chips and other picnic fare.
Nearest the stove sat the 73-year-old Zadeh wrapped in blankets. His face glowed orange as he regaled us in good humor with stories. I don’t think I kept any record of what he said. We felt in the presence of a great man but under surreal conditions—accentuated for Jie and me by our having had a hot lunch in the downtown Montreal hotel for STOC. Somewhere I do have notes of the keynote he gave the next morning before departing—in a modern and heated university lecture room—but I have not unpacked my boxes of old notebooks since my department’s move to a new building six years ago.
His birthplace Baku has been on my mind because I’ve recently read Thomas Reiss’s biography The Orientalist of Lev Nussimbaum, who wrote under the pseudonyms Essad Bey and Kurban Said. Nussimbaum had at least a hand in the writing and production of the 1937 romance Ali and Nino, which is considered the national novel of Azerbaijan. Baku juts into the Caspian Sea and calls itself the easternmost European city as demarked by the Urals and Asia Minor extended east. I wish it had occurred to me to ask about his upbringing and the history between the wars.
We convey our profound appreciation and regrets to his family and friends.
]]>
Two more tragic losses coming before a greater tragedy
Composite of crops from src1, src2 |
Michael Cohen and Vladimir Voevodsky were in different stages of their careers. Cohen was a graduate student at MIT and was visiting the Simons Institute in Berkeley. He passed away suddenly a week ago Monday on a day he was scheduled to give a talk. Voevodsky won a Fields Medal in 2002 and was a professor at the Institute for Advanced Study in Princeton. He passed away Saturday, also unexpectedly.
Today we join those grieving both losses.
We are writing this amid the greater horror in Las Vegas. Dick and I speak our condolences and more, but the condolences that two of us can give seem to fade—they do not “scale up.” Hence we feel that the best we can do is talk about Cohen’s and Voevodsky’s roles in our scientific communities and some of what they contributed. That is a gesture of peace and serenity. It may not overcome the darkness, but something like it seems needed so that we all might do so.
Michael Cohen had already worked with a wide variety of people in over twenty joint papers. He had two all by himself: a paper at SODA 2016 titled, “Nearly Tight Oblivious Subspace Embeddings by Trace Inequalities,” and a paper at FOCS 2016 titled, “Ramanujan Graphs in Polynomial Time.”
A common theme through much of this work was wizardry with special kinds of matrices. They included Laplacian matrices in which every column sums to and only the diagonal entries can be positive. You can get one from a directed graph by negating the entries of its adjacency matrix and putting the in-degrees on the diagonal. One can further demand that the rows sum to zero, which happens for our graph if each node’s in-degree equals its out-degree. This is automatic for undirected graphs. As noted in this paper:
While these recent algorithmic approaches have been very successful at obtaining algorithms running in close to linear time for undirected graphs, the directed case has conspicuously lagged its undirected counterpart. With a small number of exceptions involving graphs with particularly nice properties and a line of research in using Laplacian system solvers inside interior point methods for linear programming […], the results in this line of research have centered almost entirely on the spectral theory of undirected graphs.
The paper, titled “Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More,” was joint with Jonathan Kelner, John Peebles, and Adrian Vladu of MIT, Aaron Sidford of Stanford, and Richard Peng of Georgia Tech, and also came out at FOCS 2016. In the case of symmetric matrices , needing only that , he was part of a bigger team including Peng and Gary Miller of CMU that found the best-known time for solving . That paper came out at STOC 2014.
Thus from early on he was working with a great many people in the community. This has been noted in tribute posts by Scott Aaronson, by Sébastien Bubeck, by Luca Trevisan, by former colleagues at Microsoft Research where Cohen spent this past summer, and by Lance Fortnow. The post by Scott includes communications from Cohen’s parents and information about memorials and donations.
We’ll talk about Cohen’s paper on Ramanujan graphs in a train of thought that will lead into aspects of Voevodsky’s work. Of course we know Srinivasa Ramanujan was a brilliant Indian mathematician who also died tragically young.
In mathematics we sometimes prove the existence of objects without knowing how to construct them. Sometimes we can prove that a random object works. This is often helpful, but one downside comes from cases where we would want different people given the same problem parameters to obtain the same object. Randomized algorithms usually do not usually have a single output that is arrived at with high probability. What we really want is an algorithm that constructs the object.
This has been the story for a long time with expander graphs. They were proved to exist long ago via the probabilistic method. The zig-zag product was a watershed in constructing some kinds of them. The goal is to get these objects constructively with the same parameters or close to them.
A Ramanujan graph is a particular kind of expander with a maximum dose of the spectral-gap condition for expansion. The adjacency matrix of a -regular graph has as its largest eigenvalue. It cannot have an eigenvalue less than , which occurs if and only if the graph is bipartite. The graph is Ramanujan if all other eigenvalues have absolute value at most . This creates a quadratic spectral gap between and the next-largest eigenvalue and this is asymptotically the largest possible.
Again, a randomly chosen -regular -node graph will be almost certainly a Ramanujan graph, for any and nontrivial . Adam Marcus, Dan Spielman, and Nikhil Srivastava (MSS) proved in 2013 that such graphs exist for all and even when required to be bipartite. But can we build one for any and ? This was not known in deterministic time until Cohen’s paper. The main advance was to use a beautiful concept of trees of degree- polynomials with interlacing roots from MSS and improve it so that the requisite trees have polynomial rather than exponential maximum branch length, which governs the time of the algorithm. The paper well rewards further reading.
What this does is put bipartite Ramanujan graphs onto the list of structures that we can apprehend and use in deterministic polynomial-time algorithms. Thus Cohen added his name to the honor roll of those constructing good expanders and making random objects concrete.
Voevodsky’s work is set against a backdrop where mathematicians do the following over and over again. They start by knowing how to build certain kinds of algebraic structures on, say, differential manifolds or curves. They then want to carry this structure over to more general settings.
Voevodsky won his Fields Medal for this kind of work. He showed how to carry over topological ideas of homotopy from differential manifolds to algebraic manifolds—that is, any manifold that is the zero set of a polynomial. We discussed homotopy and its computational relevance in our own terms here. To quote his 2002 Fields review by Christophe Soulé:
It is quite extraordinary that such a homotopy theory of algebraic manifolds exists at all. In the fifties and sixties, interesting invariants of differentiable manifolds were introduced using algebraic topology. But very few mathematicians anticipated that these “soft” methods would ever be successful for algebraic manifolds. It seems now that any notion in algebraic topology will find a partner in algebraic geometry
Voevodsky’s medal was also for his proof of a noted conjecture by John Milnor that a structure of algebraic groups he built on a field of characteristic other than 2, with the algebra taken mod 2, would be isomorphic to an étale cohomology of with coefficients mod 2. Voevodsky overcame difficulty with tools from algebraic -theory by developing and systematizing prior ideas of motivic cohomology that, as the review says, “turned out to be more computable.” He later proved the general conjecture for moduli other than 2, drawing on work by others in the meantime.
In the most ambitious cases of such “carry-overs,” however, mathematicians are able to prove that the objects needed for such structure exist but not concretely. It’s not just that the objects cannot be apprehended, but that these proofs are often not subject to being algorithmically checked.
To remedy this, Voevodsky delved deeper into constructive mathematics, which aims not to limit knowledge but rather to streamline and solidify it. He built up homotopy type theory (HoTT), which we talked about here. His ideas were programmed in the software system Coq, which grew out of Thierry Coquand’s “calculus of constructions” in partnership with Gérard Huet. Thus he was led to consider the foundations of mathematics as deeply as David Hilbert did a century ago.
The term “foundations,” which lives in the names of conferences such as FOCS and MFCS, tends to be spoken as an umbrella term for “theory.” We have argued that it ought to mean continued and concerted attention to the core problems in our field like —notwithstanding that many of them are “like” in the sense of not having budged for decades. But when Voevodsky talked about foundations, he really meant the foundations: how do we know the whole edifice we have built out of proofs—all kinds of proofs—won’t collapse?
We have blogged about Ed Nelson’s attempts to show that Peano Arithmetic is inconsistent. Voevodsky took this possibility seriously. In memorials to Voevodsky on the HoTT Google Group, André Joyal contributed the following:
My first contact with Vladimir and his ideas was at a meeting in Oberwolfach in 2011. He gave a series of talks on constructive mathematics and homotopy theory, framed as a tutorial with the proof assistant Coq. His notion of a contractible object and of an equivalence were striking. I had a hard time understanding his ideas, because they were described very formally. He apparently distrusted informal expressions of mathematical ideas. One evening, he expressed the opinion that Peano arithmetic was inconsistent! He later came to distrust the applications of his ideas to homotopy theory!
Voevodsky indeed gave a talk at IAS titled, “What If Current Foundations of Mathematics are Inconsistent?” Very controversially, it tries to turn the understanding of Kurt Gödel’s Second Incompleteness Theorem on its head as a vehicle for possibly proving the inconsistency of certain classical first-order theories. Whereas, he concluded:
In constructive type theory, even if there are inconsistencies, one can still construct reliable proofs using the following “workflow”:
- A problem is formalized.
- A solution is constructed using all kinds of abstract concepts. This is the creative part.
- An algorithm which verifies “reliability” is applied to the constructed solution (e.g., a proof). If this algorithm terminates then we know we have a good solution of the original problem. If not, then we may have to start looking for another solution.
The workflow on this effort will continue. The IAS announcement notes that a memorial workshop is being planned and more information will be available soon. Update 10/7: The IAS posted a full obituary and there are also one in today’s New York Times and one in today’s Washington Post.
Again we express our condolences to their family, loved ones, and colleagues, and the same to everyone affected by the horror in Las Vegas.
This is the 750th post on this blog. We were holding onto two other ideas for marking this milestone, while busy with papers and much else these past two weeks ourselves. Those will still come out in upcoming weeks.
[added update]
Kathryn Farley obtained her PhD from Northwestern University in performance studies in 2007. After almost a decade working in that area, she has just started a Master’s program at New York University in a related field called drama therapy (DT).
Today, I thought I would talk about the math aspects of DT.
Okay so why should I report on DT here? It seems to have nothing in common with our usual topics. But I claim that it does, and I would like to make the case that it is an example of a phenomenon that we see throughout mathematics.
So here goes. By the way—to be fair and transparent—I must say that I am biased about Dr. Farley, since she is my wonderful wife. So take all I say with some reservations.
The whole point is that understanding what DT is hard, at least for me. But when I realized that it related to math it became much clearer to me, and I hope that it may even help those in DT to see what they do in a new light. It’s the power of math applied not to physics, not to biology, not to economics, but to a social science. Perhaps I am off and it’s just another example of “when you have a hammer, the whole world looks like a nail.” Oh, well.
I asked Kathryn for a summary of DT and here it is:
Drama therapy uses methods from theatre and performance to achieve therapeutic goals. Unlike traditional “talk” therapy, this new therapeutic method involves people enacting scenes from their own lives in order to express hidden emotions, gain valuable insights, solve problems and explore healthier behaviors. There are many types of DT, but most methods rely on members of a group acting as therapeutic agents for each other. In effect, the group functions as a self-contained theatre company, playing all the roles that a performance requires—playwright, director, actors, stagehands, and audience. The therapist functions as a producer, setting up the context for each scene and soliciting feedback from the audience.
Kathryn’s summary of DT is clear and perhaps I should stop here and forget about linking it to math. But I think there is a nice connection that I would like to make.
Since Kathryn is a student again, and students are assigned readings—there is a lot of reading in DT—you may imagine that she has been sharing with me a lot of thoughts on her readings and classes. I have listened carefully to her, but honestly it was only the other day, in a cab going to Quad Cinema down on 13th St., that I had the “lightblub moment.” I suddenly understood what she is studying. Perhaps riding in a cab helps one listen: maybe that has been studied before by those in cognitive studies.
What I realized during that cab ride is that DT is an example of a generalization of another type of therapy. If the other therapy involves people–including the therapist—then DT is the generalization to . We see this all the time in math, but it really helped me to see that the core insight—in my opinion—is that DT has simply moved from to or more.
We this type of generalization all the time in math. For example, in communication complexity the basic model is two players sending each other messages. The generalization to more players creates very different behavior. Another example is the rank of a matrix. This is a well understood notion: easy to compute and well behaved. Yet simply changing from a two-dimensional matrix to a three-dimensional tensor changes everything. Now the behavior is vastly more complex and the rank function is no longer known to be easy to compute.
Here is an example of how DT could work—it is based on a case study Kathryn told me about.
Consider Bob who is seeing Alice who is Bob’s therapist. Alice is trained in some type of therapy that she uses via conversations with Bob to help him with some issue. This can be very useful if done correctly.
What DT is doing in letting be or more is a huge step. We see this happen all the time in mathematics—-finally the connection. Let’s look at Bob and Alice once more. Now Alice is talking with Bob about an issue. To be concrete let’s assume that Bob’s issue is this:
Bob has been dating two women. His dilemma is, which one should he view as a marriage prospect? He thinks both would go steady with him but they are very different in character. Sally is practical, solid, and interesting; Wanda is interesting too but a bit wild and unpredictable. Whom should he prefer?
The usual talk therapy would probably have Alice and Bob discuss the pros and cons. Hopefully Alice would ask the right question to help Bob make a good decision.
The DT approach would be quite different. Alice would have at least one other person join them to discuss Bob’s decision. This would change the mode from direct “telling” to a more indirect story-line. In that line it might emerge that Bob’s mother is a major factor in his decision—even though she passed away long ago. It might come out that his mom divorced his dad when he was young because he was too staid and level-headed. Perhaps this would make it clear to Bob that his mother was really the reason he was even considering Wanda, the wild one.
What is so interesting here is that by using more that just Bob, by setting , Alice can make the issues much more viivid for Bob.
The more I think about it, the idea of people involved is the root. Naturally anything with more than two people transits from dialogue to theater. So the aspect of `drama’ is not primordial—it is emergent. Once you say , what goes down as Drama Therapy in the textbooks flows logically and sensibly—at least it does to me now.
This is accompanied by a phase change in complexity and richness. As such it parallels ways we have talked about mathematical transitions from the case of to on the blog before. Maybe DT even implements a strategy I heard from Albert Meyer:
Prove the theorem for and then let go to infinity.
Does this connection help? Does it make any sense at all?
]]>
It was just Ken’s birthday
Kenneth Regan’s birthday was just the other day.
I believe I join all in wishing him a wonder unbirthday today.
The idea of unbirthday is due to Lewis Carroll in his Through the Looking-Glass: and is set to music in the 1951 Disney animated feature film Alice in Wonderland. Here is the song:
MARCH HARE: A very merry unbirthday to me
MAD HATTER: To who?
MARCH HARE: To me
MAD HATTER: Oh you!
MARCH HARE: A very merry unbirthday to you
MAD HATTER: Who me?
MARCH HARE: Yes, you!
MAD HATTER: Oh, me!
MARCH HARE: Let’s all congratulate us with another cup of tea A very merry unbirthday to you!
MAD HATTER: Now, statistics prove, prove that you’ve one birthday
MARCH HARE: Imagine, just one birthday every year
MAD HATTER: Ah, but there are three hundred and sixty four unbirthdays!
MARCH HARE: Precisely why we’re gathered here to cheer
BOTH: A very merry unbirthday to you, to you
ALICE: To me?
MAD HATTER: To you!
BOTH: A very merry unbirthday
ALICE: For me?
MARCH HARE: For you!
MAD HATTER: Now blow the candle out my dear And make your wish come true
BOTH: A merry merry unbirthday to you!
Ken is best known for work in theory and in particular in almost all aspects of complexity theory. But I wanted—in the spirit of an unbirthday—to point out that Ken is quite active in many other areas of computer science research. Here is one example that is joint with Tamal Biswas: Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data. We discussed it before here.
The problem is that Ken and Tamal want to be able to study levels of play in chess but are stalled currently by issues Ken raised last May in this blog. I wish them well in making strides to better understand how to model game play in chess that captures the notion of levels.
For me the following game created by Ayala Arad and Ariel Rubinstein really helps me understand the kind of thing Ken and Tamal are interested in capturing.
You and another player are playing a game in which each player requests an amount of money. The amount must be (an integer) between 11 and 20 shekels. Each player will receive the amount he requests. A player will receive an additional amount of 20 shekels if he asks for exactly one shekel less than the other player. What amount of money would you request?
The point is there are levels of thinking that a player can naturally go through. Here is a quote from their paper that should give the flavor of what is going on: The choice of 20 is a natural anchor for an iterative reasoning process. It is the instinctive choice when choosing a sum of money between 11 and 20 shekels (20 is clearly the salient number in this set and “the more money the better”). Furthermore, the choice of 20 is not entirely naive: if a player does not want to take any risk or prefers to avoid strategic thinking, he might give up the attempt to win the additional 20 shekels and may simply request the highest certain amount.
Read the paper for how Arad and Rubinstein analyze the game. The trouble is that if you take a risk and select 19 then you at least have a chance to get the bonus 20: if you reason that your opponent is playing safe that is a great play. Of course if they reason the same way, then you lose one shekel. This type of “levels” of playing are central to many games including chess.
An example is that a move may have one refutation the computer at high depth can spot but otherwise bring higher returns than playing it safe with the computer’s “best” move. How can we judge when such moves can be expected to pay off? Risky opening `novelties’ have been tried many times in chess, and in one famous game where Frank Marshall had saved up a gambit for nine years, the human player José Capablanca did find the refutation at the board.
We all wish that Ken has many more birthdays and unbirthdays. We also hope he makes progress on his open problems about depth of thinking and levels of play. What should you select in the simple coin game?
]]>
A new approximation algorithm
Composite of src1, src2, src3 |
Ola Svensson, Jakub Tarnawski, and László Végh have made a breakthrough in the area of approximation algorithms. Tarnawski is a student of Svensson at EPFL in Lausanne—they have another paper in FOCS on matchings to note—while Végh was a postdoc at Georgia Tech six years ago and is now at the London School of Economics.
Today Ken and I want to highlight their wonderful new result.
Svensson, Tarnawski, and Végh (STV) have created a constant-factor approximation algorithm for the asymmetric traveling salesman problem (ATSP). This solves a long-standing open problem and is a breakthrough of the first order.
Recall that the traveling salesman problem (TSP) is the problem of finding the cheapest tour that visits all vertices of a weighted undirected graph at least once, and the ATSP allows the graph to be directed. This difference changes the problem tremendously—it also opens up new applications. Think airline routes for dates before and after the recent eclipse: the one-way fares were not symmetric.
Below is an optimal TSP tour of 13,509 incorporated cities in the continental United States as of 1998 when it was solved. Note at bottom right that the tour includes a visit to Key West; our hearts are with all those affected by Hurricane Irma.
TSP website source |
This uses the Euclidean distance. It is common to allow any metric that satisfies the triangle inequality: for any nodes , the cost of going from to is no more than that of going from to and then from to . If we have any cost function but allow the salesman to “pass through” cities already visited, we can re-define to be the minimum cost allowing transit through one or more . Then satisfies the inequality and gives the same optimum. Conversely, if satisfies the inequality then the pass-through rule is superfluous. So allowing it is equivalent to having the triangle inequality.
Without the rule or the inequality, we could take any hard instance graph of the (directed or undirected) Hamiltonian cycle (HC) problem and add some high-cost edges to get a graph . Then approximating TSP or ATSP for is equivalent to solving HC for . Assuming the triangle inequality avoids such cases and is in force for STV.
What is so interesting about the difference between the TSP and the ATSP is that a constant approximation has long been known for the TSP. Indeed, getting a factor of is easy by finding a minimum spanning tree , considering first the tour that uses the pass-through rule to travel each edge forward-and-back, and finally improves by going directly to the next unvisited vertex in that tour rather than pass through. Getting the best-known factor is based on a simple, but very clever, algorithm by Nicos Christofides. It finds a minimum-weight perfect matching for the nodes of odd degree in using the edges they induce in , creates an Euler cycle from (traversing any edge common to and twice), and finally improves as above. There has been progress on special cases—see here, but Christofides’s algorithm has withstood attempts to improve it for over forty years. Pretty impressive.
The input for ATSP is a directed graph together with a . The graph is strongly connected, meaning that for all nodes there is some path from to . If there is no edge we could add one and define to be the minimum path cost as above, and so make into a complete directed graph. However, the STV paper does a series of reductions through problems in which the absence of edges matters.
For intuition, picture not a salesman but a big delivery truck doing its rounds on the one-way streets of Manhattan. By using more than one node at intersections we can model another feature of Manhattan, which is often not being able to make a left or even right turn. This makes Manhattan behave like a non-planar graph and turns the counting measure of blocks you must travel into a non-Euclidean distance, but still one obeying the triangle inequality.
Cropped from free Flickr source |
Now picture an army of scooters or bicycles, each taking one or a few packages—fractions of the job. They are still subject to the road rules and cost measure (not like drone delivery which is illegal in most cities). Modeling them yields the linear programming (LP) relaxation of (A)TSP studied by Michael Held and Dick Karp, which we discussed here. Its optimum is a lower bound on the optimal amount of work for the truck.
The point is that if we can find tours with cost only a constant factor higher than then we’ve automatically achieved a constant factor approximation of . The second point is that the LPs defining , while large, are nice to analyze. So their main theorem is:
Theorem 1 There is a constant and a polynomial-time algorithm that, given any , returns a tour of cost at most .
Well, the constant proved by STV is , not “galactic” but big. What is significant is that all previous proved overheads grew as or similar in the number of nodes . Once we achieve a constant factor we can think about improving it…
We quote their summary of the proof:
We now combine the techniques and algorithms of the previous sections to obtain a constant-factor approximation algorithm for ATSP. In multiple steps, we have reduced ATSP to finding tours for vertebrate pairs. Every reduction step was polynomial, and increased the approximation factor by a constant. Hence, altogether they give a constant-factor approximation algorithm for ATSP.
See the 39-page paper the meaning of “vertebrate pair” and words like “laminarly” which we didn’t know were legal in Scrabble. They are anyway far removed from the classic vocabulary “spanning tree,” “Euler tour,” “perfect matching,” and “Hamilton cycle” which sufficed for Christofides’s still-frontline algorithm.
What we note here is their proof structure using reductions to progressively-refined problems. They use previous steps to build algorithms for each problem, with the respective names:
Yes, the subscripts of the second and fourth stand for “vertebrate” and “laminar” while the third algorithm works on a reduction of the second problem to “irreducible” instances. Each step has its own approximation ratio, whose combination becomes for a customizable which they bound by .
Our point is that the proof makes a series of seemingly incremental refinements with loose ends left unvisited until we see at some step that it can “close” and finish its objective—which makes the excursion into a tour. We want to think more deeply about other potential progress of this form that may be capable of breakthroughs in our field.
They never claim that they are trying to optimize the factors in their reduction steps and the final constant . An obvious question that no doubt will be solved is to improve the constant. Is there a chance to get a small one?
]]>
A gathering this Labor Day in Rochester
Announcement source |
Joel Seiferas retired on December 31, 2016 and is now a professor emeritus in the University of Rochester Computer Science Department.
Today Ken and I wish to talk about his party happening this Labor Day—September 4th.
Joel retired on a holiday—New Year’s Eve—and is having his retirement celebration on another holiday, Labor Day. The former marks the end of each year, and the latter the cultural end of each summer. Labor Day in the US is the first Monday in September. As shown by this chart in Wikipedia’s article, there is some complexity in its otherwise periodic structure—can you name the pope responsible for it?
I’ve never liked Labor Day owing to summer ending, school starting, and another reason—a fact of calendrical life that I share with Ken. Did Jack Benny’s feelings about Valentine’s Day change after he turned 39?
Joel asked his department in lieu of a gold watch or a series of talks praising his decades of research, he wanted a series of talks that would be accessible and enjoyable to everyone. A somewhat novel idea, since many talks are not accessible to all.
I would argue that both could be achieved—Joel’s work was often technical, but could definitely be explained to a general audience. For example, at least in my opinion, his paper “Two Heads are Better than Two Tapes” could be fun to hear about. Okay it may not be as exciting as hearing about self driving cars, AI programs that can outplay humans at Go, or a proof that P=NP. But there is something—I believe—beautiful about results of Joel’s that explain the power of various basic computational devices.
But no one asked me. So Joel got his wish and is receiving four talks by leaders in our field that should be enjoyable for everyone:
They have posted a wonderful statement about Joel here. We especially enjoy how it ends:
The fact that Joel is completely ego-free and did all that work (and the single-handed development of widely used bibliographic bridging resources) purely for the advancement of the science—while also mentoring (especially in his ambitious and challenging courses) most of the theory students the department has educated, shaping the department’s faculty recruiting in theoretical computer science, serving as the department’s chair, and wholeheartedly serving the University in his many years on the Academic Honesty Board—makes him all the more of an inspiration to those who know him.
By coincidence, the great economist and public intellectual, Thomas Sowell, retired from his decades of column writing at the same time that Joel retired. At the end of a 2004 interview, Sowell was asked how he would like to be remembered, and he replied: “Oh, heavens, I’m not sure I want to be particularly remembered. I would like the ideas that I’ve put out there to be remembered.” Although our dear colleague Joel is self-effacing and modest, there is no doubt that the deep understanding of computer science that he has contributed will be remembered beyond Joel’s life and ours. We are deeply grateful to him for those ideas, and for his warm, wise friendship.
The earlier parts of the statement include some of Joel’s work and its impact. We’ll explain two of his results that are referenced.
One basic fact about general-purpose computing is the ability of one program to run any given program , even itself. The program is encoded in some fashion and may be much larger than . In practice we’re not aware of a time penalty for running this way rather than “natively” because serving programs is what a general-purpose computer does. But in the underlying model of computation there is a hit.
In some models the hit is only a constant factor depending on the size of —and importantly, not on the size of the input being run. One of the quirks of the standard multi-tape deterministic Turing machine (DTM) model is that it multiplies the constant factor by an extra overhead—indeed a factor for -step computations where we assume so all of is read. This is not just a feature of running but governs the best-known simulation of time- deterministic computations by families of Boolean circuits, which have size . It also affects how tightly we can separate time classes. By diagonalization we can prove:
Theorem 1 Let with being “time constructible” in the sense that some DTM given in unary halts in exactly steps. Then we can find a language that is accepted by a DTM in time but not accepted by any DTM in time. In symbols, , with meaning proper containment.
For separating we can improve the factor via certain “padding and translation” techniques. We can relax the first condition to read for some fixed . Can we make it go away completely? For DTMs there are specific problems that arise when we try to push the factor any further.
So what about nondeterministic Turing machines, that is, NTMs? It would seem to be harder to get a tight diagonalization to work because we cannot simply interchange ‘yes’ and ‘no’ answers to negate. However, Joel, as part of his thesis work under Albert Meyer, with Patrick Fischer making a trio, proved that one can almost eliminate the factor altogether:
Theorem 2 Let with again being the running time of some DTM. Then .
The only difference from a constant-factor overhead is having not on the left-hand side. The ‘+1’ has always struck me (Ken, writing this section). At polynomial time levels we can ignore it and even when we can drop it: . But with we have At double-exponential time levels the ‘+1’ makes an even greater relative difference. Its employment in the proof seems innocuous but remains indelible.
As the Rochester page mentions, the tightness of the theorem for was employed by Ryan Williams to prove his breakthrough lower bounds against . Our initial post on Ryan’s results brought out the connection to Joel.
This is one of Joel’s later results. It is joint with Tao Jiang and Paul Vitányi. The result is:
Theorem 3 The language of strings of the form where is a prefix of the binary string , which can be recognized in real time by a DTM with two heads on one worktape, cannot be recognized in real time by a DTM with two worktapes having one head each.
It is important to specify that the input appears on a read-only tape whose head cannot move left. The “one tape” and “two tapes” are initially blank.
The first statement is easy. As the machine reads the input it writes it down on its tape with one head and cleverly leaves the second head at the beginning. Then when the input hits the symbol the second head starts checking the remaining input against the written tape that stores . The two heads compare characters in lockstep and reach a verdict by the time the input has ended. The definition of real time often states that the input tape head advances at each step, but it can be relaxed to allow pausing it, provided there is a fixed finite number independent of the input such that always reads a fresh character within steps.
What happens when the tape heads are on separate tapes? can copy the part to one tape, rewind its head to the left edge, and do the same lockstep comparison with the input tape head reading and the second tape reading the copy of . The rub is that the pause for rewinding does not have a fixed finite bound. After copying the second head is just in the wrong position to begin comparing .
The second statement may now look obvious to you, but how about if has three tapes? Does it look equally obvious? In fact, Fischer and Meyer, working with Arnold Rosenberg in 1967, showed that a clever folding and mirroring scheme among three single-head worktapes enables to keep tabs on both ends of the part at all times while copying it. This enables to be recognized in real time. Moreover Joel, with Benton Leong, showed in 1977 that any computation with multi-head worktapes can be simulated in lockstep by enough single-head tapes—even when the `tapes’ are multi-dimensional.
That left single-head worktapes as the next case to try for recognizing . As our readers know, proving something to be impossible in computational theory is really hard. So is the proof in Joel’s joint paper, across eight pages of strategy and “crunch.” The question had after all been open for several decades. If you wonder whether other simple-looking problems are still open, see these two posts, which also reference related work by the FMR trio.
We wish Joel a great retirement and hope that the talks are as wonderful as we expect them to be. Ken is also looking forward to seeing old friends again there. Our thoughts are also with colleagues and friends in Houston and adjoining Gulf Coast areas.
]]>
A topical look at Norbert Blum’s paper and wider thoughts.
Cropped from source |
Thales of Miletus may—or may not—have accurately predicted one or more total solar eclipses in the years 585 through 581 BCE.
Today we discuss the nature of science viewed from mathematics and computing. A serious claim of by Norbert Blum has shot in front of what we were planning to say about next Monday’s total solar eclipse in the US. Update 9/2/17: Blum has retracted his claim—see update at end.
Predicting eclipses is often hailed as an awakening of scientific method, one using mathematics both to infer solar and lunar cycles and for geometrical analysis. The aspects of science that we want to talk about are not “The Scientific Method” as commonly expounded in step-by-step fashion but rather the nature of scientific knowledge and human pursuits of it. We start with an observation drawn from a recent article in the Washington Post.
Despite several thousand years of experience predicting eclipses and our possession of GPS devices able to determine locations to an accuracy of several feet, we still cannot predict the zone of totality any closer than a mile.
The reason is not any fault on Earth but with the Sun: it bellows chaotically and for all we know a swell may nip its surface yea-far above the lunar disk at any time. Keeping open even a sliver of the nuclear furnace changes the character of the experience.
The Post’s article does a public service of telling people living on the edge of the swath not to think it is a sharp OFF/ON like a Boolean circuit gate. People must not always expect total sharpness from science. Happily there is a second point: you don’t have to drive very far to get a generous dose of totality. This is simply because as you move from the edge of a circle toward the center, the left-to-right breadth of the interior grows initially very quickly. This is our metaphor for how science becomes thick and solid quickly after we transit the time of being on its edge.
Incidentally, your GLL folks will be in the state of New York next week, nowhere near the swath. Next time in Buffalo. Also incidentally, Thales is said to be the first person credited with discovering mathematical theorems, namely that a triangle made by a circle’s diameter and another point on the circle is a right triangle and that lengths of certain intersecting line segments are proportional.
The transit time is our focus on this blog: the experience of doing research amid inspirations and traps and tricks and gleams and uncertainty. Swaths of our community are experiencing another transit right now.
Norbert Blum has claimed a proof that P is not equal to NP. In his pedigree is holding the record for a concrete general Boolean circuit lower bound over the full binary basis for over 30 years—until it was recently nudged from his to His paper passes many filters of seriousness, including his saying how his proof surmounts known barriers. Ken and I want to know what we all want to know: is the proof correct?
More generally, even if the proof is flawed, does it contain new ideas that may be useful in the future? Blum’s proof claims a very strong lower bound of on the circuit complexity of whether a graph of edges has a clique of size . He gets a lower bound of for another function, where the tilde means up to factors of in the exponent. We would be excited if he had even proved that this function has a super-linear Boolean complexity.
Blum’s insight is that the approximation methods used in monotone complexity on the clique function can be generalized to non-monotone complexity. It is launched by technical improvements to these methods in a 1999 paper by Christer Berg and Staffan Ulfberg. This is the very high level of what he tries to do, and is the one thing that we wish to comment on.
Looking quickly at the 38 page argument an issue arose in our minds. We thought we would share this issue. It is not a flaw, it is an issue that we think needs to be thought about more expressly.
As we understand his proof it takes a boolean circuit for some monotone function and places it in some topological order. Let this be
So far nothing unreasonable. Note is equal to , of course. Then it seems that he uses an induction on the steps of the computation. Let be the information that he gathers from the first steps. Technically tells us something about the computation so far. The punch line is then that tells us something impossible about which is of course . Wonderful. This implies the claimed lower bound on which solves the question.
The trouble with this is the following—we studied this before and it is called the “bait and switch” problem. Let be some random function of polynomial Boolean complexity and let . Then assume that there is a polynomial size circuit for . Clearly there is one for and too. Create a circuit that mixes the computing of and in some random order. Let the last step of the circuit be take and and form , Note this computes .
The key point is this:
No step of the computation along the way has anything obvious to do with . Only at the very last step does appear.
This means intuitively to us that an inductive argument that tries to compute information gate by gate is in trouble. How can the ‘s that the proof compute have any information about during the induction? This is not a “flaw” but it does seem to be a serious issue.
If nothing else we need to understand how the information suddenly at the end unravels and reveals information about . I think this issue is troubling—at least to us. It is important to note that this trick cannot seem to be applied to purely monotone computations, since the last step must be non-monotone—it must compute the function. The old post also notes a relation between the standard circuit complexity and the monotone complexity of a related function .
While we are grappling with the paper and writing these thoughts we are following an ongoing discussion on StackExchange and in comments to a post by Luca Trevisan, a post by John Baez, and a Hacker News thread, among several other places.
The paper has a relatively short “crunch” in its sections 5 and 6, pages 25–35. These follow a long section 4 describing and honing Berg and Ulfberg’s work. What the latter did was show that a kind of circuit approximation obatined via small DNF formulas in Alexander Razborov’s famous lower–bound papers (see also these notes by Tim Gowers) can also be obtained with small CNF formulas. What strikes us is that Blum’s main theorem is literally a meta-theorem referencing this process:
Theorem 6: Let be any monotone Boolean function. Assume that there is a CNF-DNF-approximator which can be used to prove a lower bound for . Then can also be used to prove the same lower bound for .
The nub being discussed now is whether this theorem is “self-defeating” by its own generality. There may be cases of that meet the hypotheses but have polynomial . The StackExchange thread is discussing this for functions of Boolean strings denoting -node -edge graphs that give value whenever the graph is a -clique (with no other edges) and when it is a complete -partite graph. Such a function related to the theta function of László Lovász (see also “Theorem 1” in this post for context) have polynomial complexity, meet the conditions of Razborov’s method, and don’t appear to obstruct Berg and Ulfberg’s construction as used by Blum. But if they go through there, and if Blum’s further constructions using an inductively defined function would go through transparently, then there must be an error.
Update 8/22/17: This objection and the inference of error have been verified. A Wikipedia page for Éva Tardos’s function has been created. It seems to us that another form of the monotone function besides hers using integer approximations is just if the Lovász number of the complement of (presented as an edge list) is the number of 0’s in
The details of in section 5 have also been called into question. We are unsure what to say about a claim by Gustav Nordh that carrying out the inductive construction as written yields a false conclusion that the monomial is an implicant of a formula equivalent to . There are also comments about unclarity of neighboring definitions, including this from Shachar Lovett in Luca’s blog since we drafted this section.
But this leads us to a larger point. Both of us are involved right now with painstaking constructions involving quantum circuits and products of permutations that we are programming (in Python). Pages 27–28 of Blum’s paper give a construction that can be programmed. If this is done enough to crank out some examples, then we may verify that potential flaws crop up or alternatively bolster confidence in junctures of the proof so as to focus on others first. This ability is one way we are now empowered to sharpen “fuzzy edges” of our science.
Is the proof correct? Or will it fall into eclipse? We will see shortly no doubt. Comparing this table of eclipses since 2003 and Gerhard Woeginger’s page of claimed proofs over mostly the same time period, we are struck that ‘‘ and ‘‘ claims have been about twice as frequent as lunar and solar eclipses, respectively.
Update 8/18: This comment by user “vloodin”, whom we remember well from the discussion here of Vinay Deolalikar’s proof attempt seven years ago, lays out the apparent flaw in the paper in more detail.
Update 9/2: On 8/30, Blum posted to ArXiv a v2 that comprises the comment, “The proof is wrong. I shall elaborate precisely what the mistake is. For doing this, I need some time. I shall put the explanation on my homepage.” We have no further substantial information.
Update 10/18: Norbert Blum has posted a detailed but short explanation of the mistake.
[restored missing links; a few word and format changes, Uffberg->Ulfberg, updates 8/18 and 8/22]
Composite of src1, src2. |
Olivier Bournez and Amaury Pouly have proved an interesting theorem about modeling physical systems. They presented their paper at ICALP 2017 last month in Warsaw.
Today Ken and I wish to explain their theorem and its possible connections to complexity theory.
Of course as theorists we are most interested in discrete systems and rarely if ever mess with differential equations. I do recall, with some awe, that when I started my career at Yale the numerical analysts were experts at ODEs: that’s ordinary differential equations for the rest of us. An ODE is an equation that involves functions of one independent variable and its derivatives. A famous one is Isaac Newton’s second law of motion:
They used their ability to guess solutions to discrete recurrence systems. I do not believe there is an exact meta-theorem connecting discrete recurrences with ODEs but at the heuristic level they were able to just look at a recurrence and say I believe the solution is And they were usually right.
Let’s get back to what Bournez and Pouly (BP) have proved.
Let’s first state what BP proved and then discuss their result. Their result is an extension of a 1981 theorem of Lee Rubel, which Bournez and Pouly call “an astonishing fact.” Rubel proved there is a differential equation that is “universal” in the sense expressed by his theorem statement:
Theorem 1 There exists a fourth order differential algebraic equation (DAE)
where is a polynomial in four variables and integer coefficients such that for any continuous function and any there is a smooth solution to (*) such that for all real ,
Rubel actually proved more: the theorem allows to be any continuous positive function: so the error between the solution and can decay off as tends to plus or minus infinity. He also exhibited specific polynomials . If we rename the differentials to variables , respectively, then Rubel’s simplest one is:
Boulez and Pouly note that simpler ones were found by others, including two whole families by Richard Duffin and one in a neat paper by Keith Briggs where one can take any :
Note that all three have the same “differential monomials” and approach each other as one chooses higher, as is most obvious on multiplying by .
The theorem may be astonishing but according to BP is also a bit disappointing. They say:
As we said, Rubel’s proof can be seen as an indication that (fourth-order) polynomial implicit DAE is too loose a model compared to classical ODEs, allowing in particular to glue solutions together to get new solutions. As observed in many articles citing Rubel’s paper, this class appears so general that from an experimental point of view, it makes little sense to try to fit a differential model because a single equation can model everything with arbitrary precision.
They cite two more deficiencies of all these results:
First, … the proofs heavily rely on the fact that constructed DAE does not have unique solutions for a given initial data. … Rubel’s DAE never has a unique solution, even with a countable number of [initial] conditions of the form . Second, the proofs usually rely on solutions that are piecewise defined. Hence they cannot be analytic, while analycity is often a key expected property in experimental sciences.
This leads into what BP proved (their emphasis):
Theorem 2 There exists a fixed polynomial in variables, for some , so that for any continuous and any there exist so that there is a unique analytic function so that
- The function is a solution in the sense that
- For all real ,
.
As with Rubel’s result they actually prove the stronger form where can be any continuous positive function so that the error between the solution and can decay off just like before.
BP’s theorem differs from Rubel’s in that their solution is unique and analytic. This has several ramifications. For one it makes the proof harder, since gluing functions together as Rubel did fails for analytic function. The proof uses some programming tricks instead. One ingredient is that the function
can be modulated for any irrational to bring both and arbitrarily close to multiples of without the denominator vanishing. That is, they can make the fraction grow fast while retaining analyticity. Émile Borel once conjectured that solutions to -variable ODEs that are defined on all of must have growth bounded by a stack of exponentials. This conjecture was refuted even for , and BP were able to apply the refutation.
Second, their theorem cuts down the extreme multiplicity of solutions all the way to one. In real life we expect that the equation given the initial conditions uniquely forces the total behavior of the solution. As they quote, Rubel’s paper had even given as open “whether we can require in our theorem that the solution that approximates to be the unique solution for its initial data.”
Their result can be viewed both as a positive result and a negative one. On the one hand they show that ODEs are very powerful and can model almost any reasonable computation. You can think of this as showing the power of analog computation. On the other hand, they show that models based on ODEs may likewise be too powerful. They too may be so powerful that they are not really useful.
One weakness is that their polynomial is not so simple. They outline how their construction can be effectivized to give somewhere north of 300. Possibly it can be tightened much further. They point out an analogy with the kind of universality shown for the Riemann zeta function by Sergei Voronin, citing one of our two posts on it for reference. The zeta function is analytic but cannot be a solution to any polynomial ODE (nor DAE). We can try to motivate the task of improving and by noting how a minimum universal ODE is a natural and fixed version of a minimum universal program.
What are some further implications of the new result for modeling nature? Is there some way we can stratify differentially simulatable systems according to some (concrete) measure of their computational complexity?
]]>
Including debt to Marina Ratner, 1938-2017
By joint permission of Assad Binakhahi, artist, and Radio Farda (source) |
Maryam Mirzakhani won the Fields Medal in 2014. We and the whole community are grieving after losing her to breast cancer two weeks ago. She made several breakthroughs in the geometric understanding of dynamical systems. Who knows what other great results she would have found if she had lived: we will never know. Besides her research she also was the first woman and the first Iranian to win the Fields Medal.
Today we wish to express both our great sorrow and our appreciation of her work.
An article in 2014 by Jordan Ellenberg called her win a “cultural change in mathematics” not for her gender or nationality but for her field of dynamics. He called it “an infant compared to the other major branches of math.” Now dynamics has been studied since long before Isaac Newton, and we’ve covered the three-body problem among other topics. What he means is that abstraction away from physics was needed to boost mathematical tools of analysis and that this gained thrust only in the second half of the 1900s.
We can put it this way: Dynamics has always been a moving target. The work that Mirzakhani furthered gives it a fixed frame. Whole ensembles of possible motions can be represented by parameters to form a space—one like a manifold but with a quotient structure. This space becomes a single geometric object by which to analyze the dynamics. We can give a facile analogy to how Boolean circuits are often considered easier to analyze than Turing machines because they are fixed whereas Turing machines move. But there is a greater potential conduit to problems in complexity theory: both her work and the attack on P vs. NP by Ketan Mulmuley and co-workers involve orbits and their closures.
Perhaps the best example of a dynamical system to play with is the familiar executive toy of metal balls on strings. Usually there are five identical balls as at left below, but let’s say a junior executive might start with just two as shown in the middle.
Composite of various sources plus extra drawing. |
Now let’s transport the company to Edward Abbott’s Flatland. Junior executives there have two balls that go back and forth along a line inside a confined area. We don’t know how gravity would work in Flatland—at least not classical gravity—but the edges of the line segment would propel a ball colliding with them back toward the center. Of course we assume all collisions are perfectly elastic, meaning in particular that they conserve momentum. Admittedly contrary to the illustrations, we also assume the “balls” are really point particles of vanishing radius.
We can now trade a ball for a dimension. We can represent configurations of the balls by points where is the displacement of the left ball from the left end and is the distance between the balls. These points form a triangle as shown, with left-right remaining the directions of the first ball and up-down corresponding to left-right for the second ball. The combined directions and velocities of the two balls become one direction and velocity of the blue ball shown in the triangle. The two balls collide—remember we made their radii infinitesimal—when the blue ball is on the hypotenuse.
The neat fact is that the dynamics of the two balls in 1D are faithfully represented by the Newtonian behavior of the one ball in the triangle. Collisions with the sides or with each other, at any velocities, become angle-preserving collisions with the sides. A proof may be found here (first pages) along with a representation of three constrained particles on a circle. The only thing we need to avoid is if the two balls hit the left and right sides simultaneously or hit each other against a side. That corresponds to the blue ball hitting a corner, a singular event we are entitled to ignore. Abracadabra, our executive is now gaming at billiards on a triangular table.
The last trick is the niftiest and works with any triangle—and more generally with polygons. We can reflect the triangle along one of its edges as diagrammed in a survey by Mirzakhani’s Stanford colleague Alex Wright which is a major source for our post:
The billiard trajectory becomes a straight line into the reflected copy. Obviously it would be nicer if we could analyze straight lines—that is, geodesics—in a larger space. When and how we can make the space may recall the tiling problems of our previous post but the rules are different. We need not tile in the plane but can use surfaces of arbitrary genus and metrics that allow angles greater than around a conical point. This is where the special mathematical framework and tools for the work by Mirzakhani we are discussing enter in.
We’ve exemplified that billiards can represent some other kinds of dynamical systems. Of course, billiards—even with just one ball—is interesting in itself. We can play it on tables shaped like other polygons besides triangles, or not polygons at all. Here are some questions we would like to answer:
Some of these questions are challenging even for triangles. Every acute triangle has a closed loop that visits the three bases of the three altitudes, but it is not known whether every obtuse triangle has a closed loop at all. On a convex billiard table the answer to question 3 is immediately yes, but what about non-convex tables? If the edges are mirrors and is a candle, we are asking whether is illuminated—and how much if any of the surface remains in shadow. Although Wikipedia traces the question only to Ernst Straus in the early 1950s, I wonder if Newton thought of it during his work on multiple-prism arrays in his great treatise Opticks. This book by Serge Tabachnikov has more.
The questions become more attackable if we assume that every interior angle of the polygon is a rational multiple . Then is called a rational polygon. There are only finitely many ways that can be iteratively reflected around one of its edges and the changes in orientation form a finite group that is dihedral. This is easy to visualize if the copies of tile the plane in the sense of the last post. Group theory and topology and abstract spaces extend our horizon because they can be used on polygons that don’t simply tile and allow us to apply the straight-line reflection trick.
A “clump” of non-overlapping polygons in the plane generates a translation surface if:
For example, we can take a square and pair the opposite edges. Identifying them creates a torus. Besides the familiar 3D donut shape of a torus, we can picture it in 2D via how squares tile the plane. If we take a single octagon and identify the four pairs of opposite sides then all eight vertices become identified as shown below at left. We get a translation surface with angles summing to at one vertex. This time octagons cannot tile the plane but we can still picture the space with algebraic help.
Two clumps generate the same space if one can be converted to the other by the operations of translating a polygon, bisecting a polygon along a diagonal, or doing the inverse of the latter to legally glue two polygons together. This equivalence relation is said to be difficult here but is evidently polynomial-time decidable.
We may also ignore interior edges; thus the reflections of the right triangle having smallest angle —shown at right in our figure—are considered to yield the octagonal translation surface. Indeed, every translation surface can be presented by a single polygon (see section 12 of this) but not necessarily one that is convex.
Rotations and deformations of the polygons, however—shown in the middle of the figure—yield different spaces. We can describe those and other processes by groups acting on their coordinates. In the real plane there are two coordinates so we are talking about the general linear group of matrices with real entries and its subgroups.
The reflections of a rational polygonal billiard table yield a translation surface, but not every translation surface arises that way. What do we gain by the extra generality? What we gain are the algebraic tools and one more trick:
Instead of looking at different starting points for the billiard ball and rotating the direction in which it starts moving, we can look at rotations and linear stretchings of the translation surfaces. That is, instead of the orbit of the ball, we can study the algebraic orbit of the space under or some of its subgroups.
The orbits have their own spatial structure. This is one of the great features of representation theory conceived by Sophus Lie: groups of matrices acting on spaces form topological spaces in their own right. Subgroups can be defined by parameters that act as coordinates for . So what happens when is a translation surface?
A simple answer was hoped for but experience with fractal behavior and chaos in related matters had restrained hopes of proving one. The answer by Mirzakhani in collaboration with Alex Eskin and joined by Amir Mohammadi was dubbed the “Magic Wand Theorem” in this survey by Anton Zorich:
Theorem 1 The closure of the orbit of a translation space is always a Riemannian manifold, moreover one definable by linear equations in periodic coordinates with zero constant term.
Despite the statement being simple and short the proof is anything but: almost half of the first paper’s 204 pages are devoted to approximation techniques employing random walks amid conditions of low entropy meaning low rate of divergence or “unpredictability.” Zorich says more about the wide panoply of techniques the proof brings together. Thus the ultimate dynamics were brainpower, knowledge, interaction, focus while assembling all the moving parts, and sheer hard work.
What does the “Magic Wand Theorem” do? To quote the title of a paper by Samuel Lelièvre, Thierry Monteil, and Barak Weiss, “Everything is Illuminated.” They solved question 3 above for rational polygons by showing that at most finitely many points remain in shadow—and illumination comes arbitrarily close to those points. It is just amazing that a simple question that Newton would have instantly understood needed such heft to answer. As they say in their abstract:
Our results crucially rely on the recent breakthrough results of Eskin-Mirzakhani and Eskin-Mirzakhani-Mohammadi, and on related results of [Alex] Wright.
Wright’s survey also notes that Theorem 1 converts many results of the form ‘X happens in almost all cases (but we don’t know specifically which)’ into ‘X happens in all cases.’
The theorem also makes previous upper and lower bounds for certain counting problems coincide. Incidentally, one of the major results in Mirzakhani’s PhD thesis, cited in the article accompanying her Fields Medal, showed how to count simple closed geodesics in Riemannian manifolds as a function of their length . The count can jump—e.g. when passes the length of a loop around a torus—but behaves nicely asymptotically.
The amplification of previous knowledge also shows in the relation of Theorem 1 to a theorem by Marina Ratner that inspired it:
Theorem 2 Let be a Lie group and a finitely periodic structure within —that is, a lattice. Let be a subgroup of definable by real matrices such that some power of is zero and the entries of are functions of one real parameter . Then for every point in , the closure of its orbit under is a manifold defined by homogeneous equations.
The Fields citation article calls Theorem 1 “a version of Ratner’s theorem for moduli spaces,” noting that the latter are “totally inhomogeneous.” It says Mirzakhani was thus able to transfer questions about dynamics on inhomogeneous spaces into nicer homogeneous cases. Other theorems by Ratner form a nexus that is all reflected in Mirzakhani’s work with Eskin and Mohammadi.
By sad coincidence, Marina Ratner also passed away earlier this month. Yesterday’s New York Times gave her a long obituary as well, noting how she did some of her best work after age 50 and that it was a basis for work by others including Mirzakhani. Jointly they provided much to inspire. Here are Mirzakhani in a still from a Harvard Math lecture video at the point where she introduced billiards and the illumination problem, and Ratner receiving an honorary doctorate from The Hebrew University in Jerusalem:
Our most ambitious question is whether Mirzakhani’s work can be made to have a magic effect on orbit closure problems that some are trying to use to illuminate complexity theory.
Again we express our condolences to her family and colleagues.
Update 7/30: We received permission to use Assad Binakhahi’s beautiful memorial drawing titled “Unfinished Equation” from both the artist and Radio Farda, whose torch logo appears at its upper left. To them many thanks. The previous picture of Mirzakhahi has been moved alongside the one of Marina Ratner, with an added sentence above them and some other word changes at top and here.
Cropped and combined from src1, src2. |
Michaël Rao and Marjorie Rice are linked in this month’s news. Rao has just released a paper (see also slides and code here) completing the catalog of convex polygons that tile the plane. Rice, who passed away on July 2 (obit), had expanded the pentagon catalog from 9 to 13 while working in her kitchen in 1975. Rolf Stein found a fourteenth in 1985 and Casey Mann led a team of programmers to find a fifteenth in 2015. Rao has closed the book at 15.
Today Dick and I hail their accomplishments, which we noted from two articles by Natalie Wolchover in Quanta this past Tuesday. We also emphasize some related problems.
We especially are impressed by Rice, who was a true amateur. She had no advanced training in mathematics of any kind. After reading a 1975 Scientific American article on tessellations she started her search for new types. She succeeded and found ones that had been missed by everyone—that includes Johannes Kepler who worked on tessellations in 1691. She maintained a website named, “Intriguing Tessellations.”
In recent decades, computers have become an essential tool. This raises the possibility of a new kind of amateur: one who can code. Computing power is more accessible than ever before. The fact that having advanced degrees doesn’t make your code run faster levels the playing field. As it happens, Mann led a team that included a student and Rao wrote his own code.
If you draw any triangle in the plane, then you can place a 180 rotated copy against it on an edge to make a parallelogram. That can be replicated to make an infinite strip, and those strips complete a tiling of the plane. That tiling is periodic with only two different orientations of the triangle.
A little more thought will tell you that any quadrilateral—not just a parallelogram—can be made to tile the plane. The reason is that the four interior angles add up to 360—even if the quadrilateral is not convex. Make three copies and orient them so that the four different angles come together at a corner of the original and its two adjoining edges are shared. Then the same orientations work at the opposite corner and this suffices to see that the clump of four tiles the whole plane, indeed two of them do.
Combined from Math and the Art of M. C. Escher wiki source. |
Convex polygons with 7 or more sides cannot tile the plane—whatever their shape—because their interior angles sum so high that the average number of polygons meeting at a vertex would fall below 3. Regular hexagons can tile, of course. Karl Reinhardt in 1918 showed that convex hexagons can tile in three ways that use 2, 3, or 4 different orientations (the first not being a special case of the third).
That left the case of pentagons. Of course a regular pentagon cannot tile the plane, but ones shaped like a baseball home plate can mesh in a sawtooth pattern using two orientations. You can get this by cutting each strip of a regular hexagonal tiling in half:
Mathematical Tourist source. |
The idea of cutting tiles to make new ones animates the mathematics. The following figure taken from a 2015 story in Britain’s Guardian newspaper shows the now-complete list of distinct pentagonal tilings.
Note that the version of this diagram used in a February 2013 post on Rice had only the 14 tilings known then.
All known tilings by single connected pieces are periodic like wallpaper. There is an algebraic theory of wallpaper symmetries and corresponding groups. Note that pentagonal symmetry is excluded. The periodic clumps in tilings by pentagons, if they have any rotational symmetry other than the full circle, must have one of order 2, 3, 4, or 6.
Flooring, however, can choose to have a radial pattern. Here are two tilings found by Sir Roger Penrose of Oxford with five-fold symmetry that can be extended infinitely far:
Modified from Martin Gardner article source. |
Both use the same two quadrilateral tiles and a special restriction: the two centrally symmetric vertices of one cannot touch a centrally symmetric vertex of the other. Cutting each tile into two triangles facilitates defining a self-recursion that proves how the pattern can be extended infinitely. Our diagram also shows a mutual recursion between the “sun” and “star” patterns.
The limit ratio of the convex “kite” tiles to the concave “dart” tiles in the recursions is the golden ratio, … To see why, note that each larger orange-bordered kite at right is made of two kites plus two halves of a dart, while the larger darts have just one kite and the halves of a dart. The recursion thus involves powers of the matrix , whose entries yield consecutive Fibonacci numbers, whose ratio approaches . Because is irrational, the tilings are not periodic.
Penrose found a related tiling using two convex shapes—a thin lozenge and a fatter one—with a similar restriction that enforces aperiodicity. John Conway suggested enforcing these restrictions by matching up colored lines, as illustrated by the entrance to Oxford’s new Mathematical Institute building. This is my own photo from two years ago:
The restriction can be enforced without markings by notching the sides allowed to match up in the manner of jigsaw puzzle pieces, but this creates non-convex polygons. Robert Ammann found a way to cut Penrose’s lozenges and assemble them into three convex polygons that can only tile aperiodically. This figure from a paper last year by Teruhisa Sugimoto shows how:
Here is a color version of Ammann’s tiling posted by John Lindner. It too could be an attractive floor, but how about as a kitchen counter?
Two of Ammann’s tiles are pentagonal. Can a single pentagon carry out an aperiodic tiling? That question may have boarded the train of Rice’s thinking as she worked at her kitchen counter. It took until Sugimoto’s paper to prove this impossible when the pentagons must share entire edges. Rao’s results completes a definite no answer.
Understanding tilings of the plane is an intriguing mathematical problem. Finding new ones, as Rice did, requires cleverness and insight. Showing that certain types of tilings are impossible, as Rao did, requires another type of cleverness: the ability to prove that something cannot exist. This is interesting because it reminds us of lower bound problems that are well documented to be difficult.
Tilings are special compared to other classification problems—that is problems that show that the following list consist of all ways to create some mathematical object. They are different because tilings can be used to build real objects. One measure of this is that there are a number of patents for various types of tilings. Penrose thought to patent his tiles before publicizing them. We quote the introduction to his US patent 4133152 titled, “Set of tiles for covering a surface”:
[The field of this invention] has found practical application not only to the design of paving and wall-coverings but also in the production of toys and games. In both instances, not only is the purely geometric aspect of complete covering of the surface of importance, but the esthetic appeal of the completed tessellation has equal significance in the eye of the beholder. … [T]he pattern which they form is necessarily non-repetitive, giving a considerable esthetic appeal to the eye.
We especially like his equal regard for esthetics. What emerged in greater force than even he may have imagined—while expressly thinking of crystals—is how strongly Nature shares this regard. Dan Shechtman received the 2011 Nobel Prize in Chemistry for discovering quasicrystals. We covered some of this history going back to Hao Wang’s first proof of the existence of finite sets of non-convex tiles that can only tessellate aperiodically in a post four years ago. It is also neat that this was initially a consequence of Wang’s proof that whether a given can tessellate is undecidable, because by compactness, if the only tessellations are periodic then this fact is detectable in finite time.
If you don’t insist on convex tiles for your kitchen counter, then the question of aperiodic tilings by one piece remains open. This is called the Einsteinproblem. The name is not for Albert Einstein but derives from ein Stein being German for one stone. German uses “Stein” in many game contexts (besides Go) where we in English say “piece.”
Joan Taylor of Tasmania, another amateur mathematician, discovered in early 2010 that a single hexagonal tile could be forced to tile hierarchically—and only aperiodically—if a more complicated set of marking rules is stipulated. These rules cannot be enforced by jigsaw notching. However, followup work with Joshua Socolar discovered how to realize the hierarchical scheme by a single non-connected tile:
The three lines are just for show; the shape alone enforces the structure which the lines make clear. This is not quite an einstein—not one stone—but an allusion to Albert is warranted by the combination of cleverness, esthetics, and amazement that this fact brings. Taylor maintains a website with other striking designs.
There is also the problem of proving the widely-voiced belief that for square tiles with notches, the set of six discovered by Raphael Robinson has the minimum size to force aperiodicity.
If you prefer to stay with convex tilings, what emerges from Sugimoto’s paper vis-à-vis Ammann’s tiles discussed above is that the following question remains open:
Are there two convex tiles that tessellate but only aperiodically?
We have not even taken time to consider tilings in three (or higher) dimensions, in which a single bi-prism is known to give only tight packings of space that are aperiodic in one of their three dimensions. Whether a single 3D tile can squeeze periodicity out of all three dimensions seems to be open. We have also glommed over whether to allow tiles to be reflected or flipped over as well as rotated, and whether the number of different rotation angles in an aperiodic tiling is infinite, as happens for the irrational twist angles (in degree units) for the bi-prism packings. We invite you, our readers, to contribute your own favorite open tiling problems.
How would you “bet” on the open tiling problems? How would you have bet before 2010? We’ve discussed estimates of how people would bet on open problems in complexity but we have no idea here.
[sourced first two diagrams in section 2]