Jennifer Chayes is the current director of a research lab in Cambridge—that is Cambridge Massachusetts—for a company called Microsoft. She is famous for her own work in many areas of theory, including phase transitions of complex systems. She is also famous for her ability to create and manage research groups, which is a rare and wonderful skill.

Today Ken and I wish to talk about how to be “shifty” in algorithm design. There is nothing underhanded, but it’s a different playing field from what we grew up with.

Ken first noted the paradigm shift in 1984 when hearing Leslie Valiant’s inaugural talk on “Probably Approximately Correct” learning at the Royal Society in London. Last year Leslie wrote a book with this title. That’s about learning, while here we want to discuss the effect on algorithm design.

We illustrate this for a paper (ArXiv version) authored by Chayes and Christian Borgs, Michael Brautbar, and Shang-Hua Teng. It is titled, “Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation.” Since PageRank is a vital application, they don’t want their algorithms to be galactic. They trade off by not having the algorithms “solve” the problems the way we used to.

I, Dick, recall the “good old days of theory.” When I first started working in theory—a sort of double meaning—I could only use deterministic methods. I needed to get the exact answer, no approximations. I had to solve the problem that I was given—no changing the problem. Well sometimes I did that, but mostly I had to solve **the** problem that was presented to me.

In the good old days of theory, we got a problem, we worked on it, and sometimes we solved it. Nothing shifty, no changing the problem or modifying the goal. I actually like today better than the “good old days,” so I do not romanticize them.

One way to explain the notion of the good old days is to quote from a Monty Python skit about four Yorkshiremen talking about the good old days. We pick it up a few lines after one of them says, “I was happier then and I had nothin’. We used to live in this tiny old house with great big holes in the roof.” …

: You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t’ mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi’ his belt.

: Luxury. We used to have to get out of the lake at six o’clock in the morning, clean the lake, eat a handful of ‘ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!

: Well, of course, we had it tough. We used to ‘ave to get up out of shoebox at twelve o’clock at night and lick road clean wit’ tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit’ bread knife.

: Right. I had to get up in the morning at ten o’clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.

: And you try and tell the young people of today that ….. they won’t believe you.

: They won’t!

Those were the days. I did feel like I sometimes worked twenty-nine hours a day, I was paid by Yale, but so little that perhaps it felt like I paid them. Never had to drink a cup of sulphuric acid—but the coffee—oh you get the idea.

Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful.

The brilliance of the current approach is that we can change the problem. There are at least two major ways to do this:

*Change the answer required*. Allow approximation, or allow a partial answer. Do not insist on an exact answer.

*Change the algorithmic method*. Allow algorithms that can be wrong, or allow algorithms that use randomness. Do not insist that the algorithm is a perfect deterministic one.

This is exactly what Chayes and her co-authors have done. So let’s take a look at what they do in their paper.

In their paper they study PageRank, which is the definition and algorithm made famous by Google. It gives a way to rank webpages in response to a query that supplements criteria from the query itself. An old query-specific criterion was the number of matches to a keyword in the query. Rather than rank solely by this count, PageRank emphasizes a general page score. The score is sometimes interpreted as a measure of “popularity” or “authority,” leading to the following circular-seeming definitions:

A webpage is popular if it has a healthy number of links from popular pages.

A webpage is authoritative if it is well cited, especially by other authoritative pages.

What the PageRank score actually denotes mathematically is the likelihood that a person randomly traversing links will arrive at any particular page. This includes a frequency with which the person will stop clicking, do something healthier like ride a bicycle, and start again on a “random” webpage.

The situation can be modeled by the classic random walk on a directed graph. We have a graph on nodes and an matrix that is **row-stochastic**, meaning the entries in each row are non-negative and sum to . Given that the web-walker is at node , the entry is the probability of going next to node . If node has *out-*degree , then

We can tweak this e.g. by modeling the user hitting the “Back” button on the browser, or jumping to another browser tab, or using a search engine. We could also set higher in case page has few or no outgoing links. We still get an , and since the use of effectively makes the graph strongly connected and averts certain pathologies, we get a beautiful conclusion from random-walk theory: There is a unique **stationary distribution** , which is the unique left-eigenvector for the largest eigenvalue, which as normalized above is :

Then the PageRank of node is . It is remarkable that this simple, salient idea from the good old days works so well. A further fact from the theory (and use of ) is that if you start at *any* node, in the long run you will find yourself on page with frequency . Here is Wikipedia’s graphical example:

The issue is: *how* to compute ? In the good old days this was a trivial problem—just use linear algebra. But now the issue is that is really big, let alone being unspeakably big. The is too big even to get an approximation via the “further fact,” that is by simulating a random walk on the whole graph, and classical sparse-matrix methods might only help a little. This is where Chayes and company change the game: let us care about computing only for some ‘s, and even then, let us be content with fairly rough approximation.

The approximation to PageRank is called SignificantPageRank. The paper gives a randomized algorithm that solves the following problem.

Let us be given a graph. Then, given a target threshold and an approximation factor , we are asked to output a set of nodes such that with high probability, contains all nodes of PageRank at least , and no node of PageRank smaller than .

This is a perfect example of the shift. The algorithm is random, and the problem is to find not the nodes with a given PageRank, but those that are not too far away.

The nifty point is that the algorithm can tolerate fuzzing the matrix , in a manner called SARA for “sparse and approximate row access”:

Given and , return a set of columns and values such that for all :

- , and
- .

It is important to use this for different values of . The cost of a query is .

If we picture “” as “exponential” and take where , then this becomes an approximative version of being **succinct**, which we just talked about. In this scaling of we are effectively limiting to a **local** portion of the graph around node . Since we also have , under SARA entries outside would become effectively zero, so that the chance of “teleporting” outside on the whole would be regarded as negligible. In fact the paper also researches the case where each Web user always starts afresh at a “home node” in that portion, making just for that user. Then the -related probability is not negligible, and the resulting user-dependent estimate is called PersonalizedPageRank.

The problem they need to solve for SignificantPageRank then becomes “SignificantColumnSums”:

Given and as above, find a set of columns such that for all columns :

- ;
- .

An even simpler problem which they use as a stepping-stone is “VectorSum”:

Given a length- vector with entries in , and and :

- output
yesif ;- output
noif , don’t-care otherwise.

The goal is always to avoid looking at all nodes or entries, but only an or so portion of them, where . Thus the problem shift is necessitated by being huge. This isn’t my good-old-days idea of solving a problem, but can be called “-solving” it.

Ken and I are interested because these are similar to problems we have been encountering in our quest for more cases where quantum algorithms can be simulated in classical randomized polynomial time. Thus any new ideas are appreciated, and what catches our eye is a *multi-level* approximation scheme that exploits the requirement of SARA to work for different . The situations are different, but we hope to adapt them.

The situation for VectorSum is that a probe still costs order-of , and returns unless . A simple-minded use of a set of random probes for the same would yield the estimate

The resulting error has order , so we need which is rather demanding. Indeed the total cost would have order , where needs to be so large as to kill any hope of making the cost or even . In the pivotal case where , we would need , incurring cost on the order of .

However, they show that by using a different precision for each random probe, they can get acceptable error with a reasonably small number of probes. The case where we have occurs only once, so its cost can be tolerated. Other probes have smaller cost, and while their precisions are looser, the aggregate precision on the estimate becomes good enough for the following result:

Theorem 1Given as above and , VectorSum can be -solved with probability at least and cost

Well in the good old days before LaTeX we wouldn’t even have been easily able to write such a formula with a typewriter, let alone prove it. But it is certainly better than , and allows taking to meet the goal of runtime . As usual, for the details on SignificantColumnSums and the application problems, see the paper.

Do you miss the good old days? Or do you like the current approaches? What shifts, what new types of changing the goals, might we see in the future? For clearly today will one day be the “good old days.”

[changed qualifier on "certain pathologies", may as well footnote here that the "Back" button creates a Markov Chain *with memory*.]

]]>

Book source |

Kurt Gödel did it all, succinctly. His famous 1938 paper “The Consistency of the Axiom of Choice and of the Generalized Continuum-Hypothesis” was pages long. Since the Proceedings of the National Academy of Sciences was printed on single-column pages in fairy large type, it would have been under one page in FOCS or STOC format. Only Leonid Levin has famously been that succinct.

Today Ken and I wish to advance another approach on complexity questions based on succinctness and Gödel’s brilliant idea. We have covered succinctness before, but now our angle is more arithmetical.

What Gödel showed is that if ZF is consistent then so is ZF + AC, where AC is the axiom of choice. He threw in GCH, the generalized continuum hypothesis, and two statements about analytic sets, but AC drew the most attention. Many were and remain worried about the Axiom of Choice. Gödel showed that AC is safe to use in the sense that a result employing it cannot reach a contradiction in set theory, unless set theory already is inconsistent without AC.

The proof method Gödel used is based on a notion of a set being constructible. Roughly, very roughly, a constructible set is one that can be built up inductively by simple operations. He then proved that if ZF has any model, then these sets are embedded inside it and form a model by themselves, in which AC, GCH, and the other two statements also hold.

Well, he didn’t exactly prove it. His paper begins with “THEOREM” but there is no “*Proof*” anywhere. Once he states his inductive definition, the rest is left to the reader as evident. Nor did Gödel feel a need to give us any more detail when he laid out his entire version of set theory on one little porta-board during the interview we published last November—see the “One Choice” section toward the end. When a definition accomplishes a proof by itself, that’s brilliance.

The ingredients of Gödel’s construction are *ordinals* and *first-order formulas* . Once you know these ingredients, the definition is easy to figure out. For , . Then a set belongs to if and only if there is a first-order formula with one free variable and quantified variables and constants ranging in such that

And of course, if is a limit ordinal, then

This definition is so natural and nice that it does not appear in Gödel’s paper. It does not need to. The idea is clear enough. That is true brilliance. To be fair, the above undoubtedly came up in interaction with John von Neumann and others in the 1930s. Note, incidentally, that when we get , which gives , which is not the same as . Thus Gödel’s sequence gets off the ground.

Today the notion of constructible sets is a huge part of logic and set theory. Ken and I ask, is there some way to make this notion useful for complexity theory? This involves a third ingredient: a *set* can be an *element*, even a number. Thus , , and can be represented by or by various other 2-element sets. There are also various ways to represent binary strings, ways that are naturally more compact than the representation for numbers, but we will try to treat them all as numbers.

Our idea is to define a constructible *number* , and then look at sets of such numbers. To distinguish from Gödel we now say “constructive.”

Let be an integer of bits, where for some . In a natural way we can view as a boolean function from to where the value is . Say is -**constructive** provided the Boolean function has a circuit of size at most .

Clearly, this is only interesting if grows much slower than , such as being polynomial—or at least sub-exponential—in . We will discuss later trying to imitate Gödel’s definition further by using logical formulas in place of circuits. We keep it simple with circuits and numbers for now. Note that increases only when doubles, so if is really a function of , then we get nesting finite sets . The following definitions are asymptotic, but have this concrete basis.

Definition 1A set of natural numbers is -constructiveprovided for each in , it is -constructive.

Definition 2A set ispoly-constructiveprovided there is a so that for each in , is -constructive—where is bounded by .

Thus a set is poly-constructive if the members of the set each have a small circuit description. It does not require that has a uniform circuit description, nor that the elements of have circuits that are related to each other in any manner.

Let the universe of all poly-constructive sets be denoted by .

Let and be sets in . Then we claim that the following are true:

The sets and are in .

The set is in .

Thus the sets in form a Boolean algebra. Even more, suppose that is a subset of , which is a set in . Then is also in . This is just because for each , we are talking about subsets of .

Next, let us consider bitwise operations. Regarding numbers as -bit strings, let be their bitwise-XOR. Now if have circuits of size then may need size including an XOR gate or gadget at the output. But asymptotically we still have poly constructiveness:

The set is in .

Now we turn to addition: . Note that unlike bitwise XOR, addition can increase the length of numbers by one, though this is not as important as with multiplication whereby lengths can double. The main issue is that the bits can depend on non-local parts of and via carries.

In the *descriptive logic* approach to complexity this is no problem, because the Boolean function can be defined by a *first-order formula* in terms of the small indices and fixed predicates denoting and . The formula is independent of , so it is a single finite characterization of a set over all lengths .

The problem is that the circuitry for simulating a first-order quantifier ranging over most of the indices has size proportional to , not to or poly in . So it might not stay succinct. We *believe*, however, that we can simulate it in poly() size if parity counting can be done in polynomial time:

Theorem 3(?) If , then the universe of poly-constructive sets is closed under addition.

Contrapositively, if is *not* closed under addition, then , which is close to . Our idea for a proof is related to parity-based prediction in carry-lookahead adders (see this for instance).

Closure under multiplication, , is even thornier. We wonder if the correspondence in descriptive complexity between multiplication and , which is kind-of a scaled down analogue of (unbounded error probabilistic polynomial time), carries through here:

Theorem 4(??) If , then the universe of poly-constructive sets is closed under multiplication.

We suspect this but do not have a proof idea. Since is contained in length- strings, we can also ask concretely whether for appropriately-chosen it is contained in .

What we actually want to establish is this:

Theorem 5(???) There is a natural numerical operation such that the universe of poly-constructive sets is closed under (that is, on going from to ) ifand only if.

A result like this would finally bring our Gödel-inspired idea to full force as a potentially useful re-scaling of the issues involved in .

We can also consider using the intuitively looser condition of having a first-order formula to define members of in terms of members of and of , which immediately works for addition. This is close in form to Gödel’s definition itself. We suspect that this leads to involvement with the rudimentary relations and functions, which were introduced by Raymond Smullyan and studied in complexity theory by Celia Wrathall and others. Again it may yield an interesting re-scaling of the linear-time hierarchies involved.

Is the notion of interesting? Can we prove the above theorems?

]]>

* Taking a conjecture about identities to college *

Alex Wilkie is a Fellow of the Royal Society, and holds the Fielden Chair in Mathematics at the University of Manchester. Ken knew him at Oxford in 1981—1982 and again from 1986. In 1993 Wilkie won the Karp Prize of the Association of Symbolic Logic, equally with Ehud Hrushovski. This is named not after Dick Karp but rather Carol Karp, a mathematical logician in whose memory the prize was endowed.

Today I wish to talk about logical theories, motivated by some questions from Bill Gasarch and others.

The questions I have been considering really revolve around the role of the exponential function, and its interactions with our tamer friends, plus and times. And subtraction. It is subtraction that causes some hitches, because over the reals, is always a positive number. The sum and product of positive numbers are positive, but subtraction makes positivity go away. Subtraction and exponentiation also cause immediate issues in computational complexity, and the meeting point of complexity and logic is what has my attention.

The Karp Prize citation for Wilkie says it is

for proving the model completeness of the field of real numbers with the exponential function.

Model completeness is a step toward quantifier elimination and decidability—it implies that every first-order formula in the theory is equivalent to an existential one, and conversely. The Karp Prize itself has an interesting stipulation. It is awarded every five years:

for a connected body of research, most of which has been completed in the time since the previous prize was awarded.

Nothing like the lifetime retro span of a Nobel Prize or Turing Award or the 14-year horizon of the Gödel Prize, which until recently was 7 years. Once a Karp Prize is awarded, everybody basically starts again with a clean slate. This was no hurdle for Wilkie, even though the famous theorem of his that this post is covering in detail dates to 1980.

The other day, our friend and fellow blogger Bill Gasarch asked us a natural question. He asked me for the best upper bound on the theory of real arithmetic. It’s one of those things that I happened to know, and I was happy to help him get a quick reference. Sometimes people can beat Google, ha.

The answer is that the upper bound is double exponential time. This is pretty large, but it is not terrible. I have used a special case of this result that shows that solving existential sentences—essentially solving equations—is in single exponential time. This was in a paper with Markakis Vangelis that showed that Nash Games can be done in exponential time, even multi-party games.

The theory of the reals allows the operations of addition and multiplication of reals. It is a first order theory with equality, and so we can write sentences like:

This is a true sentence. Note the theory can define order: is defined as:

The axioms of the theory are all the basic properties of addition and multiplication, along with that fact that any odd degree polynomial has a root and that every positive number has a square root.

A famous theorem of the logician Alfred Tarski is:

Theorem:The theory of the reals is decidable.

Actually Tarski proved much more: he showed that any real closed field has a decidable and complete theory. A real closed field is a natural generalization of the reals. This beautiful theorem is in stark contrast to the situation of the theory of arithmetic, which by the Incompleteness Theorem of Kurt Gödel cannot be decidable. (Well it can be decidable, if it is inconsistent.) Tarski’s result is in the 1951 book: *Decision Method for Elementary Algebra and Geometry*.

But since our interest is in feasible computation, the question is lurking about. So the theory of reals cannot be too easy to solve, since it must at least be as least as hard as SAT. There are several ways to see this, some use a standard reduction, some use a direct encoding. So the real theory cannot be too easy unless .

Tarski’s original proof used quantifier elimination, and yielded a bound of the form on the running time. Many years later, Michael Ben-Or, Dexter Kozen, and John Reif proved that the theory of real closed fields is decidable in exponential space, and therefore in doubly exponential time.

Tarski raised a very interesting problem that I did not know, but found as I was writing this. Consider identities that you might find in a high school class on algebra. You might be asked to prove:

No doubt you can do this by simply expanding the left-hand side. Tarski said, let’s consider the following list of “obvious” rules for plus, times, and exponentiation:

Let’s call these **HSI**: for high school identities—what else? It is known that determining whether is an identity is decidable, where are terms that use plus, multiplication, and exponentiation. Tarski’s asked:

Are there identities involving only addition, multiplication, and exponentiation, that are true for all positive integers, but that cannot be proved using only HSI?

Another way to say this is: Can we use just the facts we learned in high school to prove any identity? The proof might be long and tricky to find, but does it always exist? A positive answer would have been cool—even in high school we had all the tools needed to resolve identities.

Alas, the answer is **no**. The HSI rules are not complete. In 1980, Wilkie found an example of a true identity that is beyond the ability of the HSI rules:

The intuitive reason this is true is that

so we can factor out , but the presence of subtraction makes this operation escape the parts of HSI dealing with exponentiation. The usual trick of representing subtraction by an existential quantifier and an addition does not work here.

See this paper by Stanley Burris and Karen Yeats for a nice survey of the area. It includes the use of Godfrey Hardy’s theorem that exp-log functions are linearly ordered by the relation modulo . It emphasizes the hunt for the smallest algebra in which HSI are true but Wilkie’s identity is not. Since then, further identities escaping proof by HSI have been discovered, and also researchers have found families of identifies that always can be proved by HSI.

A natural question is, what is the complexity of determining if such identities are true? Is there a fast way to test them? One idea that comes to mind is to try the identities on multiple random values. Of course this works for identities without exponentiation.

The key to understand this would be to show that an expression over plus, times, and exponentiation is not zero too often without being identically zero. If this was true, with an effective bound, then a random testing idea would work.

I am not aware of this result.

We wonder if there might be a similar theory for boolean circuits for example. Are there rules of the above kind that suffice to prove all boolean identities? This needs to be carefully defined. If there are obvious rules, could there be examples like the Wilkie example of a true identity that is unprovable using just these rules? Perhaps we can show that or even get lower bounds in this way for restricted rule sets.

What do you think?

]]>

* Musings on gravity and quantum on the 4th of July *

Cropped from Crumbel source |

Amanda Gefter is the author of the book titled *Trespassing on Einstein’s Lawn: A Father, a Daughter, the Meaning of Nothing, and the Beginning of Everything*. As the title suggests—and as Peter Woit pointed out in his review—there are parallels with Jim Holt’s book *Why Does the World Exist?*, which we covered a year ago. Gefter’s essay for Edge last year lays out more briefly how Einstein regarded relativity and early quantum as structurally divergent theories.

Today her book has led me to think further about the relationship between gravity and quantum mechanics.

The book is terrific. She simply has the ability to explain deep physics to a non-expert, me, in a way that makes it seem clear. Perhaps it is all a shell game, and what she says is ultimately misleading. But I think that she really does open the door to physics in a way that few popular authors have been able to do. Read the book and decide for yourself.

What I am thinking about is whether quantum and gravity “meet” in ways that maybe haven’t been fully appreciated, even though the context is simple: collisions of freely moving bodies. The bodies have to be tiny yet acting mainly under gravity. The question is whether quantum effects would make them avoid collision and fireworks with high probability. Of course today is Independence Day in the US where we have fireworks, though Ken and I feel the World Cup soccer games have already given us plenty.

Let and be electrically neutral objects of mass separated by some distance in a universe with *nothing else*. Okay maybe that is impossible, but this is a thought experiment. In a Newtonian world they would move toward each other according to the inverse law of gravity and would collide in a time that depends on their mass and distance. The exact bound is, I believe:

See this for the detailed calculation. In any event, in a finite time they will collide: the exact bound is not so critical.

In an Einsteinian world they would also move toward each other, and would collide after some time that also depends on the given parameters, and . The time would, of course, be close to the Newtonian time, if the mass and distance are small. In any event, again, in a finite time they would collide.

But the above arguments miss a point. Suppose that the objects and were neutrons. Then the above conditions seem fine, but there is a problem. The “nothing else” clause is false if we allow—as we must—quantum effects. As the particles are drawn together due to gravity, I believe they will encounter virtual particles from the vacuum quantum flux. Suppose that virtual particles appear and disappear: would they not create some gravitational force, even for a short time? Then it seems they would disturb the path of the neutrons?

If this is the case, then a curious possibility seems to arise. The objects may never collide at any time in the future. The reason is based on how random walks work in three dimensions. If we view the particles as moving toward each other but subject to random small tugs, then depending on the strength of the virtual gravity effect, the particles could avoid each other forever. Or they could take a very different time than predicted by either theory of gravity. What happens?

When there is lots of matter, quantum effects are known to forestall overwhelming gravitational force. The engine of electron degeneracy pressure, which prevents certain stars from collapsing beyond the white-dwarf stage, is Wolfgang Pauli’s exclusion principle, which forbids two fermions from having the same quantum state. There is more general quantum degeneracy, and in models it is affected by the dimension of the ambient space.

For very sparse matter, however, the fact that I allude to is this: For a random walk in Euclidean space, the nature of the walk depends on the dimension of the space. In one or two dimensions an unbiased walk will return to a initial location with probability . In three dimensions the probability of returning to the start of the random walk is strictly less than . George Pólya proved this back in 1921. Does this make a difference here?

Pólya’s result was originally for -dimensional square latices, but as noted in this 1998 PhD thesis by Peter Doyle, it extends to other regular lattices and to -dimensional Euclidean space in general. There are various different proofs, of which perhaps the shortest is this recent proof by Shrirang Mare drawing on Doyle’s thesis.

The “quick” reason why is a critical value is expressed by either of two integrals. One is that the “resistance” of the spatial medium to being infinitely penetrated by the walk is expressible by an integral of the form

while the other expresses the probability of non-repeating divergence in terms of the integral

Even though one exponent is and the other is , both have as the critical value between divergence and convergence. Doyle hints that the latter resolves a “chicken-or-egg” type question of whether the “reason” for being the first whole number giving convergence is anchored in mathematics or physical reality, though he minimizes its significance. We, however, still wonder about the full physical significance of this fact, beyond references that stay on quantum walks such as this paper by Martin Štefaňák, Tamás Kiss, and Igor Jex.

The question is: what actually happens? Do the virtual particles create a gravitation effect that moves the neutrons? Or is the effect too small? Or nonexistent? What happens? Is this in some quantum sense a reason why space is quiet—beyond known issues about quantum field theory predictions for the vacuum?

We are also interested because of some conjectures noted at the end of Doyle’s thesis: For any infinite graph that is “highly regular” in the sense of its automorphism group having only finitely many orbits, the maximum number of vertices within distance of a given node grows either as or as , where in the first case of this “poly/exp gap,” is a positive integer. The further conjecture is that the basic random walk on is recurrent if and only if . We have noted that a large case of this conjecture was proved by Mikhail Gromov.

]]>

* Ann will be missed *

Ann Yasuhara was a mathematician and a complexity theorist, who passed away this June 11th. She lived over half of her 82 years in Princeton, and was a member of the faculty of computer science at Rutgers since 1972.

Today Ken and I send our thoughts and condolences to Ann’s husband Mitsuru Yasuhara, her family, and her many friends—Ann will be missed—she was special.

I have known Ann for what seems like an uncountable amount of time. She was always kind to everyone—everyone—and we all miss her. She had a wonderful laugh and no doubt brought smiles to many over her 82 years. Ken remembers meeting her in the 1980s. Ann and her husband and her student Elaine Weyuker also attended Princeton’s centennial celebration for Alan Turing two years ago.

Her obituary notes that she and Mitsuru settled in Princeton in 1970, but traveled widely while pursuing interests in music and art and Nature as well as mathematics. Alumni of her *alma mater*, Swarthmore College, posted a note of gratitude. Just today, the Princeton Comment website has posted a list of organizations she supported.

As a Quaker and social advocate, Ann was against all forms of violence and all forms of injustice. She co-founded several groups, based in Princeton, that put her beliefs into action. One of them, “Silent Prayers for Peace,” keeps a vigil every Wednesday in Palmer Square, while “Not In Our Town” fights racism and bullying. The Swarthmore item remarks on friends calling her “Mountain Woman” and quotes the Princeton obit:

Most recently she enthusiastically supported—and went on protests with—the nonviolent direct action group, Earth Quaker Action Team (EQAT), which works to end mountaintop removal coal mining. On her 79th birthday, she protested on a strenuous mountain climb in West Virginia mining country. In January, just before she was diagnosed with cancer, the Philadelphia-based group honored her as one of its outstanding “wise elders.”

After experience with a local family who came from Guatemala, Ann joined those founding the Latin American Legal Defense & Education Fund, Inc., based in New Jersey. She is still listed on their Advisory Council.

Ann also kept a Facebook page, and her last entry, this past January, says something interesting about commitment and works:

[T]here is a similar issue for all (most?) aspects of Quakerism—there are deep, not so easy to articulate, basics, and coming to know and wrestle with them takes a long time—maybe a lifetime. So, we want to give some short cuts, but in doing so, we may may be delineating a path that doesn’t really reach into the basic territory. This difficulty is an important example of the tension between covenant and enumeration or contract.

Ann was indeed a mathematician who worked on logic and its relationship to computational theory. Ann was the PhD advisor to Frank Hawrusik, Venkataraman Natarajan, and Elaine Weyuker. I would imagine that she made a great and caring advisor.

Ann’s own 1964 PhD thesis was on foundational issues, as was her textbook. Indeed her book *Recursive Function Theory and Logic* is one of my favorites, and I mentioned it in a recent post.

In order to understand what Ann’s book was about, what her research was about, and what she was about, we need to understand complexity theory in the 1960′s. This was well before the question, before random methods, before quantum, before many of the mainstream topics that we know and love. In these early days the focus was on the structure of recursive functions. Mathematicians were well aware that recursive functions were too large a class of functions, even though they were “computable.”

Primitive recursive functions were a much weaker subset that contained still many very powerful functions. In Ann’s book she studied these functions in great detail. One was the Grzegorczyk hierarchy, named after Andrzej Grzegorczyk: this hierarchy divides the primitive functions into a series of higher and higher classes of functions. Roughly the levels of this hierarchy correspond to growth rates of functions: lower levels grow slower than higher levels. For example, the class of elementary functions, is the part of the hierarchy that corresponds to the union of the exponential hierarchy.

Of course today we consider even just functions that take time to compute as not really *elementary*, in the sense that we usually cannot compute them in practice. But elementary includes

for example.

A nice example of a natural occurrence of this hierarchy is the function . The famous theorem of Bartel van der Waerden states that for any there is some number so that if the integers

are colored with colors then there are at least integers in an arithmetic progression that all have the same color. The least is the value of .

The initial proof showed that was well defined, was a recursive function. Later in a famous paper Saharon Shelah showed that was in the Grzegorczyk hierarchy—it was primitive recursive. The original proof of the existence of used a double induction which lead to a non-primitive recursive bound. Shelah needed to avoid such a double recursion to get into the Grzegorczyk hierarchy, which he did in a clever way.

Later, in a breakthrough result Timothy Gowers lowered its place in the hierarchy by a huge jump:

This was not “just” primitive recursive, but was elementary. That is elementary in the sense of the Grzegorczyk hierarchy, not in the nature of the proof.

One of the great open problems is to get closer to the best lower bounds—bounds only slightly better than what follow from using a random coloring of the -many integers. This is one of the great mysteries of complexity theory on the “high” end, but may mirror issues on the lower end.

In any event, the structure of this hierarchy and others like it were the things that Ann worked on her whole life. Read her textbook—even after almost fifty years, there are many interesting nuggets in there about primitive recursive functions.

Again our best thoughts to all of Ann’s family and friends.

]]>

* Or rather two beautiful identities *

src |

Joseph Lagrange was a mathematician and astronomer who made significant contributions to just about everything. Yet like all of us he could make mistakes: he once thought he had proved Euclid’s parallel postulate. He wrote a paper, took it to the Institute, and as they did in those days, he began to read it. But almost immediately he saw a problem. He said quietly:

Il faut que j’y songe encore.

“I need to think on it some more.” He put his paper away and stopped talking.

Today Ken and I thought we would talk about a beautiful identity of Lagrange, not about the parallel axiom.

Many of you may know the identity, almost all of you probably know one of its consequences, the famous Cauchy-Schwarz inequality:

Here is the usual inner product and is the square of the norm,

The finite-dimensional case of this inequality for real vectors was proved by Augustin-Louis Cauchy in 1821. Then his student Viktor Bunyakovsky obtained the integral version, by taking limits. The general result for an inner product space was proved by Hermann Schwarz in 1888.

Bunyakovsky worked in theoretical mechanics and number theory. He conjectured, in 1857, a very natural result, one that is surely true, but even after its sesquicentennial anniversary, it remains completely untouched. The conjecture is: For each integer polynomial there are an infinite number of primes in the sequence

provided the polynomial satisfies certain trivial constraints. These are: (i) it must be a polynomial that tends to positive infinity—primes are positive, (ii) it must be irreducible, and (iii) it must not always be divisible by a fixed prime.

The first two constraints are trivial and the last avoids polynomials like . The conjecture is still open for all polynomials of degree at least two. Thus Cauchy’s student made a conjecture that is both beautiful and hard. We wonder, how was it to work with Cauchy as an advisor?

Lagrange’s identity is quite pretty. For any real numbers and :

Using notation for the norms and inner product we can shorten it to

The importance of this identity is that it immediately implies the famous Cauchy-Schwarz inequality over the real numbers.

For a historical note, the case was known going back to antiquity:

Wikipedia names this for Brahmagupta, whom we recently mentioned, and for Fibonacci, but notes that it goes back (at least) to Diophantus. Brahmagupta actually stated and proved this identity for any as well:

Incidentally, this shows that not only are numbers that are sums of two squares closed under multiplication, but also numbers of the form , for any fixed . Also we recently mentioned Lagrange’s use of a lemma that numbers that are sums of *four* squares are also closed under multiplication, on the way to proving that they encompass all whole numbers.

The identity shows that an inner product can be reduced to the computation of only sums of positive quantities. This works also in the complex case, but there are two interesting versions that are not immediately interchangeable. First, let

where we will have in mind that are unit vectors. Then by the identity we get that is equal to

This consists of only two sums, each of which is a sum of squares. Now we may interchange each by its conjugate , so that we get a proper complex inner product

which is thereby equal to

Note that the conjugates do not go away even though they are squared, so we do not obtain sums of *positive* quantities. To get this, we need a different version that was also found by Lagrange:

where

This is the same as Wikipedia’s formula with replaced by , which in turn fixes an apparent typo in its source. We note that the brute-force proof of this for leads to Lagrange’s four-square lemma. Namely, let

Then the left-hand side becomes

which represents an arbitrary product of sums of four squares that we want to write as a sum of four squares. The right-hand side becomes

This is a sum of four squares, and to verify that it equals the left-hand side, one need only see that all twenty-four cross-terms in *cancel*.

Written more compactly using inner product and norm notation, what we have is

Thus we have written the squared inner product as a difference of two positive real terms, where each term is a sum of squared real sub-terms or a product of the same. Ken and I have been trying to use this to improve some known simulations of quantum algorithms. So far we have no new results, but the above manipulations make a tighter connection seem plausible. The question for the applications we seek is:

Under what conditions can good approximations to the squared sub-terms yield a good approximation to ?

In general, the sticking could be the minus sign, together with the presence of situations where each of the two terms on the right-hand side has magnitude much higher than . Thus approximations of these terms to within will not help unless is truly tiny.

However, in quantum algorithms, we can expect and to be unit vectors. Depending on the algorithm, we may be able to arrange for the squared inner product to yield the acceptance probability, so that is a number between and . Then simply equals . Thus we have:

Hence we can hope that good approximations to the squared terms in the sum can yield a good approximation to . Moreover we are helped by the promise for a algorithm that either is close to zero (so the sum is close to ) or is close to one (so the sum is close to ). What can go wrong?

The sticking point again could be the minus sign, together with what could be exponentially many complex “cross terms” of the form . It may be hard to get enough of a handle on those terms to approximate all their (squared) differences. Two further complications are that sometimes our acceptance probability involves not just one inner product but many, though in some cases we can arrange polynomially many, and that the indices may be limited to some subset of . What could help us most would be a further manipulation of , taking into account the “promise” condition that be near or .

Did you know this identity? Besides Cauchy-Schwarz, what useful inequalities and estimates can be derived—in the presence of certain promise conditions?

]]>

* Another dichotomy theorem *

Jin-Yi Cai is one of the world’s experts on hardness of counting problems, especially those related to methods based on complex—pun intended—gadgets. He and his students have built a great theory of **dichotomy**, which we covered two years ago. This means giving conditions under which a counting problem in must either be in or be -complete, with no possibility in-between. The theory is built around a combinatorial algebraic quantity called the *holant*, which arose in Leslie Valiant’s theory of holographic algorithms.

Today Ken and I wish to discuss a recent paper on the hardness of counting the number of edge colorings, even for planar graphs.

This work is due to our dear friend Jin-Yi—who has been a colleague of each of us—with Heng Guo and Tyson Williams. It is entitled *The Complexity of Counting Edge Colorings and a Dichotomy for Some Higher Domain Holant Problems*. It is here. The extent to which algebra involving both rational and complex numbers is relevant is just one of several interesting things about the paper.

They prove that counting the number of edge colorings of a planar graph is -hard. What is interesting is that it is easy, thanks to the four-color theorem, to show that every cubic biconnected planar graph is three-edge colorable. But counting how many such colorings is hard. Finding one is easy. Finding how many there are in total is hard.

This is a common situation in complexity theory. We have many examples of where finding one object is easy, but finding the total number is hard. This happens for example with perfect matchings, with satisfying assignments to formulas, and with many other situations.

One of my favorite type of graphs is planar graphs. You can draw them, picture them, visualize them. Even the Petersen graph and other important non-planar graphs graphs start to become, at least for me, harder to understand. It is no accident of nature that planar graphs have an efficient isomorphism algorithm, have good separators, and have many other special properties. Of course, thanks to the resolution of the Four-Color Conjecture, we know that any planar graph can be vertex-colored with at most four colors.

This makes Jin-Yi’s result more interesting, since planar graphs are special. His result, however, shows some surprising equivalences between counting problems for regular planar graphs and general graphs. This is one of the reasons that his machinery is interesting.

Complexity theory is often viewed as more difficult than other parts of theory. But I would argue that it is much the same as the area of algorithms, because complexity theory should be viewed as composed of algorithms. Often the algorithms are reductions that show that solving one problem is the same as solving another.

In Jin-Yi’s result he needs the existence of certain special gadgets. These gadgets allow him to prove the counting results. They handle *signatures* which are finite tuples of complex numbers under constraints. A key signature written entails that the second and third numbers are equal and distinct from the first one. Others force all elements of a tuple to be equal, and so on. These conditions parameterize the statements of the counting problems, and the gadgets enable manipulating signatures so as to show reductions between these problems.

The gadgets are more intricate than in many other proofs, they require an understanding of certain number theory facts as well as other tricks. These together mean that the reductions are nontrivial and cause the resulting proof and paper to be long.

The structure of their paper is great, so following the argument is not too difficult. But there are lots of details in their reductions, and so the best—as always—is to read the paper. Actually they supply a flowchart of their gadgets as follows:

They call the second bubble from the bottom the “Bobby Fischer gadget” because one choice of plan surprisingly achieves all of seven separate objectives. Fischer was indeed known for *clarity* as well as *depth*, so the comparison is apt.

What the gadget achieves is a way to simulate a richer set of “signatures” from certain base sets, so that a planar *holant* problem in the richer case polynomial-time Turing reduces to the planar holant problem for the original. Provided the base set includes and where is the size of a certain domain, “Bobby Fischer” shows that one can add the equality-of-four signature without increasing the complexity of the counting problem. Again we say to see the paper for the detailed lemma statement and proof.

We can, however, give here a taste of one of the key issues they face. Let’s take a look at what they call “the lattice condition.”

This is not a disease or malady of some kind. Rather it refers to a kind of independence of numbers. Let the numbers be

all be non-zero complex numbers. Then they satisfy the **lattice condition** provided

for integers (not all zero), implies that . For ease of insight, suppose the are all positive real numbers. Then taking logarithms makes this mean that if for integers not all zero, then

If we remove the condition that the sum of the are zero, then this would be exactly that the logs of the are linearly independent over the integers. The extra condition makes the lattice condition weaker. For example have the lattice condition, but their logarithms are not linearly independent:

This yields that and . Thus assume that . Then it follows that , which is a contradiction: all exponents must be nonzero.

Here is a sample lemma that they need:

Lemma 1Let be a polynomial of degree with rational coefficients. If the Galois group of over is or and the roots of do not all have the same complex norm, then the roots of satisfy the lattice condition.

Later they have an explicit infinite family of polynomials, and they want to show they all have the lattice condition, in the sense that their roots do. If they can show that they satisfy the above lemma they are done. Nothing is easy. So they must work hard to show via a clever argument that only a finite explicit subset of the infinite family of elements fail the lemma. These finitely many elements are shown to be handled by other arguments. The algebra for identifying these elements culminates in the formula

which is needed only to be positive for .

They do mention the famous theorem of Carl Siegel on integer solutions to equations of the form

Siegel in 1929 proved that such an equation has only a finite number of solutions provided the polynomial satisfies a certain condition. Trivially we must avoid linear polynomials, since (e.g.)

has an infinite number of solutions. An important special case of the finiteness theorem is

that

has only a finite number of solutions over the integers, provided are non-zero integers.

The theorem comes up in Jin-Yi’s paper because in constructing their proof they must show that a particular equation has only the “obvious” integer solutions. The Siegel theorem is helpful: it shows immediately that there are only a finite number of solutions. But the theorem gives no effective method that finds all the solutions. So they must show for **this** particular equation that all is okay. They succeed in doing this—and you may find this part of the paper quite interesting enough.

One obvious open problem is: could there be a short proof that counting colorings of planar graphs is -hard? Just because Jin-Yi’s proof is complex does not rule out the existence of a much simpler proof. I wonder…

[fixed definition of lattice condition]

]]>

Howard Goldowsky is the author of this month’s Chess Life cover story. Every month this magazine is mailed to about a quarter of a million players, but this cover story was deemed so important by Daniel Lucas, Chess Life’s editor, that he made it available online, free of charge.

Today I want to talk about this story because it is interesting, important, and it raises some novel theory questions.

The cover picture and the story is about the work that our own Ken Regan is doing to help stop chess cheaters.

Of course cheating at games of all kinds, at sports of all kinds, at almost all human endeavors is ancient. I would be shocked if cave dwellers did not cheat at something. Yet in my quest to find an article on the first known examples of cheating ended in failure. I would be lying—cheating?—if I said that I found the earliest example. I did discover that cheating occurred during the first Olympics in around 776 BCE. Most athletes were honest, although some cheated in various ways. The penalties then were fines or more severe forms of punishment—do not ask.

One of the funniest examples of cheating at an Olympics occurred much more recently during the 1960 Rome Games. One team did the following:

If at first you can’t succeed, cheat. Words to live by for this inept pentathlon team in the 1960 Rome Games. In the first event, the entire team fell off their horses. One athlete almost drowned during the swimming competition, and the team was forced out of the shooting event after a team member nearly grazed the judges. For the fencing event, they decided to secretly send out their expert swordsman each time and hoped no one looked behind the mask. The third time the same fencer came out, however, the hoax was discovered.

An example of really “unmasking the cheater.”

We have talked before about cheating at chess. No drugs are needed, no bribery, no fencing masks; one needs only to use a computer to play the moves for you. The world’s best chess players today have ratings in the range from 2700 to 2820, except for world champion Magnus Carlsen at 2880. The world’s best chess programs, or *engines* as they usually are called, play at 3100 or above. Put simply: the best humans cannot beat the chess engines.

Hence, the recipe for cheating at chess is simple. Get access to a computer during the game, use its suggested moves, and easily defeat your opponent. The computer can be at a remote site run by a confederate or can be on your person. Even a mobile phone can run a program that will play better than anyone you will likely face in your tournament.

The easy of cheating is a major issue for organized chess. The number of cases in professional tournament play is, according to Ken, roughly one per month—one case happen at a tournament in Romania just last month. Ken knows this because he routinely runs his detection methods, more on those shortly, on most major tournaments. So quoting Goldowsky’s article:

“ when a tournament director becomes suspicious for one reason or another and wants to take action, Regan is the first man to get a call.”

Ken also answers a second class of requests. Sometimes a player will voice suspicions in public. In all that Ken has heard, including several times this year, his work has given evidence that everything was above board—pun intended. The player voicing the suspicion usually lost to an apparently weaker opponent, who seemed to make some very strong moves. No one likes to lose, but Ken discovers that in most of these cases the answer is innocent. The weaker player either made strong moves because they were forced or the accuser made quantifiable mistakes.

Obviously some cheaters are caught red-handed. Just like the unmasking of the fencer they are caught in the act, with a computer in hand. These are easy cases. But more and more the cases are becoming quite sophisticated, making it difficult for tournament directors to obtain direct evidence.

This is where Ken’s research enters, where the problem of detecting cheating becomes a theory problem. A hard and interesting theory problem. One that I would like to abstract and explain in my own way.

Consider that a player is faced with many decisions of the form: what move should I make in this position? We can imagine that the player is faced with a series of positions

and for each the player has to make a choice of what move to make. We can assume that the positions are not related, that is not followed by : this is a simplifying assumption, but is a reasonable one to make.

The difficult question is: how do we rate the move the player selects? The trouble is that in many positions there many be only one legal move. For example, if your king is in check there may be one move to get it out of check. There are many other cases where the move is forced or almost forced, so that even a weak player, like myself, would get the right move.

Another difficulty is that the chess engines suggest not one move but several *top moves*. If the position is a winning one for the player, there may be little difference in playing any of the top moves. Of course, the opposite often happens too. Sometimes there is a unique move that wins and all the other moves lose.

These complications make scoring how well a player is following the chess engine, that is how much the player is cheating, a nontrivial statistic issue. The principal theory question is:

How should we rate how well a player matches the chess engine as a method to detect cheating?

The short answer is very carefully.

Ken has spend years working on the right scoring method given these issues and others. I believe that his method of scoring is powerful, but is probably not the final answer to this vexing question. You can read more about it in posts by Ken on this blog: March 2012, May 2012, January 2013, July 2013, September 2013, and April 2014, with more to come.

The problem facing Ken in detecting cheating is hard because the cheater may use many different strategies. Suppose that the player is cheating and the chess engine suggests

Here is a move and the value the engine assigns to the move. We can assume that the higher the value the better the engine thinks the move is: there is a slight issue that I am skipping here, since the machine uses the sign of the value to suggest which of black or white is ahead if the move is used. We can safely ignore this for our discussion.

The problem is what strategy might a cheater use? A naïve strategy is to always select the top ranked move, i.e. the move with the largest value. This strategy would be easy for Ken to detect. A superior strategy might be to select moves based on their values: higher values are selected more frequently. This clearly would be more difficult for Ken to detect, since it is randomized.

Another twist is the cheater could use a cutoff method. If several moves are above a value, say , then these could be selected with equal probability. If is large enough this is essentially using the insight that there may be many ways to win a won game. Indeed in many discussions about chess cheating a related method is suggested: simply randomly take one of the top three ranked moves—it is called the “Top-3 Cheating” method.

I could go on, but the key is that Ken is not able to assume that the cheater is using a known strategy. This makes the detection of cheating much harder, more interesting, and a still open problem. It is essentially a kind of two player game. Not a game of chess, but a game of the cheater against the detector; some player against Ken’s program.

Does the detection of cheating in chess seem to be interesting? Will we see cheating start to occur in other mental tasks in the future as computers get better in those areas? Will people start to cheat at other tasks soon? Or have they started already?

Will one day your STOC/FOCS paper be rejected because it looks like it was written by a computer? Or even worse that the theorems were proved by computer and written up by them? For the record this was all created by me—no cheating at all.

]]>

* Partly a rant from today’s New York Times *

source—note the “\cdots” |

Aryabhata is often called the father of Indian mathematics, and can be regarded as the progenitor of many modern features of mathematics. He completed his short great textbook, the *Aryabhatiya*, in 499 at age 23. Thus if there had been a “STOC 500,” he could have been the keynote speaker. The book has the first recorded algorithm for implementing the Chinese Remainder Theorem, which Sun Tzu had stated without proof or procedure. He computed to 4 decimal places, and interestingly, hinted by his choice of words that could only be “approached.” He introduced the concept of versine, that is , and computed trigonometric tables. He maintained that the Earth rotates on an axis, 1,000 years before Nicolaus Copernicus did, and while he described a geocentric system, he treated planetary periods in relation to the Sun.

Today Ken and I wish everyone a Happy Father’s Day, and talk about recognizing our scientific fathers correctly. This includes a short rant about a flawed piece in today’s New York Times.

Ken believes that Aryabhata further hinted at the “correct” value of . As earlier referenced by our fellow blogger and friend Bill Gasarch, this article by Bob Palais wins Ken’s assent that we should really think in terms of 6.283185307… Here is what Aryabhata wrote:

Add 4 to 100, multiply by 8, and then add 62,000. By this rule the circumference of a circle with a diameter of 20,000 can be approached.

Why wouldn’t he say to multiply by 4, add 31,000, and get the circumference for a diameter of 10,000? Ken observes that 10,000 was already the more recognizable unit, so Aryabhata may have been hinting at the ratio to the radius. Indeed Aryabhata flatly stated a decimal place system as the norm for arithmetic, as already used in earlier manuscripts he cited, though he himself extended it with a system of Sanskrit phonemes capable of handling fractions and decimal places.

Aryabhata is sometimes also credited with the first symbolic use of zero, at least implicitly. His successor Brahmagupta, a century-plus later, perhaps deserves greater credit for being the first to treat zero as an explicit *entity*. He wrote rules such as zero plus anything is itself, zero times anything is zero, and “a fortune subtracted from zero is a debt.” Brahmagupta waded into trouble only when he ruled that zero divided by zero is zero, and tried to make unique sense out of other fractions with zero. But our point is that in the West he seems to get zero credit for zero at all. And that brings me to giving credit to “fathers” of fields correctly.

This Sunday’s New York Times has, on page nine of the Review section, an article by Ignacio Palacios-Huerta, which is titled “The Beautiful Data Set.” He is a professor of economics at the London School of Economics, and is the author of a recent book on Game Theory.

In his article he uses soccer as a way to explain game theory. He is especially interested in penalty kicks. These are two person games between the goalie and the player. It is a zero-sum game, since either the kick goes in or it does not. He has looked at over nine thousand such kicks in professional games that took place over the period 1995 to 2012. He found that roughly 60% go to the right of the net and the rest to the left. This is, as he notes, not exactly an even distribution; this is probably due to players having a stronger kick to one side than the other.

All of this is interesting, and I like when the NYT, or any mainstream media, talks about mathematics. But Palacios-Huerta repeatedly gets one thing wrong. I cannot understand why, but in my opinion he does. He repeatedly attributes the theory of such zero-sum games to John Nash. He calls all of the above “Nash’s Theory.”

This is just wrong. The theory of zero-sum games is due to John von Neumann. Recall that he published the famous Minmax Theorem in 1928 and said:

As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved.

Palacios-Huerta knows this—he quotes the above in his book—so I am at best puzzled by his use of Nash in his NYT article. The only point I can see is that the public may recall Nash, since he won a Nobel Prize and is featured in a great book and even a major motion picture. Soccer is after all called the “Beautiful Game,” whence his “Beautiful Data Set,” and the association with “A Beautiful Mind” called up Nash to either him or the editor—or both.

But I still am disappointed with the NYT. The whole claim to fame for Nash is that he extended the theory of games to include non-zero-sum games, that his theory extended the minmax theorem to such games. To invoke Nash, in a misleading manner, seems to me to be wrong—something that does not fit in print.

What do you think? Not only about the NYT article, but also about how we have interpreted claims about Aryabhata. In any event, Happy Father’s Day.

]]>

* I wish I could have used Theorem X… *

IISC centenary source |

Vijaya Ramachandran is now the Blakemore Regents Professor of Computer Science at The University of Texas at Austin. Once upon a time she was my PhD student, at Princeton, working on the theory of VLSI chip design. Long ago she moved successfully into other areas of theory. She became a leader in cache oblivious algorithms, an area created by Charles Leiserson, whom I advised as an undergraduate many years ago. It’s a small world.

Today I want to talk about theorems, old theorems and new, that I wish I could have used in a paper.

Call the phenomenon *theorem envy*: There are theorems that I have long known, but have never been able to use in any actual paper, in any proof of my own. I find this curiously unsatisfying, to know a cool result and yet be unable to use it in one of my papers. Perhaps I am alone in having this feeling, perhaps you only care about solving your problem, not how you solve it. Mostly I would agree with that. But part of me would like to use Theorem X in one of my papers. It just seems like it would be cool to do that.

This isn’t about envying those who originally proved the theorems. It’s about those who actually use the big gear in their toolboxes. We may have lots of shiny gadgets in our boxes, but part reason they’re shiny is they haven’t been used.

I realized this years ago when Ramachandran was working on her thesis. One of her results solved a problem that most had ignored—could not solve—about the cost of resizing drivers to make a chip go faster. One way to make a signal propagate faster across a chip is to use more power: the larger the driver, the faster the signal can move. This remains as significant today as then. The issue is not the speed-of-light restriction, but is related to the time it takes to charge a capacitive load:more power makes it happen faster. The downside then, and much worse today, is the added heat that power causes.

There is a curious cyclic issue in resizing drivers. The approach was in two steps:

- Look at a chip and determine the wires that were too long and were slowing down the chip.
- Resize their respective drivers so that they would go faster.

In practice this worked quite well. Yet there is a subtle bug: the resizing of the drivers moves parts of the chip, and this can make some wires become new bottlenecks. The obvious answer is to do the following:

- Look at a chip and determine the wires that were long and were slowing down the chip.
- Resize their respective drivers so that they would go faster.
- Look at the chip, and if it is still slow repeat step 1.

The theoretical question is, does this process actually ever stop? Is it possible that the chip just blows up and thus no resizing will make it go faster? Or is this unlikely or even impossible?

The answer is that using the famous Brouwer Fixed-Point Theorem, Ramachandran was able to prove that this cannot happen. She showed that given some reasonable restrictions on the chip topology, the process becomes a continuous function on a convex compact set, which by the theorem has a fixed point. Brouwer means “brewer” in Dutch, and one way to visualize is stirring a brew in a pot—as opposed to a torus where the flow can go around with no vortex. The analysis also needs and shows quick convergence, but the essence is that things stop because the resizing operation has a fixed point.

I like her result. In my opinion its practical impact was shedding light on CAD, but I liked it even more because it used the wonderful Brouwer Fixed-Point Theorem. I never have been able to use that in any of *my* papers. Oh well.

So let me give a few other examples of other theorems that I would still like to get a chance to use in some future paper.

Here is a partial list of my top few:

*Regularity Lemma*. The famous Szemerédi regularity lemma states that every large enough graph can be divided into subsets of about the same size so that the edges between different subsets behave almost randomly. It was proved in a weaker form by Endre Szemerédi and used to prove his famous theorem on arithmetic progressions.

I have known it for many years, but have never been able to use it in my work. Of course many others have used it in their work. I would still like to be able to use it.

*Ax—Grothendieck theorem*. This is a famous theorem about the relationship between injectivity and surjectivity of multi-variate polynomials. It was proved independently by James Ax and Alexander Grothendieck. One version of the theorem is that if is a polynomial function from to and is injective then is bijective.

The full theorem is more general, and what’s remarkable is that the basic mechanism of the proof is really quite simple. It relies on the fact that injective implies surjective for finite sets: a map that is one-to-one on a finite set must be onto. I believe this should be useful in problems that concern polynomials, which after all is one of our main objects of study in complexity theory.

*Baire Category Theorem*. This is the famous theorem of René-Louis Baire, who proved it is in his 1899 doctoral thesis. It is an existence theorem showing that any countable family of open and everywhere-dense sets in a complete metric space has a non-empty intersection. It has been used in topology and analysis to prove the existence if many wonderful and exotic objects.

In our world we often prove the existence of exotic objects via the probabilistic method, but the Baire Category Theorem is another different method. It has constructive versions in certain settings, and complexity theorists have used them to show that certain oracle constructions are generic. I believe there should be some further interesting applications of it, and still hope to use it someday.

*Feit-Thompson theorem*: this is the famous theorem that every odd order group is solvable. It was proved by Walter Feit and John Thompson in the early 1960′s. It is a cornerstone to the understanding and eventual classification of finite simple groups: it implies that all simple groups are even order.

I have always thought that there might be some deep connection between this theorem and our inability in complexity theory to understand the power of computations modulo composites, but have never been able to make this idea into a theorem.

*Hilbert’s Nullstellensatz*. This does get applied, as we noted here. The cool German name is enough to induce envy. Particularly interesting are various effective and combinatorial forms of it.

This also stands for theorems in polynomial ideal theory and algebraic geometry in general, as used e.g. by Ketan Mulmuley and Milind Sohoni in their attack on type questions, and by Craig Gentry in crypto for some schemes of fully homomorphic encryption.

Do you ever feel theorem envy? Which theorems would you like to use? Am I unique and alone in this affliction? Is there a cure?

]]>