# Busses Come In Threes, Why Do Proofs Come In Two’s?

* Why do theorems get proved independently at the same time *

Jacques Hadamard and Charles-Jean de la Vallée Poussin, Neil Immerman and Robert Szelepcsenyi, Steve Cook and Leonid Levin, Georgy Egorychev and Dmitry Falikman, Sanjeev Arora and Joseph Mitchell, are pairs of great researchers. Each pair proved some wonderful theorem, yet they did this in each case independently and at almost the same time.

Today Ken and I want to try to explain how this happens.

I have said many times that I am surprised that these coincidences occur as often as they seem to do. Is this reasonable? Or is there some deep reason that makes it happen? Are some results in the “air,” and so it is expected that multiple people will discover them at about the same time?

** Some Coincidences **

Here are several examples—there are many more, many more. I find this a mystery.

**Prime Number Theorem:**

Recall this is the theorem that shows that , where is the number of primes less than . Hadamard and Vallée Poussin each proved the theorem in 1896 using complex analysis. That is not a “complex” type of reasoning, but analysis about functions defined for complex numbers: numbers of the form .

**van der Waerden’s permanent conjecture:**

In 1926 Bartel van der Waerden made a simple sounding conjecture about the permanent of a double stochastic matrix. Such a matrix has non-negative entries and each row and column must sum to . He conjectured that

Even further that the only case of equality is when the matrix has all entries . Such a basic question stood for over half a century until it was solved independently by Egorychev and Falikman, in 1979 and 1980. I have discussed this before here.

** question:**

Cook and Levin both discovered that -complete problems exist. They used different definitions, terminology, and examples. But both proved that certain key problems were, as Levin says, “universal.” One cannot include Dick Karp here because his important work built on Cook’s. Dick showed that there were many such problems, gave the right names to the key concepts, weaken the notion of reductions used, and generally created the problem as we know it today.

** closed under complement:**

Immerman and Szelepcsenyi both proved that nondeterministic space classes are closed under complement. I discussed this result and gave a fairly detailed sketch of their proofs here. Their proofs are brilliant, and again depend on the same prove technology.

**Euclidean Traveling Salesman PTAS:**

Arora and Mitchell both found in 1996 a polynomial-time approximation scheme (PTAS) for the Traveling Salesman problem for points in . Mitchell’s scheme came a few months later and was more complex; Arora was able to extend his to work for for all . The concept of a PTAS was generally known in the late 1970’s, so this was a 20-year problem.

The last two pairs won the 1995 and 2010 Gödel Prizes, respectively. The prize has been given to multiple papers in other years. Shafi Goldwasser, Silvio Micali, and Charles Rackoff shared the 1993 prize with Laszlo Babai and Shlomo Moran for the development of interactive proofs, but these were more for great concepts than solving a long-open problem. Both papers appeared in STOC 1985, but GMR was out in 1982; Ken heard it talked about glowingly at the FCT conference in Sweden in August 1983. The prize for the PCP Theorem in 2001 was for concurrent work in 1992, but this clearly built on a long succession of ideas—indeed this essay by Ryan O’Donnell gives photo credit to 34 people in its development.

** The Mystery **

I find it a mystery how a problem can be open for decades or longer and then suddenly solved by two independent researchers at essentially the same time. If there were communication, even one bit—the problem is solved—that I can understand. But as far as I know that did not happen in the above cases, nor did it happen in other cases. How is this possible?

** A Model **

Ken Regan says that it is less of a mystery to him. He believes that it is related to other probabilistic phenomena, such as the “bunching phenomena” described in these slides. The slides reference the common commuter’s complaint that buses may have long gaps between arrivals and then “come in threes,” per the title of a popular book on everyday mathematical phenomena.

In writing the following, however, Ken admits he hasn’t made the formal connection to bunching, nor is the model completely worked out in all details. Some implicit assumptions are made, such as the independent of the key events.

Let’s suppose researchers work in isolation, which was much more the case before e-mail, Internet, even good international phone service. This is a pretty clear assumption in the above examples: several of the pairs were separated by language and culture, or even by more. During the cold war certainly there was a very low bandwidth connection between East and West.

For a given theorem , any of a given number of researchers might prove at any time. We can then consider a proof of as an *arrival*. Let us call time the time at which researchers first conceive of as an idea or conjecture. Then for any later time , let denote the number of arrivals by time . The process is a *Poisson process* if for all :

- is independent of .
- is independent of , for all ; and

The former says that the theorem does not become easier to prove over time—people either make the breakthrough or they don’t. The latter says that the number of people who solve it within a given time interval does not depend on how many have solved it before. This is debatable, because some theorems are easier than others. However, a coincidence on an easier theorem will be less notable, so let us assume all the theorems are equally hard. The definition of Poisson process also presumes discrete time units in which at most one arrival can occur.

This allows us to condition the probability on the time at which the first proof occurs. Note that our sample does not include unproved theorems, so every object compared has such a time. The above axioms then say the modeling is the same regardless of when is. The issue then becomes whether a second proof arrives at a time close enough to to be considered independent. The modeling depends on several more factors:

- The time it takes for the proof to diffuse at-large, so that work after time will not be considered independent.
- The expected number of independently working researchers (minus the original prover!) who would find proofs in a reference time interval .

Put together, we have reduced to the case of getting (at least) one arrival in an interval of time under a simple Poisson process with mean . Then itself is a seat-of-the-pants estimate for the probability of shared credit. Actually the correct theory for a Poisson process is that the probability of having *no* other proofs in time is given by . This means that the probability of shared credit is

To compare some numbers, suppose months years, years, and . Then the linear formula gives a shared-credit probability of . The correct formula gives a probability of which is not too much different.

The issue then becomes the values of and . From the above examples it seems like the time could even be a whole year and the community will be generous. It seems best to estimate as the amount of time taken from the moment the problem was clearly formulated, to its solution. The probability of shared credit then becomes simply . Note that the units of years have been factored out by itself being a year.

So how big is ? The Prime Number theorem had been suspected for a century, though one can date its true prominence from Bernhard Riemann’s famous 1859 paper on the distribution of primes, giving 37 years until 1896. For space complexity one can date from the Turing-award winning paper by Juris Hartmanis and Richard Stearns in 1965 until 1987, for 22 years. For NP-completeness one could have as little as 1971 – 1965 = 6 years, or go back much further since Cook’s paper in particular expressly followed Alan Turing’s on the r.e.-completeness of first-order logics. Van der Waerden’s problem took 53 years, while Euclidean TSP took 20. A crude average here gives , call it 25. So we can guesstimate independent shared credit for 1-in-25 = 4% of prominent theorems.

** Open Problems **

I am still puzzled, because coincidences seem to be more than 1/25 of the big results in the fields I know best. Perhaps the above hasn’t really taken *bunching* into account. Perhaps the conditioning should be on “a proof” not “the first proof,” so that the relevant time interval is so the probability at least doubles? Perhaps we should not use conditioning but picture independent random variables in the interval ? Perhaps those variables are not really *independent*. What do you think?

Is Ken’s a reasonable model? How often should one expect independent proofs of identical theorems?

[fixed typo “interview”–>”interval”, added caveat about (non-)independence.]

Why the picture of James Lee?

Sorry—didn’t note in mod queue until now (6:17pm), mod is only for first-time commenters. A colleague noted this too, and Dick fixed it.

I trust you will enjoy the story of the discovery of the HOMFLY polynomial. Here is the description from the original article, “A new polynomial invariant of knots and links”:

“Editor’s Note. The editors received, virtually within a period of a few days in late September and early October 1984, four research announcements, each describing the same result—the existence and properties of a new polynomial invariant for knots and links. There was variation in the approaches taken by the four groups and variation in corollaries and elaboration. These were: A new invariant for knots and links by Peter Freyd and David Yetter; A polynomial invariant of knots and links by Jim Hoste; Topological invariants of knots and links, by W. B. R. Lickorish and Kenneth C. Millett, and A polynomial invariant for knots: A combinatorial and an algebraic approach, by A. Ocneanu.

It was evident from the circumstances that the four groups arrived at their results completely independently of each other, although all were inspired by the work of Jones (cf. [10], and also [8, 9]). The degree of simultaneity was such that, by common consent, it was unproductive to try to assess priority. Indeed it would seem that there is enough credit for all to share in. Each of these papers was refereed, and we would have happily published any one of them, had it been the only one under consideration. Because the alternatives of publication of all four or of none were both unsatisfying, all have agreed to the compromise embodied here of a paper carrying all six names as coauthors, consisting of an introductory section describing the basics written by a disinterested party, and followed by four sections, one written by each of the four groups, briefly describing the highlights of their own approach and elaboration.”

The HOMFLY polynomial is a knot invariant which could have easily been discovered, via its skein relation, in the 1960s. However, it appears that the skein relation associated with the Jones polynomial (originally defined through operator algebras) prompted its actual discovery.

[Comment edited by KWR to remove “jaggies”. Thanks, a great example!]

Regarding van der Waerden’s conjecture, there are rumours that the papers of Egorychev and Falikman were not in fact independent, but rather one author stole the other’s results.

Maybe the coincidence is due to the Birthday paradox.

I think Tom Lehrer, in his song “Lobachevsky” had a proposed solution to the problem 🙂

hahaha! thanks for that!

That’s an interesting model, and I think it does help to account for a good part of this phenomenon.

One key point which I think is overlooked is a sort of communal propensity to make a discovery. Consider for example NL closed under complementation. This result could not really have been found before the formulation of a formal definition of computation. As time went on, the Turing machine was discovered, as was the Church-Turing thesis that there is really only one notion of computation. This gave the notion of complexity classes much more robustness. At some point the complexity class L was first studied, and then NL. I claim that each of these landmarks raises the communal propensity to discover that NL is closed under complementation. There are surely many other small (and perhaps large) discoveries which helped to set the scene.

This would correspond to a Poisson process whose intensity increases with time, suggesting the longer we wait for a proof, the more likely it is to appear in groups.

Unfortunately this suggestion would be very difficult to model, since the “propensity” functions associated to different theorems would be very different in general. But, heuristically, I think the notion helps to capture the reality that proofs do not occur in a vacuum, but are heavily influenced by what comes before.

Thank you regarding the model, though actually I was dismayed not to get a real involvement of “bunching” phenomena. I dated L-NL from 1965 as the formulation, but you’re right that involvement with those classes “ramped up” in the 1980’s. We meant to say also that a Poisson process embodies an assumption of independence of the “arrivals” that may not hold here. Your idea of “propensity” is probably on-track, but I was hoping to leverage something specifically about clustering.

Perhaps 2 people will come up with a simultaneous explanation of why 2 people come up with simultaneous explanations! How perfect would that be?

Malcolm Gladwell wrote an article for the New Yorker titled “In the air” about how one of the main reasons discoveries are often made at almost exactly the same time by independent parties is that the ideas are simply “in the air”, waiting to be plucked from the collective wisdom of the community:

http://www.newyorker.com/reporting/2008/05/12/080512fa_fact_gladwell

There are lots of interesting examples to be found in that article. I recommend reading it for those examples alone!

Another problem that was independently solved by several people at the same time:

http://en.wikipedia.org/wiki/Angel_problem

Apologies in advance for my off-topic comment.

Probably I’m wrong, but I believe the paper of John Conway on the Angel Problem (which can be found at http://library.msri.org/books/Book29/files/conway.pdf), in particular the proof of Theorem 3.1 (the main theorem of the paper), is incorrect. Quoting:

“he should truncate this cone by a horizontal line AB at a very large

height H above the Fool’s starting position, and use his first few moves to eat

one out of every M squares along AB, where M is chosen so that this task will

be comfortably finished when the Angel reaches a point Q on the halfway line

that’s distant H/2 below AB”

As far as I can tell, he sort of assumes M to be a constant he can choose here, while it actually is a function of H.

Can anyone clarify?

oh my goodness! i spent a lot of futile hours on that back in high school, i hadn’t realized it’d been solved!

I’ve often wondered about this too. And it’s not limited to proofs and problems, but to ideas of all kinds. Similar ideas seem to emerge around the same time, and it is mind-blowing.

You forgot to mention maybe the most important and amazing of such independent and contemporary proofs (for us computer scientists at least): the Church and Turing’s solution to the Entscheidungsproblem.

And no, I don’t think that a simple Poisson process can describe these events: the Entscheidungsproblem is a problem posed in 1928 but a solution could only appear after 1931. Theorems are product of their time. So I agree with Andy.

I don’t think this is a good model. These are not independent proofs. They are all by members of a community. Why were the people working on these particular problems and not on an infinite number of others? To me, because their scientific community had built up enough background knowledge to let members with somewhat similar background realize these were interesting problems to work on. Even across the iron curtain there was plenty more than a few bits of information exchanged in nicely bunched “journaled” packets.

“In the air” is just this. Spending mental effort reading papers that the community deemed important puts people in a similar location in the landscape of ideas. You might try to somehow survey how many people were actually working in the area of each of these to see if it just a fact of more people working with similar mental tools which likely were available to these people at similar times also.

This phenomenon of multiple discovery has been extensively studied. See, e.g.: http://scholar.google.ca/scholar?hl=en&q=%2Bmultiples+%2Bdiscovery With regard to the Poisson model, it appears that http://sss.sagepub.com/content/8/4/521.short may be relevant (I haven’t read it). I have read, and recommend, Steven Johnson’s recent book “Where Good Ideas Come From”.

One of the underlying themes of the Connections TV series by James Burke was that many scientific discoveries become almost inevitable at certain moments in history. Once all of the necessary prerequisites are present in a culture, it is just a matter of time before someone connects the pieces together. He was looking at science and technology, but I’m sure the idea applies to a large extent to math and CS theory as well.

Another example of coincidence in TCS appears to be the concept of Alternating Turing Machines – simultaneously originated by Chandra and Stockmeyer in one paper, and Kozen in a separate paper. The story goes that they were both submitted to the same STOC and a merged paper appeared in STOC and later as a JACM paper. (If this is not the correct story, I hope some better informed reader shall correct it.)

I’m happy that the Poisson analysis provides a lower-bound to the problem. In reality, I’m confident that there are at least some amount of “in the air” type correlations, such that actual probabilities will be higher.

The assumption that ” the theorem does not become easier to prove over time” is a pretty depressing model for our collective progress. As if we could hop from one big result to the next and the rest is irrelevant. The assumption is so objectionable that I would be reluctant to use the model as a starting point, but the comment by Andy Parrish is well taken.

Even though the ideas are revolutionary, they are ideas whose time has come. The different way of seeing things becomes almost unavoidable, as Kurt describes. The anecdote described by Anonymous Rex is an extreme example; all the results credited a specific work for the main insight.

To ignore this, even in a first crude model, is to ignore the heart of the phenomenon in my opinion.

Like some other people above, I think modelling proof discoveries by different people as independent random variables is obviously wrong. What makes a problem change from being apparently completely intractable into something that it is reasonable to work on? Very often it is that other results are discovered that improve the set of tools available. The independent-variables model is what one might come up with if one imagined that the difficulty of a problem remains constant and the hardest problems have to wait for a super-genius to come along with a magic flash of inspiration that could strike at any time.

To give a slightly strange example, although the Poincaré conjecture was solved by a single person, the difficulty of the problem changed substantially after the work of Hamilton. At the very least, I think it would make sense to say that this greatly reduced the value of T, thereby increasing substantially the probability that the proof would be discovered independently by more than one person. (The fact that it wasn’t discovered by more than one person doesn’t affect the validity of this argument, and of course there are many other examples of this general phenomenon, some of which have resulted in independent discoveries.)

I now see that my point is more or less identical to that of Andy Parrish …

Consistent with both Tim Gowers’ and Andy Parrish’s point is the striking increase in relative frequency since 1980 of the Google ngrams proof methods, proof tools, proof technology. Google Books confirms that these three ngrams are used largely by the authors of mathematical texts.

Does increased consciousness of proof-methods-as-technologies foster simultaneous discovery? This seems plausible … but perhaps not readily susceptible to rigorous proof.

“I find it a mystery how a problem can be open for decades or longer and then suddenly solved by two independent researchers at essentially the same time”

I think a possible answer to this fascinating mystery lies in the concept of Collective Unconscious, whose existence is believed by several scientists and which is in turn explained (although not rigorously, yet) by Quantum Physics. The fact that two different, distant people make the same discovery without prior communication between them may be seen as a form of macroscopic quantum entanglement. There is a great deal of exciting research at the intersection of Psychology, Psychiatry, Neuroscience, Quantum Physics and Biology regarding Collective Unconscious, the surprising events ascribable to it, and how to precisely explain them and reproduce them.

Don’t forget the famous (Godel prize winning) proof that did come in threes:

– “The Topological Structure of Asynchronous Computation”. Maurice Herlihy and Nir Shavit.

– “Wait-Free k-Set Agreement Is Impossible: The Topology of Public Knowledge”. Michael Saks and Fotios Zaharoglou.

– “Generalized FLP impossibility result for t-resilient asynchronous computations”. Elizabeth Borowsky and Eli Gafni.

These papers appear consecutively in STOC’93.

Wow, thanks—I missed that while perusing the awards!

http://arxiv.org/abs/1412.3718 in which I discuss multiple discovery and refer to Ken Regan’s Poisson model as ‘tongue-in-cheek’, I hope not too unfairly!

Fine by me. 🙂 It was more a “magpie” factor that often happens while (co-)writing or editing posts. At the time I was considering Poisson arrival processes in my chess research, so it was natural to spend a little time delving thru the two linked references and others. But it was a testimony crunch time in a prominent chess cheating case, plus many other things were happening and the Fukushima quake to boot, so such time was limited.

I just came across an example with a longer timespan while re-reading parts of

A Brief History of Timeafter seeing the ToE movie: as noted by Hawking on p6, theOlbers Paradoxwas hardly original to Heinrich Olbers in 1823, and he uses a plural for those who wrote about it in the time of Newton.The answer is straightforward – conceptual waves! It’s actually quite fruitful to conjecture that processes are to concepts, what experimental measures are to quantum objects. Therefore, ideas are due to flow non-locally and independently through different minds, in exactly the same way that a particle is due to go simultaneously through both slits in the double-slit experiment! But whenever you try to find which slit it went through, you stumble upon decoherence – and this phenomenon also occurs with concepts. That is, whenever you try to find who of Newton and Leibniz invented the calculus, you face a decoherence phenomenon. In my opinion, it explains why their contemporaries where misled in believing there was only one creator instead of two!

There are indeed conceptual waves, just like there are waves of matter! This original viewpoint sheds much light onto the controversy of intuitionism vs formalism, because intuitionism is to formalism what the wave viewpoint is to the particle viewpoint in quantum mechanics. Thus, whenever you’re formalizing a concept, you’re actually observing it physically!

In view of the above, here goes my current conception about the P vs NP question. I think the accuracy of an algorithm belongs to the world of ideas, whereas its time behavior belongs to the world of formalizations. This is the true reason why the P vs NP question cannot be answered. Asking asking whether P=NP amounts to making a confusion between an object and its potential observations. In more traditional words, it amounts to confusing the mathematics and its metamathematics.

Conceptual waves might allow one to prove that NP-complete problems have probability zero of getting solved efficiently. Still, I wonder what the probability is for integer factoring… 🙂

Of course, the probability wave associated to a process doesn’t travel at the speed of light. Rather, concepts are doomed to travel as fast as our communications means – and at the speed at which we can understand them and teach them to somebody else. So, in a Feynman diagram, you just have to replace each electron with a mathematician, and each photon with an Internet connection! After all, my conceptual waves aren’t much more fictitious than Schrödinger’s wave functions…