Busses Come In Threes, Why Do Proofs Come In Two’s?
Why do theorems get proved independently at the same time
Jacques Hadamard and Charles-Jean de la Vallée Poussin, Neil Immerman and Robert Szelepcsenyi, Steve Cook and Leonid Levin, Georgy Egorychev and Dmitry Falikman, Sanjeev Arora and Joseph Mitchell, are pairs of great researchers. Each pair proved some wonderful theorem, yet they did this in each case independently and at almost the same time.
Today Ken and I want to try to explain how this happens.
I have said many times that I am surprised that these coincidences occur as often as they seem to do. Is this reasonable? Or is there some deep reason that makes it happen? Are some results in the “air,” and so it is expected that multiple people will discover them at about the same time?
Here are several examples—there are many more, many more. I find this a mystery.
Prime Number Theorem:
Recall this is the theorem that shows that , where is the number of primes less than . Hadamard and Vallée Poussin each proved the theorem in 1896 using complex analysis. That is not a “complex” type of reasoning, but analysis about functions defined for complex numbers: numbers of the form .
van der Waerden’s permanent conjecture:
In 1926 Bartel van der Waerden made a simple sounding conjecture about the permanent of a double stochastic matrix. Such a matrix has non-negative entries and each row and column must sum to . He conjectured that
Even further that the only case of equality is when the matrix has all entries . Such a basic question stood for over half a century until it was solved independently by Egorychev and Falikman, in 1979 and 1980. I have discussed this before here.
Cook and Levin both discovered that -complete problems exist. They used different definitions, terminology, and examples. But both proved that certain key problems were, as Levin says, “universal.” One cannot include Dick Karp here because his important work built on Cook’s. Dick showed that there were many such problems, gave the right names to the key concepts, weaken the notion of reductions used, and generally created the problem as we know it today.
closed under complement:
Immerman and Szelepcsenyi both proved that nondeterministic space classes are closed under complement. I discussed this result and gave a fairly detailed sketch of their proofs here. Their proofs are brilliant, and again depend on the same prove technology.
Euclidean Traveling Salesman PTAS:
Arora and Mitchell both found in 1996 a polynomial-time approximation scheme (PTAS) for the Traveling Salesman problem for points in . Mitchell’s scheme came a few months later and was more complex; Arora was able to extend his to work for for all . The concept of a PTAS was generally known in the late 1970’s, so this was a 20-year problem.
The last two pairs won the 1995 and 2010 Gödel Prizes, respectively. The prize has been given to multiple papers in other years. Shafi Goldwasser, Silvio Micali, and Charles Rackoff shared the 1993 prize with Laszlo Babai and Shlomo Moran for the development of interactive proofs, but these were more for great concepts than solving a long-open problem. Both papers appeared in STOC 1985, but GMR was out in 1982; Ken heard it talked about glowingly at the FCT conference in Sweden in August 1983. The prize for the PCP Theorem in 2001 was for concurrent work in 1992, but this clearly built on a long succession of ideas—indeed this essay by Ryan O’Donnell gives photo credit to 34 people in its development.
I find it a mystery how a problem can be open for decades or longer and then suddenly solved by two independent researchers at essentially the same time. If there were communication, even one bit—the problem is solved—that I can understand. But as far as I know that did not happen in the above cases, nor did it happen in other cases. How is this possible?
Ken Regan says that it is less of a mystery to him. He believes that it is related to other probabilistic phenomena, such as the “bunching phenomena” described in these slides. The slides reference the common commuter’s complaint that buses may have long gaps between arrivals and then “come in threes,” per the title of a popular book on everyday mathematical phenomena.
In writing the following, however, Ken admits he hasn’t made the formal connection to bunching, nor is the model completely worked out in all details. Some implicit assumptions are made, such as the independent of the key events.
Let’s suppose researchers work in isolation, which was much more the case before e-mail, Internet, even good international phone service. This is a pretty clear assumption in the above examples: several of the pairs were separated by language and culture, or even by more. During the cold war certainly there was a very low bandwidth connection between East and West.
For a given theorem , any of a given number of researchers might prove at any time. We can then consider a proof of as an arrival. Let us call time the time at which researchers first conceive of as an idea or conjecture. Then for any later time , let denote the number of arrivals by time . The process is a Poisson process if for all :
- is independent of .
- is independent of , for all ; and
The former says that the theorem does not become easier to prove over time—people either make the breakthrough or they don’t. The latter says that the number of people who solve it within a given time interval does not depend on how many have solved it before. This is debatable, because some theorems are easier than others. However, a coincidence on an easier theorem will be less notable, so let us assume all the theorems are equally hard. The definition of Poisson process also presumes discrete time units in which at most one arrival can occur.
This allows us to condition the probability on the time at which the first proof occurs. Note that our sample does not include unproved theorems, so every object compared has such a time. The above axioms then say the modeling is the same regardless of when is. The issue then becomes whether a second proof arrives at a time close enough to to be considered independent. The modeling depends on several more factors:
- The time it takes for the proof to diffuse at-large, so that work after time will not be considered independent.
- The expected number of independently working researchers (minus the original prover!) who would find proofs in a reference time interval .
Put together, we have reduced to the case of getting (at least) one arrival in an interval of time under a simple Poisson process with mean . Then itself is a seat-of-the-pants estimate for the probability of shared credit. Actually the correct theory for a Poisson process is that the probability of having no other proofs in time is given by . This means that the probability of shared credit is
To compare some numbers, suppose months years, years, and . Then the linear formula gives a shared-credit probability of . The correct formula gives a probability of which is not too much different.
The issue then becomes the values of and . From the above examples it seems like the time could even be a whole year and the community will be generous. It seems best to estimate as the amount of time taken from the moment the problem was clearly formulated, to its solution. The probability of shared credit then becomes simply . Note that the units of years have been factored out by itself being a year.
So how big is ? The Prime Number theorem had been suspected for a century, though one can date its true prominence from Bernhard Riemann’s famous 1859 paper on the distribution of primes, giving 37 years until 1896. For space complexity one can date from the Turing-award winning paper by Juris Hartmanis and Richard Stearns in 1965 until 1987, for 22 years. For NP-completeness one could have as little as 1971 – 1965 = 6 years, or go back much further since Cook’s paper in particular expressly followed Alan Turing’s on the r.e.-completeness of first-order logics. Van der Waerden’s problem took 53 years, while Euclidean TSP took 20. A crude average here gives , call it 25. So we can guesstimate independent shared credit for 1-in-25 = 4% of prominent theorems.
I am still puzzled, because coincidences seem to be more than 1/25 of the big results in the fields I know best. Perhaps the above hasn’t really taken bunching into account. Perhaps the conditioning should be on “a proof” not “the first proof,” so that the relevant time interval is so the probability at least doubles? Perhaps we should not use conditioning but picture independent random variables in the interval ? Perhaps those variables are not really independent. What do you think?
Is Ken’s a reasonable model? How often should one expect independent proofs of identical theorems?
[fixed typo “interview”–>”interval”, added caveat about (non-)independence.]