Seeking the limits of mathematical proofs

Ivan Vinogradov was a famous mathematician who created a whole method, named after him, that allowed him to make major advances on many open number theoretic problems. These included his theorem on the Weak Goldbach Conjecture and many others.

Today Ken and I wish to talk about a new type of “mathematical proof,” illustrating it for the Weak Goldbach Conjecture.

It is strange, pushes the limits of what we usually call a proof, but still might be considered a proof. Our plan is to present it first abstractly, then give a concrete example of the new style “proof.” The central questions today are: Is this new type of proof really a proof, and is it new?

## What Is A Proof?

We know what a mathematical proof is. It is a convincing argument that some mathematical statement is correct. The rigor of proofs has changed slowly over the years, yet today we would all agree what constitutes a proof and what does not. Of course some are short and straightforward, some are short and complex, some are long and complex, few are long and straightforward.

The acid test is: can a proof be checked? If it cannot then most would argue it is not a proof. A poorly written proof may fail this test, even if the writer of the proof is convinced of its correctness. Some proofs over the years have been rejected because they failed this checking test, and some that failed were later judged to be correct. Many “correct” proofs still have typos or minor errors that are easy for a checker to fix. A typo of ${x}$ for ${x+1}$ may be easy to spot, and thus a proof that is technically incorrect is really fine.

## Computer Proofs

Let ${S}$ be some statement we wish to prove. A standard proof would be a written argument that convinces you and others that ${S}$ is true. If the proof uses computation then what usually happens is:

1. For some statement ${T}$ you prove that ${T \rightarrow S}$.

2. Then you use your computer to check that ${T}$ is correct.

This of course then shows that ${S}$ is proved. Well, this is the issue with computer-based proofs. The fear is that the checking of ${T}$ may be faulty due to a program error or even a hardware error. This can be reduced by having people review the programs, or even better having them independently write their own programs to check ${T}$. But there remains some feeling—for many—that such proofs could be wrong. We now have many proofs that no human can check, including proofs of the Four-Color Theorem (until recently) and this year’s advance on Paul Erdős’s Discrepancy problem (EDP). Many are uncomfortable with proofs of this kind, proofs that rely on massive computation. In any case we will consider human proofs and machine proofs as “standard” types of proofs. For thoughtful comments on machine proofs see this by Robert Pollack.

We think we have discovered a new type of proof—it fits into neither of these categories.

## Monte Carlo Proofs

We will now describe our new type of proof. It is a type of computer-based proof, but one that ups the ante and is even more problematic. We still think it is quite interesting and should be studied.

So let look at a new way to prove ${S}$. Let ${A_{i}}$ and ${B_{i}}$ be additional statements for ${i=1,\dots,N}$. The keys are the following:

1. We can prove that for each ${i}$ that ${A_{i} \wedge B_{i} \rightarrow S}$.

2. We can also prove that ${A_{i}}$ is true for at least ${1/2}$ of the ${N}$ indices ${i}$.

3. The statement ${B_{i}}$ is easy to check for any index ${i}$ by computation.

Then we can proceed as follows. Pick a set ${I \subset [N]}$ of ${m}$ random indices. For each ${i}$ in this set, check that ${B_{i}}$ is true. We claim that if all check correctly, then the probability that ${S}$ is true is at least ${1-2^{-m}}$.

The point is that each time we select an index ${i}$ there is a probability at least ${1/2}$ that ${A_{i}}$ is true. Since we check that ${B_{i}}$ is true, ${S}$ follows. But we may have selected an index ${i}$ so that ${A_{i}}$ is not true. The chance that this happens ${m}$ times is clearly at most ${2^{-m}}$, and so we have the claimed probability.

## An Example

We return to the Weak Goldbach Conjecture. Vinogradov, in the 1930’s, famously proved that every sufficiently large odd number is the sum of at most three primes. His “sufficiently large” was not effective at first: he proved there was a constant ${C}$ so that all odd numbers larger than it were the sums of three primes. In 1956 it was shown that ${C}$ was upper-bounded by ${3^{14,348,907}}$—a pretty large number. This was reduced in 2002 by Liu Ming-Chit and Wang Tian-Ze to approximately ${e^{3,100}}$—still pretty large. Computer searches still cannot even get near such a bound.

The long-term goal of course has been to reduce this constant ${C}$ and prove eventually the following theorem:

Theorem 1 Every odd integer greater than ${5}$ can be expressed as the sum of three primes.

Recently this has been achieved by Harald Helfgott, who technically proved that the theorem is true for all odd numbers greater than ${10^{27}}$. Computer searches had already verified the theorem up to ${10^{30}}$. But let us cycle back to the situation before last year where ${C}$ was still on the order of ${\exp(3,100)}$. Let ${S}$ be the statement that all odd numbers below ${C}$ are the sum of at most three odd primes. The set ${P}$ of prime numbers below ${C}$ has density order-of ${C/\ln(C)}$. The key concept is:

Definition 2 A subset ${Q}$ of the even numbers below ${C}$ is complementary to ${P}$ if every odd number below ${C}$ can be written as ${p+q}$ where ${p \in P}$ and ${q \in Q}$.

By a probabilistic argument, most subsets ${Q}$ of size ${c\ln(C)^2}$ are complementary, where ${c}$ is a small known constant. (Indeed some finer results are now known.) The size ${c\ln(C)^2}$ is not too large, and the numbers themselves have only order-${\ln(C)}$ size. The final ingredient comes from the known heuristic bounds on Goldbachs’ Conjecture itself. For most even numbers ${k}$, one is highly likely to find primes ${q}$ and ${r}$ summing to ${k}$ within ${(\ln k)^{O(1)}}$ trials.

## Executing the Proof

Given the above facts, the following procedure is feasible to execute. It is a computer procedure, but the work involved is orders of magnitude smaller than the EDP proof, and the predicates needing to be checked are simpler than for the Four-Color Theorem.

• Generate ${m}$-many sets ${Q_{i}}$ of ${c\ln(C)^2}$-many even numbers below ${C}$.

• For each ${i}$, for each ${k \in Q_{i}}$, search for primes ${q_{i,k},r_{i,k}}$ summing to ${k}$.

• If each ${i,k}$-case succeeds, accept and output the entire table ${V}$ of ${[k,q_{i,k},r_{i,k}]}$ as the proof.

We could weaken the last step to accept if for a two-thirds majority of ${i}$, every ${i,k}$-case succeeds, but let us grant that the heuristic bounds on Goldbach’s Conjecture are good enough that this stricter and simpler policy has a high chance of success.

So granting that it succeeds and we get the output table—note that it is easy to verify afterward that the ${q_{i,k}}$ and ${r_{i,k}}$ are prime and sum to ${k}$—as the “proof,” what do we really have? Define the statements as follows:

${\bullet }$ The statement ${A_{i}}$ says: “the set ${Q_{i}}$ is indeed complementary for ${\cal P}$ up to ${C}$.”

${\bullet }$ The statement ${B_{i}}$ says: “the set ${Q_{i}}$ has all its even numbers the sum of two primes.”

We claim that this fits our framework; let’s check. At least half the time ${A_{i} \wedge B_{i} \rightarrow S}$. We suppose that some ${Q_{i}}$ is a complement—we only need one. Let ${n \le C}$ that is odd. Then ${n = p + k}$ where ${p}$ is a prime and ${k}$ is in ${Q_{i}}$. But we checked that ${B_{i}}$ is true, so all the ${k \in Q_{i}}$ are sums of two primes. This means that ${S}$ is proved.

One aspect is that if our routine failed, then we would have a significant “probabilistic” result of a completely different kind. It would indicate that the heuristic bounds on Goldbach’s Conjecture are incorrect. But once the routine succeeds, all that is water under the bridge. We have already proved each ${B_{i}}$, and our routine has bypassed the need to prove ${A_{i}}$, which would ostensibly be computationally prohibitive.

Note also that we are of course not assuming Goldbach is true overall—we are only needing to verify it for the tiny sample of numbers that go into each ${Q_{i}}$. That is, call ${Q_{i}}$ “good” if ${A_{i}}$ is true, and “hyper-good” if every ${k \in Q_{i}}$ obeys the heuristic bounds on Goldbach. We don’t actually need to argue that with probability at least ${\frac{1}{2}}$ a random ${Q_{i}}$ is hyper-good—once ${V}$ is checked the “hyper” part is dispensed with. Hence we only need that a random ${Q_{i}}$ is good with probability at least ${\frac{1}{2}}$. Then the confidence in ${V}$ being a proof is ${1 - \frac{1}{2^m}}$. Is that enough?

## Comparing Other Probabilistic Proofs

Consider the problem of proving that a given arithmetic formula ${g(x_1,\dots,x_n)}$ computes the zero polynomial over some finite field ${\mathbb{F}}$. Whenever ${g}$ does not compute the zero polynomial, a random subset ${Q \subset \mathbb{F}^n}$ of known small size ${c}$ will, with probability at least ${\frac{1}{2}}$, include an argument ${x}$ giving ${g(x) \neq 0}$. Now choose ${m}$-many such subsets ${Q_{i}}$, and run a routine exhibiting that ${g(x) = 0}$ for every ${i}$ and ${x \in Q_{i}}$. Let ${V}$ be the resulting table of zero values. (Of course, we could have just picked one random ${Q}$ of size ${mc}$, but we are comparing with the above.) Is this ${V}$ exactly the same kind of “proof”?

We argue no. The main reason is that the statement corresponding to ${A_{i}}$, namely that ${Q_{i}}$ is “good,” is counterfactual. It is saying that if ${g}$ were a non-zero polynomial, then ${A_{i}}$ would have had a counter-example. The only way to avoid the problem of “which ${g}$?” seems to be for ${Q_{i}}$ to be a universal hitting set for all non-zero ${n}$-variable polynomials over ${\mathbb{F}}$. The size bounds on universal hitting sets, especially in arithmetical not Boolean complexity, are murkier to argue.

The cases of randomized algorithms for deciding primality and other problems in ${\mathsf{BPP}}$ or ${\mathsf{RP}}$ (etc.) look similar. For a language ${L}$ in the Arthur-Merlin class ${\mathsf{AM}}$, there is a predicate ${R(x,y,z)}$ decidable in time ${|x|^{O(1)}}$ such that for all ${x}$:

$\displaystyle \begin{array}{rcl} x \in L &\implies& (\text{for most } y) (\exists z)R(x,y,z)\\ x \notin L &\implies& (\text{for most } y)(\forall z)\neg R(x,y,z). \end{array}$

Here we would want a bunch of pairs ${(y,z_y)}$ to be the proof of the theorem “${x \in L}$.” By the perfect completeness theorem for ${\mathsf{AM}}$, we can replace ${R}$ by ${R'}$ such that the first line holds for all ${y}$, which is analogous to the idea of our routine always succeeding and outputting a set ${V}$ of such pairs.

The problem again, however, is that the goodness of the pairs is still defined with respect to counterfactual ${x \notin L}$ cases. Whereas in our proof, goodness is a positive statement, namely that the randomly-chosen set really is complementary to the set of primes.

## Open Problems

Is this a new type of proof? What should we call it? Are there novel applications of the method? Note that the size of the range is less important than having concrete density estimates on intervals, since the point is needing to check relatively few members.

June 6, 2014 11:00 am

Isn’t probabilistic proof an oxymoron?

June 6, 2014 11:30 am

I’m a little confused. You want to make a claim of the form:

Pr[Theorem = False| Proof succeeds] < 1/2^m

But you instead seem to argue something like:

Pr[Proof Succeeds | Theorem = False] < 1/2^m

To go from one statement to the other you would normally need to apply Bayes Rule, and to do that, you would need a prior belief about the theorem being true. How do you get around this difficulty?

June 6, 2014 12:34 pm

Under the section titled Monte Carlo proofs, I think you need to be more explicit that the set of statements Ai, …, An are complete in a sense (i.e. there are no other statements that matter). This condition becomes clear in the example with the sets of numbers below C.

I would say that this does not constitute a proof. This exercise is similar to Bayesian inference. Each additional Ai you check becomes an experiment and it may strengthen your confidence further (or completely kill it).

4. June 6, 2014 6:55 pm

Aren’t all proofs probabilistic? Pr[Reader_k believes what I wrote \forall k \in {1,..,12} | faulty proof] < 0.1% Two failed proofs of the four color theorem stood for 11 years.

5. June 7, 2014 9:49 am

Yes, I think this is a very nice concept and I would certainly call it a probabilistic proof. Nevertheless, as was pointed out by Story Donald, it does not prove that the theorem is true with 99% or anything like that, such statements don’t even make sense.

An example to understand why, imagine that we take a random 100 digit number. Then we run a primality test on it that returns “not prime” for any non-prime with 50% chance, otherwise says “possible prime”. Suppose we run this test 10 times, every time we get “possible prime”. Still, our number is more likely to be composite than prime.

But if we insert a theorem to such an algorithm, then it is either true or false, so we cannot assign a probability to it. But as Koray wrote, we can increase our confidence that it is true. So I think this is a nice concept and I wonder if it can be applied to other problems.

ps. I don’t really understand why you say it is different from RP.

June 7, 2014 2:18 pm

Complementarity is defined as a two argument predicate: Q is complementary to P. But in the next sentence it’s used as a one argument predicate where it’s said that most sets of a certain size are complementary. What is the intended meaning?

Also confused by what the definition if C is here. Is complementarity a property whose definition changes as C varies?

• June 9, 2014 12:22 pm

Complementary to P is intended. Here “P” means the set of primes up to a certain C, so it changes as C does, although informally it’s always the set of primes.

7. June 9, 2014 12:43 am

On a slightly off-topic but related note, there exists something called a probabilistic checkable proof which is an interesting concept in itself. But a probabilistic proof itself is certainly something else.

June 9, 2014 5:32 pm

This is one of the reasons I’m rather skeptical of any proof that isn’t in principle capable of being represented in first order logic; Granted, the execution of such a task is an enormous effort for modern mathematics, given much of modern mathematics beyond the graduate level has yet to be formalized, but most of it doesn’t face any theoretical barrier to formalization.

With a formal system and a proof, proof checking is mechanical and entirely verifiable, where your confidence in the proof is your confidence in the formal system and the verifier program. Proof verification programs can be very simple and are themselves rather easy to verify by accomplished engineers and mathematicians as correct, and we are fairly confident in formal systems such as first order logic today.

The feeling that the proof could be wrong because a computer is verifying it rather than even more fallible humans strikes me as misplaced. The risk that the proof verification code is wrong is a real risk to be sure, but its easier to find flaws in verification code than thousands of lines of a formal proof.

The risk of the code and formal system being wrong is when we’re using arguments from formal systems that we haven’t rigorously defined, such as running large amounts of calculations to search for a finite amount of counterexamples in a search space. What are the properties of the finite groups that map to the computer registers? What are the properties of the objects being represented by other facets of the code? Most of these aren’t rigorous compared to first order set theory, and wrong assumptions can lead to subtle traps of reasoning that could undermine the validity of the proof.

9. June 10, 2014 2:35 pm

Betteridge’s law of headlines is an adage that states: “Any headline which ends in a question mark can be answered by the word no.”

10. June 10, 2014 4:18 pm

Dez Akin, a seminal document in regard to proof validation is the (collectively authored) QED Manifesto (Automated Deduction, 1994). The ensuing 20 years have seen remarkable progress in that many thousands of fundamental theorems have been checked (by frameworks that include Mizar and Coq), and no serious challenges to the validity of accepted theorems have resulted. This finding is both remarkable and reassuring in regard to the logical integrity of the mathematical enterprise.

With regard to unsolved mathematical problems the situation is less satisfactory. The QED Manifesto envisioned that

Agreement by experts in each mathematical subfield that the [starting] definitions are ‘right’ will be a necessary part of establishing confidence that mechanically checked theorems establish what is intended.

For example, to date none of the main theorems of complexity theory have been formally verified in validation framework like Mizar and Coq (as far as I can tell). This lack is remarkable in view of the thousands of proofs contributed by other mathematical disciplines.

It is natural to wonder whether this lack of progress reflects not logical errors in the theorems of complexity theory — including complexity of simulation as a particularly important class of problems in complexity theory — but rather persistent uneasiness in regard to whether the starting postulates and definitions are ‘right’, in the sense of the QED Manifesto.

This foundational uneasiness is likely to persist — in some people’s minds anyway — for so long as our mathematics says so little that is both rigorous and useful in regard to lower-bounding the computational complexity of practical computations and simulations.

Conclusion  There is scant likelihood the Gödel’s Lost Letter and P=NP will run short of thought-provoking material in regard to these burning arrow issues in complexity theory!

June 11, 2014 8:54 am

Wow!
1. Yes, this is surely a proof, I do not understand the question mark in the title of this great post. Ok, formally this is not the full proof, but only because the calculation was not actually executed. If you would report that you actually performed the calculation with, say, m=100, and attach the full program code, this would surely be a proof.
2. This is one of the greatest proof in the whole history of mathematics: it resolves an important problem which had been open for almost 300 years using one great idea and a half-page of easy argument. In contrast, Helfgott’s paper is 79 pages long and full of complicated formulas. The full graduate course would probably needed to explain his proof to students. After this post, the proof of Weak Goldbach Conjecture can be explained to undegraduates in one lecture!
3. Yes, the idea should be new, otherwise the Weak Goldbach Conjecture would probably not be open until 2013. Except the main idea, the rest of agrument is straightforward…
4. With this proof, the Weak Goldbach Conjecture could anyway be wrong because of one the following reasons
a) one of the results you use contains an error, for example, 2002 reduction (by Liu Ming-Chit and Wang Tian-Ze) to the M=exp(3100) case is wrong; This is unlikely, but possible.
b) your proof contains an error. Yes, it is half-page only, and the program could be verified independently, so it is highly unlikely that it would contain an error everyone miss, but still possible.
c) ZFC is inconsistent, therefore the correctness of the proof does not imply the correctness of the Weak Goldbach Conjecture. This is much more unlikely than a) and b), but still possible.
In contrast, the event with probability 2^(-100) that your proof just happen to work by chance is so unlikely, that it could safety be treated as impossible, and thus it is not included in this list.
5. There are a lot of other important open problem reduced to finite search http://mathoverflow.net/questions/112097/important-open-problems-that-have-already-been-reduced-to-a-finite-but-infeasibl . Have you tried to apply your great method to any of them?

July 7, 2014 11:47 am

By the way, the “proving that a polynomial is zero” case is exactly like the first one – let A_i = (f = 0 or f(X) =/= 0). Then P(A_i) s always >= 1/2 not depending on f.

13. January 11, 2015 2:22 pm

According to your link the new proof of the four-colour theorem is still uncheckable by human efforts.

Note that many things that mathematicians believe about math (or have confidence in or however you want to understand such affirmations and expectations) are not proven or unprovable like the consistency of arithmetic etc. The equation of proof with knowledge with mathematics creates a rather odd arrangement when compared to how we use the term knowledge in other fields.

14. May 9, 2020 1:13 pm

Did anything happen grow out of this argument?