Proving proofs are proven

Kevin Buzzard is Professor of Pure Mathematics at Imperial College London. Wikipedia says that he specialises in algebraic number theory, rather than “specializes.” Is this because he was born in Britain, works in Britain, both, or could it be neither?

This month, he has an article in the AMS Notices that highlights the system $L\exists \forall N$. We thought we’d discuss $L\exists \forall N$ and its potential to change math—or maths—as we know it.

For starters we should say that $L\exists \forall N$ is:

Lean is an open source theorem prover and programming language being developed at Microsoft Research. Lean aims to bridge the gap between interactive and automated theorem proving, by situating automated tools and methods in a framework that supports user interaction and the construction of fully specified axiomatic proofs.

This wording should perk up the ear of us complexity theorists. We know that polynomial-size interactive proofs have the power of ${\mathsf{PSPACE}}$ for one prover, ${\mathsf{NEXP}}$ for two, and the Halting Problem for two quantum provers, all of which are believed bigger than ${\mathsf{NP}}$ as for non-interactive proofs. We believe that our concrete capabilities are shaped by this abstract lay-of-the-land, though this is a meta-scientific assertion that we could never try to prove. What Buzzard’s article and $L\exists \forall N$ are on about is the scientific state of what we do try to prove, or think we have proved.

## Aftermath as We Know It

Buzzard spoke early this year at the second annual “Lean Together” segment of the Formal Methods in Mathematics conference at CMU.

His talk was not so much about details of $L\exists \forall N$ but about the fact of errors and specter of gaps in published mathematics. While a student of Richard Taylor, he had a front-row seat to a major instance. As he relates on a slide marked Gaps in big letters:

• In 1993, Andrew Wiles announced a proof of Fermat’s Last Theorem. There was a gap in the proof.
• In 1994, Wiles and Taylor fixed the gap, the papers were published, and our community accepted the proof.
• In 1995, I pointed out to Taylor that the proof used work of Gross which was known to be incomplete.
${\dots}$
Taylor told me it was OK, because he knew another argument which avoided Gross’s work completely.

He also highlights the current unfinished state of the classification theorem. The process of writing its proof may outlive all of the first generation of provers. We at GLL have accepted it almost all of our working lives. The “after math” can be harder than the initial publication even in the traditional process, let alone the totally free dissemination we have today.

Buzzard points out that some long proofs have now been written in $L\exists \forall N$ or related systems. These include the famous Feit-Thompson odd order theorem in Coq, and the 2017 Ellenberg-Gijswijt proof of the cap set conjecture. This shows that complex arguments can be written in systems like $L\exists \forall N$ and others. He maintains his own blog, “The Xena Project,” on practical and educational uses of $L\exists \forall N$ and similar software.

## Really Proved?

But there is a problem with $L\exists \forall N$ and all the other similar systems.

Their correct proofs may be incorrect.

$L\exists \forall N$ is a large software system and may have bugs. This is of course a major concern, but perhaps one that can be handled. Perhaps not.

I thought about this issue years ago when people argued for verification of operating systems to prove that they had no errors. They wanted to argue that software systems could be error-free only by making a formal proof that they were correct. Over forty years ago, Rich DeMillo and Alan Perlis and I wrote a paper on this issue. Our paper starts with

Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think.

We go on to argue against formal verification as the answer to correctness. It is interesting to see that now the argument is flipped: Should mathematics become more like programming?

Ken opines out that a bug in $L\exists \forall N$ would probably be more catchable because it would likely replicate. Mistakes in proofs ought by nature to be more recognizable than telling whether a tactic to try to build a proof is likely to succeed.

 Leaning Together at CMU src

## Leaning Into $L\exists \forall N$

The fundamental idea of $L\exists \forall N$ is to create a language that does make proofs more like programming. The key is that a proof written in $L\exists \forall N$ can be checked automatically. This is like saying that a program in Python can be executed.

The project that Buzzard and others are interested in is several-fold. One is that $L\exists \forall N$ can be a teaching tool. Another is it can help us ensure that the proofs we write down are correct.

The use of $L\exists \forall N$ is akin to LaTeX. Most of us now write our proofs in LaTeX or some similar typesetting language. We love and hate the issues involved in getting the LaTeX to compile and to look good. Buzzard argues that adding the new ability to be sure that one’s proof is solid is a great leap. LaTeX lets us be sure the proof looks good, while $L\exists \forall N$ allows us to be sure it is correct.

We have generally taken a reserved attitude toward formal methods on this blog—or maybe not so reserved. We keep our reservations expressed above. But what may help $L\exists \forall N$ overcome them is its emphasis on interaction—on being an assistant. Maybe writing a proof in $L\exists \forall N$ won’t be like taking castor oil proverbially used to be—it might be more like Marmite.

## Open Problems

Two final thoughts. For the main one: Proofs are not as important as discoveries. Can we use $L\exists \forall N$ to suggest ideas, open problems, approaches, and more? The gene folding program we discussed recently does not claim to prove its fold is correct. But it seems useful. If $L\exists \forall N$ could help make guesses or provided intuition that would be valuable. Perhaps more valuable than assuring that a long proof is correct.

Another possible idea: How about we insist that claimed proofs of P${=}$NP or other famous conjectures must be presented using $L\exists \forall N$. The “claimed” proofs are all relatively short, so it would not be too hard for the authors of these results to write them out in the $L\exists \forall N$ system. They already LaTeX them, so why not? This would have a double benefit: The false claims that are made about famous problems might decrease; and the chance that a correct proof would not be overlooked might increase. The cost would be the time to write up the$L\exists \forall N$ proof.

Buzzard is a colorful (or colourful?) character, and may also be gaining a reputation for being well ahead of the times. In 2017 he gave a talk on P${=}$NP for the Royal Institution, but what we note in the video is how he was presciently dressed for Zoom.

[some format fixes]

1. December 10, 2020 1:35 pm

To elaborate our remarks on possible bug issues a little more: There is of course the point 1. that the system that checks proofs could have bugs, which is a classic issue with any software system. Dick is not as sure as I am about the ease of testing for it. But he goes on to note a second point:

2. The system might be fine. The proof might be fine. But the proof might not capture correctness in the real world. It is not too hard to imagine that we could state P=NP or the RH correctly. But any system that is more complex could be another story.

This is the issue of getting the specification correct. Dick recalls that the original paper by Tony Hoare on the correctness of the FIND program had this very issue: see section 5. Reservation of the paper.

My take goes a bit further in the direction of point 2: that a system like LEAN could appear to ratify buggy reasoning. If a bug in LEAN is not in the main computation part, then it would be in some specialized module. But such a module would likely be user-provided. So the issue would be whether LEAN passes through buggy inferencing rules. I guess the point of its type system is to forestall this.

Now zooming out abstractly, consider what happens if a system like Peano is inconsistent. We stumble upon a possibly long Peano proof of 0=1. We still have to verify that proof. The verification uses a subset of the logic that is consistent by itself. So in that sense, the bug got propagated by the sound part of the system. But I’m not convinced this issue could really bite in practice…

2. December 11, 2020 2:51 am

I was keen on trying LEAN until you mentioned it’s like Marmite…
typo: cideo -> video

December 11, 2020 12:47 pm

My understanding is that most theorem-provers are able to output a proof that can be checked by a very simple program – is LEAN not of this type?

December 11, 2020 3:48 pm

One more typo: ought -> brought
[Mistakes in proofs ought by nature…]

December 12, 2020 12:12 am

Gee, just tell the people who prove P=NP or P NE NP (or R(5)=43 — I get some of those people) to run it through some prover first. That assumes that when one gets email claiming P=NP it is the opening salvo in an interesting debate, It is not. (I would have put that last sentence in caps and use an exclamation point, but when I do that my comments tend to be spam-filtered.) Such people are less interested in truth and more interested in adoration.

December 12, 2020 11:18 am

Dear Bill:

I agree. But maybe there is some hope that someone will someday have some idea…

Oh well

Best

Dick

December 16, 2020 8:44 pm

And to really know who leads the majority requires some humility.

6. December 15, 2020 7:16 am

Re: Proofs Are Not As Important As Discoveries (PANAIAD).

Conjecture and Discovery fall under the heading of Abductive Inference (AI’s missing grape).

About which more when I pull my heading out of the Pandemic …

Regards,

Jon

December 26, 2020 3:08 pm

What I want to see (and I admit it’s asking a lot) is a system that can read math papers and extract the proofs.

This would be a combination of natural language and theorem proving. The latter would be needed to fill in the gaps in the proofs. I imagine each would inform the other.

Getting this to work would be a very tall order, but the theorem proving required would be at a lower level than needed for proving original theorems. It’s the kind of reasoning that would be expected from someone reading a paper, not writing it.

The payoff from this would be that the entire math literature would become available for translation into mechanized proofs. This would provide an enormous corpus for training theorem provers. It would also identify bad proofs buried in the literature.

8. January 9, 2021 5:12 pm

Proofs are not as important as discoveries. Can we use LEAN to suggest ideas, open problems, approaches, and more?

that is the huge question of work in automated theorem proving, there are some ideas along those lines, but almost nobody is directly working on this key problem, its barely even been fully articulated in the literature. gowers has a neat essay on this and has written at length. automated theorem proving has a lot of value in proof verification, but feel thats really kind of pedestrian/ mundane in someways.

another way of thinking of this: there are problems that are hard for machines but not for humans, problems that are hard for humans but not for machines, and problems that are very hard for both, and open math/ CS etc conjectures fall into the latter, most insurmountable category. and think they deserve the most attention, but ofc they tend to get the least, because mathematicians have been trained to be wary of the hardest problems in the publish-or-perish academic environment. eg Tao has a page on his blog expressly discouraging work on the hardest problems. so many mathematicians are incrementalists at heart, almost by training.

heres some over ½ decade old musings/ writings on the subj and things havent chgd a whole lot since then!