Lint For Math
Can we remove simple errors from math proofs?
simple-talk interview source
Stephen Johnson is one of the world’s top programmers. Top programmers are inherently lazy: they prefer to build tools rather than write code. This led Steve to create some of great software tools that made UNIX so powerful, especially in the “early days.” These included the parser generator named Yacc for “Yet Another Compiler Compiler.”
Today I (Dick) want to talk about another of his tools, called lint. Not an acronym, it really means lint.
Steve was also famous for this saying about an operating system environment for IBM mainframes named TSO which some of us were unlucky enough to need to use:
Using TSO is like kicking a dead whale down the beach.
Hector Garcia-Molina told me a story about using TSO at Princeton years before I arrived there. One day he wrote a program that was submitted to the mainframe. While Hector was waiting for it to run he noticed that it contained a loop that would never stop, and worse the loop had a print statement in it. So the program would run forever and print out junk forever. Yet Hector, because of the nature of TSO, could not kill the program. Hector went to the system people to ask them to kill his program. They answered that they could not kill it until it started to run. Even better: the program would not run until that evening—do not ask why. So they could not kill it. But the evening crew could once it started. So they left a handwritten note to kill Hector’s program later that night. A whale indeed.
Steve’s lint program took your C program, examined it, and flagged code that looked suspicious. The brilliant insight was that lint had no idea what you were really doing, but could say some constructs were likely to be bugs. These were flagged and often lint was right. A beautiful idea.
For example, consider the following simple C fragment:
while (x = y)
This is legal C code. But, it is most likely an error. The programmer probably meant to write:
while (x == y)
Recall that in C the test for equality is x == y while x = y is the assignment of y to x. The former could be correct yet it is likely a mistake. These are exactly the type of simple things that lint could flag.
The lint program has changed over the years and now there are more powerful tools that can flag suspicious usage in software written in many computer languages. It was originally developed by Steve in 1977 and described in a paper “Lint, a C program checker” (Computer Science Technical Report 65, Bell Laboratories, 1978).
Lint For Math
I believe that we could build a lint for math that would do what Steve’s lint did for C code: flag suspicious constructs. Perhaps this already exists—please let me know if it does. But assuming it does not, I think even a tool that could catch very simple mistakes could be quite useful.
There is lots of research on mechanical proof systems. There is lots of interest in proving important theorems in formal languages so they can be checked. See this and this for some examples. Yet the vast majority of math is only checked by people. I think this is fine, even essential, but a lint program that at least caught simple errors would be of great use.
Let me give three types of constructs that it could catch. I assume that our lint would take in a LaTeX file and output warnings.
Unused variables. Consider
The lint program would notice that the variable is never used. Almost surely the intent was to write
Again note: this is not a certainty, since the former is a legal math expression.
Unbound variables. Consider
If there is nothing before to constrain , this is at best poor writing. Does range over all reals, all integers, or just all natural numbers? Again a construct that should be flagged.
Under-constrained variables. Consider the statement,
For some it follows that .
The statement may be technically true when , but for purpose of clear communication it needs a qualifier that . Perhaps the writer wrote that stands for a positive real number some pages earlier—we would not expect lint to pick that up. But we could reasonably ask lint to check for a mention of “” in a previous formula and/or paragraph.
The TextLint applet page hosted by Lukas Renggli with Fabrizio Perin and Jorge Ressia does not flag the unused-variable condition, and evidently does not try to handle the other two situations. It also fails to catch 2^16 which will give not the undoubtedly-intended . This is more a LaTeX syntax issue than the kind of math-semantics error we are gunning for; the programs mentioned here also seem limited to this level.
Does a lint program like this—for general mathematical writing not just LaTeX code—already exist? If not, should we build one?
[added “environment” qualifier to TSO]