How exceptions in theorems may affect their complexity

 Cropped from India Today source

Manjul Bhargava is a mathematician who just won one of the 2014 Fields Medals. We offer our congratulations on this achievement. He is an expert in number theory, which makes him special to us among Fields medalists. His Fields citation includes his doctoral work on a powerful reformulation and extension of Carl Gauss’s composition law for quadratic forms. He also proved a sense in which 290 is special to us among numbers, since we have been thinking recently about quadratic forms as tools in complexity theory.

Today we talk about his “290 Theorem” with Jonathan Hanke, which is quite accessible, and also raise complexity-related questions about this result.

One of his papers is in the American Math Monthly. It is on the factorial function and its generalizations. We always applaud when a pathbreaker writes a popular survey. It is hard to imagine a more concrete subject and a more accessible journal.

A sub-surface question in computational complexity is how much difference can finite changes make? With asymptotic time or space bounds, finite changes make zero difference—finite sets have zero asymptotic complexity. However, for measures of concrete complexity, such as the size of proofs and the information content of strings, even small exceptions may make a difference.

## Quadratic Forms and Universality

Why is ${290}$ special? Well the answer starts with, why is ${15}$ special? Call a quadratic form ${f}$ in ${n}$ integer variables universal if it represents all positive integers, that is, if ${\mathbb{N}^{+} \subseteq f(\mathbb{Z}^n)}$.

• We can view the ancient result that ${\sqrt{2}}$ is irrational as saying that the quadratic form ${f(x)=x^{2}}$ is not universal—it fails of course to represent ${2}$ among many other integers.

• The function ${f = xy}$ is universal since we can have ${x = 1}$ and ${y =}$ anything. However, it “cheats” by violating a historical essence of being “quadratic,” which is that all nonzero arguments should map to positive values.

• To find a universal form that doesn’t cheat requires ${n \geq 4}$. The fact that

$\displaystyle x_{1}^{2} + x_{2}^{2} +x_{3}^{2} +x_{4}^{2}$

is universal is Joseph Lagrange’s famous Four-Square Theorem, whose implications in complexity theory we have already noted.

Indeed, Jeffrey Shallit and Michael Rabin proved that this form is effectively universal in the sense that given any integer ${N}$, we can find in random polynomial time integers ${x_{1},x_{2},x_{3},x_{4}}$ so that

$\displaystyle N = x_{1}^{2} + x_{2}^{2} +x_{3}^{2} +x_{4}^{2}.$

Our complexity problems here are perhaps related but different: given a form ${f}$, how easy is it to test whether ${f}$ is universal, and how easy is it to prove the theorem on which the test is based?

We need to define the allowed range of forms carefully. Every ${f}$ can be specified by a matrix ${A_f}$ such that for all ${x = (x_1,\dots,x_n)}$,

$\displaystyle f(x) = x^T A_f x.$

If we were using non-commutative algebra where ${xy \neq yx}$, then ${A_f}$ would be unique, but now the coefficient ${c_{i,j}}$ of ${x_i x_j}$ can be split any way between the entries ${A_f[i,j]}$ and ${A_f[j,i]}$. The convention is to split evenly, so that ${A_f}$ is symmetric, and once again unique. Then the condition that ${f(x) > 0}$ for ${x \neq 0}$ is the same as ${A_f}$ being positive definite, and often this is assumed when talking about quadratic forms.

In Lagrange’s case, ${A_f}$ is the ${4 \times 4}$ identity matrix. Diagonal matrices are positive definite if and only if all the diagonal entries are positive, but for symmetric real matrices in general things are trickier. The matrix for ${2x^2 -2xy + y^2}$ is positive definite despite the two ${-1}$ entries, while ${x^2 + 4xy + y^2}$ is another “cheater” despite all-positive entries. In the case ${f = x^2 + xy + y^2}$, ${A_f}$ is positive definite but has half-integer entries:

$\displaystyle A_f = \begin{bmatrix}1 & 1/2\, \\ 1/2 & 1 \end{bmatrix}.$

Then ${f}$ is said to be integer-valued but not classically integral. For much of two centuries, ${A_f}$ was also thought to “cheat,” so that only forms ${f}$ with even cross-term coefficients were distinguished, but Bhargava’s work has helped solidify the looser “twos-out” condition as the standard.

## 15 and 290

Obviously, universal forms are interesting. In 1993, John Conway and William Schneeberger proved that if a positive-definite quadratic form with integer matrix represents all positive integers up to ${15}$, then it represents all positive integers. Pretty neat. This clearly makes the checking of such a form to see if it is universal a relatively simple task. The proof was not published and was quite intricate. This result is naturally called the 15 Theorem.

Bhargava found a simpler proof in 2000, which was hailed by Conway himself. He followed this up by proving Conway and Schneeberger’s conjecture that one could relax to an integer-valued form upon replacing ${15}$ by ${290}$: If a form represents all positive integers up to ${290}$, then it is universal. Precisely stated:

Theorem 1 For any integer-valued positive-definite quadratic form ${f}$, if ${\textit{ran}(f)}$ includes

$\displaystyle \begin{array}{rcl} && 1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31, 34,\\ && 35, 37, 42, 58, 93, 110, 145, 203, 290, \end{array}$

then ${\textit{ran}(f)}$ includes ${\mathbb{N}}$. Moreover, this set of twenty-nine numbers is minimal—remove any one and the statement becomes false.

## Four Theorem Types

This last theorem alternates senses of being easy and complex. Its statement is complex—where do those numbers come from? However, having only twenty-nine numbers to check makes it easy to prove that a given form is universal—just produce arguments yielding each number as a value. However, the problem of finding such arguments, especially given an arbitrary ${N > 290}$, remains possibly complex—can there be any kind of extended Rabin-Shallit theorem?

What we fix on, however, is what the nature of the statement says about the complexity of the proof. This raises the question of a measure of complexity of theorems. Our intent can be approached by considering four types of theorems:

Type I. Every ${X}$ of type ${\dots}$ has property ${P}$.

Type II. Every ${X}$ of type ${\dots}$ has property ${P}$, except possibly for ${X}$ belonging to this fixed finite set ${S}$ of values: ${\dots}$

Type III. If every member of ${S}$ has property ${P}$, then every ${X}$ of type ${\dots}$ has property ${P}$.

Type IV. A theorem about objects ${Y}$ in which a test of kind II or III is part of the statement.

To interpret this, think of ${X}$ as a positive integer, fix a form ${f}$, and read ${P(x)}$ as “${x \in \textit{ran}(f)}$.” Then for the particular form ${f}$, statement I is “${f}$ is universal,” II is “${f}$ is almost universal” (in a sense we haven’t discussed but that is clear), and III is an instance of Bhargava’s theorem for the particular ${f}$ (when ${S = \{1,2,3,5,\dots,290\}}$, that is). Then Bhargava’s theorem itself is type IV, where the objects ${Y}$ are forms ${f}$, and all it does is universally quantify the assertion of type III over those ${f}$.

Note that in types II and III, the statement remains true if ${S}$ is replaced by a larger set ${S'}$. We intend this also in type IV—that is, the theorem is monotone in the test. So define a number ${n}$ to be special if there is an ${S}$ such that:

• the theorem is true for ${S}$,

• ${n}$ is the largest number in ${S}$, and

• the theorem is false for ${S \setminus \{n\}}$.

In types II and III it is enough to specify ${P}$; in IV we also give the theorem statement about ${Y}$‘s.

## Some Examples

Here are some examples of all these types:

Theorem 2. Every odd-order group is solvable. (Type I)

Theorem 3. Every finite group that is not cyclic or alternating or of certain Lie types is not simple, unless it is one of 26 so-called sporadic simple groups. (Type II)

Theorem 4. Every Fibonacci number ${F_{n}}$ has a prime factor that does not divide any earlier Fibonacci number, except for ${F_{1}=F_{2}=1}$, ${F_{6}=8}$, and ${F_{12}=144}$. (Also type II)

A statement of type II logically implies one of type III, but in a trivial sense. For instance, let ${P(n)}$ be the assertion that the (topological generalized) Poincaré conjecture holds in ${n}$ dimensions. Suppose we went back in time before Michael Freedman won his Fields Medal for proving ${P(4)}$ in 1982, let alone Grigori Perelman for ${P(3)}$, but after Steve Smale won his in 1966 largely for proving ${P(n)}$ for ${n \geq 5}$. Then with ${S = \{3,4\}}$ we had a theorem of type II, and also type III in the form “if Poincaré is true for ${n \in S}$ then it is true for all ${n}$.” But the implication generally does not go the other way, and since the Poincaré is true there was no special number.

Our point is that a theorem of type I seems easier to understand than one of type II or III, let alone IV. Hence theorem I might have a clearer proof than theorem II. None of this intuition is solid, but it does seem reasonable. The presence of ${S}$ gives the statements other than type-I higher information complexity. Our questions are whether the freedom to enlarge ${S}$ mitigates the increase in complexity compared to type I, whether this might allow easier proofs of the resulting statements, and how much harder it is to prove special numbers—when ${S}$ must be minimal.

## Complexity

For now we only have a few sketchy ideas on these questions, involving the notion of the Kolmogorov complexity ${K(x)}$ of a string ${x}$. This is named for Andrey Kolmogorov, but various related notions were discovered at the same time by others. Here it is enough to say that ${K(x)}$ is the length of the shortest program that generates ${x}$. The notion extends to define ${K(S)}$ for finite sets.

Can we prove that certain theorems of types II–IV cannot be proved in a fixed formal system? Our intent is to argue like this: Suppose that we could prove the statement

$\displaystyle (\forall x): \ P(x) \iff x \not\in S$

where ${S}$ is a given finite set. The ${\iff}$ means we’re taking ${S}$ minimal. If ${S}$ itself has large complexity, then this might lead to a contradiction.

A simple result can be proved. Let ${|P|}$ denote the length of formula ${P}$ as a string, and let ZF be the usual set theory—or any reasonable theory, provided its set of theorems is r.e.

Theorem 5. Suppose that the statement

$\displaystyle \forall x \ P(x) \iff x \not\in S$

where ${S}$ is a given finite set is provable in ZF. Then there is an absolute constant ${C}$ that depends only on ZF so that

$\displaystyle K(S) \le |P| + C.$

Proof: Let a machine search the proofs of ZF for a proof of the above theorem. It will eventually find the proof. This will give us the set ${S}$ and so we have a description of ${S}$ of size at most ${|P|}$ plus a constant that depends on the machine that does the search. $\Box$

Since we can get a bound on ${C}$, we may be able to argue that certain concrete properties cannot be proved to fail for sets that are finite but too high in complexity. This would be great if we could actually do something like this. We would like even better to remove the dependence on ${P}$, and obtain results of this kind:

There are only finitely many numbers ${x}$ that ZF can prove to be special.

Although ${K(x) \le K(S)}$ when ${x}$ is the largest element in ${S}$, we don’t get ${K(x) \le C}$ by imitating the above proof, because the machine could output an infinite sequence of ${x}$‘s that go with different ${P}$‘s. It isn’t a contradiction because something like ${P}$ or the index of ${x}$ in this sequence needs to be given as well to specify ${x}$, and this need not be bounded by any function of ${C}$. One can craft artificial ${P}$‘s to falsify it. But in specific cases, or in the presence of restrictions on ${P}$, this might yield (or at least suggest) interesting results.

For instance, we’ve noted that homotopy groups are finite and computable, so that the sequence of their structure must have Kolmogorov complexity at most a value that we can determine. We need not develop a fast algorithm to compute them, but just a short one. This proves that while the intuitive complexity of the structure of homotopy groups seems high, we can bound their K-complexity. Having tight bounds would be interesting, and then might help analyze what kinds of relaxations of theorems make them easier to prove.

## Open Problems

We have raised a diverse array of complexity-related questions. Can more light be shed on them? Is there any kind of global limit on special numbers?