# Random Axioms and Gödel Incompleteness

* Does adding random axioms to Peano Arithmetic make sense? *

Giuseppe Peano is the creator of the standard axiomatization of the natural numbers, based on induction. He is also the discoverer of the space filling curve that is now named after him. One of his great contributions was to set up the notation that we now all use for many mathematical concepts.

Today Ken and I want to talk about a method for extending Peano Arithmetic that is based on randomness.

Kurt Gödel’s famous Incompleteness Theorem helped launch several different threads of research. It is not possible to build a formal system that can generate all and only the true statements about integers, whether one uses Peano’s system or others. There is always a true statement about integers that your system cannot prove. Some of the research threads concern the various forms and intuitive meanings that such statements can have, along lines we discussed initially and recently here. For example, the Liar Statement

“I am lying.”

has the following variation as a definition statement:

“The smallest positive integer not definable in under eleven words.”

Bertrand Russell attributed this type of statement to the Oxford University librarian Mr. G. Berry—after a long search I cannot find what “G” stands for. The statement seems to define an integer , but then is definable in ten words, contradicting its own definition.

Others have looked for more “natural” examples of true statements that Peano Arithmetic () cannot prove. Gödel’s Second Incompleteness Theorem shows that cannot prove its own consistency, which while a natural statement, is not one that a practicing mathematician usually thinks about. However, combinatorial principles like Ramsey’s Theorem are used in real proofs. Thus there was immense excitement when Jeff Paris and Leo Harrington proved that a *slight* modification of Ramsey’s Theorem cannot be proved in . The eminent logician Harvey Friedman has shown there are many other striking examples of combinatorial principles that lie beyond –some even require the vast power of large cardinal axioms of set theory.

Another thread is the search for new axioms that can be added to or set theory that increase its power, allowing more theorems to be provable. No additional axioms can escape incompleteness and preserve consistency, but new axioms can expand the set of provable true sentences. One simple method is to add the statement, formalized in :

The resulting theory is obviously more powerful than , and this can even be iterated.

What I plan to discuss today is a different type of axiom that can be added to . They are based on the use of randomness, where I mean real randomness. The source of random bits used in this method must be really random, not pseudo-random. Even pseudo-randomness in a very strong sense will not work in this approach.

** Kolmogorov complexity **

The famous mathematician Andrey Kolmogorov created many wonderful notions, among them the notion of Kolmogorov Complexity. There are many technical variants of this notion, but we will define it informally and then fix one.

Informally, given any string , define to be the shortest length of a “seed” string that when “expanded” yields . The expansion is deterministic, so it follows immediately that not all strings of length can have seeds of length . That is, for all there exist strings such that .

If the strings use a binary alphabet, we can be even more particular about possible values of . For every , at most of the strings of length can have . Thus with probability at least , flipping coins will produce a string with . For a suitable choice of , we call such a string *algorithmically random*. Note a subtle philosophical contrast:

- A string that results from flipping coins is “random” as an outcome; while
- A string with is intrinsically random.

To make this point specific we have to choose a fixed notion of expansion—i.e., a fixed program such that . Kolmogorov himself proved that *any* universal Turing machine or programming-language interpreter can be used for the theory: the choice between particular matters only up to an additive constant in the value of any string. Since we ultimately intend to be a non-constant function such as , any such constant will be asymptotically dominated. Still, we carry an additive slack term in our formulas below. Efforts to minimize and related constants via a “concrete definition of Kolmogorov complexity” have been led by John Tromp here. We will assume such an implementation.

Note also that any provides a fixed prefix “quote” string such that always expands to give and we may suppose . Thus with regard to , always . If we are ignoring plus-or-minus differences, then algorithmically random strings are those with Kolmogorov complexity between and . With freedom to set so that the error will be small enough to meet the probability requirements of our constructions below, we can state the above as: A random string is intrinsically random.

** The Axioms **

For any we imagine flipping really random coins and creating the string of bits

The statement:

is likely to be true with probability at least . Call this statement . Our plan is to add many such statements to . We want to add as many as possible, but we also want the resulting theory to have a good probability of being not only consistent but also sound. Soundness here means that every theorem, and a-priori every axiom, must be true. The simple idea is to add axioms

so that the probability that all of these axioms are true is at least , for some given .

If we desire an axiom for every length , it suffices to take for some customizable constant . Then the probability that some axiom is false is at most

We can improve the odds even more by taking to be a sparse sequence of lengths. Whatever choice is made, clearly *there exists* a sound theory with infinitely many of these axioms—and moreover we have defined an infinite coin-flipping process that has a good chance of obtaining one. Call the theory .

** A Note on Defining the Axioms **

One issue with is that the set of new axioms is not, indeed cannot be, recursively enumerable. This is a consequence of Gregory Chaitin’s theorem, which Wikipedia here mentions as a kind of Gödel-incompleteness statement related to Berry’s paradox. Once we have chosen the program , we can formalize the predicate

in PA or in any formal system that is rich enough to include arithmetic. Chaitin’s theorem then states:

Theorem:A sound formal system with a recursively enumerable axiom set can prove statements of the form for only finitely many . In particular, there is a length such that the system cannot prove that any string with is algorithmically random.

The particular can be viewed as a finite limit on the amount of information that the formal system can “comprehend”—which is essentially the amount of information in its axioms to begin with, more precisely, the amount of information in the program that recursively enumerates them.

It follows that every infinite r.e. set must contain some strings such that is false, where and is as above. Otherwise would give us an r.e. set of axioms for a theory that contradicts Chaitin’s theorem. Thus our set of new axioms for cannot be r.e. It can be co-r.e., however. Thus while our point has been to picture as a “random” theory, it can instead be viewed as a particular subset of *True Arithmetic*, which has recently been discussed on MathOverflow here.

** “The” Theory **

The main issue is: what is the power of the theory ? Since the it is not one theory, but many theories, we must be careful what we mean by saying that can prove a sentence . Thus we are restricted to statements like:

Let’s denote this by . Indeed let’s take , but other values are fine too.

Clearly will hold for many statements that cannot be proved by , provided it is consistent. This is clear and follows directly from Chaitin’s theorem. The main question, therefore, is this: What “new” statements can be proved now? More precisely, how can we characterize the statements so that ?

It seems that there are two extremes. One is that the new statements are really just statements about the Kolmogorov complexity of strings. If this is all that is added, then the theory is pretty uninteresting, as are all the variants with different values of .

The other extreme is that there are interesting new theorems that are provable by . Could the consistency of be proved, or some other interesting sentences? We do not know.

Cristian Calude has corresponded with us on an earlier form of this topic. His opinion can be summarized by saying that can only prove that certain strings are random. In particular, he does not believe that they can be used to prove the statement asserting that itself, without the new axioms, is consistent.

** Open Problems **

What interesting statements, if any, are provable in a fixed sound theory with randomness axioms . Or provable with reasonably high probability in a theory with such axioms chosen by random coin flips?

Isn’t this the subject of Levin’s FOCS 2002 paper “Forbidden information”? http://arxiv.org/abs/cs.CC/0203029

Luca,

It is related but I do not think it is the same.

Isn’t it related in the sense that it implies that incompleteness also applies to probabilistic “quasi”-axiomatizations like PA_delta among others? Still, I don’t think it rules out that PA_delta could be strong enough to imply the consistency of PA.

By the way, mine wasn’t a rhetorical question. I have tried several times to read that paper of Levin’s, and I still have no idea what is the statement that he proves, and what is its philosophical interpretation.

Like the program that runs all programs that do not run themselves. Asking the question:”Can it run itself” would kill P vs. NP, see http://kamouna.wordpress.com.

In fact one can imagine even a more general procedure. Let us fix some threshold e>0. Then we can (once) do the following: after we have shown that (at most) e-fraction of strings of given length N have some property phi (an arithmetic formula), we toss a coin and add as an axiom the statement that the result of coin tossing does not have the property phi.

(In the special case phi could be compressibility.)

Unfortunately, it is quite easy to see that if a given statement is not provable in PA, it remains unprovable after such an extension with high probability, so in this way we cannot succeed much…

Question:What interesting statements, if any, [might be] provable in a fixed sound theory with randomness axioms?Proposition:Each member of assigns a string x to one of three classes: .Conjecture:The set has no provably correct proof-length ordering in any theory , including the non-random root theory itself.—–

Not that I’m an expert … I’m just interested in issues associate to verifiable run-time ordering of the set of algorithms; this interest extends naturally to proof-length ordering of the set of true statements. Hmmm … maybe these are (in some sense) variations upon the same problem?

I’m unclear about the formal definition of An. Are we fixing a particular string “r” that we chose through a random procedure and then asserting An for that r? Or are we making an assertion about any randomly chosen string?

Kind-of both: that’s why we mentioned both “PA_\omega” and “PA_{1/2}”. We wrote “A_n(r)” not “A_n”, however, to make sure the axioms themselves are concrete, for particular strings r.

Let me say more clearly:

assume that we have fixed some threshold probability and choose parameters in such a way that the probability to get an false axiom is less than $\varepsilon$. Let $A$ be a formula that is not provable in PA. Then the probability that $A$ is provable in the extended theory (obtained by adding random axioms as described), cannot exceed $\varepsilon$.

Indeed, let us assume for simplicity that only one axiom is added. We consider strings of length $N$ and choose some $c$ that the inequality $K(x)>c$ is true for random $x$ of length $N$ with probability greater than $1-\varepsilon$. The we toss a coin $N$ times, get a string $r$ of length $N$ and add the statement $K(r)>c$. Let us show that $A$ (the formula not provable in PA) will be a consequence of $K(r)>c$ with probability at most $\varepsilon$.

Indeed, let us consider all the strings $r$ such that $K(r)>c$ implies $A$ in PA. Let $R=\{r_1,\ldots,r_M\}$ be the set of all those $r$. We assume that the fraction of $R$ is greater than $\varepsilon$. The implication

$$

K(r_1)>c \lor K(r_2)>c \lor\ldots\lor K(r_M)>c \Rightarrow A

$$

is provable in PA. But the disjunction in the left-hand side is provable in PA, too, because we know that the fraction of strings s.t. $K(r)\le c$ is less than $\varepsilon$. So $A$ should be also provable, and we get a contradiction.

So — if I understand correctly your question — the answer is negative: such an axiom cannot prove something not provable in PA with high probability (greater than probability to prove something false).

very nice

“Another thread is the search for new axioms that can be added to PA or set theory that increase its power, allowing more theorems to be provable. No additional axioms can escape incompleteness and preserve consistency, but new axioms can expand the set of provable true sentences. … What I plan to discuss today is a different type of axiom that can be added to PA. … Our plan is to add many such statements to PA. … Whatever choice is made, clearly there exists a sound theory with infinitely many of these axioms—and moreover we have defined an infinite coin-flipping process that has a good chance of obtaining one. Call the theory PA_omega.”

This raises two interesting points about extending PA.

A1. The first is that if an aim of Computability Theory is to address finitary concerns unequivocally, then such concerns are only addressable in Recursive Arithmetic and the first-order Peano Arithmetic PA; they cannot rely upon – or appeal to – a language that admits limit ordinals such as Cantor’s omega.

A2. To see this, let [G(x)] denote the PA-formula:

[x=0 v ~(Ay)~(x=y’)]

A3. This translates, under every unrelativised interpretation of PA, as:

If x denotes an element in the domain of an unrelativised interpretation of PA, either x is 0, or x is a ‘successor’.

A4. Further, in every such interpretation of PA, if G*(x) denotes the interpretation of [G(x)] then:

(a) G*(0) is true;

(b) If G*(x) is true, then G*(x’) is true.

A5. Hence, by Goedel’s completeness theorem:

(c) PA proves [G(0)];

(d) PA proves [G(x) => G(x’)].

(Goedel’s Completeness Theorem: In any first-order predicate calculus, the theorems are precisely the logically valid well-formed formulas, i. e. those that are true in every model of the calculus.)

A6. Further, by Generalisation:

(e) PA proves [(Ax)(G(x) => G(x’))];

(Generalisation in PA: [(Ax)F] follows from [F].)

A7. Hence, by Induction:

(f) [(Ax)G(x)] is provable in PA.

(Induction Axiom Schema of PA: For any formula [F(x)] of PA:

[F(0) => ((Ax)(F(x) => F(x0)) => (Ax)F(x))].)

A8. In other words, except 0, every element in the domain of any unrelativised interpretation of PA is a ‘successor’. Further, x can only be a ‘successor’ of a unique element in any such interpretation of PA.

Moreover, the Induction Axiom Schema of PA does not allow us to bypass this constraint by introducing an ‘actual’ (or ‘completed’ ) infinity disguised as an arbitrary constant (such as omega) into either the language, or a putative non-standard model, of PA.

A9. Now, since Cantor’s first limit ordinal omega is not the ‘successor’ of any ordinal in the sense required by the PA axioms, and since there are no infinitely descending sequences of ordinals in any model of a set-theory, PA and any sub-system of ZF – such as the sub-system ACA_0 of the second order Peano Arithmetic Z_2 – cannot have a common model, and so we cannot consistently extend PA even to ACA_0 simply by the addition of more axioms.

(This of course challenges the conventional wisdom that ACA_0 is conservative over PA, since an implicit premise in the proof of this Theorem is that we can add omega as a constant to PA – along with necessary supporting axioms – without inviting inconsistency.)

B1. The second point is that PA does not allow the addition of any fundamentally new axioms that are independent of the Peano-Dedekind axioms.

Reason: We can show that a PA-formula is PA-provable if, and only if, there is a Turing machine which can provide evidence that the formula interprets as Tarskian-true in any model of PA.

In other words, although PA is not complete with respect to the true propositions under the standard interpretation of PA, it cannot be ‘extended’ in any significant sense to yield “…a fixed sound theory with randomness axioms PA_omega” where the additional axioms that are not recursively enumerable, since PA is ‘algorithmically’ complete with respect to the ‘algorithmically’ true propositions under the standard interpretation of PA.

B2. This seems to be a reflection of the fact that the true propositions under the standard interpretation of PA do not yield a finitary model for PA, whilst the ‘algorithmically true’ propositions under the standard interpretation of PA do (which implies that PA is not omega-consistent, and so Goedelian incompleteness is inconsequential), as detailed in:

“Kurt Goedel’s famous Incompleteness Theorem helped launch several different threads of research. It is not possible to build a formal system that can generate all and only the true statements about integers, whether one uses Peano’s system or others. There is always a true statement about integers that your system cannot prove.”

It is true that Goedel’s reasoning has shown with respect to the first-order Peano Arithmetic PA that:

‘There is always a true statement about integers that your system cannot prove’.

However the assertion that:

‘It is not possible to build a formal system that can generate all and only the true statements about integers, whether one uses Peano’s system or others’

always needs qualification, since it only holds if we assume that PA is omega-consistent and such an assumption is computationally dangerous for the following reasons.

1. PA is defined omega-consistent if, and only if, there is no PA-formula [F(x)] such that:

(a) [~(Ax)F(x)] is PA-provable;

(b) [(R(n)] is PA-provable for any given PA-numeral [n].

2. Now (a) and (b) hold if, and only if:

(c) there is a model M of PA in which we can always interpret a formula such as [~(Ax)~F(x)] as:

‘There is some natural number n such that F(n) is true in the structure N of the natural numbers under the interpretation M.’

3. Although there is, indeed, an interpretation under which (c) holds – namely the standard interpretation of PA – the interpretation does not define a finitary model of PA.

4. To see why, we need to note that there is an element of obscurity in the very first pages of any text on first-order logic.

Every such text INFORMALLY instructs that the formula [(Ex)F(x)] is to be interpreted under ANY interpretation I of the logic as (Ex)F*(x), where F*(x) is the interpretation of the PA-formula [F(x)] under I, and (Ex)F*(x) is an abbreviation of:

‘There is some s in the domain D of the interpretation such that F*(s) is true in D’.

5. At the same time, every such text FORMALLY instructs further that the formula [(Ex)F(x)] is merely an abbreviation of the formula [~(Ax)~F(x)].

6. The above instructions today underpin mainstream mathematical thought and practice, even though Brouwer had pointed out as early as 1908 that the presumption that the formal expression denoted by [(Ex)F(x)] can always be interpreted under ANY interpretation I of the logic as the proposition denoted by (Ex)F*(x) is logically questionable.

7. What is not obvious is that, with the development of computational theory as the basis for finitary reasoning, it can be shown that the assumption is not logically sustainable.

In other words the assumption becomes invalid when we accept the values of a simple functional (computational) language as specifying evidence for the truth and falsity of the propositions in a constructive logic.

8. To see this note that all standard texts appeal to Tarski’s inductive definitions of the satisfaction and truth of the formulas of first-order logic under an interpretation, and that Tarski’s definitions uniquely specify when the formula [(Ax)F(x)] is true under the interpretation.

9. Following Tarski, we can define the satisfaction and truth of the atomic and compound formulas of PA under an ‘algorithmic’ interpretation S so that:

(d) the natural number n satisfies a formula [F(x)] under S if, and only if, there is a Turing machine which provides evidence that F*(n) is true in the structure N of the natural numbers;

(Note that this also holds under the standard interpretation of PA.)

(e) [(Ax(F(x)] is true under S if, and only if, there is a TM which provides evidence that, for any given natural number n, F*(n) is true in N under S.

(Although not immediately obvious, this does not hold under the standard interpretation of PA.)

10. It is not difficult to see that PA cannot be omega-consistent if there is a PA-formula [R(x)] such that there is no TM which can provide evidence that, for any natural number n, the formula R*(n) is true in N under S even when, for any given natural number n, there is always a TM which gives evidence that R*(n) is true under S in N.

In other words, PA cannot be omega-consisent if there is an arithmetical relation that – like Chaitin’s Omegas – is algorithmically uncomputable, even though it is instantiationally verifiable.

The proof that there is such a relation is given in:

11. Since it also then follows that a PA-formula [(Ax)F(x)] is PA-provable if, and only if, F*(x) is true in N under S, the interpretation S of PA defines a finitary model of PA such that a PA-formula is PA-provable if, and only if, it’s interpretation is algorithmically computable under the standard interpretation of PA.

We cannot therefore assert unconditionally that ‘It is not possible to build a formal system that can generate all and only the true statements about integers, whether one uses Peano’s system or others’.

Umm what about Rosser’s theorem that improves Godel’s theorem to eliminate the dependence on w-consistancy?

Instead of constructing a sentence saying “I have no proof” instead the sentence says “If I have a proof there is a shorter proof of my negation. Call this sentence R. This sentence is clearly \Pi_1 (for all either p is not a proof or bounded search for shorter proof). So if ~R (R has a proof and there is no shorter proof of ~R) is true in in N it has a proof (coded by an actual natural number) by \Sigma_1 representability. But since ~R is true R must have a proof and that proof must be shorter than that for ~R so must also be standard. Hence CON(PA)-> R

But R entails that if any standard p codes a proof of R then PA is inconsistant and since if there is a proof of ~R then PA |- R it follows that CON(PA)-> ~Prf(R) & ~ Prf(~R). So Consistency is sufficient.

“…there are many other striking examples of combinatorial principles that lie beyond PA…”

The reason for this seems quite straightforward (even if not obvious), since many combinatorial proofs appeal to recursive properties.

1. Now a recursive function such as F(x) can be represented in PA by an arithmetical formula [A(x, y)] such that, for any given natural numbers m and n:

(i) F(m) = n if, and only if, A*(m, n) is true in the structure N of the natural numbers, where A*(x, y) is the interpretation of [A(x, y)] in N.

(ii) [(E_1 y)A(m, y)] is PA-provable.

(Here ‘E_1 y’ denotes uniqueness, and is to be interpreted as ‘There is one, and only one, natural number y such that …’.)

2. Now even though F(x) is recursive, and therefore algorithmically computable, it can be shown that there is no TM that can provide evidence whether, for any given natural numbers m and n, A*(m, n) is true or not in N even though, for any given natural numbers m and n, there is always a TM that can provide evidence whether A*(m, n) is true or not.

(In other words, A*(x, y) is algorithmically verifiable, but not algorithmically computable.)

3. Further the PA formula [(E_1 y)A(x, y)] is PA-provable if, and only if, there is a TM that, for any given natural number m, will provide evidence that there is a unique n such that A*(m, n) is true in N (see my previous post above).

4. Hence [(E_1 y)A(x, y)] is not PA-provable, even though it interprets as an arithmetical relation that expresses true combinatorial properties.

The following result, and obvious generalizations to probabilities far greater than 1/2, indicate that you cannot prove any new statements in any reasonable theory T using coin flips.

Let p be a positive integer. We are interested in sentences of the form phi(n*), where 1 <= n <= 2^p.

Consider the following implication alpha(n), for 1 <= n <= 2^p:

if phi(n*) then |{m: 1 <= m = 2^p-1

formalized in T.

THEOREM. Let psi be a sentence in the language of T. Suppose the number of 1 <= n <= 2^p such that T + alpha(n) proves psi is less than 2^p-1. Then T proves psi.

Proof: Assume the Theorem is false. Suppose 1 <= n <= 2^p, and T + alpha(n) proves Con(T). Then T + not(phi(n*)) proves Con(T). Hence for at least 2^p-1 n's in [1,2^p], T + not(phi(n*)) proves Con(T). Also

T + {m: 1 <= m = 2^p-1 proves psi.

Note that for more than 2^p-1 n’s in [1,2^p}, we have

T + not(psi) proves phi(n*).

Hence

T + not(psi) proves {m: 1 <= m = 2^p-1.

Hence T + not(psi) proves psi.

Hence T proves psi.

The librarian’s name was George Berry.

http://www.bookrags.com/research/bertrand-russell-and-the-paradoxes–scit-06123/

I don’t think this is really the right framework to go about thinking about this question.

Since you have to add axioms in a non-r.e. way you can never get ahold of any specific theory. Instead you have to talk about the probability of a theory with axioms added according to such and such distribution proving some theorem. Moreover, you have to say that a theory selected in such a manner is only consistent with such and such probability (note b/c you are only adding Pi_1 statements if your theory turns out to be consistent it will actually be true of the standard model).

Worse since you prove in PA that PA_w will be consistent with probability 1-e that means if \phi is independent of PA then PA+~\phi is consistent and we can prove in said system that adding the axioms A_n(r) to PA+~\phi is consistent with probability 1-e so the probability that whatever PA_w is chosen proves \phi (or ~\phi) can be no greater than e.

Of course you could get lucky and end up with A_1(0) proving CON(PA), indeed you could always pick the constants such that A_1(0) is false if the procedure searching for a proof of 0=1 from PA halts. However, this is really the best you can do. Eventually 1/2^(k+u) must get arbitrarily smaller than e and thus one can use the machine existance theorem to easily construct a machine that enumerates all strings of r of length n with n greater than some constant where A_1(r) &…A_n(r) -> Con(PA). So for any particular universal machine there is some length n after which no A_n can contribute to proving Con(PA).

—

The problem with your formalism is that PA_w never gets to assert that the sequence r is random. It only gets to assert that finite initial segments are incompressable and compactness means than anything not proved from some finite subset of the A_n(r) isn’t provable from the whole thing. Indeed, given \phi independent of PA by Koneig’s theorem one can show that for any episilon there is an n such that the contribution of the A_i(r) for i > n to the probability that PA_w proves \phi is less than epsilon.

I think a better framework to look at is what Ted Slaman is doing with adding axioms to 2nd order arithmentic (i.e. two sorted first order language) that assert the existance of random reals and thus let one harness the power of saying that the whole infinite sentence is random.

In other words the entire strength of PA_w is really coming from your choice of u which is really equivalent to just specifying finitely much info about 0′ in a contorted way.