A Proof Of The Halting Theorem
Toward teaching computability and complexity simultaneously
|Large Numbers in Computing source|
Wilhelm Ackermann was a mathematician best known for work in constructive aspects of logic. The Ackermann function is named after him. It is used both in complexity theory and in data structure theory. That is a pretty neat combination.
I would like today to talk about a proof of the famous Halting Problem.
This term at Georgia Tech I am teaching CS4510, which is the introduction to complexity theory. We usually study general Turing machines and then use the famous Cantor Diagonal method to show that the Halting Problem is not computable. My students over the years have always had trouble with this proof. We have discussed this method multiple times: see here and here and here and in motion pictures here.
This leads always to the question, what really is a proof? The formal answer is that it is a derivation of a theorem statement in a sound and appropriate system of logic. But as reflected in our last two posts, such a proof might not help human understanding. The original meaning of “proof” in Latin was the same as “probe”—to test and explore. I mean “proof of the Halting Problem” in this sense. We think the best proofs are those that show a relationship between concepts that one might not have thought to juxtapose.
Diagonal and Impossible
The question is how best to convince students that there is no way to compute a halting function. We can define Turing machines in a particular way—or define other kinds of machines. Then we get the particular definition
How can we prove that is not computable? We want to convey not only that this particular is uncomputable, but also that no function like it is computable.
Trying the diagonal method means first defining the set
We need to have already defined what “accept” means. OK, we show that there is no machine whose set of accepted strings equals . Then what? We can say that the complementary language is not decidable, but we still need another step to conclude that is uncomputable. And when you trace back the reason, you have to fall back on the diagonal contradiction—which feels disconnected and ad hoc to the particular way and are defined.
Ken in his classes goes the route first, but Sipser’s and several other common textbooks try to hit directly. The targeted reason is one that anyone can grab:
It is impossible for a function to give the value —or any greater value.
Implementations of this, however, resort to double loops to define . Or like Sipser’s they embed the “” idea in the proof anyway, which strikes us as making it harder than doing separate steps as above. We want the cleanest way.
Here is the plan. As usual we need to say that represents a computation. If the computation halts then it returns a result. We allow the machine to return an integer, not just accept or reject. If the machine does not halt then we can let this value be undefined; our point will be that by “short-circuit” reasoning the question of an undefined value won’t even enter.
Now let be defined as the halting function above.
Theorem 1 The function is not computable.
Proof: Define the function as follows:
Suppose that is computable. Then so is . This is easy to see: just do the summation and when computing compute the part first. If it is then it adds nothing to the summation, so it “short-circuits” and we move on to the next . If it is then we compute and add that to the summation. Let stand for the summation before the last term; then .
Now if the theorem is false, then there must be some such that the machine computes . But then
This is impossible and so the theorem follows.
What Ken and I are really after is relating this to hierarchies in complexity classes. When the are machines of a complexity class then the functions and are computable. It follows that is not computed by any and so does not belong to . What we want is to find similar functions that are natural.
Ackermann’s famous function does this when is the class of primitive recursive functions. There are various ways to define the machines , for instance by programs with counted loops only. The that tumbles out is not primitive recursive—indeed it out-grows all primitive recursive functions. Showing that does likewise takes a little more formal work.
In complexity theory we have various time and space hierarchy theorems, say where is . For any time constructible , we can separate from by a “slowed” diagonalization. The obtained this way, however, needs knowledge of and its constructibility to define it. By further “padding” and “translation” steps, one can laboriously make it work for , for any fixed , and a similar theorem for deterministic space needs no log factors at all. This is all technical in texts and lectures.
Suppose we’re happy with , that is, with a non-“tight” hierarchy. Can we simply find a natural that works for ? Or suppose is a combined time and space class, say machines that run in time and space simultaneously. Can we possibly get a natural that is different from what we get by considering time or space separately?
We’d like the “non-tight” proofs to be simple enough to combine with the above proof for halting. This leads into another change we’d like to see. Most textbooks define computability several chapters ahead of complexity, so the latter feels like a completely different topic. Why should this be so? It is easy to define the length and space usage of a computation in the same breath. Even when finite automata are included in the syllabus, why not present them as special cases of Turing machines and say they run in linear time, indeed time ?
Is the above Halting Problem proof clearer than the usual ones? Or is it harder to follow?
[some word fixes]