Skip to content

A Proof Of The Halting Theorem

September 19, 2016

Toward teaching computability and complexity simultaneously

Large Numbers in Computing source

Wilhelm Ackermann was a mathematician best known for work in constructive aspects of logic. The Ackermann function is named after him. It is used both in complexity theory and in data structure theory. That is a pretty neat combination.

I would like today to talk about a proof of the famous Halting Problem.

This term at Georgia Tech I am teaching CS4510, which is the introduction to complexity theory. We usually study general Turing machines and then use the famous Cantor Diagonal method to show that the Halting Problem is not computable. My students over the years have always had trouble with this proof. We have discussed this method multiple times: see here and here and here and in motion pictures here.

This leads always to the question, what really is a proof? The formal answer is that it is a derivation of a theorem statement in a sound and appropriate system of logic. But as reflected in our last two posts, such a proof might not help human understanding. The original meaning of “proof” in Latin was the same as “probe”—to test and explore. I mean “proof of the Halting Problem” in this sense. We think the best proofs are those that show a relationship between concepts that one might not have thought to juxtapose.

Diagonal and Impossible

The question is how best to convince students that there is no way to compute a halting function. We can define Turing machines {M_x} in a particular way—or define other kinds of machines. Then we get the particular definition

\displaystyle  H(x,y) = 1 \text{ if the machine } M_x \text{ halts on input } y;\;\; H(x,y) = 0 \text{ otherwise.}

How can we prove that {H(x,y)} is not computable? We want to convey not only that this particular {H(x,y)} is uncomputable, but also that no function like it is computable.

Trying the diagonal method means first defining the set

\displaystyle  D = \{x: M_x \text{ does not accept } x\}.

We need to have already defined what “accept” means. OK, we show that there is no machine {M_d} whose set of accepted strings equals {D}. Then what? We can say that the complementary language {K = \{x: M_x \text{ \emph{does} accept } x\}} is not decidable, but we still need another step to conclude that {H} is uncomputable. And when you trace back the reason, you have to fall back on the diagonal contradiction—which feels disconnected and ad hoc to the particular way {D} and {H} are defined.

Ken in his classes goes the {D} route first, but Sipser’s and several other common textbooks try to hit {H} directly. The targeted reason is one that anyone can grab:

It is impossible for a function {f} to give the value {f(n) = f(n) + 1}—or any greater value.

Implementations of this, however, resort to double loops to define {f(n)}. Or like Sipser’s they embed the “{D}” idea in the proof anyway, which strikes us as making it harder than doing separate steps as above. We want the cleanest way.

The Proof

Here is the plan. As usual we need to say that {M_{x}(y)} represents a computation. If the computation halts then it returns a result. We allow the machine {M_{x}} to return an integer, not just accept or reject. If the machine does not halt then we can let this value be undefined; our point will be that by “short-circuit” reasoning the question of an undefined value won’t even enter.

Now let {H(x,y)} be defined as the halting function above.

Theorem 1 The function {H(x,y)} is not computable.

Proof: Define the function {f(y)} as follows:

\displaystyle  f(y) = 1 + \sum_{x=0}^{y} H(x,y) \cdot M_{x}(y).

Suppose that {H} is computable. Then so is {f(y)}. This is easy to see: just do the summation and when computing {H(x,y) \cdot M_{x}(y)} compute the {H} part first. If it is {0} then it adds nothing to the summation, so it “short-circuits” {M_{x}(y)} and we move on to the next {y}. If it is {1} then we compute {M_{x}(y)} and add that to the summation. Let {A_y = \sum_{x=0}^{y - 1} H(x,y) \cdot M_{x}(y)} stand for the summation before the last {x = y} term; then {A_y \geq 0}.

Now if the theorem is false, then there must be some {n} such that the machine {M_n} computes {f(y)}. But then

\displaystyle  f(n) = 1 + \sum_{x=0}^n H(x,n)M_{x}(n) = 1 + A_n + M_{n}(n) \ge 1 + M_{n}(n) = 1 + f(n).

This is impossible and so the theorem follows. \Box

Extending Simplicity

What Ken and I are really after is relating this to hierarchies in complexity classes. When the {M_{x}} are machines of a complexity class {\mathsf{C}} then the functions {H} and {f} are computable. It follows that {f} is not computed by any {M_n} and so does not belong to {\mathsf{C}}. What we want is to find similar functions {f'} that are natural.

Ackermann’s famous function {A(n)} does this when {\mathsf{C}} is the class of primitive recursive functions. There are various ways to define the machines {M_{x}}, for instance by programs with counted loops only. The {f(n)} that tumbles out is not primitive recursive—indeed it out-grows all primitive recursive functions. Showing that {A(n)} does likewise takes a little more formal work.

In complexity theory we have various time and space hierarchy theorems, say where {\mathsf{C}} is {\mathsf{DTIME}[O(n^2)]}. For any time constructible {t(n) = \omega(n^2 \log n)}, we can separate {\mathsf{C}} from {\mathsf{C' = DTIME}[t(n)]} by a “slowed” diagonalization. The {f'} obtained this way, however, needs knowledge of {t(n)} and its constructibility to define it. By further “padding” and “translation” steps, one can laboriously make it work for {t(n) = \omega(n^2 (\log n)^{\epsilon})}, for any fixed {\epsilon > 0}, and a similar theorem for deterministic space needs no log factors at all. This is all technical in texts and lectures.

Suppose we’re happy with {\mathsf{C' = DTIME}[n^3]}, that is, with a non-“tight” hierarchy. Can we simply find a natural {f'} that works for {\mathsf{C'}}? Or suppose {\mathsf{C}} is a combined time and space class, say machines that run in {O(n^2)} time and {O(n)} space simultaneously. Can we possibly get a natural {\mathsf{C'}} that is different from what we get by considering time or space separately?

We’d like the “non-tight” proofs to be simple enough to combine with the above proof for halting. This leads into another change we’d like to see. Most textbooks define computability several chapters ahead of complexity, so the latter feels like a completely different topic. Why should this be so? It is easy to define the length and space usage of a computation in the same breath. Even when finite automata are included in the syllabus, why not present them as special cases of Turing machines and say they run in linear time, indeed time {n+1}?

Open Problems

Is the above Halting Problem proof clearer than the usual ones? Or is it harder to follow?

What suggestions would you make for updating and tightening theory courses? Note some discussion in the comments to two other recent posts.

[some word fixes]

How Hard, Really, is SAT?

September 4, 2016

A new longest computer proof makes us wonder about things from security to the Exponential Time Hypothesis


Marijn Heule, Oliver Kullmann, and Victor Marek are experts in practical SAT solvers. They recently used this ability to solve a longstanding problem popularized by Ron Graham.

Today Ken and I want to discuss their work and ask a question about its ramifications.
Read more…

Descending Proofs Into Algorithms

August 28, 2016

A way to make indirect reasoning more palpable

Wikimedia Commons source

Nicholas Saunderson was the fourth Lucasian Professor at Cambridge, two after Isaac Newton. He promoted Newton’s Principia Mathematica in the Cambridge curriculum but channeled his original work into lecture notes and treatises rather than published papers. After his death, most of his work was collected into one book, The Elements of Algebra in Ten Books, whose title recalls Euclid’s Elements. It includes what is often credited as the first “extended” version of Euclid’s algorithm.

Today we raise the idea of using algorithms such as this as the basis for proofs.
Read more…

A Surprise For Big-Data Analytics

August 14, 2016

A simple but interesting issue with analyzing high-dimensional data


Peter Landweber, Emanuel Lazar, and Neel Patel are mathematicians. I have never worked with Peter Landweber, but have written papers with Larry and Laura Landweber. Perhaps I can add Peter one day.

Today I want to report on a recent result on the fiber structure of continuous maps.

Read more…

Do results have a “teach-by-date?”

August 9, 2016

Teaching automata theory


Noam Chomsky is famous for many many things. He has had a lot to say over his long career, and he wrote over 100 books on topics from linguistics to war and politics.

Today I focus on work that he pioneered sixty years ago.

Yes sixty years ago. The work is usually called the Chomsky hierarchy(CH) and is a hierarchy of classes of formal grammars. It was described by Noam Chomsky in 1956 driven by his interest in linguistics, not war and politics. Some add Marcel-Paul Schützenberger’s name to the hierachry. He played a crucial role in the early development of the theory of formal languages—see his joint
paper with Chomsky from 1962. Read more…

The Mathematics Of Dan’s Inferno

July 25, 2016

A possible error with mathematical ramifications

Non-technical fact-check source

Dan Brown is the bestselling author of the novel The Da Vinci Code. His most recent bestseller, published in 2013, is Inferno. Like two of his earlier blockbusters it has been made into a movie. It stars Tom Hanks and Felicity Jones and is slated for release on October 28.

Today I want to talk about a curious aspect of the book Inferno, since it raises an interesting mathematical question. Read more…

The World Turned Upside Down

July 10, 2016

Some CS reflections for our 700th post

Lin-Manuel Miranda is seen in New York, New York on Tuesday September 2, 2015.
MacArthur Fellowship source

Lin-Manuel Miranda is both the composer and lyricist of the phenomenal Broadway musical Hamilton. A segment of Act I covers the friendship between Alexander Hamilton and Gilbert du Motier, the Marquis de Lafayette. This presages the French co-operation in the 1781 Battle of Yorktown, after which the British forces played the ballad “The World Turned Upside Down” as they surrendered. The musical’s track by the same name has different words and melodies.

Today we discuss some aspects of computing that seem turned upside down from when we first learned and taught them.
Read more…