How teaching helps you understand proofs

Stanislav Žák is a theorist who has made important contributions to complexity theory. He was at the Institute for Computation Techniques of the Technical University in Prague thirty years ago when he proved the theorem below. He is now affiliated with the Institute of Computer Science of the Czech Academy of Sciences, where he has just written a new paper on “A Turing Machine Distance Hierarchy” with his colleague Jirı Šıma. He should not be confused with Professor Stanislaw Żak of Purdue’s ECE Department.

Today I want to discuss a classic theorem of Žák that is in the textbooks, and why it is a bit tricky to understand.

The power of the following theorem is in its statement, but the beauty is in its proof:

Theorem: Let ${f(n)}$ and ${g(n)}$ be constructible functions with ${f(n+1) = o(g(n)}$. Then nondeterministic time Turing Machines that run in time ${g(n)}$ compute more than those that run in time only ${f(n)}$.

In symbols,

$\displaystyle \mathsf{NTIME}(f(n)) \subset \mathsf{NTIME}(g(n)).$

Here ${\subset}$ is proper subset. An example of its power is that it separates nondeterministic time ${n}$ from time ${n\log_{*} n}$, which is much stronger than anything we can prove for deterministic time. This result uses a clever simulation of Ron Book and Sheila Greibach which I have discussed before.

Teaching

One of the side-effects of teaching is that when you present material you really have to know it perfectly. If your understanding is approximate, then it is hard to get students to completely understand. You may be able to fill the details in later, but at the board you’d better—in my experience—have it down exactly.

So to prepare for my class on the above theorem, I decided that the book’s proof was not that easy to follow. I decided to write the proof up in my own way. I hope this helps you understand the argument.

I hasten to add that there is nothing wrong or unclear about Žák’s proof. It was the rendition in the textbook that I found a bit unclear. Oh well.

Defining the Diagonal Machine

The goal is to diagonalize over NTMs that run in some polynomial time with another NTM that runs in a higher polynomial. The argument can be made much tighter, but I believe the slippery part of the argument is what I will focus on here.

Let’s start by picking a series of fast growing numbers

$\displaystyle a_{1} < a_{2} < \dots$

that we fix later. Rather than magically set them now, I prefer to delay their definition. Think of them as growing fast, and you will soon see that they need only satisfy two simple constraints.

Let’s now define a NTM ${D(x)}$ that operates as follows. It diagonalizes only on inputs of the form ${1^n}$—on other inputs we may simply say it accepts. On inputs of the form ${1^{n}}$ there are two cases depending on the relation of ${n}$ and the sequence of the ${a_{j}}$‘s. Thus the first requirement on the sequence ${a_{j}}$ is that ${D}$ itself can easily decide which of the following two cases holds.

Case I: In this case ${n = a_{i+1}}$ for some ${i}$. Here ${D}$ simulates

$\displaystyle M_{i}(1^{a_{i}+1}),$

for ${n^{2}}$ time, and then flips its answer. If the computation above does not halt it accepts; otherwise, it does the opposite. This is the diagonalization step. This can be done efficiently because ${n=a_{i+1}}$ is much larger than the input ${1^{a_{i}+1}}$. This is the key:

$\displaystyle a_{i+1} \gg a_{i} + 1.$

How much larger? It must be exponentially larger so that ${D}$ can simulate the nondeterministic computation by a brute-force manner. Nothing clever here. Brute force. This is the second and last requirement on the sequence.

Case II: In this case ${n}$ is not equal to any of the ${a_{j}}$‘s. Then there is a unique ${i}$ so that

$\displaystyle a_{i} < n < a_{i+1}.$

In this case ${D}$ just simulates ${M_{i}(1^{n+1})}$ for ${n^{2}}$ steps. This is not a typo—there may be some—but the exponent is really ${n+1}$. This small point is critical to make it all work, as you will see it a moment.

That is the description of ${D}$. It clearly is a NTM that runs in time ${n^{3}}$. I am being loose with the polynomials, since they are not the tricky part.

Okay, as with any diagonalization proof, we now get to the key step where we show that some computation of ${D}$ must take on two distinct values, which is of course a contradiction. So we assume that ${D}$ itself can be computed by some NTM ${M_{i}}$ that runs too fast. We will then show that this leads to a contradiction.

Finishing the Proof

Still with me? We know that for some ${i}$, ${D(x) = M_{i}(x)}$—let’s call this assumption “A”. Let’s look at ${D(1^{a_{i+1}})}$. This is the value of ${D}$ that will be inconsistent.

By case I, we know that (*)

$\displaystyle D(1^{a_{i+1}}) \neq M_{i}(1^{a_{i}+1}).$

This is not, I repeat not, a contradiction. Note that the exponents are different. But we have case II to exploit. We will now show that it implies that (*) is false, which is our desired contradiction.

$\displaystyle \begin{array}{rcl} D(1^{a_{i+1}}) &=& M_{i}(1^{a_{i+1}}) \text{ by A} \\ &=& D(1^{a_{i+1}-1}) \text{ by case II} \\ &=& M_{i}(1^{a_{i+1}-1}) \text{ by A} \\ &=& D(1^{a_{i+1}-2}) \text{ by case II} \\ \vdots \\ &=& M_{i}(1^{a_{i}+1}). \text{ by A} \\ \end{array}$

These equalities imply that

$\displaystyle D(1^{a_{i+1}}) = M_{i}(1^{a_{i}+1}).$

But this is a contradiction with (*). Done.

The clever use of ${M_{i}}$ computing on input ${1^{n+\mathbf{1}}}$ should now be clear. It allows the chain of equalities to work.

I hope this helps.

Open Problems

There are functions ${f,g}$ such that ${f(n) = o(g(n)}$, but it is not true that ${f(n+1) = o(g(n)}$. An example is ${f(n) = 2^{2^n}}$, ${g(n) = nf(n) = n2^{2^n}}$. Can we diagonalize—conveniently—in cases like that?

Finally Lance Fortnow and Rahul Santhanam in their paper have a newer way to prove this result—see their paper on “Robust Simulations and Significant Separations.”

Fixed picture thanks to Rostislav Horcik. Sorry for error.

1. September 13, 2013 2:45 am

I’ve always thought that this hierarchy theorem is due to Cook. Actually, Cook “only” proved it for polynomial time (if $r<s$, then $NTIME(n^r) \subset NTIME(n^s)$).

September 13, 2013 6:12 am

One of the side-effects of teaching is that when you present material you really have to know it perfectly. If your understanding is approximate, then it is hard to get students to completely understand. You may be able to fill the details in later, but at the board you’d better – in my experience – have it down exactly.

This is no doubt one of the reasons of the success of proof assistants. Whenever you have no student at your disposal, there’s still your computer! Proof assistants are to mathematics what chess programs are to chess – a great way to practice, even though they don’t completely replace real math students and chess players. Chess and math – among others – are essentially human activities.

3. September 13, 2013 6:46 am

I agree that noted side-effect of teaching (“when you present material you really have to know it perfectly”) is very positive for scientific researches. But what about negative side-effect? A teacher needs significant time for teaching work, so he may have not enough time for his researches. I see that the majority of computer scientists are teachers and main centers for computer science researches are universities. But teaching is talent (aptitude), not everybody has it. What do you think about?

September 13, 2013 4:51 pm

There are several stories of famous mathematicians who didn’t really like to teach their students, such as Jean Dieudonné who admitted having no attraction to that. In this regard, unofficial researchers are luckier than those in university. ;-) But Dieudonné nevertheless contributed actively in Bourbaki’s Elements of Mathematics which deeply reshaped most of the mathematics in the 1940-60’s. The Elements originated in a collective project formed in 1935 by some of the greatest French mathematicians of that time: writing out some definitive advanced calculus material for their university students. So, teaching can sometimes yield really positive outcomes.

September 14, 2013 2:05 pm

Another celebrated bad teacher/great researcher was Lars Onsager: “I wouldn’t say that Lars Onsager was the worst lecturer in the world, but he was certainly in contention”

A celebrated great lecturer/great researcher/bad adviser was Richard Feynman. At least one of the world’s more celebrated surgeons owes his post-physics medical career — and his substantial wealth — to the enduring dislike for quantum field theory that his thesis adviser (Feynman) inspired in him.

September 14, 2013 7:18 pm

An aphorism that can be read in two ways — or more, and all good! — is Vladimir Arnold’s “Every mathematician knows that it is impossible to understand any elementary course in thermodynamics.”

So are mathematicians poor learners, or are physical scientists poor teachers?

A Lipton/Regan/Pip column on “Aphorisms that we read in more than one way” would be lots of fun (IMHO).

Consider for example Henry G. Forder’s “The virtue of a logical proof is not that it compels belief but that it suggests doubts.”

Inspiring wisdom or tautologically foolish?

September 15, 2013 4:02 am

Teaching aphorisms can appear also in dual pairs; e.g. Henry Forder’s “The virtue of a logical proof is not that it compels belief but that it suggests doubts” is rhetorically dual to Scott Aaronson’s “Math could be defined as that which can still be trusted, even when you can’t trust anything else.”

Is one right? the other? both? neither? Which is best suited to teaching students the nature of a mathematical proof?

• September 15, 2013 5:05 am

Physical scientist can work in a research institute, where he has no students to teach, as well as a programmer can work in industry. But it seems that this way is impossible for many pure mathematicians, because the number of mathematics research institutes is very small. So majority of mathematicians must work in universities, where they must teach students. Or they have to work in applied mathematics in physical research institutes or in industry.

September 15, 2013 6:14 am

Math can only be trusted without doubt once it has been put into computerized form and that’s what the Bourbaki revolution was all about. Unfortunately, this is also the stage when the proofs become hardly understandable to most humans. Thus, a right balance must be sought between proofs “which suggest doubts” and proofs “which can still be trusted when you can’t trust anything else”. The doubts can be seen as a strong incentive to redo for yourself – sometimes subconsciously – some of the steps that could be carried out by automated check. It gives you a personal view of the proof… and computers can’t do this for you!

September 15, 2013 12:06 pm

Serge avers “Math can only be trusted without doubt once it has been put into computerized form.”

LOL  computerized proof-checkers teach us to “concentrate our doubts” upon the postulates of the theorem and/or the axioms of the proof-checker.

For engineers especially, such foundation-questioning is a supremely practical exercise . Perhaps that is the ranks of iconoclastic researchers include relatively many trained engineers (Leonardo, Dirac, von Neumann, Shannon, Wittgenstein)?

A relevant meditation is from William Thurston’s On Proof and Progress in Mathematics:

On the most fundamental level, the foundations of mathematics are much shakier than the mathematics that we do. Most mathematicians adhere to foundational principles that are known to be polite fictions. For example, it is a theorem that there does not exist any way to ever actually construct or even define a well-ordering of the real numbers.

There is considerable evidence (but no proof) that we can get away with these polite fictions without being caught out, but that doesn’t make them right.

In regard to computational complexity theory — and particularly in regard to the computational complexity of quantum dynamical simulation — the present-day postulates are less firmly established, and perhaps also more needlessly restrictive, than we quantum systems engineers could wish!

September 15, 2013 1:50 pm

I agree with you John, but that doesn’t contradict my previous comment as I was only questioning the proofs, not their postulates. One is reminded of Knuth’s famous quote: “Beware of bugs in the above code; I have only proved it correct, not tried it.” :)

I’ve often written on this blog that I felt there’s a problem in studying algorithms by using proofs, since – by the Curry-Howard isomorphism – they’re also a special kind of algorithm. I really think the static notion of “algorithm existence” will have to be replaced with the more dynamic notion of “algorithm finding”. Such a reversal of viewpoint was already started in Quantum Mechanics with its “observable variables” and I’m convinced it will eventually have to reach computational complexity theory.

September 15, 2013 2:58 pm

Those are well-considered remarks Serge, and thank you for posting them.

Perhaps we can enjoyably encapsulate them in an aphorism that dualizes John von Neumann’s remark “Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.”

“Anyone who appeals to Platonic oracles in deciding complexity-class membership, and yet quantifies complexity with reference to physical symplectomorphic flows, resides of course in limbo.”

September 15, 2013 3:49 pm

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.”

My only excuse is that Gödel did it first with his &latex \beta&-function!

Regards,

Bhup

September 15, 2013 3:10 pm

:)

• September 17, 2013 11:14 am

Serge posts :smile:

Serge, closely related to questions of mathematical diversity in postulates and axioms — and related in particular to the perennial Gödel’s Lost Letter theme of “Burning arrows! Are they allowed to do that?” — is the mathematical tradition of scary puzzles, of which a celebrated example is A trusted foreigner comes and says ‘I see a blue-eyed person.’”

The intersection of cognition, fear, and mathematics is fertile territory, and yet (as far as I know) only stage-magicians have systematically studied and exploited that territory … and the magicians aren’t talking! Because as everyone knows, :smile: and :lol: are close cultural cousins to :shock: !

• September 17, 2013 11:37 am

Serge, as a concrete example, in his TED-talk Brain Magic, beginning around minute 14:50, mentalist Keith Berry skillfully inhibits mathematical cognition by increasing the unconscious anxiety-level of his audience (especially the anxiety of Berry’s nervous on-stage volunteer).

The more math-and-science we know, the more vulnerable we are to Berry’s cognitive manipulations! It is natural to consider whether comparably powerful cognitive inhibition mechanisms act broadly — yet unconsciously — within the mathematical/academic community.

4. September 13, 2013 10:50 am

You’re got a couple O(g(n) instead of O(g(n)). No need to post this comment.

5. September 13, 2013 10:53 am

Hmmm, I thought I remembered that comments for GLL got vetted. Can I interest anyone in some fine Canadian pharmaceuticals, fake Rolex watches, and bogus P=NP proofs?

September 13, 2013 5:15 pm

Ha ha! Seems like Dick has suffered some embarassing moments in front of his students recently, struggling to complete the proof! Doesn’t matter – happens to all of us.

Seriously, every researcher should teach basic undergraduate subjects every once in a while, otherwise your own fundamentals keep getting weaker.

7. September 14, 2013 12:10 pm

One professor is talking to another one – “Yesterday I met really stupid student, I explained him a topic 10 times. I got it myself, but he could not”.

8. September 15, 2013 7:26 pm

You think the NSA has a sneaky solution to DL and RSA?

September 16, 2013 9:39 am

J,

I would not say sneaky. I do believe that these problems are likely to be in P. And also have pointed out that those who know the solutions might not tell us.

• September 16, 2013 11:13 am

However, QM provides algorithms for these from ‘special’ operations and people do think QM in general provides better algorithms like the non-intuitive Grover search for which a classical algorithm is not possible. It is possible that if a classical algorithm for any one of the currently deployed techniques is found, then it will have impact in QM. From this perspective it is hard to think of existence of a classical technique. Doesn’t existence of QM algorithms not affect your belief or do you believe in a possible polynomial hidden variable theory like that Scott Aaronson hints in one of his papers?

Also do you have the same opinion for OWFs in general? My opinion is that if DH falls for cyclic groups, it probably is not safe for any groups.