How proofs are like cities

Pierre L’Enfant was not a theorist, but a designer, and in particular the designer of the physical layout of Washington D.C.

Today I want to talk about why some proofs are hard to understand, not hard to discover.

What do proofs have to do with cities? At first glance they seem to have little in common, but I believe that understanding proofs and navigating cities have a great deal in common. I think that the reason some proofs can be hard to follow is related to why some cities are hard to navigate.

Washington, as in the US capital, has always been difficult for me to get around, with due respect to L’Enfant. I sometimes feel lost in the capital. Its main streets run in a diagonal pattern which is beautiful, but does not help me form a good mental spatial map. I also almost never drive a car around the city: I walk, take the Metro, or take a taxi. The latter have me let someone else do the navigating for me, which means that I can close my eyes and still get where I want to go. This is very pleasant and easy on me, yet does not make me learn the ins and outs of the streets of D.C.

I think when we read a proof it is like trying to navigate around a city. The layout is critical if we are to be successful in understanding the proof. Also we need to actively interactive with the proof. If you just sit back in a “taxi” and are driven around the proof, say at a talk or lecture, then you are much less likely to really understand the proof.

Let’s turn to discuss some connections between the understanding of cities and proofs. Perhaps these comments will help us both to write better proofs and to help us to better read proofs.

Cities And Proofs

${\bullet}$ Standard Layout: For all its size and immense density of people and vehicles and buildings, Manhattan is relatively easy to navigate around. The streets, to a first approximation, are laid out on a rectangular grid: so 41${^{st}}$ street is north of 33${^{rd}}$. The avenues run north and south and are numbered west to east—okay not exactly but close—for example, Sixth Avenue is also the Avenue of the Americas and Ninth is Columbus Avenue. So figuring out a rough location is pretty straightforward, especially if you stay away from the lower part of Manhattan.

What does this mean for a proof? We should, as much as possible, layout our proofs in a standard “grid” manner. Definitions, lemma’s, and proofs should be in as standard a layout as possible. Sometimes this is hard or even impossible, but proofs with “diagonal” structure are much harder to understand and should be avoided.

Stated another way, with a mixed metaphor: Traffic flow in a proof should be topologically sorted. You might have to refer back to check the source of the next logic step, how Theorem 5 depends on Lemmas 2 and 3, but you should never have to cycle around, where to prove Lemma 3 you need to understand part of the proof of Theorem 5. Another way to say this: the understanding of a lemma should never need to be “deferred” to a later result. When such “deferrence” happens, you have what Jacques Derrida called différance, which might be fine in literary theory, but shouldn’t be in math. This last reference is due to Ken, I defer to him on this, and simply add: keep the structure as simple as possible.

${\bullet}$ Good Guide Books: Any city is easier to navigate if their are well written guide books. Ones that explain how to get around, how to locate major landmarks, and how to navigate.

What does this mean for a proof? We should supply a guidebook with our proofs. If a proof is very short or simple, then a guide may be unnecessary. However, for a proof of any complexity having a good overview that explains where you are, where you are going, and how all fits together is invaluable. Terence Tao is a master at this: even some of his deepest theorems have an overview that at least allows you to get your bearings. Here is an overview from one of his papers.

There are three major ingredients. The first is Szemerédi’s theorem, which asserts that any subset of the integers of positive density contains progressions of arbitrary length. The second, which is the main new ingredient of this paper, is a certain transference principle. This allows us to deduce from Szemerédi’s theorem that any subset of a sufficiently pseudorandom set (or measure) of positive relative density contains progressions of arbitrary length. The third ingredient is a recent result of Goldston and Yildirim, which we reproduce here. Using this, one may place (a large fraction of) the primes inside a pseudorandom set of “almost primes” (or more precisely, a pseudorandom measure concentrated on almost primes) with positive relative density.

${\bullet}$ Districts: Many cities, even huge ones, have districts or neighborhoods. This modular structure is immensely useful in navigating around. Even Manhattan, for example, has neighborhoods: TriBeCa, West Village, Washington Heights to name just three out of dozens.

What does this mean for a proof? We should structure our proofs to have a neighborhood structure. Proofs that are modular often can be read much more easily. The modules can be understood separately and that helps the reader greatly. Often this modular structure is based on the use of powerful abstraction, which used properly can be very reader friendly. Used properly a highly abstract level proof can be quite clean, simple, and very understandable. Or used poorly the abstract level may hide the key issues that are being argued, and make the proof very hard to understand.

$\displaystyle \S$

The next two are things to be avoided.

${\bullet}$ Many side streets: Cities with lots of . dead-end streets, with numbering schemes that jump in unexpected ways, in naming schemes that are strange can be very hard to navigate.

What does this mean for a proof? We should avoid proofs with many side steps and many parts that are unneeded. I believe that side streets in proofs correspond to many cases. The more cases the more complex a proof is, usually. The reason is that we can leave a case out, or argue that case (iii) follows as case (ii), which is false. Lots of cases is a symptom that there is complexity, and that is usually an enemy of understanding.

Cases cannot always be avoided. In some areas of mathematics, for example finite group theory, special cases abound. One is quite likely to see a theorem like this:

Theorem: Every group of even order that does not have ${\dots}$ as a subgroup and has no elements of order ${p^{2}}$ where ${p \in \{3,5,17\}}$ satisfies ${\dots}$.

${\bullet}$ Crazy Structure: Some cities have endearing, but nutty structures. Pittsburgh where CMU is located, is famous for having the property: you may be able to see where you want to get to, but there is no way to get there. The many hills, rivers, and bridges make this happen. You can see where you want to be, but there seems to be no road that leads you there. Of course, there is a way to get there, the way is just hard to discover, and may involve you heading initially away from where you are going. It can be very confusing and makes for difficult navigation.

In Cleveland there was a road that self-intersects—crosses itself—at least when I was an undergraduate there. Really. Many cities also have roads that change names for no obvious reason, or become one-way at certain times of the day. All of this makes getting around a challenge. Atlanta, where I live, has many roads that are named

$\displaystyle \mathrm{Peachtree \ } X$

where ${X}$ is a modifier like: way, street, place, and so on.

What does this mean for a proof? We should be careful to avoid crazy structure. We should use logical names, keep notation from changing for no reason. We must avoid circular reasoning, even reasoning that is not circular but appears to be will be quite hard to follow.

A classic example of this are inductive arguments, which are essential to many areas of mathematics. But an inductive argument can be tricky. There is always the danger that the argument is circular or has some other defects. Be very careful writing them and reading them: be sure the base case is handled properly and that you understand what is being “inducted” on. Some proofs, especially advanced ones, may use a complex measure that is being used to do the induction on. Be sure to follow these carefully.

Open Problems

If you are writing or reading a proof look for the above examples to help you, and understand the parts that make the proof hard. Avoid taking proof “taxis” if you really want to understand a proof.

An open problem: classic proof theory measures complexity by size: longer is harder. But this seems to be a bit naive. Perhaps there is a more reasonable measure of proof complexity. The same applies to much of complexity theory in general: a bigger circuit is more complex, a longer computation is more complex, and so on. Is this the best we can do?

1. May 6, 2011 9:15 am

I feel like your analogy about topological sorting is probably the best intuition about how we can measure proof complexity in general. Consider a proof as a graph where the the lemmas/theorems are vertices and the references between them are edges. We could refine this even further by considering a directed dependency graph between the actual arguments, rather than just the theorems, however this would be harder to do systematically.

Once we have a proof represented as a graph, we can do some interesting things like considering the traversal represented by the linear arrangement of the proof in text form. How many vertices does the linear arrangement traverse? If we minimize the number of traversed vertices (basically, using a topological sort) then the amount of mental context-switching is reduced (since long stretches of the proof will all be interrelated, and thus the mental “locality of reference” is better). In principle, reducing mental context-switching should make the proof easier to grok, though that’s obviously just a hunch.

2. May 6, 2011 9:49 am

To your “open problem” point: if we could find a “semantic” way to assess the complexity of a proof separate from its “size”, I think we would be most of the way to solving the major complexity class inclusion problems.

3. May 6, 2011 10:29 am

I kept having this problem that I wanted to go top-down or bottom. I tended to see the proof as a tree structure, branching as we get deeper into the details. No doubt, that’s a programmers way of looking at it, but does that make it harder for a mathematician to understand what I am trying to say? How does one unroll that perception?

Paul.

May 6, 2011 11:26 am

This was one of the, pehaps not best, but at least important posts here. Messy and unnecessarily complex proofs are the bane of mathematics. A city planner that creates a city that is unaccessible and confusing has failed in the same way that a mathematician that creates a proof that is unaccessible and confusing has. Thanks for the post.

• May 6, 2011 2:35 pm

Please let me second Peter Floderus’ opinion that this is a good post on a difficult-but-central topic. One topic that might be mentioned is the crucial importance of well-crafted definitions. Michael Spivak’s well-regarded textbook Calculus on Manifolds (1965) offers this meditation on the role of definitions:

“There are good reasons why theorems should all be easy and the definitions hard … Definitions serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs … Stokes’ Theorem shares three important attributes with many fully evolved major theorems: (1) It is trivial. (2) It is trivial because the terms appearing in it have been properly defined. (3) It has significant consequences.”

The proof that Spivak finally presents (on page 102) of Stokes’ Theorem is short in length (two pages) and it is logically trivial (a reasonably obvious sequence of elementary manipulations) … and yet it has taken Spivak 101 prior pages to define the elements of the proof, and motivate the geometric logic of the proof.

So what might be the computational complexity of Spivak’s proof? Obviously we need … better definitions! 🙂

May 7, 2011 7:16 am

I’ve tried to figure out why this is (that the work should be in the definitions rather than the theorems). I think it’s because we hope to have the definitions done, and to now produce lots of theorems, so we want them to be easy.

Here’s a matter of taste: should one define “An X is something that has A, B, and C” and a theorem that says “Actually you only need to check A”, or say “An X is something that has A” and a theorem that says “Xs also have B and C”?

I prefer the first, but I can’t articulate why. I remember seeing the definition “A rectangle is a parallelogram with a right angle” and thinking I much preferred “is a parallelogram with all right angles”.

• May 7, 2011 2:59 pm

Allen, these are deep issues indeed, relating to the interlocking roles of definitions, theorems, and proofs!

For example, Gauss is justly famed for his Theorema Egregium of 1827, which first proved that geometric curvature was intrinsic. Similarly, Riemann is justly famed for his extension of the emph{Theorema Egregium} to manifolds of arbitrary dimension. And yet the geometric definitions essential to the Theorema Egregium clearly state too (decades earlier) in Bowditch’s New American Practical Navigator of 1807 [etext, page 100]. Throughout the 19th century, Bowditch—as Bowditch’s text was eponymously known both then and now—contained the most accurate and carefully validated lunar tables in the world; it is very likely that Gauss (as a surveyor) and Riemann (as Gauss’ student) both knew Bowditch well.

We see that our modern understanding of geometry came to us via an archtypal trajectory: [1] the practical applications, physical intuitions, and essential definitions came first (Bowditch), [2] then come the creative abstraction of fundamental theorems (Gauss), [3] and finally came the refinement of the definitions, the extension to deeper theorems, and the enormous simplification of proof technologies (Riemann and his innumerable modern successors).

Only after this slow succession was complete could pedagogic geniuses (like Spivak) write their great mathematical texts. And so it is not surprising that throughout mathematical history, great texts have generally focused most of their attention not on theorems, and not on proofs, but rather on stating good definitions. This is because (from the Spivakian point-of-view) theorems and proofs are not ends in themselves, but rather, serve mainly to motivate the good definitions that are the true central concern of mathematics.

It follows that students of mathematics, science, and engineering should heed Spivak’s highest-rated Amazon reviewer:

Spivak knows you learned calculus the wrong way and he devotes the first three chapters to setting things right. Along the way he clears all the confusion …

This reviewer’s advice should lead every student to wonder (complexity theory students in particular): “Am I learning my subject the right way? That is, are my textbooks teaching me good definitions? Meaning, definitions that make rigorous-yet-useful theorems trivial to prove?”

With regard to algebra, geometry, and dynamics, at the undergraduate level the Spivakian answer is “Definitely not!” Fortunately, good texts covering algebra, geometry, and dynamics are available beginning at the first-year graduate level (Spivak’s text being one of them). And needless to say, these texts begin by teaching students better definitions, and it is is notable that most of the theorems that students learn at the graduate level are trivial in Spivak’s wholesome sense.

With regard to complexity theory and quantum dynamics (and their beautiful progeny, quantum complexity theory) the news is less cheerful. We have not found obviously good definitions—as we know because useful theorems are not trivial to prove—and so perhaps it is simply impossible (at present) to write good texts about complexity theory and quantum dynamics. And this would explain why students find these topics so baffling!

Therefore, if we could ask just one question of 22nd century complexity theorists and quantum physicists, perhaps we should not ask “What theorems have been proved?” or even “What proof technologies are popular?” but rather we should ask the Spvakian question “What fundamental definitions are associated to the complexity class ${P}$ and to the state-space of quantum dynamics?”

We should all hope that these fundamental definitions will not have their present simple forms (which perhaps are too simple?) but rather these definitions will be incrementally refined throughout the coming decades of the 21st century, to the point that the theorems and proofs associated to problems that today are perceived as both fundamental and exceedingly difficult (like the separation of complexity classes and the simulation of quantum dynamical processes) will become “trivial” in the good Spivakian sense … definitions that are so well-crafted as to carry us most of the way to the understanding we seek.

More broadly, we can hope that future STEM students will grasp that key theorems mostly are trivial and key proofs mostly are simple … that the creativity and vitality of the STEM enterprise thus resides largely in its definitions … and thereby these future students will understand much more than we do now, and be inspired and emboldened to create the global enterprises that a planet of ${10^{10}}$ people so urgently requires.

The preceding amounts to a definition-centric, broadly Spivakian roadmap for the 21st century STEM enterprise, and yet (obviously) it is neither feasible nor desirable that everyone embrace Spivak-style roadmaps. After all, as Mark Twain’s Puddn’head Wilson said:

It were not best that we should all think alike; it is difference of opinion that makes horse-races.

It is these diverse mathematical perspectives that (for me) makes weblogs like Gödel’s Lost Letter and P=NP so thought-provokingly enjoyable.

• May 8, 2011 12:04 pm

Wow, that’s a great comment. Thanks for that.

• May 8, 2011 9:53 pm

Oh boy … a fan post! 🙂

Delta, here is another terrific quote (IMHO) relating to definitions, proofs and theorems, from the preface to John Lee’s Introduction to Smooth Manifolds (a text that is known to graduate students as “Smooth Introduction to Manifolds”):

Over the past few centuries, mathematicians have developed a wondrous collection of conceptual machines designed to enable us to peer ever more deeply into the invisible world of geometry in higher dimensions. Once their operation is mastered, these powerful machines enable us to think geometrically about the ${6}$-dimensional zero set of a polynomial in four complex variables, or the ${10}$-dimensional manifold of ${5\times5}$ orthogonal matrices, as easily as we think about the familiar ${2}$-dimensional sphere in ${R^3}$.

The price we pay for this power, however, is that the machines are built out of layer upon layer of abstract structure. Starting with the familiar raw materials of Euclidean spaces, linear algebra, and multivariable calculus, one must progress through topological spaces, smooth atlases, tangent bundles, cotangent bundles, immersed and embedded submanifolds, tensors, Riemannian metrics, differential forms, vector fields, flows, foliations, Lie derivatives, Lie groups, Lie algebras, and more, just to get to the point where one can even think about studying specialized applications of manifold theory such as gauge theory or symplectic topology.

The reader’s main job […] is to absorb all the deﬁnitions and learn to think about familiar objects in new ways. It is the bane of this subject that there are so many definitions that must be piled on top of one another before anything interesting can be said, much less proved.

All of this seems very remote from the concerns of practicing engineers, until one appreciates that the very example that Lee cites as mathematically arcane—“the ${6}$-dimensional zero set of a polynomial in four complex variables”—is precisely the state-space of two interacting (unentangled) spin-${1/2}$ particles, which is a topic of immense theoretical interest and practical importance.

So it is natural to ask, “How would classic texts like Slichter’s Principles of Magnetic Resonance and Nielsen and Chuang’s Quantum Computation and Quantum Information read, if their ideas were expressed using Lee’s “smooth” mathematical machinery?” And it is a trivial exercise (in Spivak’s exacting sense of the word) to prepare such a translation.

In the same spirit, with regard to complexity theory, what I would most like to read is a definition-centric book in the same “smooth” spirit as John Lee’s, nominally titled Smooth Introduction to Computational Complexity, that similarly would facilitate a Spivak-style translation of (say) Oded Goldreich’s (terrific!) textbook P, NP, and NP-completeness, into terms that would render the proofs of its complexity-theoretic theorems similarly “trivial” to the theorems and proofs in Lee’s book. And it seems to me that the complexity-theoretic definitions of P, NP, and NP-completeness might well require some adjustments and deepening, for this goal to be achievable (certainly adjustments and deepening were required of algebra, geometry and dynamics, so why not complexity theory too?)

The desire for Spivak-style definitions and Lee-style “smooth” expositions is, I think, not confined to complexity theory and/or quantum information theory, but rather is widespread throughout the STEM community nowadays. As Scott Aaronson recently posted, “My real hope is that we’ll learn something new someday that changes the entire terms of the debate”.

What Michael Spivak’s and John Lee’s texts both teach us—and the history of math and science affirms—is that the process of learning “something new that changes the entire terms of the debate” very often begins with learning (and accepting) new definitions … the theorems and proofs come afterwards.

5. May 6, 2011 2:26 pm

The Tao summary is really a marvelous piece of writing.

May 7, 2011 11:29 am

I’d very much like to point out that Tao won the Levi Conant prize for mathematical writing (in the Notices of the AMS).

6. May 6, 2011 4:12 pm

Some cities have radial-circle planning, for example, Moscow. Is it crazy structure? Perhaps, not. The radial-circle structure may be useful when you have many independent modules and one central module that linked with others.

7. May 6, 2011 4:44 pm

Thank you, Dick, for this guide to navigating and planning proofs. Hopefully it will make us all better proof-writers.

I think the part about having a good guide-book has an important counter-point, which I encountered today (among many other times): a relatively simple proof should often be presented with little introduction, just as we need no guidebook for merely walking down the street. While writing today, I realized I was trying explain the subtleties of a proof in advance, while simply handling them on arrival is much more clear.

Imagine if, walking down the sidewalk, you encountered a sign declaring: “Caution: this sidewalk will veer slightly to the right, almost two-thirds into a minor incline, but not before you pass the largest pebble on the route.” There’s a level where advanced warning is just confusing.

8. May 8, 2011 12:10 am

I like your post as usual, particularly

“I think when we read a proof it is like trying to navigate around a city. The layout is critical if we are to be successful in understanding the proof. Also we need to actively interactive with the proof. If you just sit back in a “taxi” and are driven around the proof, say at a talk or lecture, then you are much less likely to really understand the proof. ”

It is indeed true that if the proof structure is look like a “rectangular grid city” it would be enough to navigate the roads of the city by a two tuple (x,y) where x is the street number and y is the direction (north, south, east, west). But there is another way of proof structure (particularly in algorithmic proofs) which is independent from the structural complexity of the map of city (represent road map of the proof). We embed our navigational path (search structure) onto the streets of the existing city (grid, diagonal, triangular etc.) and find our way from, say point A to point B by only using the navigational path. For example the navigational path chosen in the new proof of the four color theorem is a “spiral path” within the maximal planar graphs (the cities). In this way we would not go from one place to the other by using the shortest path (easiest proof structure) but always select the spiral segment from point A and point B (longer but safe and guaranteed path). Furthermore by using spiral path we can only investigate small number of cases that might create trouble to our proof. Further views have been given in:

1. VISUALIZATION OF THE FOUR COLOR THEOREM