# Math and Theory Proof Methods: A Comparison

*A comparison of methods used in mathematics and theory for solving problems *

John Rhodes is a world expert on semigroups and related mathematical objects. He is the “Rhodes” in the famous *Krohn-Rhodes Theorem*. Rhodes recently has worked on the P=NP question, and even held a conference at Berkeley on his approach—it is based on ideas related to semigroup theory.

Today I want to talk about methods mathematicians use to attack hard problems. Some of these methods are used often by theorists too, while others do not seem to be used as much as they could be.

One of the powerful tools that mathematicians use is related to early the work of Rhodes. But first I want to relate two stories about him. In the early 1960’s Rhodes was a graduate student at MIT and was working with another graduate student at Harvard, Kenneth Krohn. Together they discovered a theorem that became the Krohn-Rhodes Theorem. There is a long story here, but for now let me just say that they got Ph.D.’s for the identical work in 1962. Identical. The theses were the same—a pretty rare event.

When I was at Berkeley in the late 1970’s I was walking to lunch one day with Dick Karp. As we crossed campus Dick pointed out someone in the distance, and said, “do you know who he is?” I answered that I did not recognize him. Dick added in a low voice,

That is Rhodes; he is in real estate.

I have never heard before or since Karp say something with such awe. I believe that Dick respected Rhodes as a mathematician, but he was filled with wonder at Rhodes the heavily leveraged real estate investor.

** Approaches Used By Mathematics and Theory **

There are a number of approaches to problems that are used by both mathematicians and by theorists.

**Restrict The Solutions:** In mathematics often if a problem is too difficult to solve, then progress can be made by restricting the solution space. Thus, rather than looking for the general solution to an equation, sometimes only linear solutions are considered. Another example is to consider only highly symmetric solutions.

In circuit complexity theory this is a common method too. Proving general lower bounds is so far a dream for any “natural” problem. Progress can be made, however, by restricting the class of circuits—boolean or arithmetic. For instance we have lower bounds when the circuits are restricted to be:

- Monotone
- Constant depth
- Non-commutative

Even with these restrictions the results can be quite deep and difficult, but at least we can prove something.

For monotone the “we” refers to Alexander Razborov whose famous paper proves that the clique problem cannot be solved efficiently by any polynomial size monotone circuit. This does not rule out that P=NP. It does give an important insight into the structure of this famous NP-complete problem.

For a relatively recent paper on constant depth arithmetic circuits take a look at the paper of Maurice Jansen and Ken Regan. They use a discrete uncertainty principle to prove their lower bound.

And for the work on non-commutative see the paper of Pavel Hrubĕs, Avi Wigderson, and Amir Yehudayoff.

**Generalize The Problem:** In mathematics generalizing a problem can often make it more tractable. This sounds paradoxical, but it is not. By generalizing the problem we are asking a different question, and that question may be quite a bit easier to solve. The hope, of course, is that the generalized problem will yield some insight into the original problem.

A simple example from mathematics is the famous problem, also called the Collatz conjecture after Lothar Collatz. It has been open over 70 years. Various generalizations have been studied and some of these solved, but the original problem stands open. Paul Erdős is quoted as saying:

Mathematics is not yet ready for such problems.

In computational complexity we have used this method too. One example is the use of oracles. The famous results of Theodore Baker, John Gill, Robert Solovay showing that P=NP can be true or false relative to different oracles, certainly give an important insight. As you know I am not a fan of oracle results, but these results are important. Of course the real question is when we generalize a problem, how much does the generalization help us understand the original question.

** Approaches Unique To Mathematics? **

There are a number of approaches to problems that are used often by mathematics. But they are not used by complexity theorists—at least I do not think as often as they should be.

**Classification:** In mathematics a powerful method is to try to build a theory that classifies all objects of some type. The most famous of such a program is probably the classification of all finite simple groups. This tremendous result allows many problems to be solved that would other wise be impossible. An example is the famous conjecture of the number of generators of a finite simple group.

Another classification type theorem is the Krohn-Rhodes Theorem. It states that any finite automaton can be decomposed into “simple” automata. The hope at the time was that this decomposition theorem would help resolve many questions. I believe that this expectation at best was way too optimistic.

It is unclear if the classification method makes sense for complexity theory. There are many other classification theorems in mathematics that are very powerful, but there are many areas where such theorems seem to be hopeless. For example, I think trying to classify the models of set theory is completely impossible. Not just impossible—completely impossible.

However, it would be neat if there were useful classification theorems in complexity theory. What if we could prove a theorem like:

Theorem:If the boolean function is computable in linear time by a Turing Machine, then can be decomposed into

I have no idea what this would look like, but it is interesting to think about it. One issue, however, is that every computable function has a “padded” version that is linear-time computable, and it is not clear what decomposable structure would apply to but not .

There are some examples of classification theorems in complexity that have been quite useful. In the study of SAT there are the pretty results of Thomas Schaefer which are called his Dichotomy Theorem. Roughly his theorem does classify the type of SAT problems that are easy and the types that are NP-complete.

Les Valiant and Jin-Yi Cai have proved some quite beautiful results on the boundary between counting problems that are either in P or are #P complete. Jin-Yi has several papers based on ideas of Valiant, but they extend that work greatly. The work of Cai is also joint with numerous others: including Xi Chen, Sangxia Huang, Michael Kowalczyk, Pinyan Lu, and Mingji Xia.

**Make the Objects Really Big:** One tool to better understand a problem that mathematicians use and we rarely seem to is: replace “finite” by various notions of “infinite.” The study of finite groups is extremely difficult: one reason is that the interplay between small primes and each other can be very complex and subtle. Studying instead infinite groups is not easy, but there are some cases where the theory does become much easier. For example, studying Lie groups is not easy, but it does avoid much of the “accidental” number issues that arise in finite groups. One way to see this is their classification occurred first and is simpler than the classification of finite simple groups.

**Continuous:** Often replacing finite objects by continuous ones can yield a very powerful method. The entire field of analytic number theory is based on the ability of continuous functions to be used to solve discrete problems. Johann Dirichlet’s famous Theorem on primes in arithmetic progressions was one of the first of these great results:

Theorem:If and are relatively prime, then the sequence

contains an infinite number of primes.

The original proof of this used the structure of the following functions over the complex numbers where is a Dirichlet character:

Dirichlet had to prove that they were not zero at .

In complexity theory we do sometimes use continuous methods, but perhaps not as much as we could.

**Model Theoretic:** One of my favorite methods that has been used very successfully in mathematics is model theoretic methods. My personal favorite is the James Ax and Alexander Grothendieck Theorem:

Theorem:Let be a polynomial map from to . If is injective, then it is also surjective.

The miracle of the proof is that the theorem is obvious for any map from a finite set to itself: if it is one-to-one, then it is onto. The great thing about model theory is that essentially the same argument works for the very non-finite set .

I wonder if we could use model theoretic methods to solve some of our major problems.

** Open Problems **

Are there other methods that mathematics uses that we could try to exploit? Also any suggestions to add to the ones that I already have listed here?

I just have one thing to say Professor. Your enthusiasm for the subject is contagious!

Thanks for your wonderful blog posts. They really help a lot.

A mathematical technique much-used by geometers and engineers alike might be called “When pushing fails, try pulling.”

Two arxiv preprints that nicely illustrate push-versus-pull are Ed Witten’s recent “pushforward” preprint, titled

“A New Look At The Path Integral Of Quantum Mechanics”(arXiv:1009.6032), which nicely contrasts with the now-classic 1997 “pullback” preprint by Abhay Ashtekar and Troy Schilling, titled“Geometrical Formulation of Quantum Mechanics”(arXiv:gr-qc/9706069).In essence, Witten’s manuscript focusses on classical dynamics that he pushes forward onto quantum state-spaces, while Ashtekar and Schilling’s article focusses quantum dynamics that (in essence) they pull back onto classical state-spaces.

For engineers whose central concern is the design and validation of practical dynamical simulations—both classical and quantum—it is helpful to be familiar both with Witten-style pushforward methods and Ashtekar-Schilling-style pullback methods … and indeed, from a mathematical and practical engineering point-of-view, these two once-distinct dynamical schools are effectively merging nowadays.

One wonders whether this emerging push-pull unification might eventually extend to even to Morse-style

symbolicdynamics … and that is why I have long been hoping for a post by the weblog cabal of Lipton/ Tao/ Gasarch/ Fortnow/ Gowers/ Kalai (etc.) on this topic! 🙂This comment may be out of place, but I’ve always though that mathematical problem solving and computer programming share a lot in common. Both disciplines work on various Formal Systems, where they wield symbolic elements in order to build up larger works. In mathematics, a proof is validated by its acceptance in the community, while in programming it is a computer that says if the syntax is runnable. Proofs, even if they are correct can be as convoluted as any spaghetti code. And bugs in programming map back to either failed interpretations of the mathematics or intrinsic paradoxes. Both disciplines use varying degrees of encapsulation.

In that sense mathematicians can (and do) use the same sorts of tools that programmers use. Some are constructive, while some are exploratory. For example, symbolic and numerical computation engines, or automated proving languages are cousins to programmers writing prototypes or experimental systems in order to work out the different behaviors of underlying technologies or libraries. Reverse engineering probably has some analogous counter-part too. In both cases, we’re attempting to extend the underlying Formal System based on its primitives and existing works. In that process we need to clarify our internal models just enough to push the limits of what is already there.

(In my defense, it is Friday and there we no comments 🙂 )

Paul.

More than most mathematicians think, if you trust Doron Zeilberger.

Would anyone deny that he is a “true mathematician”? 😉

I believe any language recognized by a linear time (single tape) Turing machine is regular. Here’s a proof sketch: http://naml.us/~irving/misc/theory.xml#turing. This seems to contradict your statement about padding any language to get a linear time computable language, so I’m curious if this implies an error in my proof. Thanks!

For single-tape Turing machines you can make an even slightly stronger statement, which is that any language decidable in time small-o(n log n) must be regular. However, this does not carry over to multi-tape TMs.

I think you’re right for a single tape – and of course padding will not change anything in that case, the language will stay regular. For more tapes it gets more interesting.

In fact, it seems that by adding more tapes, the power of the model keeps increasing. So you could take the union of k-tape linear-time machines for all k.

I took a course with John Rhodes is 1984 where he promised to use semi-group theory to solve P vs NP by the end of the semester. He didn’t. I don’t even remember whether groups are Christian or Communist, never mind the statement of the Krohn-Rhodes theorem. He did teach us something completely memorable: “The problem with middle age is that you’re attracted to your wife’s daughters”. Woody Allen should have taken his class.

But the class wasn’t a complete waste for me: I met Roman Smolensky, and was instrumental in getting him to do theory. I told Steven Rudich that there was a really smart guy in the class who had solved one of Rhodes’ open problems over the weekend. Steven asked for a description, but all I could muster was that he had a beard and an accent. Steven told me a few days later that he’d met the guy I was talking about. I didn’t think I had given enough information to identify him, but Steven said, “Well, you said he was REALLY smart, so it has to be the same guy. I got him to start thinking about (what we used to call randomness extractors…)”. Roman was officially Rhodes’ student, mainly because once he found someone willing to be his adviser, Roman didn’t bother trying to look for another.

Thanks for the excuse to reminisce, and my apologies for reminiscing in public.

Russell

Thanks for sharing.

You needn’t apologize for such stories. Some of us (students especially) love this kind of stuff. Thanks for sharing.

Classification is probably a misnomer for what you really mean since it is hard to imagine that complexity classes are not set up precisely for classification! What you seem to want are structure theorems. Structure theorems are a more significant project in areas of complexity than you seem to account. Beyond Schaefer there is the work of Bulatov from FOCS 2002 and following his work there has been substantial progress towards strong dichotomy theorems for all CSPs. (This might be a nice subject for a future post.)

The Krohn-Rhodes Theorem is a structure theorem much more like the classification of finite Abelian groups than the classification of simple groups. About 20 years ago, work of Barrington, Straubing, and Therien introduced many of the aspects of this classification to the study of branching programs and low level circuit complexity.

Paul,

Your terminology is probably better. Yes the more structure we know about our main objects the better. Thanks for the comment.

If Paul Beame and/or Dick Lipton and/or Ken Regan—or any other expert—were to write a few remarks upon the general topic of complexity classes (which apparently can be ramified without limit?) in relation to what Paul calls structure theorems (which apparently are ever-… uhhh … deepening?) then I for one would be very interested and grateful!

John,

will do

dick

It seems to me that the Robertson-Seymour program is a good example of classification that has impact on algorithms directly. Especially when coupled with results like Courcelle’s theorem, and Grohe’s results.

Also, the use of the continuous is definitely prevalent in theoryCS. Witness Mulmuley’s proof that P != weak NC, which essentially describes the space of computations for max flow geometrically. Another example is the algebraic decision tree lower bounds that use topological descriptors of the space of possible solutions to show lower bounds.

p.s while this is not about mathematical methods per se, there’s an interesting list of mathematical IDEAS that have been used somewhat surprisingly in theoryCS: http://cstheory.stackexchange.com/questions/1920/examples-of-unrelated-mathematics-playing-a-fundamental-role-in-tcs

Thank you very much for this post. I think it helped me better understand how CS theory feels different from other kinds of mathematics that I’ve studied. Also, forgive me if I’m speaking out of ignorance here, but I had two thoughts.

In most mathematical “classification theories” it seems like there are a sequence of theorems leading up to the final classification to perform two services. First, the objects of study should be reduced to some kind of normal form. For instance, when classifying compact two-manifolds, the first step is usually to show that it suffices to consider piecewise linear manifolds. Second, one would like to show some kind of algebraic structure on the space of all objects being classified. For instance, direct sums or products of groups, connected sum of topological spaces, etc. I don’t know if this is the same idea as what Paul meant by structure theorems. However, it seems like an interesting idea to try and ask a question like “what are the prime problems?” Perhaps problems that are complete for complexity classes (e.g. SAT) play this role.

Second thought. Would you include the Blum-Shub-Smale model as an example of replacing your objects with continuous ones or not? That was the first thing that popped into my mind.

I liked Gilbert Bernstein’s post very much—and I will mention also, that his blog

Rotation Notationlooks very interesting—and so I will try to ask some similar questions in a similar vein, with a view toward better understanding (perhaps) of one of Dick Lipton’s favorite questions: “Why are practical instantiations of NP-hard problems so commonly easy to solve?”We have the intuition that problems in the class NP-hard commonly are generated by large-dimension dynamical processes; for example, by movie preferences evolving in a large cognitive space; protein configurations evolving in a large configuration space; quantum states evolving in a large Hilbert space; optimization problems evolving in a large constraint space.

Very often these dynamical processes are noisy (or commonly, we realistically can

pretendthat the dynamics is noisy). The noisy generation dynamics then imposes structure on the generated problem, such that the generated problem is (unexpectedly yet generically) easy to solve.For constraint systems, quantum systems, and biological systems we have at least the beginning of a satisfactory mathematical understanding of how the generating dynamics generically generates easy problems (respectively, by smoothing, by Lindbladian compression, and by evolutionary selection).

Is there some over-arching level of complexity-theoretic abstraction that captures this insight, that we might apply even to the discrete problems that arise in (say) graph theory?

There is a 1938 article by Marston Morse and Gustav Hedlund, titled

Symbolic Dynamics, that apparently sets off in this direction, in which many of the themes of the Krohn-Rhodes Theorem and the Schaefer Dichotomy Theorem appear.One motivation to ask this question is the (vague) idea is that someday folks will bring order to the Complexity Zoo, by classifying algorithms in terms of the generating dynamics associated to the problems that they solve.

In practice, this structural approach to classification already is commonly used. After all, scarcely any algorithms in an engineer’s armamentarium are effective against worst-case instances. Fortunately, in practice we understand enough about the generating dynamics associated to our problems, to be confident that worst-case instances will not arise. E.g., we know that our constraints are imprecise, our quantum dynamics are noisy, and our proteins are selected to fold efficiently in a thermal environment … all of which makes our jobs (as simulationist engineers anyway) immensely easier.

It would be great to formalize this notion of structural classification … and in particular, to translate as many of these dynamical insights as feasible, to the domain of discrete state-spaces (including graph theory especially).

This I take to be a main theme of the Krohn-Rhodes and Schaefer Dichotomy Theorems. Upon which, I hope that someone will write more sensibly than I have! 🙂

Just curious, but any reason why Paul Erdős would say “Mathematics is not yet ready for such problems” specifically about the Collatz conjecture ?

tef,

I do not really know. My best guess is that Collatz conjecture is about the dynamic properties of a map. Math tools are weaker for dynamic actions than for static ones. THis of course is an over statement, but has some validity. Perhaps. Perhaps that is what Erdos was thinking.

Richard,

An (interesting, at least for me) example of application of ideas from model theory in TCS may be found in the development of o-minimal hybrid systems. See, e.g., the paper by Gerardo Lafferriere, George J. Pappas and Shankar Sastry at

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.8808&rep=rep1&type=pdf,

which I believe started this whole industry, and the good set of slides at

http://www.newton.ac.uk/programmes/LAA/seminars/041811001.pdf.

For the notion of o-minimal theory see http://en.wikipedia.org/wiki/O-minimal_theory. The book

van den Dries, Lou (1998). Tame Topology and o-minimal Structures. Cambridge University Press

is a truly excellent introduction to the theory of o-minimality. See

for a review of that book.

Hybrid systems consist of finite state machines interacting with differential equations. As shown in the paper I link above, o-minimal hybrid systems always admit finite bisimulations, leading to decidability results for fundamental verification problems.

Perhaps this is of interest to some of your readers and you.

Luca, at least one reader found your references to be of great interest, for which this comment is an appreciation and thanks.

The quantum spin microscopes in our QSE Lab comprise, in essence, what your references call “hybrid systems”, in which Nature provides the continuous classical/quantum dynamics, which we engineers then hybridize with finite-state machines that observe and control the dynamics.

Partly for fun, but also as a serious validation tool, we routinely interface our spin microscopes to voice synthesizers; thus during experiments a real-time audio voice-0ver describes the evolving state of what your references call a “bisimulation.” For this reason, the concept of real-world quantum technologies having a natural “bisimulation” structure is very real to us.

Bisimulation structures arise with similar naturality in our numerical quantum simulations, in which it is both feasible and computationally efficient to unravel noise processes as observation and control processes; thus orthodox Lindbladian quantum dynamics associates naturally and generally to a bisimulation structure … this definitely was a new idea (to me).

As we engineers become ever-more-concerned to formally validate and verify what both our practical experiments and our numerical simulations demonstrate to us—in particular that noise in classical/quantum dynamical systems induces structures that generically are feasible to simulate and control—the structure-inducing constraints that are associated to o-minimal hybrid systems appear (on first reading) to capture a pretty large subset of the notions of mathematical structure and instantiation that arise naturally in practical quantum systems engineering.

Therefore, we surely will take a look at your book

Reactive Systems: Modelling, Specification and Verification… and please accept our thanks and appreciation, Luca, for your well-conceived remarks and references.Luca, I completely disagree with your assessment of the contribution of model theory to the study of hybrid systems, which I think is less than (o)-minimal. The actual decidability result is due to

G. Lafferriere, G. Pappas, S. Yovine. Symbolic reachability computation of families of linear vector fields. Journal of Symbolic Computation, 32(3):231-253, September 2001. Academic Press

and it handles a very restricted class of hybrid automata with restricted linear dynamics and resets after every transition (if I remember correctly). This is a nice result based on some linear algebra tricks and all the following o-minimal stuff is just a tautological blanket put around the thing (if trajectories and reachable sets are well-bhehaving one can compute them – but there is no clue how, given a hybrid automaton to know whether its trajectories are o-minimal). Nothing interesting (except for some papers in the same spirit) followed from that.

–Oded

Well written post, good researched and useful for me in the future.I am so happy you took the time and effort to make this. Best of luck

Over on Scott Aaronson’s fine weblog

Shtetl Optimized, Scott and Tim Gowers are posting on a topic“The converse of smoothed analysis”that (AFAICT) is dual to the structure-related points that Paul Beame and Luca Aceto have raised above.I’ve done my best to bridge the two sides of this duality, and would greatly appreciate further posts on this topic. In particular, the idea of a

“bisimulation”that Luca Aceto’s post raised, seems to me to be very powerful and general; certainly there is an already-large (and rapidly growing) literature on this subject that I had not appreciated.Helpful (to me) has been a recent article by Shun-ichi Azuma and George Pappas, titled

“Discrete Abstraction of Stochastic Nonlinear Systems: a Bisimulation Function Approach”that provides concrete examples of how these ideas work.Further remarks (from anyone) on the various roles in complexity theory of structure theorems in general, and bisimulation structures in particular, would be very welcome.

Talking about the Krohn-Rhodes theorem, I recently finished a rewriting part of my thesis (20 years ago) which gave a more comprehensible (for me at least) proof. Can be found at

Interesting. Reminds me of another problem solving method I heard in Brad Osgood’s lectures on Fourier series (available from Stanford’s website or academicearth.org).

“Make your worst enemy into your best friend.”

Essentially defining away the problem, or defining what you want to get to and seeing what gets you there.

Maybe that doesn’t make sense; it will if you watch it. Lecture 22 or so.

OMG I discovered your blog a couple of days ago and I just love it! Alongside Aaronson’s it’s my favourite blog on this here interwebs!!!

Related to Russell’s comment, one of my most vivid memories of Rhodes was during a lecture he gave reading out of Johnstone’s Topos Theory book. A young proper preppy kid, fresh out of Yale, asked him, “Professor Rhodes, how is that book?”

Rhodes response….”You ever done Acid? This book is like Acid”

I found Rhodes to be very inspiring, if not all over the map, as it were.

The reason I bring up this anecdote is not completely apropos; in the mid to late 80s Rhodes became interested in Category Theory, partially due to the fact that one of his ex-students, Bret Tilson, wrote this nice little paper called Categories as Algebra. He basically argued for treating the kernel of a semigroup homomorphism as a category.

Lipton mentioned “Model approaches”. I’d wager that the mother of all “model approaches” to proving theorems is in fact Topos Theory, which has its own internal language and (intuitinisitic) logic in which conjunctive, disjunctive, negation, existential and universal logical operands are spawned naturally by the structure of the topos in question.

Lawvere proved in one fell swoop that Godel 2nd Incompleteness, Russell’s Paradox and Cantor diagonal are all different manifestations of the same “thing” using Category/Elementary Topos theory. He refers to this as a “Fixed Point” concept.

I’m suspect there is another proof of Godel using Topos Theory and using another kind of “fixed pointedness”. but that’s neither here nor there…..