How important is it to take sides in complexity?

Barry Mazur has contributed to many areas of mathematics for many decades. His paper “Modular curves and the Eisenstein ideal” and related work furnished key ideas for Andrew Wiles’ ultimately-successful strategy on Fermat’s Last Theorem.

Today Ken and I want to discuss how well we guess.

Well not us—we guess that we two guess pretty poorly. The “we” refers to the theory community, or more generally the mathematics community.

Among many of us currently guessing how the strife in Congress to reach a deal on the US budget crisis will play out is Krystal Ball, who sometimes does analysis for Fox News. That’s right—a great name to humble experts who are really paid for educated guesses.

In 1994, Mazur won a Chauvenet Prize for Mathematical Exposition from the American Mathematical Society for his article “Number Theory as Gadfly” about the Shimura-Taniyama-Weil conjecture which is related to Fermat. His opening paragraph has these cautionary words for guessers:

“Number Theory…produces, without effort, innumerable problems which have a sweet, innocent air about them, tempting flowers; and yet…swarms with bugs, waiting to bite the tempted flower-lovers who, once bitten, are inspired to excesses of effort!”

These are words to take especially carefully from someone who even has a swindle named after him. Hence we offer a post on just how wary one should or should not be.

Two Old Examples

An old joke is that ${3,5,7,9,11,\dots}$ are all primes or prime powers. The next number, 13, confirms it, right?

The number ${12,345,678,987,654,321}$, the digits going up and then down, looks like requiring a complicated factorization. For it to be prime or have simple factors might seem too much a coincidence. But in fact:

$\displaystyle 111,111,111 \times 111,111,111 = 12,345,678,987,654,321.$

A Rash Conjecture

Barry Mazur in another neat paper points out what he calls a rash conjecture. The Bernoulli numbers are defined by:

$\displaystyle \frac{x}{e^x-1} = 1 -\frac{x}{2} + \sum_{k=1}^{\infty} (-1)^{k+1} B_{2k} \frac{x^{2k}}{2k!}.$

$\displaystyle \begin{array}{rcl} B_2/4 &=& +1/24 \\ B_4/8 &=& -1/240 \\ B_6/12 &=& +1/504 \\ B_8/16 &=& -1/480 \\ B_{10}/20 &=& +1/264 \\ \end{array}$

Named after Jakob Bernoulli, these numbers play roles in many areas of mathematics, including determining whether or not a prime is regular or not.

The obvious guess is that ${B_{2k}/2k}$ is of the form ${\pm 1/m}$ where ${m}$ is a natural number. Unfortunately,

$\displaystyle B_{12} = -691/65520,$

where ${691}$ is a prime. So this nice, but rash, conjecture blows up quickly.

Cyclotomic polynomials are defined by the equation:

$\displaystyle \Phi_n(X) = \prod_{\omega} (X-\omega)$

over all ${n^{th}}$ roots of unity ${\omega}$ that are primitive.

The first ten such polynomials and two others are listed here. A quick look shows that all the nonzero coefficients are ${\pm 1}$:

You could continue this out past ${n = 100}$ and still see coefficients of only ${\pm 1}$. Must be a theorem, right? Alas, a surprise happens at the ${105^{th}}$:

Notice the coefficient of ${-2}$ lurking out there on the ${x^{41}}$ term. The theorem lurking out there is that the coefficients are ${\pm 1}$ when ${n}$ has at most two distinct odd prime factors, while ${105 = 3\cdot 5 \cdot 7}$.

Things are even more surprising. Define ${A(n)}$ as the maximum coefficient, in absolute value, of all cyclotomic polynomials of degree at most ${n}$. Then it is known that

$\displaystyle A(n) \ge n^{\epsilon(n)}$

for any function ${\epsilon(n)}$ that tends to zero as ${n}$ grows.

A Strange Fact

The Fermat quotient is ${\frac{a^{p-1}-1}{p}}$, where ${p}$ is a prime. By his “Little” theorem this is always an integer.

A guess might be that this integer is unlikely to be divisible by ${p}$ again. Wrong. There are primes so that

$\displaystyle 2^{p-1} \equiv 1 \bmod p^2.$

These are called Wieferich primes, and were first found by Arthur Wieferich in 1909.

We have given some other examples of wrong guessing here and here and here, and in posts linked from there.

Sky’s The Limit on Guessing

Ken and I are both reading Brian Greene’s new book The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos. Early on, Greene presents as focal the question of whether the spatial extent of the cosmos is finite or infinite. Which would you guess?

Both possibilities allow for some kind of multiverse, and Greene’s book surveys at least nine—count ’em nine—separate multiverse theories. The current August 2011 issue of Scientific American has some skepticism on most if not all of them.

String-theory skeptic Peter Woit recently blogged about an interview story in the July 2011 issue of Scientific American on Leonard Susskind, whom Woit identifies as “the most prominent promoter of the string-theory multiverse.” We re-post the same long excerpt Woit takes from that interview, but as an example of field-wide “guessing”:

A large fraction of the physics community has abandoned trying to explain our world as unique, as mathematically the only possible world. Right now the multiverse is the only game in town. Not everybody is working on it, but there is no coherent, sharp argument against it.

In 1974 I had an interesting experience about how scientific consensus forms. People were working on the as-yet-untested theory of hadrons called quantum chromodynamics, or QCD. Hadrons are subatomic particles that are believed to be made up of quarks held together by the strong force, and include protons and neutrons. At a physics conference I asked, “You people, I want to know your belief about the probability that QCD is the right theory of hadrons.” I took a poll. Nobody gave it more than 5 percent. Then I asked, “What are you working on?” QCD, QCD, QCD. They were all working on QCD. The consensus was formed, but for some odd reason, people wanted to show their skeptical side. They wanted to be hard-nosed. There’s an element of the same thing around the multiverse idea. A lot of physicists don’t want to simply fess up and say, “Look, we don’t know any other alternative.”

Here is our point: In physics, it takes much theoretical legwork to justify, then design, a proposed experiment bidding for time at a major collider or lab or observatory. Thus it seems one must make guesses and take sides in order to be creative. Is this so in computer science theory? We have previously advocated working on both sides of every hard question.

Open Problems

What about complexity theory? Do we have any rash conjectures—or are we great guessers? Do we need to be better guessers?

Also what about multiverse theories? Do they imply that there is a world where ${\mathsf{P}=\mathsf{NP}}$?

Note that “${\mathsf{P}}$” in a universe with different physical laws can possibly refer to a different class of feasibly computable problems from ours, which would also change the denotation—though not the definition—of “${\mathsf{NP}}$” in terms of it. This article by Scott Aaronson is a start, along with the view here implying that minds in such a universe would evolve to perceive that “${\mathsf{P}}$” rather than ours.

August 1, 2011 3:15 am

Do we need to be better guessers? It could be a matter of life and death, as I learned when I followed your link to Scott Aaronson’s paper, where he says

“There is at least one foolproof way to solve 3SAT in polynomial time: given a formula ϕ, guess a
random assignment x, then kill yourself if x does not satisfy ϕ. Conditioned on looking at anything
at all, you will be looking at a satisfying assignment!”

August 1, 2011 5:07 am

I’d like to point out a typographic problem that surfaces in the 12345678987654321 example, related to TeX. When the number is divided in groups of three digits as above (12,345,6…), there is a thin space after each group that makes it look like a separate number in an enumeration, just as in the 3, 5, 7, 9, 11, … example in the same section.

This is because the comma is used more often in enumerations, thus it has been defined in TeX to carry a thin space after it. There is a simple way to handle the exceptions, however: Just enclose each comma within braces, i.e., write 12{,}345{,}6… in the TeX source, and there will be no space following it. This is because (in math mode) each pair of braces defines its own subformula, which (by default at least) is treated as “other symbol” instead of a separator.

(Note that there is a similar problem regarding the colon, which in TeX automatically is surrounded by spaces. This is wrong when it is used like this: “A: R×R ⟶ R”; in these cases, there should be a space after the colon, but not before. A colon with this property is provided by the \colon command sequence.)

3. August 1, 2011 6:49 am

Creative guessing is an essential starting-point not only in mathematics and science, but in every field of human enterprise, in part because guessing helps us formulate not only good theorems, but good plans. Despite the undoubted fact that guesses often are wrong, even a wrong guess can serve as the starting point of a good plan. And as the American president Dwight Eisenhower notably remarked “I have always found that plans are useless but planning is indispensable.”

For example, quantum simulationists nowadays enjoy the puzzling circumstance that large numerical quantum simulations often perform much better than we have any known reason to expect. We guess that this happy trend will continue … we foresee that understanding will be found at the intersection of complexity theory, computation theory, and quantum information theory … and we formulate our plans on this basis.

Similarly, magnetic resonance researchers (a.k.a. quantum spectroscopists and microscopists) nowadays enjoy the puzzling circumstance that the Shannon channel capacities of our devices have doubled on the average every year for the past 65 years (and STEM employment in magnetic resonance has doubled every seven years in this same period). We guess that this happy trend will continue … we foresee that understanding will be found at the intersection of complexity theory, computation theory, and quantum information theory … and we formulate our plans on this basis.

That is why quantum systems engineers nowadays read with intense interest the literature of complexity theory, computation theory, quantum information theory and even string theory. These disciplines provide the necessary mathematical tools, for formulating the vital hopeful guesses, that are the essential planning elements, for every hopeful future that we engineers can envision.

For this reason, it seems (to me) that critics of string theory and/or complexity theory too often overlook the immense present-day value of both theories as fertile sources of good guesses — guesses that already are immensely valuable in practical engineering.

4. August 1, 2011 11:14 am

Open Problems
“Also what about multiverse theories? Do they imply that there is a world where ?”

More generally, will the theory of everything (i.e., one that successfully unifies gravitation and quantum mechanics), whether it be multi-verse B.S., or String theory, or whatever, receive inspiration from theoretical computer science?

Mission Impossible for String Theorists

P.S. Multiverse theories are not physics. They are bad marketing. Only stuff that has been tested in the lab is physics. Don’t award a prize to a contestant for a job he hasn’t done yet, just because, in the contestant’s opinion, he shows great promise.

• August 3, 2011 5:53 pm

“A team of cosmologists based at University College London (UCL), Imperial College London and the Perimeter Institute for Theoretical Physics” claim to be working on an observational test for multiverse bubble collision points. (?)

http://scienceblog.com/46860/first-observational-test-of-the-%E2%80%98multiverse%E2%80%99/

• August 3, 2011 6:33 pm

LOL. Thanks for pointing that beauty out. Wonderful agreement between the data and the theory. It’s a sure thing. Let’s give them a Nobel prize right now.

August 1, 2011 12:12 pm

I was thinking about a strategy to approach the P=?NP question or hard conjectures like that:

Researchers should look for the simplest lemma L they can think of that proves either result (i.e. P=NP or P/=NP) and then create a conditional proof like “L implies P=NP”, for example. The proof of the lemma, if simple enough, would then be the next pursuit (a “game change”).

Of course, this involves choosing (guessing) one side. But their guess would be based on what they can prove, so the guess would not be that “esoteric”.

I don’t know of results like this for the P=?NP problem (I’m not a theorist), but similar conditional theorems prepared the path for Wiles’ proof of Fermat’s last theorem.

August 7, 2011 4:00 pm

Perhaps we don’t even need to choose either side. If it happened to be undecidable, the first thing to do would be to find an enlargement of ZFC (Zermelo-Fraenkel’s set theory) that made it possible to build a model where P=NP holds, and another one where P!=NP holds. After all, ZFC allowed Gödel to build a model of the continuum hypothesis and Cohen to build a model of its negation.

For example, let’s say that two algorithms are “mathematically equivalent” if they’re “based on the same mathematical idea”. This may sound very imprecise but it makes sense in practice because most efficient algorithms seem to take advantage of some mathematical result. Once defined precisely, the existence of this equivalence relation could be taken as a new axiom for ZFC. Two equivalent algorithms would have roughly the same complexity and we could try to classify the equivalence classes. Then it would make sense to claim that a problem is NP-complete if and only if all its solutions lie in a single class.

This could be true, false, undecidable, but something. Until now we’ve been suffering from the strength of the theory of algorithms that seems to be in competition with that of proof theory. We badly need more axioms…

August 8, 2011 6:48 pm

If PvNP is independent of ZFC, although it’s true you could build a model of ZFC where P=NP and another where P !=NP, one of those models would contain non-standard natural numbers. In other words, it would contain natural numbers X such that you could never get to X by repeatedly adding 1 when starting at 0. These models are called “unsound” and are highly undesirable as a basis for mathematics (as you might guess!).

PvNP is a statement about natural numbers: For every Turing machine T (using a natural number to encode the Turing machine) and exponent e (natural number), there exists a SAT instance S (using a natural number to encode S) that T does not correctly solve in time |S|^e + e. Every statement about natural numbers–not involving sets of natural numbers–has a definite truth value in the standard model of the natural numbers. It’s just a question how strong your axioms have to be to get at that truth. You can think of strengthening axioms as disallowing more non-standard models, and therefore assigning truth values to more statements.

Whereas the natural numbers have a “standard model” (namely, where the natural numbers are exactly the set you get by repeatedly adding 1 when starting at 0), sets do not. Statements like the Continuum Hypothesis, which involve not just natural numbers but also general sets, can have ambiguous truth values because we don’t have a solid conceptual footing of a model for all sets. (Which is related to the discrepancy of there being uncountably many sets, but only countably many sets that can ever be described!)

Finally, also note that we have no shortage of additional axioms going beyond ZFC (which, it’s worth noting, is already an extremely strong theory for questions about the natural numbers). There are many large cardinal axioms that set theorists and logicians have worked on, and they do affect what statements can be proved about the natural numbers. Although I’m not as educated on this topic as I’d like to be, my understanding is these axioms give tremendous increases in proof theoretic strength.

August 9, 2011 7:07 pm

Thanks a lot Luke for this clear and neat explanation.

I agree with you that it’s cool to have a standard model of natural numbers to start with, together with a recursive restatement of our problem in terms of numbers. However I wouldn’t go so far as calling non-standard models “unsound”. First because – as you’ve suggested – they could be someday the key to the PvNP problem, and second because non-standard analysis is a nice topic. I like the idea of “monster elements” that aren’t documented by the axioms. Knuth entitled his novel “Surreal Numbers”, not “Unsound Numbers”! I find it fascinating also that surreal numbers were invented by Conway, the creator of the Game of Life – a rather “non-standard” computing model…

When I spoke of adding more axioms I was thinking of a relative consistency proof but I’m no specialist at all. My initial idea was to classify algorithms according to how they solved the problems instead of computing in how much time they did so. In other words we should try to invent a more subtle invariant than speed because algorithms of same speed can use utterly different methods. For this reason I was dreaming of axioms more qualitative than those for the existence of large cardinals, but who knows?

August 1, 2011 10:47 pm

I think the lower bound on A(n) is quite interesting, especially since there is no similar lower bound for the slightly different quantity B(n), which I define as the absolute maximum coefficient over cyclotomic polynomials of degree exactly n.

• August 1, 2011 10:51 pm

It is tricky that the definition of A(n) said “degree at most n”, making it a non-decreasing step function. Comparing with your B(n) shows that B(n) is a “sawtooth” function with some really long teeth. I wonder how much significance there is to intermediate values of B(n),

August 1, 2011 10:51 pm

In fact, since B(p) = 1 for all primes p, there is only a trivial lower bound on that function.

[KWR: Comments were edited with Mr. Flammia’s permission.]

7. August 2, 2011 6:48 am

What about multiverse theories? Do they imply that there is a world where P=NP?

It is interesting to consider the converse, phrased as follows:

What about non-multiverse theories? Do they imply that there is a world where BQP=P?

Here the answer is “yes” by the following construction.

“‘Non-multiverse theories’ are concretely specified as quantum-dynamical flows that do not globally satisfy a superposition principle, but rather are Hamiltonian flows on Kahlerian manifolds. In these non-multiverse worlds, the first and second laws are respected and Lindbladian measurement processes are well-defined, but the state-space does not accommodate the global linear superpositions of Hilbert-space dynamics. Physically speaking, small systems are dynamically quantum, big systems are dynamically classical, and the geometry of the quantum state-space is (in principle) a physical observable … albeit a challenging one.”

At first sight, the non-multiverse world seems strange, but actually it is a familiar one. Here is how the transition from special relativistic dynamics to general relativistic dynamics is described as a parallel narrative:

“‘General relativistic theories’ are concretely specified as dynamical flows that do not globally satisfy a relativity principle, but rather are Hamiltonian flows on Riemannian manifolds. In these general relativistic worlds, the first and second laws are respected and causal measurement processes are well-defined, but the state-space does not accommodate the global linear translations and boosts of Newtonian dynamics. Physically speaking, small systems are dynamically relativistic, big systems are dynamically non-relativistic, and the geometry of the Riemannian state-space is (in principle) a physical observable … albeit a challenging one.”

Drawing further inspiration from Ken’s question, we imagine a world in which 21st century experiments showed that the predictions of general relativity were just plain wrong. Even if this were the case, general relativity still would have immense value, as the exemplar of a mathematically and computationally natural dynamical theory. Thus we might abandon general relativity, but we would be very reluctant to abandon the mathematical naturality and computational efficiency associated to it.

In summary, for me the broad answer to Ken’s question is that the STEM community presently is in the midst of recapitulating in 20-21st century quantum dynamics and information theory very much the same thrilling evolution — even unto the Bœotian outcries of various academic factions — as the 19-20th century evolution of classical dynamics and thermodynamics. If so, this is good news for everyone … especially young researchers. 🙂

August 2, 2011 7:56 am

The August 1 post by John Sidles reminds me of this curious passage that appears in Robin Wilson’s book “Four Colors Suffice”:

“At this stage the computer started to think for itself, as Haken and Appel later recalled:

‘Thus it began to teach us things about how to proceed that we never expected. In a sense it had surpassed its creators in some aspects of the ‘intellectual’ as well as the mechanical parts of the task.’ “

A clue as to why the computer was being so helpful comes a few pages later when Wilson reports on the contributions made by a grad student who did a lot of the programming:

“Koch devised an elegant and highly efficient method for testing ….”

Moral of the story: Logic trumps luck.

August 2, 2011 9:04 am

Postscript:

Some guesses are better than others.

“Educated guessing” may in fact be an integral part of a logic that Charles Sanders Pierce called “Retroductive” or “Abductive” logic.

Symbolic Logic may be a few logics short of a full deck.

August 2, 2011 10:15 am

Classic Example:
Let pi(x) be the number of primes \le x
Let li(x) be integral from x to 1 of dt/ln t

Numerical evidence indicated pi(x) < li(x).
This was a good guess at the time.

Skewes showed that there is an x li(x).
(At one time this was in the Guiness book of world records as the largest
number ever used in a math proof. Now its the bounds on Graham’s number.)

Later it was shown that pi(x)-li(x) changes signs infinitly often.

August 2, 2011 5:59 pm

A Classic Example:
Let pi(x) be the number of primes \le x
Let li(x) be the integral from x to 1 of dt/ln t
By Prime Number theorem pi(x) is ROUGHLY li(x).
How do they compare?
Numerical evidence indicated that, for all x, pi(x) li(x).
At one time the Guiness book of world records listed 10^10^10^34 as the largest
number to appear in a math proof- since exceeded by Graham’s number.

It was later shown that | pi(x) – li(x) | changes signs infinitly often.

The bound for the first time pi(x) > li(x) has come down some but is
still quite large.

August 4, 2011 8:15 am

There’s a common property of NP-complete problems and programming languages that puzzles me:

A) NP-complete problems can simulate any other NP problem in polynomial time.
B) Turing-complete languages can simulate any other programming language in polynomial time.

What does this similarity really mean? Where does it come from? Does it imply that problems and algorithms share more properties than they seem?

August 4, 2011 9:56 am

For Turing-complete languages, there’s no polynomial time bound. They just have to halt in some finite time if the simulated one halts. My intuition is that complexity theory starts by adding resource bounds (polynomial in the case of P vs. NP stuff like (A)) to the definitions of general computability theory (like (B)).

August 4, 2011 11:04 am

OK, my mistake. Thank you Ørjan.

The striking point is the existence of these bounds in nature, for they don’t look artificial at all. Cook discovered the equivalence of NP-complete problems even before he defined them.

And Church’s thesis doesn’t involve a precise definition of computing, it’s just the remark that all models of computation are mutually equivalent, without even providing a precise definition of what a model of computation is.

My intuition is that the notions of problem and of language are isomorphic in some sense, but I don’t know yet where this leads…

August 5, 2011 9:34 am

Let’s put it in another way. All that follows is conjectural.

When a problem admits many different wordings up to a suitable polynomial reduction, it must admit only one algorithm up to some isomorphism that preserves complexities. When a problem admits a single wording up to polynomial reduction, then it has many different solutions.

This would explain why the problem of sorting a list, which seems to admit only one reasonable formulation, possesses so many different solutions… and why the NP-complete problems, being all rewordings of each other, possess only one reasonable solution, namely the naïve algorithm that tries every possibility.

When there are many way of expressing a problem there’s only one way to solve it, and the other way around. I wonder if it’s possible to draw anything from this idea…

13. August 4, 2011 2:54 pm

Here is a post about conjectures in mathematics (which is related to the guessing issue) by Shmuel Weinberger. http://gilkalai.wordpress.com/2008/11/19/about-conjectures-shmuel-weinberger/

August 5, 2011 6:47 pm

I think your last remark might give amateurs reading the blog the wrong impression that P is the class of efficiently solvable problems on a physical computer, and NP is the class of efficiently verifiable ones. A lot of misunderstanding about P vs. NP in popular culture stems from this kind of belief. Turing machines are the same in any universe..but they might be a weak or a strong model compared with physical systems.

• August 5, 2011 7:16 pm

Your comment intersects a subtle point I hinted at with my link to the Lakoff-Nuŉez “embodied mind theory” book. Turing machines T are the same in any universe…and so are other models T’ whose polynomial-time classes are different. If T’ is closer to physical systems in another universe, then brains may evolve to view T’ as saliently correct for poly-time, with our Turing machines regarded as a curiosity.

Mind you, we’re not even sure that T is correct for our physical world—that’s the part of the “poly-time Church-Turing thesis” that is most often challenged. But they certainly have the most equivalent models for poly-time, and so are unquestionably salient.

I do believe that “computable” is the same in every possible universe. While I was at Oxford in 1985, David Deutsch really did initially think that quantum mechanics dictated otherwise for our world. It was actually a nontrivial task with Robin Gandy and Roger Penrose and others huddled after seminar talks at the Mathematical Institute to close off his possibility of accepting non-r.e. sets. This belief seems to have off-and-on status in Max Tegmark’s theories.

August 5, 2011 9:37 pm

Thanks for the nice answer. I realize that physical laws strongly influence the relevance of computation models to the practice of computing. I just thought I should point out the detail for the benefit of “outsiders” reading the blog.

BTW reading Woit’s criticism of Susskind, I am thinking it’s remarkable to what heights some theoretical physicists have taken the “if we can’t prove it or even explain it, we’ll assume it”. I think Aaronson said somewhere that if P vs NP was a theoretical physics question, it would’ve long ago been settled and declared the “the law of the impossibility of efficient general search” or something catchier. I am glad that for the most part we’re sticking with out model (Turing machines) and not changing it to make a version of P vs NP easier to resolve.

August 5, 2011 6:48 pm

Of course the multiverses question reminds of me an Impagliazzo talk, with his (great!) transparencies with hand-drawing of alien worlds.

Since you started off the blog post with mention of Mazur’s theorem about the Eistenstein Ideal, I should mention that this paper was also the culminatin of a good guess. I was Mazur’s grad student at the time that he wrote it so I was “present at the creation”. Reichert and Billing and Mahler (around 1940) gave a specific procedure (quite algorithmic) which, given a positive integer $N$, would construct a curve whose solutions over $\mathbb{Q}$ parametrized elliptic curves over $\mathbb{Q}$ which had a point of order $N$. Billing and Mahler proved that $N=11$ was impossible. This curve is known in the modular forms community as $X_1(N)$. In the 1960’s, Andy Ogg ( http://math.berkeley.edu/index.php?module=mathfacultyman&MATHFACULTY_MAN_op=sView&MATHFACULTY_id=133 ) noticed that all the $N$ for which there actually were points of order $N$ were exactly those for which the curve had genus 0. The ones (subsequent to Billing and Mahler, there were a handful of other cases) for which there was no point of order $N$ had genus $> 0$. Mazur took that as a goal, and by some rather deep insight, was able to dispose of *all* cases of genus $> 0$ that were big enough. All of the others (such as $N=13$ which was particularly difficult, and the subject of a separate paper by Mazur and John Tate) were taken care of one by one.