A topical look at Norbert Blum’s paper and wider thoughts.

 Cropped from source

Thales of Miletus may—or may not—have accurately predicted one or more total solar eclipses in the years 585 through 581 BCE.

Today we discuss the nature of science viewed from mathematics and computing. A serious claim of ${\mathsf{NP \neq P}}$ by Norbert Blum has shot in front of what we were planning to say about next Monday’s total solar eclipse in the US. Update 9/2/17: Blum has retracted his claim—see update at end.

Predicting eclipses is often hailed as an awakening of scientific method, one using mathematics both to infer solar and lunar cycles and for geometrical analysis. The aspects of science that we want to talk about are not “The Scientific Method” as commonly expounded in step-by-step fashion but rather the nature of scientific knowledge and human pursuits of it. We start with an observation drawn from a recent article in the Washington Post.

Despite several thousand years of experience predicting eclipses and our possession of GPS devices able to determine locations to an accuracy of several feet, we still cannot predict the zone of totality any closer than a mile.

The reason is not any fault on Earth but with the Sun: it bellows chaotically and for all we know a swell may nip its surface yea-far above the lunar disk at any time. Keeping open even a sliver of the nuclear furnace changes the character of the experience.

The Post’s article does a public service of telling people living on the edge of the swath not to think it is a sharp OFF/ON like a Boolean circuit gate. People must not always expect total sharpness from science. Happily there is a second point: you don’t have to drive very far to get a generous dose of totality. This is simply because as you move from the edge of a circle toward the center, the left-to-right breadth of the interior grows initially very quickly. This is our metaphor for how science becomes thick and solid quickly after we transit the time of being on its edge.

Incidentally, your GLL folks will be in the state of New York next week, nowhere near the swath. Next time in Buffalo. Also incidentally, Thales is said to be the first person credited with discovering mathematical theorems, namely that a triangle made by a circle’s diameter and another point on the circle is a right triangle and that lengths of certain intersecting line segments are proportional.

## Norbert Blum’s Paper

The transit time is our focus on this blog: the experience of doing research amid inspirations and traps and tricks and gleams and uncertainty. Swaths of our community are experiencing another transit right now.

Norbert Blum has claimed a proof that P is not equal to NP. In his pedigree is holding the record for a concrete general Boolean circuit lower bound over the full binary basis for over 30 years—until it was recently nudged from his ${3n}$ to ${3.0116n.}$ His paper passes many filters of seriousness, including his saying how his proof surmounts known barriers. Ken and I want to know what we all want to know: is the proof correct?

More generally, even if the proof is flawed, does it contain new ideas that may be useful in the future? Blum’s proof claims a very strong lower bound of ${2^{\Omega(N^{1/6})}}$ on the circuit complexity of whether a graph of ${N}$ edges has a clique of size ${N^{1/3}}$. He gets a lower bound of ${2^{\tilde{\Omega}(N^{1/4})} }$ for another function, where the tilde means up to factors of ${\log N}$ in the exponent. We would be excited if he had even proved that this function has a super-linear Boolean complexity.

Blum’s insight is that the approximation methods used in monotone complexity on the clique function can be generalized to non-monotone complexity. It is launched by technical improvements to these methods in a 1999 paper by Christer Berg and Staffan Ulfberg. This is the very high level of what he tries to do, and is the one thing that we wish to comment on.

## The Structure Of The Proof

Looking quickly at the 38 page argument an issue arose in our minds. We thought we would share this issue. It is not a flaw, it is an issue that we think needs to be thought about more expressly.

As we understand his proof it takes a boolean circuit for some monotone function ${f}$ and places it in some topological order. Let this be

$\displaystyle g_{1},g_{2},\dots,g_{m}.$

So far nothing unreasonable. Note ${g_{m}}$ is equal to ${f}$, of course. Then it seems that he uses an induction on the steps of the computation. Let ${I_{k}}$ be the information that he gathers from the first ${k}$ steps. Technically ${I_{k}}$ tells us something about the computation so far. The punch line is then that ${I_{m}}$ tells us something impossible about ${g_{m}}$ which is of course ${f}$. Wonderful. This implies the claimed lower bound on ${f}$ which solves the ${\mathsf{NP \neq P}}$ question.

## The Issue

The trouble with this is the following—we studied this before and it is called the “bait and switch” problem. Let ${r}$ be some random function of polynomial Boolean complexity and let ${s=r \oplus f}$. Then assume that there is a polynomial size circuit for ${f}$. Clearly there is one for ${r}$ and ${s}$ too. Create a circuit that mixes the computing of ${r}$ and ${s}$ in some random order. Let the last step of the circuit be take ${r}$ and ${s}$ and form ${r \oplus s}$, Note this computes ${f}$.

The key point is this:

No step of the computation along the way has anything obvious to do with ${f}$. Only at the very last step does ${f}$ appear.

This means intuitively to us that an inductive argument that tries to compute information gate by gate is in trouble. How can the ${I_{k}}$‘s that the proof compute have any information about ${f}$ during the induction? This is not a “flaw” but it does seem to be a serious issue.

If nothing else we need to understand how the information suddenly at the end unravels and reveals information about ${f}$. I think this issue is troubling—at least to us. It is important to note that this trick cannot seem to be applied to purely monotone computations, since the last step must be non-monotone—it must compute the ${\oplus}$ function. The old post also notes a relation between the standard circuit complexity ${C_{st}(f)}$ and the monotone complexity ${C_m(L_f)}$ of a related function ${L_f}$.

## The Outer Structure of the Proof

While we are grappling with the paper and writing these thoughts we are following an ongoing discussion on StackExchange and in comments to a post by Luca Trevisan, a post by John Baez, and a Hacker News thread, among several other places.

The paper has a relatively short “crunch” in its sections 5 and 6, pages 25–35. These follow a long section 4 describing and honing Berg and Ulfberg’s work. What the latter did was show that a kind of circuit approximation obatined via small DNF formulas in Alexander Razborov’s famous lowerbound papers (see also these notes by Tim Gowers) can also be obtained with small CNF formulas. What strikes us is that Blum’s main theorem is literally a meta-theorem referencing this process:

Theorem 6: Let ${f \in B_n}$ be any monotone Boolean function. Assume that there is a CNF-DNF-approximator ${A}$ which can be used to prove a lower bound for ${C_m(f)}$. Then ${A}$ can also be used to prove the same lower bound for ${C_{st}(f)}$.

The nub being discussed now is whether this theorem is “self-defeating” by its own generality. There may be cases of ${f}$ that meet the hypotheses but have polynomial ${C_{st}(f)}$. The StackExchange thread is discussing this for functions ${f_k}$ of Boolean strings denoting ${m}$-node ${N}$-edge graphs that give value ${1}$ whenever the graph is a ${k}$-clique (with no other edges) and ${0}$ when it is a complete ${(k-1)}$-partite graph. Such a function related to the theta function of László Lovász (see also “Theorem 1” in this post for context) have polynomial complexity, meet the conditions of Razborov’s method, and don’t appear to obstruct Berg and Ulfberg’s construction as used by Blum. But if they go through there, and if Blum’s further constructions using an inductively defined function ${R}$ would go through transparently, then there must be an error.

Update 8/22/17: This objection and the inference of error have been verified. A Wikipedia page for Éva Tardos’s function has been created. It seems to us that another form of the monotone function besides hers using integer approximations is just ${f(G,y) = 1}$ if the Lovász number of the complement of ${G}$ (presented as an edge list) is ${\geq}$ the number of 0’s in ${y.}$

## Inner Issues and Calculating

The details of ${R}$ in section 5 have also been called into question. We are unsure what to say about a claim by Gustav Nordh that carrying out the inductive construction as written yields a false conclusion that the monomial ${x}$ is an implicant of a formula equivalent to ${x \wedge y}$. There are also comments about unclarity of neighboring definitions, including this from Shachar Lovett in Luca’s blog since we drafted this section.

But this leads us to a larger point. Both of us are involved right now with painstaking constructions involving quantum circuits and products of permutations that we are programming (in Python). Pages 27–28 of Blum’s paper give a construction that can be programmed. If this is done enough to crank out some examples, then we may verify that potential flaws crop up or alternatively bolster confidence in junctures of the proof so as to focus on others first. This ability is one way we are now empowered to sharpen “fuzzy edges” of our science.

## Open Problems

Is the proof correct? Or will it fall into eclipse? We will see shortly no doubt. Comparing this table of eclipses since 2003 and Gerhard Woeginger’s page of claimed proofs over mostly the same time period, we are struck that ‘${\mathsf{P\neq NP}}$‘ and ‘${\mathsf{P=NP}}$‘ claims have been about twice as frequent as lunar and solar eclipses, respectively.

Update 8/18: This comment by user “vloodin”, whom we remember well from the discussion here of Vinay Deolalikar’s proof attempt seven years ago, lays out the apparent flaw in the paper in more detail.

Update 9/2: On 8/30, Blum posted to ArXiv a v2 that comprises the comment, “The proof is wrong. I shall elaborate precisely what the mistake is. For doing this, I need some time. I shall put the explanation on my homepage.” We have no further substantial information.

Update 10/18: Norbert Blum has posted a detailed but short explanation of the mistake.

[restored missing links; a few word and format changes, Uffberg->Ulfberg, updates 8/18 and 8/22]

August 17, 2017 5:04 pm

Very nice and enlightening discussion.

2. August 17, 2017 5:27 pm

think have seen your idea about the difficulty of P vs NP induction proofs wrt xor of functions in years-old-blog. unf didnt quite understand it originally either. maybe you could dig up the link for fun. but my question, wouldnt that same argument apply to the existing proofs of exponential circuit complexity of some monotone circuit functions? eg berg/ ulfberg, razborov, etc (quite a few examples out there now)…

one of the nice things about world-class proofs is that they show how some of our intuitions are wrong. on other hand it does seem like proving P vs NP must grapple with proving “some degree of randomness or disorder” of some type of function. ala razborov/ rudich observations. in fact it has long looked to me like P vs NP is the TCS equivalent of measuring disorder/ entropy in a very precise way. this connection is enhanced with all the research into satisfiability transition point connections to statistical physics. and maybe it will be strengthened even after P vs NP is proven “someday”…

if nothing else this proof attempt is great because its focusing attn on this very excellent area of complexity theory (circuit theory) the details of which are known only to a few hardcore experts… it took a huge amt of sophistication/ study/ finesse for him to write those pages, even if a flaw is found… and maybe (hopefully/ ideally) it will lead to focus on a particular gap that could be an important new open question.

it would really be nice if someone collected all these related proofs in this thread into a book, think its a bit of a shame it hasnt happened to date, its all scattered and even experts would have trouble consolidating it all. juknas book on circuit complexity comes close, but doesnt focus on this particular thread that Blum has zoomed in on. maybe a book by someone will be more likely now.

3. August 17, 2017 10:33 pm

The original essay idea, after 1. the intro as-is, was something like:
2. examples of unreasonable expectation of sharpness from science (“let us calculate” fallacy)…
3. …and unreasonable criticism: the rhetorical trick of calling the whole into question because some edges are fuzzy;
4. what predictive modeling is about (from my/our experience): fuzzy error ranges are not just inevitable, they’re the core of the model under its surface;
5. the transit time of humanly doing science before orbits are “settled”—how the field self-corrects and stabilizes (cue Newton-Laplace reference);
6. an analogy between “constructive convergence” in constructive mathematics and reliability in science. (Note also that the link under the word “awakening” in the intro is a book by B.L. van der Waerden—he of the famous numbers and conjecture—on the role of math in the development of science.)

News of the Blum paper came to us midday Monday just as I was rushing before vacation to finish two 13-page formal chess cheating reports, one an evident reprise of my “Thirteen Sigma” article. Here’s half of a footnote from the other one to an explanation of how a just-over-3-sigma test result giving what I call “face-value odds” of 750-1 against a null hypothesis of fair play needs to be weighed against a prior likelihood of cheating Fermi-estimated at 10,000-to-1 against: “On the other hand, if only one mobile [phone] was found with no other indications, then even in a tournament with prior broadcast warnings [of our rule forbidding carrying phones “on one’s person” during games], there might be a non-compliance rate of 10% by forgetful but innocent players, and that would change the ‘prior’ down only to 1,000-to-1, still mooting the 750-to-1.” How might this fit into my essay outline? Well, the Blum paper gives an au-courant example of point 5 and hopefully 6…

August 18, 2017 3:17 am

Tardos example refutes the proof, as has been pointed out on stackexchange:

It gives a monotone function, which agrees with CLIQUE on T0 and T1, but which lies in P.

August 18, 2017 6:12 am

The question is where is the mistake in the (otherwise well written and thought out) Blum’s paper. I believe the single error is one subtle point in the proof of Theorem 6, namely in Step 1, on page 31 (and also 33, where the dual case is discussed) – a seemingly obvious claim that C’_g contains all the corresponding clauses contained in CNF'(g) etc, seems wrong.

To explain this in more detail, we need to go into the proof and approximation method of Berg and Ulfberg, which restates the Razborov’s original proof of the exponential monotone complexity for CLIQUE in terms of DNF/CNF switches. This is how I see it:

To every node/gate g of a logic circuit betta (containing binary OR/AND gates only), conjunctive normal form CNF(g), disjunctive normal form DNF(g), and approximators Cr_g and Dk_g are attached. CNG and DNG are simply the corresponding disjunctive and conjunctive normal forms of the gate output. Dr_g and Ck_g are also disjunctive and conjuctive forms, but of some other functions, “approximating” the gate output. They are however required to have bounded number of variables in each monomial for Dr (less than a constant r) and in each clause for Ck (less than a constant k).

There is notion of an “error” introduced with this approximation. How is this error computed? We are only interested in some set T0 of inputs on which our total function takes value 0, and T1 of inputs on which our total function takes value 1 (a “promise”) . Now at each gate, we look only at those inputs from T0 and T1, which are correctly computed (by both DNF(g) and CNF(g), which represent the same function – gate output in betta) at gate output, and look how many mistakes/errors are for Cr_g and Dk_g, compared to that. If the gate is a conjunction, then the gate output might compute more inputs from T0 correctly (but the correctly computed inputs from T1 are possibly decreased). For Cr_g, which is defined as a simple conjunction, there are no new errors however on all of these inputs. Now, Dk_g is defined as a CNF/DNF switch of Cr_g, so there might be a number of new errors on T0, coming from this switch. On T1 also, there are no new errors on Cr_g – each error has to be present on either of gate inputs, and similarly on Dk_g, switch does not introduce new errors on T1. The analysis for OR gate is dual.

So the number of errors for the final approximators is bounded by number of gates in betta, times the maximal possible number of errors introduced by a CNF/DNF switch (for T0), or by a DNF/CNF switch (for T1). But the total number of errors has to be “large” in at least one case (T0 or T1), since this is a property of positive conjunctive normal forms with clauses bounded by k, which was the key insight of Razborov’s original proof (Lemma 5 in the Blum’s paper).

So what did Blum do in order to deal with negations (which are pushed to the level of inputs, so the circuit betta is still containing only binary OR/AND gates)?

His idea is to preform CNF/DNF and DNF/CNF switches restrictively, only when all variables are positive. Then the switches would work EXACTLY like in the case of Berg and Ulfberg, introducing the same amount of errors. It turns out this is the only case which needs to be considered.

So, he follows along the lines of Berg and Ulfberg, with a few distinctions. Instead of attaching CNF(g), DNF(g), Cr_g and Dk_g to each gate g of betta, he attaches his modifications, CNF'(g), DNF'(g), Cr’_g and Dk’_g, i.e. the “reduced” disjunctive and conjunctive normal forms, which he defined to differ from CNF(g) and DNF(g) by “absorption rule”, removing negated variables from all mixed monomials/clauses (he also uses for this purpose operation denoted by R, removing some monomials/clauses entirely; as we discussed on stackexchange, his somewhat vague definition is not really the problem, R can be applied at each gate but what is removed depends not only on previous two inputs but on the whole of the circuit leading up to that gate), and their approximators Cr’_g and Dk’_g, that he also introduced.

He concludes, in Theorem 5, that for a monotone function, reduced CNF’ and DNF’ will really compute 1 and 0 on sets T1 and T0, at root node g_0 (whose output is the output of the whole function in betta). This theorem is, I believe, correct.

Now comes the counting of errors. I believe the errors at each node are meant to be computed by comparing reduced CNF'(g) and DNF'(g) (which are now possibly two different functions), to C’_g and D’_g as he defined them. The definition of approximators parrots definition of CNF’ and DNF’ (Step 1) when mixing variables with negated ones, but when he deals with positive variables, he uses the switch like in the case of Berg and Ulfberg (Step 2). And indeed, in Step 2 he will introduce the same number of possible errors like before (its the same switch, and all variables involved are positive).

But the proof is wrong, I think, in Step 1. I think Blum is confusing gamma_1, gamma_2, which really come, as he defined them, from previous approximators (for gates h1, h2), for positive parts of CNF'(h1) and CNF'(h2). There is a difference, and hence, the statement “C’_g contains still all clauses contained in CNF’_\betta(g) before the approximation of the gate g which use a clause in gamma’_1 or gamma’_2” seems to be wrong in general.

Anyway, sorry for the long post; news about the Blum’s paper came through a friend who, fresh from Szechuan, was working in the field when someone called him, and then he passed the news asking me to weight in. So, here it is, hope that someone might check if this is really the place where the proof breaks down.

August 19, 2017 9:45 am

In the case of monotone networks, the Phi functions are defined with respect to the current front (section 3, page 13: “in dependence of …”). Although it isn’t explicitly stated, I believe Blum is also defining the reduced CNF/DNF functions at a node g with respect to the current front.

Thus in the case that g is an OR-gate, at the time that gate g is being approximated, by definition: CNF’_beta(g) = C_h1^k OR C_h2^k = AND { (gamma_1 OR gamma_2), (gamma’_1 OR gamma_2), (gamma_1 OR gamma’_2), (gamma’_1 OR gamma’_2) }. Under this definition, does the statement in question (“Furthermore, C’_g contains still all clauses …”) still potentially fail?

August 21, 2017 2:20 pm

I suspect vloodin is correct that the proof fails here.

Consider the case of CNF approximators (the DNF case is dual). I will count the errors slightly differently than Blum does in his paper.

Given a node g, let CNF(g) be the Boolean function computed by the standard network at node g where each node s of the current front has been replaced by its CNF approximation C(s). C(s) is necessarily of the form C(s) = gamma and gamma’.

Let g be the next node to be approximated (according to the topological ordering).

If g is an AND gate then we set C(g) = (gamma_1 AND gamma_2) AND (gamma’_1 AND gamma’_2). In this case no errors are introduced.

Suppose g is an OR gate. The current CNF formula being computed at node g is CNF(g) = (gamma_1 OR gamma_2) AND (gamma’_1 OR gamma_2) AND (gamma_1 OR gamma’_2) AND (gamma’_1 OR gamma’_2). This paper proposes to approximate this as C(g) = Approx-D-to-C (Approx-C-to-D(gamma_1) OR Approx-C-to-D(gamma_2)) AND gamma_1 AND gamma_2 AND (gamma’_1 OR gamma’_2).

If we get rid of the trivial clauses/monomials appropriately, then it is clear that we incur the usual loss in the DNF-to-CNF/CNF-to-DNF approximations and no loss in the final term.

However, it seems to me that this approximator will incur errors when approximating the middle two terms, i.e. replacing (gamma’_1 OR gamma_2) AND (gamma_1 OR gamma’_2) by gamma_1 AND gamma_2 is not a safe operation.

This absorption rule is introduced in section 5 pages 25-26. Given the CNF formula for a non-constant monotone Boolean function f, we can drop all negated variables from any clause, d, with a positive variable to get a clause d’. The reasoning is as follows. For b such that f(b) = 0, b violates one of the clauses, say d; as d’ is a subclause of d, b violates d’ and thus the reduced CNF formula continues to evaluate to 0 on b. Now suppose b’ violates the reduced formula and thus one of the reduced clauses, say d’. Let b be the input with 0 in all of the variables corresponding to d’ and 1 elsewhere. As b’ violates d’, b’ leq b and as b violates d, f(b)=0. Finally by monotonicity, f(b’) leq f(b) = 0. This can then be generalized to show that it is safe to absorb a conjunction of negative clauses into a conjunction of positive clauses i.e. (gamma or gamma’) = gamma provided that (gamma or gamma’) is monotone non-constant.

However I do not see how this implies (or could be generalized to imply) that it is safe to perform these operations in the middle of the network.

5. August 18, 2017 7:58 am

Thank you very much for the comments. As before, after being modded-in once, your further comments appear freely.

6. August 18, 2017 9:22 am

It seems that even the greater researchers have their crackpot days… By the way, see that my attempt is in general more sensible than theirs ones: https://arxiv.org/ftp/arxiv/papers/1501/1501.03872.pdf

August 19, 2017 8:24 am

One of the reason I adore your blog can be seen in the way you handle this particular situation, full of enthusiasm and constructive support just to find out whether or not Blum’s claim will hold or if eventually it can be fixed to come to the desired conclusion.

For me this makes the difference between you and most of you other colleagues.

• August 20, 2017 11:10 pm

In blog people can look nice.

8. August 19, 2017 11:42 am

> It is important to note that this trick cannot seem to be applied to purely monotone computations, since the last step must be non-monotone—it must compute the {\oplus} function.

(I’ll paraphrase your “trick” to provide context: express $f$ as $r {\oplus} s$, with $r$ “random but polynomial time” — I guess a pseudorandom function would do for $r$. Then $s$ will also “look pseudorandom”. Make a circuit for $f$ which computes $r$ and $s$ separately, then XORs them as the last step.)

Blum moves all his NOTs to the start of the circuit, so his transformed circuit is monotone everywhere except in one layer at the beginning. Does the issue you point out still apply (given that his inductive analysis is only about this transformed circuit)?

9. August 21, 2017 5:00 pm

some postmortem thoughts: it looks to me like Blums paper is like the moth around a flame so to speak. have long suspected that a real P vs NP proof lies around very similar ideas, and maybe will write this up sometime in a new blog. however, think its a shame, it looks like while Blum understand existing/ nearby proof directions better than just about anyone, maybe Blum had nobody to talk over his approach with, nobody to “bounce ideas off”. could all this have been averted somehow if he submitted an outline somehow for expert review? my guess is he spent weeks at least writing the flawed paper.

it seems paradoxically in the modern hyper-connected cyber milieu the only way to get feedback on ones (serious) ideas is to launch a comprehensive proof attempt/ writeup seemingly at great risk to ones personal reputation. could it be said Deolalikar had the same problem? what if there was some way to bounce ideas off other experts without such high stakes and worldwide pressure and still get feedback? tried it myself almost ½ decade ago in my blog with no success. in some rare cases (2 in about 7 years) the experts all pop out of the wordwork to tag-team quickly shoot down an attempt (and this was a very impressive response) but nobody in this very high-brainpower crowd participates much in collective/ collaborative cyber “brainstorming”.

really wish there was some way to get serious feedback from experts about general directions & rough conjectures without writing up entire (complex) papers. polymath project embodies some of these ideals and has “fired/ clicked” in some cases however polymath seems to be dormant lately… how about a TCS polymath effort?