The polynomial hierarchy is infinite for a random oracle

Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.

Today we congratulate them on their achievement and describe part of how their new result works.

What exactly did they prove? As some have already remarked, how one expresses this says something about communications in our field. Exactly what they proved is:

Theorem 1 For some explicit constant ${c > 0}$ and all depths ${d \geq 2}$ up to ${c\sqrt{\log n}/\log\log n}$, there is an ${n}$-ary monotone Boolean function ${f}$ that has a simple ${O(n)}$-size circuit ${S_d}$ (indeed, a formula) of depth ${d}$ with unbounded fan-in ${\mathsf{AND}}$ and ${\mathsf{OR}}$ gates, but such that every circuit ${C}$ of depth ${d}$ with the opposite kind of gate at its output as ${S_d}$, or with bottom fan-in at most ${\log n/10(d-1)}$ at the inputs, either has size above ${2^{n^{1/6(d-1)}}}$ or else agrees with ${S_d}$ on at most a ${(\frac{1}{2} + n^{-\Omega(1/d)})}$ proportion of the inputs.

## What Did They Prove, Again?

That is quite a mouthful. We can at least simplify it—as they do up front in their paper—by noting that every circuit of depth ${d-1}$ trivially obeys both of the stated restrictions on circuits of depth ${d}$:

Theorem 2 Every Boolean circuit ${C}$ of depth ${d-1}$ and size at most ${2^{n^{1/6(d-1)}}}$ gets at least a ${(\frac{1}{2} - n^{-\Omega(1/d)})}$ fraction of the inputs wrong with regard to computing ${S_d}$.

Johan Håstad’s famous 1986 PhD thesis had proved a similar lower bound only in the worst case. If this still seems hard to parse, however, here is a consequence they proved. The total influence of a Boolean function ${f}$ is the sum from ${i=1}$ to ${n}$ of the proportion of ${w \in \{0,1\}^n}$ for which flipping the ${i}$-th bit of ${w}$ changes the value ${f(w)}$.

Theorem 3 For some non-constant depths ${d = d(n)}$ and size functions ${s(n)}$ greater than quasi-polynomial in ${n}$, there are monotone Boolean functions ${f_n}$ whose total influence is only ${O(\log n)}$, but that still cannot be approximated better than a ${\frac{1}{2} + o(1)}$ fraction by circuits of depth ${d(n)}$ and size at most ${s(n)}$.

This gives a big “No” to a question in two posts by our friend Gil Kalai in 2010 on his blog and in 2012 on StackExchange. It rules out any kind of strong converse to a famous 1993 theorem of Nathan Linial, Yishay Mansour, and Noam Nisan, later improved by Ravi Boppana, showing that small constant-depth circuits compute functions of low sensitivity. This theorem has many applications, so the big bound against a converse is significant news, but perhaps the above statement still does not come trippingly off the tongue. Well, here’s something else they proved:

Theorem 4 Relative to a random oracle ${A}$, the polynomial hierarchy ${\mathsf{PH}^A}$ is infinite.

Now that’s a nice short statement that can grab people—at least people like us who did complexity in the 1980s and before. We know this as an open problem going back even before Charlie Bennett and John Gill proved that ${\mathsf{NP}^A \neq \mathsf{P}^A}$ for a random ${A}$ in 1981.

However, there is not much surprise and not much mileage in that statement. It was believed even before Ron Book observed in 1994 that its negation collapses the hierarchy without an oracle: Given a relativizable class ${{\cal C}}$, define ${\mathsf{Almost}\text{-}{\cal C}}$ to be the set of languages ${B}$ such that the measure of ${A}$ giving ${B \notin {\cal C}^A}$ is zero. The measure is properly on the space of infinite 0-1 sequences but it is OK to think of the usual Lebesgue measure on ${[0,1]}$ where e.g.

$\displaystyle 0.001101010001010001\dots$

denotes the set of primes, ignoring the clash between finite sets like ${\{2,3\}}$ and their co-finite counterparts like ${\mathbb{N} \setminus \{0,1,3\}}$ that map to the same dyadic rational number.

For classes ${{\cal C}}$ enjoying certain closure properties, ${\mathsf{Almost}\text{-}{\cal C}}$ equals ${\mathsf{BP}\cdot {\cal C}}$. The dot means we have ${\mathsf{BPP}}$-machines each of whose random branches ends with one query to the given language in ${{\cal C}}$, whose answer becomes the result of that branch. For instance, ${\mathsf{Almost}\text{-}\mathsf{NP}}$ equals the Arthur-Merlin class ${\mathsf{AM} = \mathsf{BP}\cdot\mathsf{NP}}$. Now if ${\Pi_k^p \not\subseteq \mathsf{Almost}\text{-}\Sigma_k^p}$ for every ${k}$, then by standard 0-1 laws the measure of ${A}$ putting ${\Pi_k^p \subseteq \left(\Sigma_k^p\right)^A}$ is zero. Then by the fact that a countable union of sets of measure zero has measure zero, the hierarchy is infinite for a random oracle. Hence its collapse for a random oracle implies ${\Pi_k^p \subseteq \mathsf{BP}\cdot\Sigma_k^p}$ for some ${k}$. This in turn would collapse the hierarchy to ${\Sigma_{k+1}^p \cap \Pi_{k+1}^p}$ without oracle, much as ${\mathsf{co}\text{-}\mathsf{NP} \subseteq \mathsf{AM}}$ collapses it to ${\Sigma_2^p \cap \Pi_2^p}$.

The point in ${\mathsf{BP\cdot NP} = \mathsf{Almost}\text{-}\mathsf{NP}}$ is that random oracles furnish random bits for the computations. The random oracles could be richer in that exponentially many poly-length computations could use exponentially many random oracle bits in-toto. But the aforementioned closure properties together with probability amplification bridge the difference to machines using polynomially many independent bits when ${{\cal C}}$ is a hierarchy level.

The point in the new lower bound is that the random oracle bits connect instead to distributions over the input space in an average-case argument. This connection is expressed well in their paper.

## The New Idea

Sometimes a new idea comes from a new mathematical object, but other times it comes from a new way of applying and controlling a known object. Leslie Valiant in the late 1970s introduced the kind of projection that can do the following to each variable ${x}$ in a Boolean or numerical formula:

1. Substitute ${x}$ by a constant value;

2. Leave ${x}$ alone; or

3. Substitute ${x}$ by a variable ${x'}$ already occurring in the formula.

An equivalent rule to the last is that you can rename two variables ${x,x'}$ to be the same variable ${y}$. The substitution applies simultaneously to every occurrence of a variable. By undoing this rule one can convert every formula ${f}$ with ${m}$ occurrences of ${n}$ variables into a formula ${f_1}$ with ${m}$ different variables each occurring once. This ${f_1}$ is called a read-once formula and is unique up to permuting or renaming variables, so every ${f}$ is a projection of a read-once formula. The formulas ${S_d}$ targeted in their proof are already read-once, so the game becomes how to analyze the formulas ${f'}$ that arise as projections of ${S_d}$.

When only the first two rules are applied, ${f'}$ is a restriction of ${f_1}$. Håstad’s 1986 proof technique analyzed random restrictions obtained by independently leaving each variable ${x_i}$ alone with some probability ${p}$ and then assigning a random 0-or-1 value to each variable not left alone. Restrictions applied to a read-once formula not only preserve its structure but also preserve read-onceness, which keeps the behavior of different parts of the formula independent. This independence is sacrificed by using projections, which could “entangle” different parts of the formula and thereby worsen bias.

## Applying It

The proof technique both then and now works by reductio ad absurdum on the depth ${d}$. ${S_d}$ is always a monotone alternating ${\mathsf{AND}}$${\mathsf{OR}}$ formula ${f}$ of a kind introduced by Mike Sipser with ${\mathsf{AND}}$ at the inputs, so the output gate is ${\mathsf{OR}}$ if ${d}$ is even, ${\mathsf{AND}}$ if odd. It is “tuned” by giving the fan-in ${w_j}$ at leach level ${j}$, ${1 \leq j \leq d}$, so ${n = n_d = \prod_{j=1}^d w_j}$. Håstad had shown that with the right combination of ${w_j}$ and ${p}$, a random restriction makes ${f'}$ equivalent to a formula with a large enough expected number ${n' \geq n_{d-1}}$ of variables that embeds the relevant structure of ${S_{d-1}}$. A circuit ${C_d}$ of small enough size ${s_d}$ and depth ${d-1}$, however, with high probability gets beaten down by the restrictions into a simpler form on a pace that cannot be sustained for keeping up with ${S_{d-1}}$.

The reductio can then continue with ${d' = d-1}$, until an obvious contradiction is reached on going from ${d = 2}$ to ${d' = 1}$ with the circuit ${C_{1}}$ that results. One can with more care actually structure the proof as a standard induction going from lower bounds for ${d-1}$ to the desired ones for ${d}$, but the mechanism is the same either way.

For an average-case argument one also needs to preserve the balance between arguments ${x}$ giving ${f'(x) = 0}$ and those giving ${f'(x) = 1}$. Otherwise the trivial circuit that always says “no” or its partner that always says “yes” already has more advantage than the desired conclusion allows. This is where one might first think of the third projection rule as counter-productive. By identifying ${x_i}$ and ${x_j}$, any “bias” in ${x_i}$ might be propagated to more parts of the formula. However, this also intuitively creates greater sensitivity in terms of influence as defined above. If ${(w,i)}$ is sensitive, then letting ${w' = w}$ with bit ${i}$ flipped, we have that ${w}$ and ${w'}$ balance each other.

Rossman, Servedio, and Tan craft the argument, using a slightly different “tuning” of ${S_d}$ from Håstad’s, so that the benefits carry the day: ${S'_d}$ retains its structure and balance and hardness but the prospective smaller-depth circuit ${C}$ “collapses.” As they note, random projections had been used in a 2001 paper by Russell Impagliazzo and Nathan Segerlind for a proof complexity lower bound, but apparently had not been applied so bluntly to circuits.

Their projections first determine blocks of variables ${x_{i,a}}$ that get mapped to the same new variable ${y_a}$ for different ${i}$. Then they can be specified by a distribution over restrictions, thus simplifying the randomness analysis. If the circuit ${C}$ has a gate that accesses some ${x_{i,a}}$ and some ${\bar{x}_{j,a}}$, then after the projection the gate accesses both ${y_a}$ and ${\bar{y}_a}$ and hence can be collapsed to ${0}$ (if the gate is AND) or ${1}$ (if OR). Another difference from Håstad’s technique is that they need to adjust the projection probability adaptively depending on outcomes at previous depths. This is the point where we say to refer to their 59-page paper for details, but they also have excellent further explanations in sections 4 and 7.

## Why Excited

Dick and I are excited because we have thought about inductive lower bound arguments whose base cases involve read-once formulas and whose inductive steps are projections. In a post with Dick three years ago I described one case that may retain interest even though the statement it was trying to prove turned out to be false.

My planned attack began with a “niceness” theorem for functions ${f}$ having a funny kind of read-once arithmetical formula that has powering as a primitive operation—for instance, ${(x+2y-z)^3}$ counts as one not three occurrences of the variables in contrast to ${(x+2y-z)*(x+2y-z)*(x+2y-z)}$—but does not have an additive constant in any subterm. I proved that the first partial derivatives ${\frac{\partial f}{\partial x_i}}$ form a Gröbner basis under any admissible monomial ordering. If one allows additive constants, then the partials still bound such a Gröbner basis. The argument is less simple than one might expect and does not obviously extend to higher derivatives—at least not obviously to me, as I’ve sat on revisions of a paper for over a decade while trying to prove that. But the theorem is still good enough to use as a base case and ask:

How fast can various complexity measures associated to Gröbner bases grow under successive applications of (random) projections?

My computer runs back then were sobering: even after just a few projections in some targeted cases the answer was, mighty fast. And as I related in the post, the desired assertion about a certain “monomial complexity” measure was refuted. But there are other formulations and conjectures and measures to try, and this new result in the Boolean case may give encouragement to try them. There is also the difference that in my case the read-once formula ${f_1}$ was “easy” and was derived from a given function ${f}$ we are trying to prove “hard” by undoing projections, whereas in the new case the ${S_d}$ functions are read-once but are the ones we are showing hard for lower-depth circuits. So perhaps this all runs in the opposite direction—but still the prospect of new ways to control projections is pretty cool.

## Open Problems

What new results may come out of this breakthrough lower bound technique?

1. May 8, 2015 8:53 pm

Reblogged this on Rupei Xu.

2. May 8, 2015 9:28 pm

Great result and a nice overview! Congratulations to Ben, Rocco and Li-Yang!

3. May 9, 2015 1:50 pm

⭐ 💡 😎 wrt thm 2, it shows a fraction of (“outputs-of-“) inputs are wrong on the (exponentially) constrained size circuit. arent similar or nearly the same results obtained if there is even a single wrong input?
the circuit agenda/ program is really going well lately, eg also arithmetic circuits/ VNP=?VP, hope to see better surveys and book presentations, esp “graduate level” books seem hard to find.

4. May 9, 2015 4:58 pm

Reblogged this on Sublime Illusions.

5. May 10, 2015 1:39 am

Congratulations to Benjamin, Rocco and L-Yang for these exciting results. Regarding “inverse Hastad-LMN-Boppana” theorem the situation is as follows: In a 1999 paper Benjamini-Schramm and me made (almost) the strongest possible version of such a statement asserting that a Boolean function on n variables with influence I is close (in the l_2 norm) to a function expressed by depth d circuit of size s with $(\log s)^{d-1} =O(I)$. In this conjecture d need not be a constant because you can consider a function with larger depth depending on a small subset of the variables.  (Of course, replacing the exponent d-1 in the conjecture by constant times d or by a fixed function of d would also be interesting and useful.)

O’Donnell and Wimmer pointed out that the conjecture as stated is not correct and that you must replace the condition “is close (in the l_2 norm)” to “have a substantial correlation with”.

This is seen by taking the OR of two such functions each on half the variables, and in fact you can take an arbitrary Junta whose inputs are different functions of this type. So if you want to form a conjecture about “close in the l_2 sense” you need to allow such juntas.

If I understand correctly the new result says is that indeed if you take s much smaller then n and compensate for that by a larger depth this cannot be approximated by bounded depth. This is very useful to know.

It will also be interesting to show (or refute) that if all the individual influences are below $\log n/n$ then the function has positive correlation with a depth 2 polynomial size function. And if all the individual influences are below polylog (n)/n then the function has positive correlation (or even is epsilon-close in the l_2 sense) with a function in AC^0.

Of course, from the basic relation between influences and Fourier coefficients  if I(f) is small that f can be approximated by a low degree polynomial. Because most of the Fourier coefficients are below large constant times the influence. The two conjectures in my cited  post (also originated in the 1999 paper with Benjamini and Schramm) asserts that monotone functions expressed  with depth d size s threshold circuits have good low degree approximations. This might be true (but very hard) for monotone TC0 functions and it looks toof good to be true for monotone functions in TC0. (Alas we dont have even an example of a monotone function in TC0 that cannot be approximated by a function in monotone TC0. )

• May 11, 2015 4:56 am

Well, trying to understand better the remarkable new RST-paper, it seems to me that indeed as the post correctly describe RST give a big “no” to the possibility of “reverse Boppana-Hastad-LNM” theorem as we conjectured it in 1999, namely they genuinely show a low influence function that cannot be explained by the (log size)^depth upper bound. The function is Sipser type AND-OR tree of some appropriate parameters. The influence can be computed and is considerably smaller then the (log size)^depth upper bound, and the difficult part is the proof that you cannot approximate this function at all by a smaller depth function (even depth smaller by one).

(So its is not just matter of adjusting the reverse conjecture to circuits with different variables having different depth but really a new reason for having small total influence that cannot be captured by Boppana-Hastad-LMN .)

(I think, but not sure, that in this function to get influences well below the log size^depth you need to set the parameters that the influence of every variable is larger than polylog (n)/n so the question for them still remain.)