A Tighter Grip on Circuit Depth
The polynomial hierarchy is infinite for a random oracle
Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.
Today we congratulate them on their achievement and describe part of how their new result works.
What exactly did they prove? As some have already remarked, how one expresses this says something about communications in our field. Exactly what they proved is:
Theorem 1 For some explicit constant and all depths up to , there is an -ary monotone Boolean function that has a simple -size circuit (indeed, a formula) of depth with unbounded fan-in and gates, but such that every circuit of depth with the opposite kind of gate at its output as , or with bottom fan-in at most at the inputs, either has size above or else agrees with on at most a proportion of the inputs.
What Did They Prove, Again?
That is quite a mouthful. We can at least simplify it—as they do up front in their paper—by noting that every circuit of depth trivially obeys both of the stated restrictions on circuits of depth :
Theorem 2 Every Boolean circuit of depth and size at most gets at least a fraction of the inputs wrong with regard to computing .
Johan Håstad’s famous 1986 PhD thesis had proved a similar lower bound only in the worst case. If this still seems hard to parse, however, here is a consequence they proved. The total influence of a Boolean function is the sum from to of the proportion of for which flipping the -th bit of changes the value .
Theorem 3 For some non-constant depths and size functions greater than quasi-polynomial in , there are monotone Boolean functions whose total influence is only , but that still cannot be approximated better than a fraction by circuits of depth and size at most .
This gives a big “No” to a question in two posts by our friend Gil Kalai in 2010 on his blog and in 2012 on StackExchange. It rules out any kind of strong converse to a famous 1993 theorem of Nathan Linial, Yishay Mansour, and Noam Nisan, later improved by Ravi Boppana, showing that small constant-depth circuits compute functions of low sensitivity. This theorem has many applications, so the big bound against a converse is significant news, but perhaps the above statement still does not come trippingly off the tongue. Well, here’s something else they proved:
Theorem 4 Relative to a random oracle , the polynomial hierarchy is infinite.
Now that’s a nice short statement that can grab people—at least people like us who did complexity in the 1980s and before. We know this as an open problem going back even before Charlie Bennett and John Gill proved that for a random in 1981.
However, there is not much surprise and not much mileage in that statement. It was believed even before Ron Book observed in 1994 that its negation collapses the hierarchy without an oracle: Given a relativizable class , define to be the set of languages such that the measure of giving is zero. The measure is properly on the space of infinite 0-1 sequences but it is OK to think of the usual Lebesgue measure on where e.g.
denotes the set of primes, ignoring the clash between finite sets like and their co-finite counterparts like that map to the same dyadic rational number.
For classes enjoying certain closure properties, equals . The dot means we have -machines each of whose random branches ends with one query to the given language in , whose answer becomes the result of that branch. For instance, equals the Arthur-Merlin class . Now if for every , then by standard 0-1 laws the measure of putting is zero. Then by the fact that a countable union of sets of measure zero has measure zero, the hierarchy is infinite for a random oracle. Hence its collapse for a random oracle implies for some . This in turn would collapse the hierarchy to without oracle, much as collapses it to .
The point in is that random oracles furnish random bits for the computations. The random oracles could be richer in that exponentially many poly-length computations could use exponentially many random oracle bits in-toto. But the aforementioned closure properties together with probability amplification bridge the difference to machines using polynomially many independent bits when is a hierarchy level.
The point in the new lower bound is that the random oracle bits connect instead to distributions over the input space in an average-case argument. This connection is expressed well in their paper.
The New Idea
Sometimes a new idea comes from a new mathematical object, but other times it comes from a new way of applying and controlling a known object. Leslie Valiant in the late 1970s introduced the kind of projection that can do the following to each variable in a Boolean or numerical formula:
- Substitute by a constant value;
- Leave alone; or
- Substitute by a variable already occurring in the formula.
An equivalent rule to the last is that you can rename two variables to be the same variable . The substitution applies simultaneously to every occurrence of a variable. By undoing this rule one can convert every formula with occurrences of variables into a formula with different variables each occurring once. This is called a read-once formula and is unique up to permuting or renaming variables, so every is a projection of a read-once formula. The formulas targeted in their proof are already read-once, so the game becomes how to analyze the formulas that arise as projections of .
When only the first two rules are applied, is a restriction of . Håstad’s 1986 proof technique analyzed random restrictions obtained by independently leaving each variable alone with some probability and then assigning a random 0-or-1 value to each variable not left alone. Restrictions applied to a read-once formula not only preserve its structure but also preserve read-onceness, which keeps the behavior of different parts of the formula independent. This independence is sacrificed by using projections, which could “entangle” different parts of the formula and thereby worsen bias.
The proof technique both then and now works by reductio ad absurdum on the depth . is always a monotone alternating – formula of a kind introduced by Mike Sipser with at the inputs, so the output gate is if is even, if odd. It is “tuned” by giving the fan-in at leach level , , so . Håstad had shown that with the right combination of and , a random restriction makes equivalent to a formula with a large enough expected number of variables that embeds the relevant structure of . A circuit of small enough size and depth , however, with high probability gets beaten down by the restrictions into a simpler form on a pace that cannot be sustained for keeping up with .
The reductio can then continue with , until an obvious contradiction is reached on going from to with the circuit that results. One can with more care actually structure the proof as a standard induction going from lower bounds for to the desired ones for , but the mechanism is the same either way.
For an average-case argument one also needs to preserve the balance between arguments giving and those giving . Otherwise the trivial circuit that always says “no” or its partner that always says “yes” already has more advantage than the desired conclusion allows. This is where one might first think of the third projection rule as counter-productive. By identifying and , any “bias” in might be propagated to more parts of the formula. However, this also intuitively creates greater sensitivity in terms of influence as defined above. If is sensitive, then letting with bit flipped, we have that and balance each other.
Rossman, Servedio, and Tan craft the argument, using a slightly different “tuning” of from Håstad’s, so that the benefits carry the day: retains its structure and balance and hardness but the prospective smaller-depth circuit “collapses.” As they note, random projections had been used in a 2001 paper by Russell Impagliazzo and Nathan Segerlind for a proof complexity lower bound, but apparently had not been applied so bluntly to circuits.
Their projections first determine blocks of variables that get mapped to the same new variable for different . Then they can be specified by a distribution over restrictions, thus simplifying the randomness analysis. If the circuit has a gate that accesses some and some , then after the projection the gate accesses both and and hence can be collapsed to (if the gate is AND) or (if OR). Another difference from Håstad’s technique is that they need to adjust the projection probability adaptively depending on outcomes at previous depths. This is the point where we say to refer to their 59-page paper for details, but they also have excellent further explanations in sections 4 and 7.
Dick and I are excited because we have thought about inductive lower bound arguments whose base cases involve read-once formulas and whose inductive steps are projections. In a post with Dick three years ago I described one case that may retain interest even though the statement it was trying to prove turned out to be false.
My planned attack began with a “niceness” theorem for functions having a funny kind of read-once arithmetical formula that has powering as a primitive operation—for instance, counts as one not three occurrences of the variables in contrast to —but does not have an additive constant in any subterm. I proved that the first partial derivatives form a Gröbner basis under any admissible monomial ordering. If one allows additive constants, then the partials still bound such a Gröbner basis. The argument is less simple than one might expect and does not obviously extend to higher derivatives—at least not obviously to me, as I’ve sat on revisions of a paper for over a decade while trying to prove that. But the theorem is still good enough to use as a base case and ask:
How fast can various complexity measures associated to Gröbner bases grow under successive applications of (random) projections?
My computer runs back then were sobering: even after just a few projections in some targeted cases the answer was, mighty fast. And as I related in the post, the desired assertion about a certain “monomial complexity” measure was refuted. But there are other formulations and conjectures and measures to try, and this new result in the Boolean case may give encouragement to try them. There is also the difference that in my case the read-once formula was “easy” and was derived from a given function we are trying to prove “hard” by undoing projections, whereas in the new case the functions are read-once but are the ones we are showing hard for lower-depth circuits. So perhaps this all runs in the opposite direction—but still the prospect of new ways to control projections is pretty cool.
What new results may come out of this breakthrough lower bound technique?