Structuring his famous proof to build more on it

Ryan Williams is a deep theorist who is known among other things for his frontier-setting lower bounds against ${\mathsf{ACC}}$ circuits. There is also a professional football player named Ryan Williams who played for Virginia Tech in the ACC college conference. The football Ryan had a sub-par showing at the 2011 NFL Combine, which is a series of exercises use by pro scouts to evaluate players, but still became an early second-round pick in the 2011 NFL Draft. Unfortunately he was injured for the entire 2011 NFL season and is out for the rest of this season too. Happily injuries are less of a concern for our Ryan Williams, who married a Virginia.

Today we make a “combine” out of Ryan’s famous ${\mathsf{ACC}}$ lower bound proof. That is, we present it as a series of separate tasks that let us scout opportunities to improve the theorems.

Of course the proof is completely his—this is nothing more than a rearrangement to emphasize different ways that computational problems related to succinctness are employed. We tried to explain the proof two years ago right after it came out, but were not completely successful. Here we will try again with this particular theorem in the paper:

Theorem 1. The complexity class ${\mathsf{E}^{\mathsf{NP}}}$ is not contained in ${\mathsf{ACC}}$.

Ryan proves further lower bounds in his paper, but focusing on this one enables us to skirt some technical issues and make the presentation mostly self-contained. We will try to see what properties of ${\mathsf{ACC}}$ are exploited, and what hypotheses are needed, in order to make this apply to a general circuit class ${{\cal C}}$.

## Circuit Descriptions

Our framework builds on the notion of circuits that describe a Boolean string. For a circuit ${C}$ with inputs ${x}$, let ${C(x)}$ be the result of evaluating ${C}$ on the input ${x}$. Also we identify strings of length ${n}$ with the integers ${0,1,2,\dots,2^{n}-1}.$

Definition 2. Say that circuit ${D}$ describes the Boolean string ${A}$ provided

$\displaystyle A = D(0)D(1)\cdots D(2^{n}-1).$

We will use ${D \rightarrow A}$ to denote this.

The similar capital letter ${A}$ is deliberate. ${A}$ can be an encoding ${E(C)}$ of another circuit ${C}$, or a computation by a circuit, or an assignment to a Boolean formula on ${2^n}$ variables.

We encode general Boolean circuits as directed acyclic graphs using gate types AND, OR, NOT, and INPUT of fan-in at most two. The size of such a circuit is the number of gates. An ${\mathsf{ACC}}$ circuit allows unbounded fan-in gates AND, OR, and MOD${(m)}$ for some fixed ${m}$ as usual. Its size is the number of wires, and its depth is a constant independent of the number ${n}$ of input gates.

We allow sizes that are greater than polynomial—indeed we write ${q(n)}$ to mean a quasi-polynomial function, namely one bounded by ${2^{p(\log n)}}$ for some polynomial ${p}$. Note that a circuit of size ${s}$ needs only ${O(s \log s)}$ bits to encode. Extra log factors will affect some (sub-)exponential running times, but will not matter to the circuit size bounds. We write ${q\mathsf{ACC}}$ to denote ${\mathsf{ACC}}$ circuits of size ${q(n)}$. Ryan’s paper proves bounds against circuits of size ${q(n)}$ where ${q(n)}$ is a higher function; we keep ${q(n)}$‘s quasi-polynomial to save on details, but hope the general ${q}$ notation helps readers who wish to proceed to the stronger results.

## Circuit Problems and the Key Property

For a moment think of ${D}$ as any kind of circuit or Turing machine, or even a non-deterministic random-access machine. Given ${D}$ and a length ${n}$, consider these four problems:

1. Evaluate: Given an input ${x}$ of length ${n}$, what is ${D(x)}$?

2. Evaluate-all: Evaluate ${D(x)}$ on all inputs ${x}$ of length ${n}$.

3. Satisfiability: Is there an ${x}$ of length ${n}$ such that ${D(x) = 1}$?

4. Equivalence: Given circuits ${D,E}$, is there (not) an input ${x}$ of length ${n}$ such that ${D(x) \neq E(x)}$?

If ${D}$ is a circuit of encoding size ${t = O(s\log s)}$, or a machine running in time ${t}$, then the notional times to solve these problems deterministically are ${O(t)}$ for the first, and ${O(2^n t)}$ for the others. If ${D}$ (and ${E}$) are a non-deterministic machine(s) running in time ${t}$, then there exists a small witness for satisfiability, but not necessarily for inequivalence. An important detail is that even if ${D}$ is a non-deterministic RAM running in time ${t}$, there is a non-deterministic TM that simulates it with only ${O(\log t)}$ overhead (by “reduction to sorting”). This implies we can get a witness of size ${O(t\;\mathsf{polylog}\;t)}$, and also allows us to apply an efficient Cook-Levin construction that works for NTM’s.

The key property of the circuit class ${q\mathsf{ACC}}$ is that it can beat the notional time for evaluate-all, in a way that enables even better times for the last two problems. We have talked in general before about new lower bounds flowing from new upper bounds. The general framework is:

1. Prove a better-than-notional upper bound for the target class ${{\cal C}}$ on one of the problems.
2. Make a hypothesis ${H}$ that would amplify the power of ${{\cal C}}$.

3. Show this implies a better-than-notional speedup that is actually impossible.

4. Conclude ${\neg H}$ as a theorem, from which lower bounds follow.

Ryan’s proof translates a tiny savings for satisfiability of ${q\mathsf{ACC}}$ circuits into a bigger savings for exponential-time NTMs, one that violates the nondeterministic time hierarchy theorem.

## The ACC Algorithms

First we state what is called the “fast evaluation lemma”:

Lemma 3. For any quasi-polynomial ${q(n)}$, we can build an algorithm that evaluates a given ${n}$-input ${q\mathsf{ACC}}$-circuit ${C}$ on all ${2^n}$ inputs in time ${2^n n^{O(1)}}$.

There are three proofs of this now: a matrix based one, a dynamic programming one, and a newer one based on a simple divide and conquer method. For now just take this as true. Now we see how this can be amplified to create a better time for satisfiability and equivalence. We call the following the fast ${\mathsf{SAT}}$ algorithm for ${q\mathsf{ACC}}$ circuits.

Theorem 4. For any ${\alpha > 0}$ and quasi-polynomial function ${q(n)}$, we can build an algorithm that decides whether a given ${n}$-input ${q\mathsf{ACC}}$-circuit ${C}$ is satisfiable in time ${ 2^{n}/n^{\alpha}.}$

Proof. Select ${\ell}$ of the inputs of the circuit ${C}$. Form all the circuits ${C_{0},\dots,C_{2^{\ell}-1}}$ by setting the ${\ell}$ inputs to all possible values. Then form the new circuit ${D}$ that is the OR of all these circuits. Clearly ${D}$ is satisfiable if and only if ${C}$ is. Further ${D}$ is a ${q\mathsf{ACC}}$ circuit. Now evaluate ${C}$ on all the ${n-\ell}$ inputs. By the fast evaluation lemma this takes time

$\displaystyle 2^{n-\ell}n^{O(1)}.$

It suffices to make ${\ell}$ a large enough multiple of ${\log n}$ to achieve time ${2^{n}/n^{\alpha}}$. $\Box$

Note that there is some slack: we could take ${\ell(n)}$ to be a polynomial in ${\log n}$, and ${D}$ would still have quasi-polynomial size. We are abusing notation in whether ${q\mathsf{ACC}}$ means quasi-polynomial in general or a particulat ${q(n)}$ size, but the usage should be clear in context.

The additional closure properties needed of a circuit class ${{\cal C}}$, besides the key speedup, are closure under exclusive-or as used here, closure under composition as used later, and a mix of composition and Boolean operations needed for the “replacement lemma” below. It is not clear whether we need to make the depth ${d}$ of the composed ${q\mathsf{ACC}}$ circuits explicit in the framework. The particular depth values matter most to quantifying optimal bounds in Ryan’s proof. If the depth needs to be stated up front, at least it applies when trying to prove lower bounds against ${\mathsf{TC^0}}$ in place of ${\mathsf{ACC}}$.

## The Hypothesis

The key hypothesis we state for sake of contradiction is:

Hypothesis H. The complexity class ${\mathsf{E}^{\mathsf{NP}}}$ has ${q\mathsf{ACC}}$ circuits.

Note this also implies ${\mathsf{P}}$ has ${q\mathsf{ACC}}$ circuits, which we call hypothesis H*.

Hypothesis H is stronger than the actual assumption used in Ryan’s paper. The point here is that this simpler assumption will make some of the details less technical, while the ideas and strategy are exactly the same as in the full version. The paper also proves lower bounds against ${\mathsf{NEXP}}$, which is not known to contain ${\mathsf{E^{NP}}}$. For this an “easy-witness” property that follows from ${\mathsf{NEXP} \subseteq \mathsf{P/poly}}$ is used, but this time we will avoid it.

Having explained why ${q\mathsf{ACC}}$ is the target for the lower bound, why do we have ${\mathsf{E}^{\mathsf{NP}}}$ at the upper end? Part of this comes from the following meta-comment about the general approach: In many parts of the proof circuits of type ${{\cal C}}$ have to be constructed by an algorithm. In many cases there is no way to deterministically construct the circuits—at least none that are known. The approach used is based on a three-step method:

1. Show via the above assumption that the required circuit ${D}$ must exist. This only requires that the computation that the circuit performs can be done in the class ${\mathsf{E}^{\mathsf{NP}}}$.

2. Guess the required circuit ${D}$.

3. Then verify that the guessed circuit ${D}$ is correct. This is exactly where the speedups on equivalence and satisfiability are used.

The special reason for ${\mathsf{E}^{\mathsf{NP}}}$ is that the construction builds exponentially large strings that are used as oracle queries to an ${\mathsf{NP}}$ oracle. The hypotheses ${H}$ and ${H^*}$ allow us to replace all this by ${q\mathsf{ACC}}$ circuits.

## Replacement Lemma

Given a language ${L}$, let ${L^{=n}}$ stand for the ${2^n}$-length bit string of membership results on ${\{0,1\}^n}$. The proof needs only the weaker hypothesis H*, twice.

Lemma 5. Suppose ${\mathsf{P} \subseteq q\mathsf{ACC}}$ and let ${L \in \mathsf{P}}$. Then for any ${\alpha>0}$, we can build a nondeterministic algorithm ${\cal R}$ that runs in time ${2^{n}/n^{\alpha}}$, and on input ${0^n}$ outputs a ${q\mathsf{ACC}}$ circuit ${D_n}$ such that

$\displaystyle D_n \rightarrow L^{=n}.$

Proof. By ${L \in \mathsf{P}}$, the algorithm ${\cal R}$ can generate a uniform sequence ${[C_n]}$ of polynomial-size circuits such that ${C_n \rightarrow L^{=n}}$. The goal is to guess ${D_n}$ and verify ${D_n \equiv C_n}$, so that ${D_n \rightarrow L^{=n}}$. However, we cannot directly appeal to equivalence testing being in ${2^n/n^\alpha}$ time because ${C_n}$ is not a ${q\mathsf{ACC}}$ circuit but a general circuit.

The trick is to guess a ${D}$ that verifies the computation of ${C_n}$ doing evaluate-all on ${\{0,1\}^n}$. This is represented by the language

$\displaystyle L' = \{\langle i,g \rangle: \text{gate~} g \text{~of~} C_n \text{~has value~} 1 \text{~on input~} i\},$

where we think of ${i \in [0,\dots,2^n - 1]}$. Then ${L'}$ is also in ${\mathsf{P}}$, so applying hypothesis ${H^*}$ here implies the ability to guess a ${q\mathsf{ACC}}$-circuit ${D}$ with ${n' = n + \beta\log_n}$ inputs (for some fixed ${\beta}$) such that ${D \rightarrow (L')^{=n'}}$. The point is that we can verify the correctness of ${D}$ locally at all gates, including the final output gate ${g_0}$.

To do this verification, we need to represent the encoding ${E(C_n)}$ of ${C_n}$. We could do this as a single layer of gates computing functions ${u \wedge \neg u}$ where ${E(C_n)}$ has a ${0}$ and ${u \vee \neg u}$ where ${E(C_n)}$ has a ${1}$. However, to illustrate what we can build in ${q\mathsf{ACC}}$, we instead use ${H^*}$ again to guess a ${q\mathsf{ACC}}$-circuit ${B}$ with ${O(\log n)}$ inputs such that ${B \rightarrow E(C)}$. In this case ${B}$ can in fact have polynomial size, and verifying ${B}$ can be done by brute force evaluation on all possible inputs.

The final circuit ${D_n}$ stitches together ${B}$ and ${D}$ and ultimately outputs ${D(0,g_0)\cdots D(2^n - 1, g_0)}$. This evaluate-all task operates within time ${2^n/n^\alpha}$ since ${D_n}$ is a ${q\mathsf{ACC}}$-circuit. It remains to argue that the verification of the guessed circuit ${D}$, in-tandem with the already-verified circuit ${B}$, can be reduced to ${q\mathsf{ACC}}$${\mathsf{SAT}}$ and hence completed in ${2^n/n^\alpha}$ time. For this we have ${{\cal R}}$ build out of ${B}$ and ${D}$ yet one more ${q\mathsf{ACC}}$ circuit ${D'}$ with ${n'' = n + 3\beta\log n}$ inputs that works as follows:

On input ${\langle i,g,g_1,g_2\rangle}$, if ${g_1,g_2}$ are not the inputs to ${g}$ according to ${E(C_n)}$ then reject. If so, make calls ${D(i,g_1)}$ and ${D(i,g_2)}$ to get their values, and compare with the value ${D(i,g)}$. If the values are not correct for the gate ${g}$, then accept, else reject.

Then ${D_n}$ is correct if and only if ${D'}$ is unsatisfiable. Given an exponent ${\alpha'}$ from Theorem 4, this takes time

$\displaystyle \frac{2^{n''}}{(n'')^{\alpha'}} = \frac{2^{n + 3\beta\log n}}{(n+3\beta\log n)^{\alpha'}} < \frac{2^n n^{3\beta}}{(2n)^{\alpha'}} < \frac{2^n}{n^{\alpha' -3 \beta + o(1)}}.$

Taking ${\alpha'}$ great enough, as we are able to do, makes the denominator exceed ${n^{\alpha}}$. Note that ${D'}$ is a Boolean combination of three calls to ${D}$ and three references to ${E(C_n)}$, so it is a ${q\mathsf{ACC}}$ circuit. $\Box$

## Checking Lemma

The remaining tasks translate an NTM running in time ${t}$ into a succinct formula on ${O(t\log t)}$ variables, using a known efficient form of the Cook-Levin construction. The following “checking lemma” helps us verify assignments to such formulas.

Lemma 6. Suppose that ${D_{1} \rightarrow A}$ and ${D_{2} \rightarrow B}$, where ${D_{1}}$ and ${D_{2}}$ are ${q\mathsf{ACC}}$ circuits with ${n}$ inputs. Then for any ${\alpha>0}$, there is a deterministic algorithm that runs in time ${2^{n}/n^{\alpha}}$ and checks whether ${A}$ is a valid assignment to the 3-CNF formula encoded by ${B}$.

Proof. We will construct a new ${q\mathsf{ACC}}$ circuit ${E}$ with ${n}$ inputs. Essentially the circuit “looks for” a clause that is unsatisfied. The circuit ${E}$ on input ${i}$ computes the ${i^{th}}$ clause encoded by ${D_{2}}$. It does this by accessing ${n}$ consecutive bits of the string ${B}$, using ${D_{2}}$ on ${n}$ consecutive locations to do this. If these bits do not encode a clause, the algorithm rejects. If they do encode a clause, it then finds the three variables ${u,v,w}$ used in the clause and whether they are negated. Then it looks up the values of ${u,v,w}$ by using ${D_{1}}$ three times. Finally if the clause is not satisfied it returns accept; otherwise, it returns reject. It is clear this can be done as a ${q\mathsf{ACC}}$ circuit with ${n}$ inputs.

The algorithm then must determine whether or not ${E}$ ever accepts. Note ${E(i)}$ accepts only if ${i}$ “points” to a proper clause in ${B}$ that is not satisfied. Similar to the end of the replacement lemma’s proof, it does this by appeal to the fast ${q\mathsf{ACC}}$${\mathsf{SAT}}$ algorithm. $\Box$

We note that the checking lemma could be extended to check other “local” properties, but this is enough for the main proof.

## Strong Cook-Levin Construction

The following is a known theorem that exploits the locality of the Cook-Levin construction.

Theorem 7. There is a constant ${\alpha>0}$ such that for any language ${L}$ in ${\mathsf{NTIME}[2^{n}]}$ there is a deterministic polynomial time algorithm that given input ${x}$ of length ${n}$ outputs a circuit ${C}$ with at most ${n + \alpha \log n}$ inputs and size ${O(n^{\alpha})}$ such that ${C \rightarrow F}$, where ${F}$ is a 3-CNF formula of size at most ${2^{n}\cdot \mathsf{poly}(n)}$ such that ${x}$ is in ${L}$ if and only if ${F}$ is satisfiable.

With ${t(n) = 2^n}$, the proof takes a time-${t(n)}$ witness predicate ${R(x,y)}$ for ${L}$, converts it into circuits ${E}$ of size ${O(t\log t)}$ that decide ${R}$, and writes a 3-CNF formula ${F}$ in ${O(t\log t)}$ variables whose clauses verify that each gate ${g}$ of ${E}$ has correct output for its inputs.

Corollary 8. Using the notation of the above theorem, whenever ${x}$ is a string of length ${n}$ belonging to ${L}$, there exists a ${q\mathsf{ACC}}$ circuit ${W}$ with at most ${n +\alpha \log n}$ inputs such that ${W \rightarrow A}$, where ${A}$ is the lexicographically first satisfying assignment to ${F}$.

Proof. Construct the formula ${F}$ which is of size at most ${2^{n}\cdot \mathsf{poly}(n)}$. Use the ${\mathsf{NP}}$ oracle to find a satisfying assignment, and use it further to do a binary search that finds the lexicographically first one. Since this is done by an ${\mathsf{E^{NP}}}$ algorithm, Hypothesis ${H}$ implies the existence of equivalent circuits in ${q\mathsf{ACC}}$. $\Box$

To explain a potentially confusing point, note that the proof requires making calls to the ${\mathsf{NP}}$ oracle on exponentially large strings, since the formula ${F}$ is exponentially large. This is allowed in ${\mathsf{E^{NP}}}$.

Assembling The Proof

Now we combine all these parts into the final theorem and proof.

Theorem 9. The class ${\mathsf{E}^{\mathsf{NP}}}$ is not contained in ${q\mathsf{ACC}}$.

Proof. Recall we begin by assuming that ${\mathsf{E}^{\mathsf{NP}}}$ is contained in ${q\mathsf{ACC}}$, which entails that ${\mathsf{P}}$ is also in ${q\mathsf{ACC}.}$

Now suppose we have a language ${L}$ that is accepted by an NTM running in time ${2^{n}}$. We will show that it can be accepted by a NTM that runs in time ${2^{n}/n^{\alpha}}$ for some ${\alpha > 0}$. This contradiction of the nondeterministic time hierarchy theorem completes the proof.

We break the proof into steps. Let ${x}$ be an input.

(1): Construct a polynomial-size general circuit ${C}$ so that ${C \rightarrow F}$ where ${F}$ is a 3-CNF formula that is satisfiable if and only if ${x}$ is in ${L}$. Note ${F}$ is of size ${2^{n}n^{O(1)}}$ and thus ${C}$ has ${n + \beta \log n}$ inputs for some ${\beta > 0}$. Note the existence and the ability to create the circuit in polynomial time are based on the Strong Cook-Levin Theorem 7.

(2): Use our replacement lemma 5 to find a ${q\mathsf{ACC}}$ circuit ${D_n}$ so that ${D_n \rightarrow F}$. This circuit has quasi-polynomial size and ${n + \beta \log n}$ inputs—not necessarily the same ${\beta}$ as for the circuit ${D}$ in the proof of that lemma.

(3): Now guess another ${q\mathsf{ACC}}$ circuit ${E}$ with ${n + \beta \log n}$ inputs and quasi-polynomial size so that ${E \rightarrow A}$ where ${A}$ is satisfying assignment to ${F}$. This uses Corollary 8.

(4): Finally accept if ${A}$ is actually such an assignment. This uses the checking lemma 6.

All of this can be done in the required time, and we have a contradiction. $\Box$

## Open Problems

Can we apply the framework to prove lower bounds against other classes ${{\cal C}}$? Does this help us seek for such ${{\cal C}}$? Can we do it with ${{\cal C} = \mathsf{TC^0}}$ or even ${{\cal C} = \mathsf{NC^1}}$? Can assuming ${\mathsf{\#P} \subseteq \mathsf{NC^1}}$ help with the latter?

1. November 19, 2012 7:03 pm

Thanks, Pip! I think that this way of partitioning up the proof does shed more light.

I believe that a faster “evaluate-all” algorithm for a circuit class should, in principle, imply stronger lower bounds against that circuit class than just NEXP (e.g., EXP). But a different kind of argument will be required; in the present one, it seems we have to use guessing in order to get our hands on the small witness circuit.