The buck stops here—on a blog, that is

Stasys Jukna has written a comprehensive book on Boolean circuit complexity, called Boolean Function Complexity: Advances and Frontiers. It includes a discussion of Mike Fischer’s Theorem on negations, which we recently re-gifted.

Today Ken and I would like to fill in some missing details to Mike’s famous result.

Recall the theorem says:

Theorem 1 Let ${b(n) = \lceil \log(n+1) \rceil}$. Then if a Boolean function can be computed by a circuit of size ${S}$ over the basis ${\{\vee,\wedge,\neg\}}$ then it can be computed by a circuit of size ${2S + n^{O(1)}}$ over the same basis and only using at most ${b(n)}$ negations.

There are a number of places where the “proof” of this theorem is given. In Fischer’s original paper, in more recent improvements, and in Jukna’s book. All say essentially: “we leave to the reader the remaining details.” In the book the details are sketched and left as an exercise, with a long hint. We have discussed this before—see here.

## Proving and Using

There is nothing wrong with leaving out details, but when everyone seems to do that it can cause a problem. I looked at the proof sketches and thought for a bit—okay a more like a byte—that there might even be an error. I was worried there was a mistake in the proof. There is none. The proof is just fine. I was wrong.

But there is some reason to be concerned. For all the beauty of Fischer’s Theorem, it does not seem to have been used to prove something else. I would argue that the real way we get confidence in mathematical results is not by checking their proofs, but by a social process. This process can be improving the main result, which has been done in Fischer’s case. However, these improvements actually affect mostly other parts of the proof, but still leave some details unexplained.

The best way to avoid being concerned and gain confidence in a theorem is to use it to prove results. Who does not believe that numbers modulo a prime form a field? This has been used over and over. A theorem that is used to prove other theorems is much more likely to be correct. If it has a bug and is wrong, there is a kind of strange-attractor phenomena that tends to lead to a contradiction. Or if not an out-and-out contradiction, it may at least lead to a result that is so surprising as to make one doubt the original theorem.

An example of a theorem where maybe only a couple dozen people have fully vetted the proof, but the result is used all the time, is the ${O(\log n)}$-depth, ${O(n \log n)}$-size sorting network of comparison gates designed by Miklós Ajtai, János Komlós, and Endre Szemerédi (AKS). A comparison gate has two input values ${x_1,x_2}$ and gives two output values, ${y_1 = \min\{x_1,x_2\}}$ and ${y_2 = \max\{x_1,x_2\}}$. The ${O}$ hides a big constant—the network is galactic—but its improvement over Ken Batcher’s ${O(\log^2)}$-depth betwork was a boon to studying low-depth circuit complexity. If it were wrong, we’d expect to have seen some unbelievable circuit results by now.

## Fischer’s Proof

Let me try and explain the proof and give you all—well most—of the details, and I will try not to leave anything to the reader. As we recently posted, it suffices to invert the input string ${w = w_1 w_2 \cdots w_n}$ so that we have also the sequence ${(\bar{w}_1,\bar{w}_2,\dots ,\bar{w}_n)}$ using only ${b(n)}$ negations. Then we can compute ${f(w)}$ by a monotone circuit of those ${2n}$ values, at worst doubling the size from ${S}$ to ${2S}$, and in a sense given at the end of this post we can even do a little better.

The first step is to sort the input bits. This can be done by simulating comparison gates without any negations, since when ${x_1,x_2}$ are bits, ${y_1}$ is the AND and ${y_2}$ is the OR. If we care strongly about low depth we can use the AKS network for this, but we could also use Batcher’s. Or whatever.

The second part of the proof, which is critical for us since it uses the negations, is the following: Let ${x_{1},\dots x_{n}}$ be the ${n}$ inputs sorted so that

$\displaystyle x_{1} \le \cdots \le x_{n}.$

Just to add the obvious, which is not always obvious to all, the list must look like

$\displaystyle 0 \underbrace{\dots 0}_{k} \ \underbrace{1 \dots 1}_{m}$

where ${n = k+m}$. The goal, given these sorted values, is to construct the vector

$\displaystyle y_{1},\dots,y_{n}$

so that each ${y_{i}}$ is equal to ${\neg x_{i}}$.

The third and final part is to work the initial sorting backwards so that the outputs ${ y_{1},\dots,y_{n}}$ get routed to their correct locations, corresponding to the ${w_i}$‘s they negate. For those who care about reducing the size of the network, this is where much of the research action is. Fischer’s original idea, used also by Jukna, uses a trick, on pain of needing the quadratic size stated above.

The trick’s idea is to re-interpret the sorted bit ${x_{n-k+1}}$ as telling whether ${w}$ has at least ${k}$ 1’s—call this bit’s value ${t(k)}$. Now for each ${i}$, ${1 \leq i \leq n}$, repeat the initial sorting step on the string ${w}$ minus bit ${w_i}$, and say that the results give values ${t_i(k)}$. All of these ${n^2}$ values are obtained using monotone gates. The trick itself is that for all ${i}$,

$\displaystyle \neg w_i = \bigwedge_{k=1}^n (\neg t(k) \vee t_i(k)).$

The point is that if ${w_i = 0}$, then bit ${i}$ never makes a difference to any “threshold” ${k}$, so ${t(k) = t_i(k)}$ for all ${k}$, so one of ${\neg t(k)}$ and ${t_i(k)}$ is always true, so the big AND gives ${1}$. Whereas if ${w_i = 1}$ then there is some ${k}$ for which ${t(k)}$ is true but ${t_i(k)}$ isn’t, and the big AND gives ${0}$.

Thus the bits ${y_{1},\dots,y_{n}}$ are really supplying the values ${\neg t(1),\dots,\neg t(n)}$ needed for this trick to work. Later authors have had other sorting-based ideas that improve the size, but the second part where the negations arise is the same. With these full details digested, we can focus on this part.

## The Second Part

The claim is that for such sorted values it is possible to construct the vector

$\displaystyle y_{1},\dots,y_{n}$

so that each ${y_{i}}$ is equal to ${\neg x_{i}}$, by using a polynomial size circuit having only ${b(n)}$ negations. Let ${F_{n}(x)=y}$ be this function.

Let’s look at the intuition why this should be true. It is really just a simple divide-and-conquer recursion: one negation allows us to reduce the problem to one of half the size. This clearly yields a logarithmic bound. As usual we will assume that ${n}$ is a power of ${2}$. Look at the middle bit ${x_{m}}$ where ${m=n/2}$. There are two cases:

${\bullet }$ In this case ${x_{m}=0}$. Then we get that

$\displaystyle x_{1}=x_{2}=\cdots=x_{m}=0.$

Thus,

$\displaystyle F_{n}(x) = 0^{m}F_{n/2}(x_{m+1},\dots,x_{n}).$

${\bullet }$ In this case ${x_{m}=1}$. Then we get that

$\displaystyle x_{m}=x_{m+1}=\cdots=x_{n}=1.$

Thus,

$\displaystyle F_{n}(x) = F_{n/2}(x_{1},\dots,x_{m})1^{m}.$

So what is the difficulty? In pseudo-code we are doing the following:

if ${n=1}$ then

$\displaystyle \mathbf{return} \ \neg x_{1}.$

else if ${x_{m}=0}$ then

$\displaystyle \mathbf{return} \ 0^{m}F_{n/2}(x_{m+1},\dots,x_{n})$

else

$\displaystyle \mathbf{return} \ F_{n/2}(x_{1},\dots,x_{m})1^{m}.$

Well this would be just fine if we were using a programming language, but we are using circuits. They do not allow the full flexible array of constructs—at least not in the obvious way—that programming languages do. So this is the dirty detail that is “left to …”

The issue is that we must use circuits to do two things: (i) the recursive call on two different sets of variables based on the value of ${x_{m}}$; and (ii) the output of two different return values, again based on the value of ${x_{m}}$. All this must happen without incurring the cost of an extra negation.

Here is how we do this. Let ${t=x_{m}}$. We will have access to the values of ${t}$ and ${\bar{t}=\neg t}$. We will re-use these values many times, but of course we can do this all with one negation. Once ${t}$ and ${\bar{t}}$ are computed by the circuit, we can by fan-out use the value as many times as we wish. Of course we want to keep the size of the circuit polynomial, but that will follow.

The first problem is how to do two different calls to the circuit ${F_{n/2}}$ without incurring extra negations. If we naively just had a circuit for each call, we would wind up with a linear number of negations—we must avoid two recursive calls. So define new variables ${u_{1},\dots,u_{m}}$ as follows:

$\displaystyle u_{k} = (\bar{t} \wedge x_{m+k}) \vee (t \wedge x_{k}).$

Then we compute

$\displaystyle F_{n/2}(u) = w.$

Note that ${u}$ is set up so that it is one of the two possible calls, and we use the value of ${t}$ to decide which one to call. This uses no additional negations, which would have been terrible.

The second problem is that we need to output different values based on the returned answer ${w}$ and the value of ${t}$. Recall that if ${t=0}$ we want the output to be

$\displaystyle 0^{m}F_{n/2}(x_{m+1},\dots,x_{n}) = 0^{m}w,$

and if ${t=1}$ we want it to be

$\displaystyle F_{n/2}(x_{1},\dots,x_{m})1^{m} = w1^{m}.$

The way to do this in a circuit is as follows:

$\displaystyle \begin{array}{rcl} r &=& 0^{m}w \\ s &=& w1^{m}. \end{array}$

Then the ${k}$-th bit of the output is

$\displaystyle (\bar{t}\wedge r_{k}) \vee (t \wedge s_{k}).$

Done.

## Open Problems

Ken and I hope this helped you feel comfortable with the proof. I know that I feel better about it now. In a sense this is all “obvious” to strong circuit programmers, just as certain other tasks are “obvious” to strong Python—or whatever your favorite language is—programmers. There are always a set of idioms that they know and use daily in their programming. These idioms are so well encoded into their brains that they do not see any reason to supply details. For those of us who are not strong circuit programmers, that includes me, spelling out the details helps.

The real open problem is: can we use Fischer’s Theorem to prove some things that are new and interesting? We wonder.

3 Comments leave one →
1. January 12, 2014 5:26 pm

At the very least, this was a greatly entertaining post. Circuit programming proofs such as this one always have a nice *old-school* touch that is really refreshing in our times. A great Sunday evening treat.

2. Alex Lopez-Ortiz permalink
January 13, 2014 12:27 am

I’m glad you’ve taken the time to fleshen this out. In my academic career I’ve hesitated upon reading rather well known original proofs exactly four times. In two of them the proofs were amended (unprompted) by the authors within a month of publication, the third one the author admitted to the error via email and for the last one I was never confident enough to contact the famous author, since my grasp of the subject was tenous.

I was glad to hear very recently another well known researcher state similar doubts on this last and declare that the result stood, in his opinion, only because there was a second, much simpler proof that could be verified.

So it does happen, though in all cases the proof was eventually amended. Note that I’m not taking about typos. In all cases the errors were in crucial lemmas and were unfixable. The respective arguments in the end took very different paths which avoided the original lemmas altogether.

3. January 15, 2014 7:26 pm

SJ’s latest book on circuit complexity is imho one of the best ever written, its great you’re highlighting it, and its full of wondrous results spanning decades. there are few others in the world who could have written it. his style is also quite readable given such a difficult/technical topic. worthy of emulation. imho circuit complexity will eventually slay the P vs NP dragon. but we’re rapidly closing in on a half-century of work on it…. its truly an epic problem…. one is reminded of einsteins dictum that a problem cannot be solved on the same level of consciousness that created it….