You can fool all the functions some of the time, and some of the functions all the time, but you cannot fool all the functions all the time, or even most of the time.

Eric Blais and Li-Yang Tan are both complexity theorists. Eric just finished his PhD and is currently a Simons Postdoctoral Fellow at MIT, while Li-Yang is about to finish his doctorate at Columbia.

Today Ken and I wish to talk about their recent paper on approximations to Boolean functions.

This paper entitled “Approximating Boolean functions with depth-2 circuits” just appeared at CCC 2013. We both were unable to attend the meeting, but in reading the paper I am quite excited about their result. I think it could have many interesting applications.

A circuit with ${s}$-many wires can be described as a graph by a binary string of length ${\sim s\log s}$ by listing the gates in order of ${\log s}$-sized labels and giving the targets of each wire out of each gate. Since there are ${2^{2^n}}$ Boolean functions ${f}$ on ${n}$ variables, one for each binary string of length ${2^n}$, it follows that many ${f}$ require size ${s}$ such that ${2s\log_2 s \geq 2^n - O(n)}$, so ${s = \Omega(2^n / n)}$. If there are ${m}$ arguments ${x}$ such that ${f(x) = 1}$, we can create a circuit in disjunctive normal form (DNF) by using a gadget of up to ${1 + n}$ NOT and AND gates to distinguish each such ${x}$, and running a wire from each gadget to a big OR with ${m}$ inputs. This has size about ${mn}$, which becomes ${n2^{n-1}}$ in the case of the parity function. The issue in their paper is whether strictly exponential bounds can be beaten when the function ${f}$ need only be computed approximately.

Some History

You may skip this history if you like. In ancient times these functions were called switching functions—indeed even FOCS bore the name “Switching and Automata Theory.” Later they became Boolean functions. I am not sure when things switched—OK, bad pun.

The reason for the initial name was they were studied for their importance to the phone company, when there was one phone company. Such functions were used to control phone calls: even though in those days phone lines themselves were analog, the switching of calls was digital. You either connected Alice to Bob or Alice to Eve or Alice to …, even if their conversation was really just over a wire that carried their voice as an analog signal.

In these early times the idea of “${n}$“—as in ${n}$ people, ${n}$ gates, ${n}$ variables…—had not been invented yet. By this I mean that the theoretical interest of researchers was occupied even by Boolean—I mean switching—functions of a few variables. For example, Donald Davies studied the cost of implementing three-variable functions in his paper, Switching Functions of Three Variables in 1957. Such results were non-trivial, in some ways harder than ${O}$-notation style work of today, owing to the use of ingenious special-purpose designs and encodings.

Davies worked at the National Physical Laboratory in the 1940’s, and is said to have “spotted mistakes in Turing’s seminal 1936 paper ‘On Computable Numbers,’ much to Turing’s annoyance.” Rest assured that Alan Turing’s main theorem on the halting problem is just fine.

Let’s leave switching functions, and move on to ${n}$-variable Boolean functions and the work of Blais and Tan.

The General Approximation Result

Suppose that you wish to compute a Boolean function, but it is very complex in some measure. A natural idea is to try to approximate it. Since the function returns either ${0}$ or ${1}$ it seems that an approximation in the usual sense is useless. What would it mean to answer ${0.9}$? Actually as I write this it could be a type of confidence level. If you say ${0.9}$ it means that you are pretty sure that the answer is ${1}$ but do not completely rule out that the answer is ${0}$. Hmmmm. This is not unlike what the weather forecasters do—tomorrow has a ${90\%}$ chance of rain. Perhaps another day we should discuss this type of approximation.

Well Blais and Tan use another notion of approximation—a quite standard one. Suppose that ${f(x)}$ is a Boolean function. For some small ${\epsilon>0}$ say that the function ${g(x)}$ ${\epsilon}$-approximates ${f(x)}$ provided

$\displaystyle f(x) = g(x)$

for all but an ${\epsilon}$ fraction of the inputs. If the function has ${n}$ inputs, then this means that it must be right on

$\displaystyle (1 - \epsilon) 2^n$

of the inputs. Note that if ${g_0(x)}$ uses the first sense of approximation with real values above, then the function ${g(x)}$ obtained by rounding ${g_0(x)}$ may well meet this condition.

Clearly some functions are easy to approximate. The ${n}$-ary AND function is trivial: just say no all the time and you are right for all but one input. Also it would seem clear that more interesting functions might be quite a bit harder to approximate. The parity function comes to mind—a function that is impossible to compute exactly without reading all the inputs. But there is a surprise here in Blais and Tan’s paper, which I will explain shortly. Their first main result is:

Theorem 1 For every ${\epsilon > 0}$ there is a constant ${c_\epsilon}$ such that for all n and each n-ary Boolean function ${f}$, there is a DNF ${g}$ of size at most ${c_\epsilon 2^n/\log n}$ that ε-approximates ${f}$.

A little note on asymptotic notation: The way they actually state their theorem is,

“Every Boolean function can be ${\epsilon}$-approximated by a DNF of size ${O_{\epsilon}(2^n/\log n)}$.”

One needs a certain skill in translating asymptotics rigorously. Some things are always clear: ${n}$ is the number of variables regarded as the input size, the base of ${\log n}$ doesn’t matter since it’s a constant that gets absorbed into the “${O}$” (but is base 2 by default), and the constant in the ${O_{\epsilon}}$ depends on ${\epsilon}$ but not on ${n}$. What’s less clear is that asymptotics are being used over all ${n}$ but “Boolean function” means just one ${n}$. The statement could mean a function defined on all of ${\{0,1\}^*}$ by a sequence of Boolean functions for each ${n}$, and then the constant in the ${O_{\epsilon}}$ could depend on the chosen sequence, but it doesn’t—the correct version is as above. Their proof uses ${n \geq 10/\epsilon}$, but the statement is valid for all ${n}$; for ${n < 10/\epsilon}$—or whenever ${\epsilon = O(1/n)}$—their upper bounds lose their power and are met by their lower bounds which then become ${\Omega(2^n)}$.

The cool point in any event is that for any fixed ${\epsilon}$, the quantity ${c_{\epsilon}/\log n}$ goes to 0 with ${n}$, so approximation beats strictly exponential size.

Width and Fooling

The part of their results Ken and I like best, however, has to do with the idea of the width of a DNF. This is defined to be the maximum arity of an AND gate in any of the DNF terms, and intuitively means the number of input variables that get read in any one place. For the parity example above the width is ${n,}$ which is the maximum of course. But for the majority function, it suffices to have an AND gate of size ${m = \lfloor n/2 \rfloor + 1}$ for each ${m}$-subset of the variables, so the width is about ${n/2}$.

Now especially with parity, it is hard to conceive being able to compute these functions with DNFs of any lower width, or even approximate them. If a term leaves one variable ${x_i}$ unread, for parity it means the answer could be anything, fifty-fifty, and it seems that term must be useless. But the magic combination of approximation, the probabilistic method, Fourier analysis, and covering-code constructions gives their other main theorems:

Theorem 2 For every ${\epsilon > 0}$ there is a constant ${c_\epsilon < 1}$ such that for all ${n}$-ary Boolean functions ${f}$, there is a DNF ${g}$ of width at most ${c_\epsilon n}$ that ${\epsilon}$-approximates ${f}$.

In the case of parity, we can build ${g}$ with ${c_{\epsilon} = (1 - 2\epsilon)}$ and of size ${2^{(1 - 2\epsilon)n}}$. These bounds are nearly tight, and moreover the construction gives one-sided error: ${g(x) = 1}$ for all ${x}$ of odd parity.

Ken and I would have had the opposite expectation, that small-width DNFs would be so weak that they can be collectively “fooled.” In cryptographic complexity, for a function ${f}$ to fool a class ${\cal G}$ of weaker functions means that no ${g \in {\cal G}}$ can gain any advantage in computing values of ${f}$—that from the perspective of functions in ${\cal G}$, the function ${f(x)}$ looks random, unpredictable, and importantly is not approximable in Blais and Tan’s sense. But they prove the opposite: every Boolean function can be approximated by DNF’s of smaller width—and for parity still sub-exponential size. Hence “all the smaller-width DNF’s cannot be fooled most of the time.” At least we think Abraham Lincoln would have agreed.

They have other results for Boolean functions that are monotone up-or-down in each of the variables, called unate functions, and polynomial threshold functions which have the form ${f(x) = \text{sign}(p(x))}$ for some real-valued polynomial ${p(x_1,\dots,x_n)}$. Here is a table of upper and lower bounds from their paper:

The paper includes full details of proofs, which are interesting for their mix of randomization, analysis, and covering-code constructions.

Open Problems

What other applications can be found for their constructions, and for small width?

Yesterday, today, and tomorrow are collectively the 150th anniversary of the Battle of Gettysburg. Here is the NY Times story on 11/20/63 covering the speech Lincoln had given at Gettysburg the previous day.

1. July 2, 2013 11:34 pm

And what would you call a locally linear approximation to a Boolean function?

2. July 3, 2013 1:55 pm

thx for the great tip as usual. am going to look over all this more closely. however, suspect that an approximation-like argument may be the key to P vs NP. here is a problem that I posted somewhat recently that seems to be very close to the questions studied above. it focuses on “width” of the gates ie # of input variables. this is a key construction in Razborov’s proofs and as utilized in later derivative work.
k-cnf to k-dnf conversion to minimize errors

3. July 3, 2013 10:59 pm

Thanks for the post! What about circuits of arbitrary depth? Are there interesting approximation results there? Or even simpler: what are the smallest boolean circuits of any depth or form that encode the parity function?

4. July 4, 2013 9:22 am

more thoughts after some further scan of the paper. it would seem that a very important question might be looking at the varying hardness of a “width-k” function, this is the direction my tcs.se question goes in. suppose that a function is “width-k” ie every minterm or maxterm is max k width. but some are still harder or easier than others based on # of min/maxterms required. wild conjecture, could it possibly be something like that hardness in that sense is related to the “smallest” PTF (in the sense of smallest max degree) that equals the function? think that focusing especially on the behavior of PTFs wrt approximation will yield much new interesting & maybe key theory.

another point, there seems to be a natural connection between hypergraphs and CNF/DNFs that could yield useful bridge theorems for results moving in both directions. in particular the area of hypergraph decompositions appears to be closely relevant and yet in its early days.

5. July 13, 2013 10:32 pm

Someone might enjoy looking at the complexity of Boolean functions as expressed in terms of minimal negation operators. Such expressions have graph-theoretic representations in a species of cactus graphs called painted and rooted cacti, as illustrated here:

I know I once made a table of more or less canonical cactus expressions for the 256 Boolean functions on 3 variables, but I will have to look for that later.