On the power of bounded width computations David Barrington is famous for solving a long standing open problem. What I love about his result is that the problem was open for years, his solution is beautiful, and also that he surprised us. Most of us who had worked on the problem guessed the other way: we thought for sure it was false. Again conventional wisdom was wrong. I wonder about other problems.

Bounded Width Computation

There are many restricted models of computations, without these weak models we would have no lower bounds. Proving in a general model–such as unrestricted boolean circuits–that an explicit problem cannot be computed efficiently seems to be completely hopeless. However, once some restriction is placed on the model there is at least hope that an explicit problem has a lower bound. Barrington worked on a restricted model called bounded width computations. While most (many? all?) of the community believed that this model was extremely weak and would yield lower bounds, he shocked us and showed that bounded width computations were quite powerful.

A bounded width computation is a way of computing a boolean function. Suppose that ${x_{1},x_{2},\dots,x_{n}}$ are the inputs to the function. You can think of a bounded width computation as a line of boxes. Each box is associated with some input ${x_{i}}$. The box gets bits from the left, uses these and the input ${x_{i}}$ and decides on its output which goes to the box on its right. Of course the first (left most) box is special and gets a fixed value from the left; the last (right most) box outputs the answer. The size of the computation is the number of boxes or equivalently the length of the line. So far anything can be computed by such devices. To see this just have each box concatenate the left input with their ${x_{i}}$. Then the last box gets ${x_{1}x_{2}\dots x_{n}}$ as a string and so can easily determine the value of any boolean function.

The key is that we restrict the number of bits that can be passed from one box to the next. In particular bounded width computations must have their boxes only receive and send ${O(1)}$ bits independent of the number ${n}$. This is the restriction that makes the model interesting. While the bits that can be passed must stay bounded, the length can grow faster than ${n}$. Therefore, different boxes can use or read the same input ${x_{i}}$. Without this added ability the model would reduce to just a finite state type machine and would be quite weak when coupled with the bounded restriction.

The question that Barrington solved was: exactly what can bounded width computations compute with size at most polynomial? The conventional wisdom at the time was that such a model could not compute, for example, the majority function. My intuition–that was dead wrong–was that in order to compute the majority function the boxes would have to somehow pass the number of ${1}$ bits that are to their left. This of course does not work because this grows as ${\log n}$ and is certainly not constant. I even worked with Ashok Chandra and Merrick Furst and proved a non-linear lower bound on the size of a bounded width computation for majority. While our bound was just barely non-linear we thought that we might be able to improve it and eventually show that majority was hard. By the way this work with Chandra and Furst will be covered soon in another post on communication complexity.

Barrington proved the following beautiful theorem:

Theorem 1 Any boolean formula of size ${S}$ in can be computed by a ${O(1)}$ bounded width computation of size at most ${S^{O(1)}}$.

The theorem shows immediately that we were wrong. Since majority can be shown to have a polynomial size formula his theorem showed that the conventional wisdom was wrong. Often a weaker theorem is stated that shows that if the formula is depth ${d}$, then the bounded width computation is at most size ${2^{O(d)}}$. The above is really not stronger since it has long been known that any formula of size ${S}$ could be converted into one of ${O(\log S)}$ depth. The corresponding question about depth for circuits is still wide open: we have no idea how to make a circuit have smaller depth and not explode its size.

That formula’s can be made to have logarithm depth in their size without increasing their size too much has a long and interesting history. I do not have the space now to explain the details, but the key is that binary trees have good recursive properties–let’s leave it at that for now. Spira worked on the first result, then Brent, Pratt, and others. See Sam Buss’s paper for a recent result and some more background.

I will not give the full proof of Barrington’s Theorem, but will give an overview. His brilliant insight was to restrict the values that boxes pass from one to another to values from a finite group ${G}$. Clearly, as long as the group is fixed this will satisfy the bounded width restriction: a group only takes ${O(1)}$ bits. What is not at all obvious is that by restricting his attention to finite groups he could see how to make the bounded width model compute any formula. I think this is a nice example of a trick that we should use more often. By adding constraints he made the problem easier to understand: Henry Kissenger once said, “The absence of alternatives clears the mind marvelously.” By focusing on groups David was forced to use certain constructs from group theory that made his method work.

Barrington’s theorem is proved by showing that every formula can be computed by a bounded width computation over the same fixed group ${G}$. The group is assumed to be a non-abelian simple group. A key notation is that of commutator of two group elements: the commutator of ${x,y}$ is ${[x,y] = xyx^{-1}y^{-1}}$. Since ${G}$ is non-abelian and simple it follows that ${[G,G] = G}$.

The way that a bounded width computation will compute a formula is simple: if the formula is true let the output be the identity element ${1}$ of the group; if the formula is false, then the output will be a ${g}$ that is not the identity. To make the induction work we will insist that ${g}$ can be any non-identity element. This is a standard Pólya trick that makes the proof work nicely. Clearly, we can handle any input variable ${x_{i}}$. Negation is also simple: just multiple by ${g^{-1}}$ and this flips the value of ${1}$ and ${g}$.

The key case is how do we do an “or” gate? Suppose that we want to compute ${{\cal A} \vee {\cal B}}$. Further assume that we want the values of this computation to be ${1,g}$. Then, we can select ${a,b}$ so that ${[a,b] = g}$. Let’s use ${A}$ (resp. ${B}$) to denote the bounded width computation that computes ${ \cal A}$ (resp. ${\cal B}$). Then, the claim is that $\displaystyle C = A \rightarrow B \rightarrow A^{-1} \rightarrow B^{-1}$

computes the “or”. The arrow $\to$ means they form a series of “boxes”. It should be clear that if either ${A}$ or ${B}$ outputs ${1}$, then ${C}$ outputs ${1}$. What happens if they both do not output ${1}$. Then we get the value $\displaystyle g = aba^{-1}b^{-1}.$

But this is the value that we needed. Pretty clever.

Related Results

I have some interesting related results that I wanted to talk about. Since this post is already getting a bit long I will explain some of the related work on Barrington’s Theorem in the future.

Open Problems

This area is rich with open problems. For starters we do not understand what groups are needed to make Barrington’s Theorem work. He shows that any non-abelian simple group is enough. In the opposite direction it is known that no nilpotent group will work. The big open question is what about solvable groups? We believe that they should not be able to compute the majority function, but we all guessed wrong before so perhaps we are wrong again. It is not hard to see that the exact technology that David uses will not work for solvable groups. But that, of course proves nothing. I consider this one of the most interesting open questions in this area of theory. The reason that nilpotent groups can be handled is that they are always direct products of p-groups. And p-groups can be handled by known results.

I have a completely wild idea that I hesitate to share less you stop reading my blog and think I have lost it. But I cannot resist so I will tell you it anyway. Suppose that you could prove from first principles that no group of odd order could compute the majority function. This would be a major result, since it would imply the famous (to group theorists at least) Odd Order Theorem of Walter Feit and John Thompson. They showed that any odd order group cannot be simple. So if no odd order group can compute majority, then by Barrington’s Theorem no odd order group is simple. The original proof of the Feit-Thompson Theorem is over 200 journal pages–255 to be exact.

A possible more doable idea would be to prove the also famous ${pq}$ theorem of Burnside. Burnside’s theorem from group theory states that if G is a finite group of order ${p^{r}q^{s}}$ then ${G}$ is solvable. Hence each non-abelian finite simple group has order divisible by three distinct primes.

The reason I think that there is at least some hope that complexity theory could prove such theorems is that we bring a completely different set of tools to the table. I think it could be the case that these tools could one day prove a pure theorem of mathematics in an entirely new manner. That would be really cool.

1. March 3, 2009 9:49 pm

The link to the Sam Buss paper is broken. The correct link is http://euclid.ucsd.edu/~sbuss/ResearchWeb/fmlarestructure/paper.ps

• March 3, 2009 10:59 pm

Thanks I will fix it…dick ( My NAME, I perfer Dick to Richard)

2. March 6, 2009 11:52 am

Did you just call Andy Parrish a dick for pointing out the broken link?

• March 6, 2009 12:02 pm

NO…my name is dick. I hope you were kidding.
I better be careful. Did you see the Movie “Fracture”. In it Anthony Hopkins uses the “name Dick” it an amusing way.

3. June 9, 2009 4:41 pm

Thanks for the kind words and for the nice clear explanation of my theorem. It is true that the result was surprising, and in fact I found the proof very shortly after I began to consider the possibility that the conjecture I was trying to prove (that the majority function required exponential size for any constant width) might be false. I had proved a special case of the lower bound for width 3 and failed to get the resulting paper into FOCS 1985, and in the resulting crisis of confidence I began looking at upper bounds to check my intuition. The construction with commutators entered into the lower bound proof, so I started looking at what upper bounds they could prove. I knew from my undergraduate algebra course that width 5 might behave very differently from width 3 with respect to commutators, since S_3 is solvable and S_5 is not. When I got the proof my advisor, Mike Sipser, was out of town, so I grabbed the first person I could find with the background to understand the proof, who turned out to be Phil Klein.

Someone on Lance’s blog just mentioned that a key step, perhaps the key step, in the discovery of the NL = co-NL theorem was the result by Lange, Jenner, and Kirsig that the alternation hierarchy for logspace collapses to the second level. This made it very easy to believe that the collapse to the first level (NL) was possible, and it was then a matter of months before Immerman and Szelepczenyi independently found the proof of that collapse. There was a similar short race to find the IP = PSPACE proof once that result became plausible.

4. October 21, 2009 6:44 am

Doesn’t the exact same proof you give in this post, word-for-word, apply to circuits instead of formulas? So I’m curious why you stated it only for formulas…

• September 21, 2010 5:23 pm

The key difference between circuits and formulas is that circuits can “re-use intermediate results”, while formulas cannot: the output of a circuit gate can be wired as input to any number of other circuit gates; however for using the same expression inside a formula you have to replicate it.

Note that the argument recursively creates computations for A and B in order to compute A v B., and then “wires them together”. If the computation of A and B would share some “common computation” we either have to replicate this common part for both A and B or “save” the intermediate result by adding more wires reaching from the “A” part of the computation to the “B” part. While the former voids the bound on complexity blowup, the latter voids the bound on the computation width.

• November 3, 2010 9:22 pm

Yes, I guess you are right…so how are circuits handled? (Barrington’s paper basically seemed to give the same argument given in this post.)

5. July 1, 2016 12:07 am

I think I’m missing something because it seems to me as if no such group can exist. By the properties of boolean negation:

Let x’ be the left inverse of x for all x
g’g = 1 by definition of left inverse
Let g’x be the negation of x for all x
g’g’x = x as a property of boolean negation
g’g’ = 1 after right-multiplying both sides by the right inverse of x

This means that:
g’g’ = 1 = g’g

Left-multiplying both sides by g’, you’re left with:
g’g’g’ = g’g’g
1g’ = 1g
g’ = g

With the commutator operation, we have:
g = [a,b] = aba’b’ for a,b not equal to 1
g’ = (aba’b’)’ = bab’a’ = [b,a]
[a,b] = [b,a], since g=aba’b’ is equal to its own inverse

This is only possible when a,b are commutative with one another. This means that A OR B = [a,b] = 1 = true for all A,B, which seems wrong. Moreover, it means that [a,b] = g = 1, which in conjunction with [a,b] = [b,a] implies that all elements used to construct any formula will always be commutative with one another, which contradicts the assumption that G is non-abelian.