# Barrington Gets Simple

* On the power of bounded width computations *

David Barrington is famous for solving a long standing open problem. What I love about his result is that the problem was open for years, his solution is beautiful, and also that he surprised us. Most of us who had worked on the problem guessed the other way: we thought for sure it was false. Again conventional wisdom was wrong. I wonder about other problems.

** Bounded Width Computation **

There are many restricted models of computations, without these weak models we would have no lower bounds. Proving in a general model–such as unrestricted boolean circuits–that an explicit problem cannot be computed efficiently seems to be completely hopeless. However, once some restriction is placed on the model there is at least *hope* that an explicit problem has a lower bound. Barrington worked on a restricted model called *bounded width* computations. While most (many? all?) of the community believed that this model was extremely weak and would yield lower bounds, he shocked us and showed that bounded width computations were quite powerful.

A bounded width computation is a way of computing a boolean function. Suppose that are the inputs to the function. You can think of a bounded width computation as a line of boxes. Each box is associated with some input . The box gets bits from the left, uses these and the input and decides on its output which goes to the box on its right. Of course the first (left most) box is special and gets a fixed value from the left; the last (right most) box outputs the answer. The size of the computation is the number of boxes or equivalently the length of the line. So far anything can be computed by such devices. To see this just have each box concatenate the left input with their . Then the last box gets as a string and so can easily determine the value of any boolean function.

The key is that we restrict the number of bits that can be passed from one box to the next. In particular bounded width computations must have their boxes only receive and send bits *independent* of the number . This is the restriction that makes the model interesting. While the bits that can be passed must stay bounded, the length can grow faster than . Therefore, different boxes can use or read the same input . Without this added ability the model would reduce to just a finite state type machine and would be quite weak when coupled with the bounded restriction.

The question that Barrington solved was: exactly what can bounded width computations compute with size at most polynomial? The conventional wisdom at the time was that such a model could not compute, for example, the majority function. My intuition–that was dead wrong–was that in order to compute the majority function the boxes would have to somehow pass the *number* of bits that are to their left. This of course does not work because this grows as and is certainly not constant. I even worked with Ashok Chandra and Merrick Furst and proved a non-linear lower bound on the size of a bounded width computation for majority. While our bound was just barely non-linear we thought that we might be able to improve it and eventually show that majority was hard. By the way this work with Chandra and Furst will be covered soon in another post on communication complexity.

Barrington proved the following beautiful theorem:

Theorem 1Any boolean formula of size in can be computed by a bounded width computation of size at most .

The theorem shows immediately that we were wrong. Since majority can be shown to have a polynomial size formula his theorem showed that the conventional wisdom was wrong. Often a weaker theorem is stated that shows that if the formula is depth , then the bounded width computation is at most size . The above is really not stronger since it has long been known that any formula of size could be converted into one of depth. The corresponding question about depth for *circuits* is still wide open: we have no idea how to make a circuit have smaller depth and not explode its size.

That formula’s can be made to have logarithm depth in their size without increasing their size too much has a long and interesting history. I do not have the space now to explain the details, but the key is that binary trees have good recursive properties–let’s leave it at that for now. Spira worked on the first result, then Brent, Pratt, and others. See Sam Buss’s paper for a recent result and some more background.

I will not give the full proof of Barrington’s Theorem, but will give an overview. His brilliant insight was to *restrict* the values that boxes pass from one to another to values from a finite group . Clearly, as long as the group is fixed this will satisfy the bounded width restriction: a group only takes bits. What is not at all obvious is that by restricting his attention to finite groups he could see how to make the bounded width model compute any formula. I think this is a nice example of a trick that we should use more often. By adding constraints he made the problem easier to understand: Henry Kissenger once said, “*The absence of alternatives clears the mind marvelously.*” By focusing on groups David was forced to use certain constructs from group theory that made his method work.

Barrington’s theorem is proved by showing that every formula can be computed by a bounded width computation over the same fixed group . The group is assumed to be a non-abelian simple group. A key notation is that of *commutator* of two group elements: the commutator of is . Since is non-abelian and simple it follows that .

The way that a bounded width computation will compute a formula is simple: if the formula is true let the output be the identity element of the group; if the formula is false, then the output will be a that is not the identity. To make the induction work we will insist that can be any non-identity element. This is a standard Pólya trick that makes the proof work nicely. Clearly, we can handle any input variable . Negation is also simple: just multiple by and this flips the value of and .

The key case is how do we do an “or” gate? Suppose that we want to compute . Further assume that we want the values of this computation to be . Then, we can select so that . Let’s use (resp. ) to denote the bounded width computation that computes (resp. ). Then, the claim is that

computes the “or”. The arrow means they form a series of “boxes”. It should be clear that if either or outputs , then outputs . What happens if they both do not output . Then we get the value

But this is the value that we needed. Pretty clever.

** Related Results **

I have some interesting related results that I wanted to talk about. Since this post is already getting a bit long I will explain some of the related work on Barrington’s Theorem in the future.

** Open Problems **

This area is rich with open problems. For starters we do not understand what groups are needed to make Barrington’s Theorem work. He shows that any non-abelian simple group is enough. In the opposite direction it is known that no nilpotent group will work. The big open question is what about solvable groups? We believe that they should not be able to compute the majority function, but we all guessed wrong before so perhaps we are wrong again. It is not hard to see that the exact technology that David uses will not work for solvable groups. But that, of course proves nothing. I consider this one of the most interesting open questions in this area of theory. The reason that nilpotent groups can be handled is that they are always direct products of *p-*groups. And *p-*groups can be handled by known results.

I have a completely wild idea that I hesitate to share less you stop reading my blog and think I have lost it. But I cannot resist so I will tell you it anyway. Suppose that you could prove from first principles that no group of odd order could compute the majority function. This would be a major result, since it would imply the famous (to group theorists at least) *Odd Order Theorem* of Walter Feit and John Thompson. They showed that any odd order group cannot be simple. So if no odd order group can compute majority, then by Barrington’s Theorem no odd order group is simple. The original proof of the Feit-Thompson Theorem is over 200 journal pages–255 to be exact.

A possible more doable idea would be to prove the also famous theorem of Burnside. Burnside’s theorem from group theory states that if G is a finite group of order then is solvable. Hence each non-abelian finite simple group has order divisible by three distinct primes.

The reason I think that there is at least some hope that complexity theory could prove such theorems is that we bring a completely different set of tools to the table. I think it could be the case that these tools could one day prove a pure theorem of mathematics in an entirely new manner. That would be really cool.

### Trackbacks

- Mathematical Embarrassments « Gödel’s Lost Letter and P=NP
- Guessing the Truth « Gödel’s Lost Letter and P=NP
- Can Group Theory Save Complexity Theory? « Gödel’s Lost Letter and P=NP
- Equations Over Groups: A Mess « Gödel’s Lost Letter and P=NP
- Lecture 11 « An Intensive Introduction to Complexity Theory (CMU, Spr 2011)
- Polls And Predictions And P=NP « Gödel’s Lost Letter and P=NP
- Surely You Are Joking? | Gödel's Lost Letter and P=NP
- Open Problems That Might Be Easy | Gödel's Lost Letter and P=NP

The link to the Sam Buss paper is broken. The correct link is http://euclid.ucsd.edu/~sbuss/ResearchWeb/fmlarestructure/paper.ps

Thanks I will fix it…dick ( My NAME, I perfer Dick to Richard)

Did you just call Andy Parrish a dick for pointing out the broken link?

NO…my name is dick. I hope you were kidding.

I better be careful. Did you see the Movie “Fracture”. In it Anthony Hopkins uses the “name Dick” it an amusing way.

Thanks for the kind words and for the nice clear explanation of my theorem. It is true that the result was surprising, and in fact I found the proof very shortly after I began to consider the possibility that the conjecture I was trying to prove (that the majority function required exponential size for any constant width) might be false. I had proved a special case of the lower bound for width 3 and failed to get the resulting paper into FOCS 1985, and in the resulting crisis of confidence I began looking at upper bounds to check my intuition. The construction with commutators entered into the lower bound proof, so I started looking at what upper bounds they could prove. I knew from my undergraduate algebra course that width 5 might behave very differently from width 3 with respect to commutators, since S_3 is solvable and S_5 is not. When I got the proof my advisor, Mike Sipser, was out of town, so I grabbed the first person I could find with the background to understand the proof, who turned out to be Phil Klein.

Someone on Lance’s blog just mentioned that a key step, perhaps the key step, in the discovery of the NL = co-NL theorem was the result by Lange, Jenner, and Kirsig that the alternation hierarchy for logspace collapses to the second level. This made it very easy to believe that the collapse to the first level (NL) was possible, and it was then a matter of months before Immerman and Szelepczenyi independently found the proof of that collapse. There was a similar short race to find the IP = PSPACE proof once that result became plausible.

Doesn’t the exact same proof you give in this post, word-for-word, apply to circuits instead of formulas? So I’m curious why you stated it only for formulas…

The key difference between circuits and formulas is that circuits can “re-use intermediate results”, while formulas cannot: the output of a circuit gate can be wired as input to any number of other circuit gates; however for using the same expression inside a formula you have to replicate it.

Note that the argument recursively creates computations for A and B in order to compute A v B., and then “wires them together”. If the computation of A and B would share some “common computation” we either have to replicate this common part for both A and B or “save” the intermediate result by adding more wires reaching from the “A” part of the computation to the “B” part. While the former voids the bound on complexity blowup, the latter voids the bound on the computation width.

Yes, I guess you are right…so how

arecircuits handled? (Barrington’s paper basically seemed to give the same argument given in this post.)