Multi-party protocols, bounded width programs, concentration theorems, and FOCS 2009

Ashok Chandra is a famous theorist who has done seminal work in the many different areas. His early work was in program schema, an area that was central to theory in the 1970’s. Chandra then moved into other areas of theory and among other great things co-invented the notion of Turing Machine Alternation.

Later in his career, while still an active researcher, he began to manage research groups: first a small one at IBM Yorktown, then a much larger one at IBM Almaden, and later groups at other major companies including Microsoft. As a manager, one example of his leadership was the invention by his group at Almaden of that little trackpoint device that plays the role of a mouse. The trackpointer eventually appeared on the keyboards of IBM’s Thinkpad laptops—now called Lenovo laptops—a wonderful example of successful technology transfer.

Today I want to talk about mutli-party protocols, an area that Chandra helped create in the 1980’s. Then, I will connect this work with a new paper that is about to appear in FOCS in days. This is the ${50^{th}}$ FOCS and I am looking forward to attending it, even though I will not get any Delta miles for this one—it is in Atlanta.

I still remember vividly meeting Ashok for the first time: it was at the Seattle airport in 1974 at the taxi stand. We were both on our way to the sixth STOC conference—it was my first STOC. I knew of Ashok, since I had read his beautiful papers on program schema theory. Somehow we figured out that we were both going to the same hotel and we shared a cab. Our conversation in the cab made me feel I was welcome as a new member of the theory community: I have never forgotten his kindness.

At that STOC Chandra presented some of his recent work on program schema, “Degrees of Translatability and Canonical Forms in Program Schemas: Part I.” Also presented at the conference were these results, among others:

• Leslie Valiant: The Decidability of Equivalence for Deterministic Finite-Turn Pushdown Automata.
• John Gill: Computational Complexity of Probabilistic Turing Machines.
• Vaughan Pratt, Michael Rabin, Larry Stockmeyer: A Characterization of the Power of Vector Machines.
• Stephen Cook, Robert Reckhow: On the Lengths of Proofs in the Propositional Calculus.
• Robert Endre Tarjan: Testing Graph Connectivity.
• Allan Borodin, Stephen Cook: On the Number of Additions to Compute Specific Polynomials.
• David Dobkin, Richard Lipton: On Some Generalizations of Binary Search.

The topics then were very different from today: then, there was no quantum theory, no game theory, no learning theory, and many complexity classes had yet to be invented. Yet many of the topics are still being studied today—perhaps a discussion for another day.

Let’s now turn to multi-party protocols and eventually to concentration theorems.

Multi-Party Protocols

Chandra, Merrick Furst, and I discovered the basic idea of multi-party protocols because we guessed wrong. We were working on proving lower bounds on bounded width branching programs, when we saw that a lower bound on multi-party protocols would imply a non-linear lower bound on the length of such programs.

Let me back up, one summer Merrick Furst and I were invited to spend some time at IBM Yorktown by Ashok. The three of us started to try to prove a lower bound on the size of bounded width programs.

A bounded width program operates as follows. Suppose that the input is ${x_{1},\dots,x_{n}}$. I like to think of a bounded width program as a series of “boxes.” Each one takes a state from the left, reads an input ${x_{i}}$ which is the same for the box each time, and passes a state to the right. The first box is special and gets a start state from the left; the last box is special too and passes the final state to the right. The total number of states is fixed independent of the length of the computation. Note, also while the boxes read the same input each time, an input can be read by different boxes. The length, the number of boxes, is the size of the bounded width program.

The reason this model is called bounded width is that the number of states is fixed independent of the number of inputs. The power of the model comes from the ability to read inputs more than once. Note, with that ability the model could not compute even some very simple functions.

We guessed that there should be simple problems that required length that is super-linear in the number of inputs, ${n}$. This guess was right. What we got completely wrong, like most others, was that we thought the length might even be super-polynomial for some simple functions. David Barrington’s famous theorem proved that this was false,

Theorem: For any boolean function in ${\mathsf{NC}^{1}}$, the length of a bounded width program for that function is polynomial.

The connection between bounded width programs and protocols is simple. Suppose that there are ${2n}$ boxes, which we wish to show is too few to compute some function. Divide the boxes into three groups: ${L,M}$, and ${R}$. The boxes in ${L}$ are the first ${2n/3}$ boxes, the boxes in ${M}$ are the next ${2n/3}$ boxes, and ${R}$ are the last ${2n/3}$I fixed a typo here thanks to Andy D. Think of three players ${L}$, ${M}$, and ${R}$. They each do not have enough bits to determine the function if it depends on all the bits. Yet any two players together have enough bits, possibly, to determine the function: this follows because,

$\displaystyle 2n/3 + 2n/3 > n.$

Thus, we needed to somehow argue that they needed to move more than a fixed number of bits from one to another to compute the given function. This was the hard part.

IBM Yorktown is in the middle of nowhere. Well it is in the middle of a beautiful area, but then there were no lunch places nearby. So every day we ate in the IBM cafeteria, which was a fine place to eat. However, one day Merrick and I decided to take off and have lunch somewhere else. We had a long lunch “hour.” A really long hour. When we got back to IBM Ashok was a bit upset: where have we been? Luckily a theory talk was just about to start and we all went to hear the talk.

Talks are a great place to think. There is no interruption, which was especially true back then since there were no wireless devices. I used the time to try to put together all the pieces that we had discovered. Suddenly I saw a new idea that might be the step we were missing.

As soon as the talk was over I outlined the idea that I had to Ashok and Merrick. They were excited, and to my relief all was forgotten about the long lunch. Working that afternoon we soon had a complete outline of the proof. There were, as usual, lots of details to work out, but the proof would eventually all work. I still think we made progress on this result because of the long lunch and the quiet time of the talk. The result appeared in STOC 1983.

We were excited that we had a lower bound, but we missed two boats. Missing one is pretty bad, but two was tough. The first was our protocol lower bound : this was later vastly improved by much better technology, and the second was that we missed thinking about upper bounds. I do not know if we would have found Barrington’s beautiful result; however, we never even thought about upper bounds. Oh well.

Concentration Results

Let’s jump forward to the present. There is a terrific book, published this year, by Devdatt Dubhashi and Alessandro Panconesi called Concentration of Measure for the Analysis of Randomized Algorithms. I just got a copy and it is beautifully written, contains all the major concentration results, and is a must to have on your desk.

Of course research stops for no one, and there are two beautiful papers that contain new concentration theorems that will appear in the next FOCS. I am sure they will be included in the second edition of Dubhashi and Panconesi.

The first paper is: A Probability inequality using typical moments and Concentration Results by Ravi Kannan. The second paper is: A Probabilistic Inequality with Applications to Threshold Direct-product Theorems by Falk Unger.

The theme of both these papers, and the book, is to prove theorems that show that a sum of random variables is likely to be near its mean. More precisely, if

$\displaystyle X_{1}, \dots, X_{n}$

are random variables, a concentration theorem tries to get sharp bounds on

$\displaystyle \mathsf{Prob} \left [ \mid \sum_{i=1}^{n} (X_{i} - E(X_{i})) \mid \ge t \right ].$

In general a sum of random variables will not have concentration unless there is some additional property. To see this just consider the case where they all are ${1}$ if a fair coin is heads and ${0}$ otherwise. Clearly, this sum is ${0}$ or ${n}$ and does not satisfy any meaningful concentration theorem.

One approach to get a concentration theorem is to assume that the random variables are independent, but this is often too restrictive for many applications. Look at their book for the myriad number of ways to get concentration theorems.

Another approach is to assume that the random variables satisfy certain constraints on the conditional expectations. This is what Ravi is able to improve, since the usual theorems assume “worst-case” bounds on these conditional expectations. Ravi can weaken these to “average” bounds. I think this is a brilliant insight—one that could have far reaching consequences. For example, Ravi can prove: (It is a corollary in his paper.)

Theorem: Suppose that ${X_{1},X_{2},\dots,X_{n}}$ are real-valued random variables satisfying the SNC condition. Let ${t>0}$. Suppose that ${\sigma}$ is a positive real with

$\displaystyle E[X_{i}^{2l} \mid X_{1} + \cdots X_{i-1}] \le \sigma^{2l}(n/p)^{l-1}l^{.9l}$

for all ${l=1,2,\dots}$. There is an absolute positive constant ${c}$ such that

$\displaystyle Pr \left ( \mid \sum_{i=1}^{n} X_i \mid \ge t \right ) \le 4e^{-w}$

where ${w = \frac{t^{2}}{cn\sigma^{2}}}$.

See his full paper for what SNC is and the role of ${p}$ in that condition.

Multi-Party Protocols Again

Unger’s concentration paper is quite interesting also. I will not be able to do it justice so again please take a look at the full paper. His result has to do with results that are quite close to multi-party protocols. In particular, he has an explicit application to this area.

His theorem on protocols is quite technical, but the flavor is an amplification type result. Suppose that ${q}$ players can compute some boolean function ${f(x)}$ with probability ${p=1/2+\epsilon}$. Then, he can bound how well they can compute multiple independent copies

$\displaystyle f(x^{(1)}),f(x^{(2)}),\dots,f(x^{(m)})$

where the inputs are all independent uniform random bits. Then, he can get exponential bounds on how well a protocol can do even if it only has to get a majority of the functions right.

Theorem: Let ${f:\{0,1\}^{n \times q} \rightarrow \{-1,+1\}}$ be a function such that no ${q}$-party protocol which also uses ${q}$ bits of communication can compute ${f}$ better than with probability ${1/2 + \epsilon/2}$, for ${0 \le \epsilon \le 1}$, when the inputs are chosen uniformly at random. Chose ${\lambda}$ such that ${\epsilon \le \lambda^{2^{q}}}$ and ${\lambda \le 1}$. Then for any ${q}$-party protocol ${\cal P}$ with uniformly random inputs ${x^{1,1},\dots,x^{q,k} \in \{0,1\}^{n}}$ and ${k}$ output bits, which uses ${c}$ bits of communication, it holds: The probability that the output of ${\cal P}$ agrees with ${f(x^{1,1},\dots,x^{q,1})\dots f(x^{1,k},\dots,x^{q,k})}$ on at least ${(1/2 + \lambda/2 )k}$ positions is at most

$\displaystyle 2^{c}e^{-w}.$

Here the term ${w}$ is order ${k}$ times certain other terms. The point is that their protocol cannot easily compute the function multiple times without using many bits of communication.

Open Problems

One open question is to find additional applications of these new concentration theorems. Can the average idea of Kannan work in other situations? Are there further concentration theorems yet to be discovered?

October 15, 2009 9:39 pm

Nice post!
In your second mention of players L, M, R, there is a L where R should be.

October 16, 2009 6:58 am

Oops. Thanks for the comment.

2. October 16, 2009 8:47 am

I believe “Note, with that ability the model could not compute even some very simple functions. ” should be “Note, without that ability the model could not compute even some very simple functions.”