Election and debate special from GLL—note update

Aram Harrow is both a computer scientist and a physicist—something that makes him unique and special. Few people can explain what a probabilistic Turing Machine is and also what complementarity is. He has done some beautiful work in pure and applied mathematics, work that uses his dual abilities.

Today Dick and I wish to describe and interpret an intricate recent theorem of his in a joint paper with Alexandra Kolla and Leonard Schulman.

Aram is no stranger to these pages. Of course he has also been our honored participant in our year-long debate on the feasibility of building quantum computers, all during his current appointment in the Computer Science and Engineering department of the University of Washington in Seattle. He is also one of three mainstays on the prominent quantum computing blog created by Dave Bacon, and during our debate still managed to get a little bit of other work done. In January he will join the MIT Department of Physics as an assistant professor, and we congratulate him on that.

As related at the bottom of his post, the paper grew out of work on the unique games conjecture, seeking to apply quantum methods though the end result was “classical.” The impetus included models of physical noise, such as featured in Aram’s debate with Gil Kalai. To bring this full circle, Gil himself is the first person credited for discussions that spurred the paper, but these were in 2010 long before the debate idea arose, and not with Aram.

## The Proof

Usually we discuss the paper first, or at least state its main results, but exceptional times demand exceptions. Their “Proof overview” subsection begins by defining a “senate operator,” based on the idea that in the U.S. Senate, states have equal weight regardless of population. Relaxing equal-weighting will lead to our main open problem. The Section~3 title summarizes their pivotal Proposition~2 as “senates dominate dictatorships.” A main ingredient of their proof are the Krawtchouk Polynomials, whose Internet page talks about “the destiny of the scientist during the Great Terror.”

Our interdisciplinary push seems to have entered Politics, but the real point is: the proof is hard. So hard that the authors tell me they have no idea of the value of the constant ${a}$ whose existence constitutes their main theorem. We will still try to give some idea of its significance, and why the result is hard, and how you can approach the issues without having to delve into the whole party manifesto. But you will not be able to make like President Gerald Ford in his first debate (parody) with Jimmy Carter in 1976:

“It was my understanding that there would be no math in this debate.”

## The Paper

The paper’s title, “Dimension-free L2 maximal inequality for spherical means in the hypercube,” may not generate any 15-second sound bites. Nor may it help predict the outcome of the U.S. Presidential election, but it does contain an important new theorem.

Let’s first say why this theorem is significant and hard. The term “hypercube” is just the set ${\{0,1\}^n}$ of binary strings of length ${n}$—one of the favorite objects for computer scientists. In this world, the sphere of radius ${k}$ around a point ${x}$, written ${S_k(x)}$, is the set of strings that are Hamming distance ${k}$ from ${x}$. Thus, if ${x=000}$ and ${k=2}$ the points in the sphere are:

$\displaystyle S_2(000) = \{110, 101, 011\}.$

Now imagine we select a small fraction of the points from the hypercube, say ${\epsilon 2^{n}}$. Can one find a single point ${x}$ that is special, in that every sphere around ${x}$ of any radius ${k}$ has less than half of its points in the selected set? Clearly, this seems quite reasonable, if ${\epsilon}$ is small enough.

This is a case where the intuition is correct, but proving it is hard. A standard method of proving that special points exist is to pick ${x}$ at random, essentially using the probabilistic method. Alas this works for spheres of one radius, or a few radii, but does not seem to be powerful enough to show that the point works for all radii.

The probabilistic method is extremely powerful, but when it fails, often the requirements for a proof become very difficult. This is the case with their new result. Actually in our opinion this is one of the most important aspects of the new paper. Since the probabilistic method does not always work, any new ideas that can solve problems where it fails are automatically of great importance. Here one idea that Aram tells me was important is a different kind of “complementarity,” namely that ${S_k(x) = S_{n-k}(x')}$ where ${x'}$ is the vertex opposite ${x}$ in the hypercube.

Let’s turn now to state the theorem in a mathematically precise manner and probe its reach.

## The Result

The new inequality involves the standard ${L_2}$ norm of a function ${f: \{0,1\}^n \longrightarrow \mathbb{R}}$, which is defined by

$\displaystyle ||f||_2 = \text{ the square root of } \sum_x f(x)^2.$

The other functions involved maximize the average of ${f}$ over the spheres:

$\displaystyle M_f(x) = \max_k \frac{1}{|S_k(x)|}\sum_{y \in S_k(x)}f(y).$

The key clause in the paper’s title is that the bound given by their inequality does not depend on ${n}$. Here is their new result:

Theorem 1 There exists ${a > 0}$ such that for all ${n}$ and ${f}$,

$\displaystyle ||M_f||_2 \leq a||f||_2.$

Pretty neat, no? Well it’s neat, but it can use some interpretation.

Speaking mathematically, call ${x}$ “light” if ${|f(x)|^2}$ is below the average of the squares, i.e., ${|f(x)|^2 < \frac{1}{2^n}||f||_2^2}$. We might hope to “bulk ${x}$ up” by finding an average over one of the spheres enclosing ${x}$ that is greater. For many ${f}$‘s there will be some ${x}$‘s for which this makes a big difference. However, the theorem shows that under ${L_2}$ norms the cumulative effect of this extra allowance will never be more than a fixed constant factor—independent of the dimension. That’s fine, but can we give a more helpfully vivid interpretation?

## An Election Interpretation

It is election season in the United States again. One hears often about how demographic and personal characteristics channel one’s political preferences, and that there are “typical Democratic voters” and “typical Republican voters.” The categories are often binary or close to it: male/female, married/single, urban/suburban-or-rural, over/under age 50, and so on.

We can define them by positions on single issues: pro-life/pro-choice, gun-rights/gun-control, for/against gay marriage, immigration amnesty, anything like that. We can also add categories that seem irrelevant: righthanded/lefthanded, dark/light eyes or hair, tall/short, nearsighted/not, Coke/Pepsi. We can even include divisions that are not roughly balanced in the population, such as vegetarian/carnivore. Hence also we can break multi-way categories into binary pairs, such as X/not-X for every ethnic group X. Then the possible combinations of values of ${n}$ characteristics correspond to the vertices of the ${n}$-dimensional hypercube.

What justifies saying that a combination ${x}$ of characteristics is “typical” for a party is that if you vary some of them—any number ${k}$ of them—you still find mostly people who vote for that party. Importantly this should be true for changing any set of ${k}$ characteristics, not just specific ones. Given the strength of this requirement, do typical voters exist? The fact of single-issue voting may even make this seem unlikely. However, the following broad statement holds in enough cases to give a way to approach the issues in their paper:

Theorem 2
If a party wins by a large enough landslide, then it has typical voters.

That is, there are voters ${x}$ such that not only did ${x}$ vote for the winner, but for any ${k > 0}$, over all voters ${y}$ who differ from ${x}$ in ${k}$ characteristics, the winner got a majority of votes from those ${y}$ as well. Well this comes with some caveats, so we need to look closer.

## The One-Vertex, One-Vote Case

Let us first suppose that every combination has exactly one voter. Let ${D}$ be the set of nodes that vote for the loser. Then we define an indicator function ${f}$ by:

$\displaystyle f(x) = 1 \text{~if~} x \in D\text{;~} f(x) = 0 \text{~otherwise}.$

Now if the loser won an ${\epsilon}$ portion of the vote, then since ${f(x)^2 = f(x)}$, we have:

$\displaystyle ||f||_2^2 = \sum_x f(x)^2 =\sum_x f(x) = \epsilon 2^n.$

By the theorem applied to ${f}$, squaring both sides,

$\displaystyle \sum_x M_f(x)^2 \leq a^2\epsilon\cdot 2^n.$

Dividing by ${2^n}$ makes the left side an average ${m}$ such that ${m \leq a^2\epsilon}$. There must exist some ${x}$ giving at most the average value, i.e. such that ${M_f(x)^2 \leq m}$, which implies that for all ${k}$,

$\displaystyle \frac{1}{|S_k(x)|}\sum_{y \in S_k(x)} f(y) \leq a\sqrt{\epsilon}.$

Taking ${\epsilon}$ small enough that ${a^2 \epsilon < 1/4}$ makes ${\frac{1}{|S_k(x)|}\sum_{y \in S_k(x)} f(y) < \frac{1}{2}}$ for all ${k}$. Thus the loser achieves no more than a minority over all of the spheres centered on ${x}$. This carries over to all of the Hamming balls ${B_k(x)}$, which include their interiors, as well.

This makes ${x}$ a typical voter. By extending the averaging argument, one can show there are many typical voters. Thus when the election map is close to all-red or all-blue, there are many places where the same color prevails in all concentric spheres. However, things become trickier when we vary the requirements on the space.

## Weighted Measures

Now suppose we have a weighted measure ${\mu(x)}$ on the hypercube. With regard to this measure,

$\displaystyle ||f||_2^2 = \sum_x f(x)^2 \mu(x).$

It also now makes sense to define weighted versions of the averages over the Hamming spheres, leading to the operator

$\displaystyle M^*_f(x) = \max_k\frac{\sum_{y \in S_k(x)}f(y)\mu(y)}{\sum_{y \in S_k(x)}\mu(y)}.$

Does their theorem now hold in the form that there exists ${a_*> 0}$ such that for all ${f}$, ${||M^*_f||_2 \leq a_*||f||_2}$? Expanded, this now means:

Theorem? We ask whether for all ${n}$, all counting measures ${\mu}$ on ${\{0,1\}^n}$, and all ${f:\{0,1\}^n \longrightarrow \mathbb{R}}$, there exists ${a_* > 0}$ (possibly depending on ${\sum_x \mu(x)/2^n}$ or on ${\sum_x \mu(x)^2/2^n}$) such that

$\displaystyle \sum_x M^*_f(x)^2 \mu(x) \leq a_*\sum_x f(x)^2 \mu(x).$

Update 10/9/12: With no dependence on ${\sum_x \mu(x)}$, or restriction on values ${\mu(x)}$, this has been refuted—see Addendum below.

We can also define the modified max-averaging operator with regard to the balls ${B_k(x)}$ rather than the spheres ${S_k(x)}$, and ask the same question. The paper does not address such matters directly, and we understand from private communication with the authors that they have not arrived there yet, because seemingly simpler cases are unresolved. If it were true, then we could derive the existence of “typical voters” in the most general setting.

To see how, assume that the ${\mu(x)}$-many voters who share the same characteristics ${x}$ all vote the same way. (If this is not true, we can create some new characteristics that separate them, and are free to add them as a co-ordinates to every vector because the count ${\mu(x)}$ is allowed to be anything including zero.) Let ${D}$ again be the set of ${x}$ that vote for the loser. Then ${wt(D) = \sum_{x \in D}\mu(x)}$ is the total number of votes for ${D}$, out of a total electorate of ${E = \sum_x\mu(x)}$. Define ${f}$ to be the 0-1 valued indicator function of ${D}$ as before. Then

$\displaystyle ||f||_2^2 = \sum_{x \in D} \mu(x) = wt(D),$

while

$\displaystyle ||M^*_f||_2^2 = \sum_x\max_k \left[\frac{wt(S_k(x) \cap D)}{wt(S_k(x))}\right]^2\mu(x).$

The “Theorem?” and dividing both sides by ${E}$ yield

$\displaystyle \sum_x\max_k \left[\frac{wt(S_k(x) \cap D)}{wt(S_k(x))}\right]^2\mu(x) \leq a_*^2 \frac{wt(D)}{E}.$

Again an averaging argument with respect to ${E = \sum_x \mu(x)}$ yields the existence of ${x}$ such that for all ${k}$,

$\displaystyle \frac{wt(S_k(x) \cap D)}{wt(D)} < a_*\sqrt{\frac{wt(D)}{E}}.$

Here the right-hand side has ${\sqrt{\ell}}$ where ${\ell}$ is the loser’s percentage of the total vote, and a large enough victory makes ${a_*\sqrt{\ell} < 1/2}$. This implies that the loser still has a minority in every Hamming sphere centered on the voter ${x}$.

## Can We Use the Inequality They Proved?

We can try to work instead with the function

$\displaystyle f_*(x) = f(x)\sqrt{\mu(x)}.$

Then we obtain

$\displaystyle ||f_*||_2^2 = \sum_x f(x)^2 \mu(x),$

while (using the original maximizing operator ${M}$ again)

$\displaystyle ||M_{f_*}||_2^2 = \sum_x \max_k \left[\frac{1}{|S_k(x)|}\sum_{y \in S_k(x)} f(y)\sqrt{\mu(y)}\right]^2.$

In general this does not look very tractable, nor does the proved theorem ${||M_{f_*}||_2 \leq a||f_*||_2}$ seem either to imply or follow from the “Theorem?” above.

In our specific election case, we might try

$\displaystyle f_*(x) = \sqrt{\mu(x)} \text{~if~}x\in D\text{;~} f_*(x) = 0\text{~otherwise}.$

Then ${||f_*||_2 = wt(D)}$, and the theorem they proved gives

$\displaystyle \sum_x \max_k \left[\frac{1}{|S_k(x)|}\sum_{y \in S_k(x)}\sqrt{\mu(y)}\right]^2 \leq a^2 wt(D).$

The averaging argument, however, is still in terms of ${2^n}$ not ${E}$ as just above, so we get the existence of an ${x}$ such that for all ${k}$,

$\displaystyle \frac{\sum_{y \in S_k(x)} \sqrt{\mu(y)}}{\binom{n}{k}} \leq a\sqrt{\frac{wt(D)}{2^n}}.$

Neither side of this inequality yields the desired interpretation. The Cauchy-Schwarz inequality can relate the numerator to ${wt(S_k(x) \cap D)}$, but the inequality goes in the wrong direction for our purpose. This hints how tricky the considerations in the paper are.

The problem appears to be mainly the “white space” caused on the map when ${\mu(x) = 0}$. We can argue that this still should see good behavior as a limiting case of ${\mu(x)}$ always being positive but fractional (where we can also stipulate that ${\sum_x \mu(x) = 1}$) and letting it tend to zero for some ${x}$.

Resolving this, however, may entail solving some easier problems that Aram and the others have attempted, such as for spaces formed by Cartesian products of other graphs (besides the ${n}$-fold product of 2-cliques giving the hypercube). Moreover their type of inequality is open even for ${\{0,1,2\}^n}$, and for other norms including the ${L_1}$-norm. An ${L_1}$-norm inequality ostensibly would help our desired election conclusions, but one of their key propositions does not extend to it, at least not conveniently as they remark. As always we refer interested readers to the paper for full details and further questions.

Aram contributes a counterexample to the unrestricted “Theorem?”, to some restricted versions and also with balls in place of spheres, and to the existence of “typical voters” in the weighted case with regard to spheres. Namely, let there be only ${n+1}$ voters, with the origin voting for the loser ${D}$, i.e. ${f(0^n) = 1}$, and her ${n}$ neighbors voting for the winner (${f(x) = 0}$ everywhere else). Then

$\displaystyle ||f||_2 = 1 \quad\text{while}\quad ||M^*_f||_2 = 1 + n$

because the neighbors see only the origin at distance 1. Thus there is no inequality with a constant independent of ${n}$. Moreover, none of the winning voters is typical, again because its sphere at distance 1 has only the origin, while even the ball with radius 1 leaves a tied 1-1 result.

However, the weight-2 nodes such as ${x = 110^{n-2}}$ still have the loser in a minority of all properly enclosing balls, so they are “typical combinations” even with ${\mu(x) = 0}$ there. Perhaps the most relevant cases are where ${\mu(x) \geq 1}$ for all ${x}$ and ${\sum_x \mu(x) = O(2^n)}$, and for balls not spheres—though Hamming balls lack the “complementarity” property mentioned near the top.

## Open Problems

For what other norms and spaces does their inequality hold?

Does their inequality hold under weighted counting measures ${\mu}$ on ${\{0,1\}^n}$, with reasonable restrictions on ${\mu}$ and perhaps using balls not spheres? Can we obtain the special case of “typical voters” from the inequality they have without the one-vertex, one-vote restriction, or under milder added assumptions? Can we prove it if ${\mu(x) \geq 1}$ and ${\sum_x \mu(x) = O(2^n)}$?

Will math matter to the upcoming Presidential debates?

[Changed to distinguish "ball" from "sphere" consistently, added note on partial refutation of weighted case.]

1. October 8, 2012 4:48 pm

The title references a common riddle, “What’s black and white and red all over?”, whose answer is variously “a newspaper” (hearing “read”), “a skunk with a diaper rash”, “a sunburned zebra”, “the US balance sheet”, and so on. My own riddle is why the great analysts of Hilbert’s time and before didn’t think of and work on this kind of inequality (let alone prove it), and my answer is that they were not in the right conceptual (Hamming) spheres.

The photo of Aram is from a UW directory and is already truncated where I put part of the US flag on top.

2. October 8, 2012 10:02 pm

Let me ask a question about a potentially interesting extension but I am not sure if it is already covered by the paper or by the more general version proposed in the post.

for positive s let t(s) be the minimum value so that if no more than t(s) of the vertices of the discrete n-cube are marked then there is a vertex so that every ball around it contains at least a fraction s of unmarked vertices. The paper prove that t(1/2) is larger than an absolute constant (independent of n). What can be said about t(s)? (Here we can even allow s also to be a function of n.) In particular, is $t(s) \ge Cs$ for some absolute constant?

• October 8, 2012 10:06 pm

Sorry, it seems to follow from Theorem 1 as stated….

October 12, 2012 10:51 am

Hello KWR, I came across your post on the “lonely runner conjecture” whilst doing a bit of research on it. I believe that I have solution to the conjecture (proving it to be correct) which holds for any n (with no restrictions concerning speed (when the conjecture is framed in terms of runners on a track) etc. I would be very interested to share this idea as I can not see any flaw in the proof. I am an engineer and not a mathematician and whist the proof is easily stated and explained it is not a straightforward matter for me to express it using strict mathematical formalism.

4. October 10, 2013 6:43 am

Thanks for the wonderful post! I have a better understanding of the voting theory perspective now =)

Regarding the addendum, I assume that by “let there be only n+1 voters”, you are taking your weighted measure $\mu$ to be the characteristic function of the ball of radius 1 at the origin. Is this correct? Your calculations work out in this case. And is there any reason heuristic reason why this weight should fail to be bounded with a dimension-free constant?

In regards to the spherical maximal function, if one wants to know about the existence of typical voters, then you are trying to understand the ‘worst case behavior’ of votes (e.g. what can conspire to prevent the existence of a typical voter?). This is why one is forced to consider their maximal function. This idea isn’t new and arose in harmonic analysis with the work of Hardy–Littlewood on their maximal function. Understanding various maximal functions, including the continuous version of the spherical maximal function on the hypercube, has been an inherent aspect of harmonic analysis since then. On the other hand, the probabilistic method (which is also known as the union bound in this case) only handles the ‘average behavior’ of votes. As far as I understand, this idea was communicated to them by Gil Kilai – see http://gilkalai.wordpress.com/2012/11/17/a-few-mathematical-snapshots-from-india-icm2010/ . Once they understood this perspective, the techniques in their paper are standard in harmonic analysis, i.e. the passage to a smoother operator and controlling the remainder as a square function. Though of course, this is a nice result that has been recently extended by Krause to $L^p$ for $p>1$ in http://arxiv-web3.library.cornell.edu/abs/1309.4466?context=math.CO . It fails even as a weak $L^1$ bound by considering the delta function at the origin.