Projections Can Be Tricky
Projections can be tricky
Henri Lebesgue is famous for the creation of modern measure theory. His definition of what we now call the Lebesgue Integral replaced previous notions of integrals. These were due to such great mathematicians as Isaac Newton, Gottfried Leibniz, Augustin-Louis Cauchy, and Bernhard Riemann. His definition was one of the landmark achievements of mathematics. His notion has many beautiful properties that all previous definitions lacked.
Today Ken and I wish to talk about the role that projections play in mathematics. And further how tricky the seemingly simple concept of projection can be.
The main reason we are interested in discussing projections is to raise points of comparison between these two developments:
- A classic result of Volker Strassen on lower bounds for arithmetic circuits. His result, even after 37 years, is the best known general lower bound on arithmetic computation.
- A recent claimed result of Vinay Deolalikar on Turing Machine lower bounds. His claim, if true, would be the best result ever on Turing lower bounds.
Both attacks need to face and handle issues that arise from projections. Strassen does this in a clever way that we will describe shortly in some detail. It is a beautiful argument. Deolalikar may or may not know how to handle projections. We will see. But, this issue of projections in his “proof” has emerged as the main conceptual gap in his proof strategy for .
How hard can the notion of projection be? After all a projection is just the notion of a shadow—right? Well it can be very tricky. The great Lebesgue once made a “throw-away” remark about how sets behave under projections in a monograph. Here is a wonderful summary of the error:
Lebesgue in his preface (to Lusin’s book) humorously points out that the origin of the problems considered by Lusin lies in an error made by Lebesgue himself in his 1905 memoir on functions representable analytically. Lebesgue stated there that the projection of a Borel set is always a Borel set. Lusin and his colleague Souslin constructed an example showing that this statement was false, thus discovering a new domain of point sets, a domain which includes as a proper part the domain of Borel sets. Lebesgue expresses his joy that he was inspired to commit such a fruitful error.
Let’s now turn to discuss projections, and how Strassen handles them while perhaps Deolalikar does not. But, that a great mathematician like Lebesgue can be mistaken about projections should be a reminder to us all. The rest of this post is by Ken Regan.
Let’s Be Nice
Let us abbreviate Deolalikar’s key concept of “polylog parameterizability” of a distribution by the English word “nice.” Then the facts established by his paper, as summarized on the wiki page opened by Terence Tao, become:
- If , then the solution space of any -SAT problem with at least one solution supports a projection of a nice distribution.
- The hard phase of -SAT does not have a nice distribution.
Subsequent to Tao’s original comment on this, the paper’s public examiners have agreed that these two facts do not, and conceptually cannot, combine to make a proof of . The extra clause “projection of” severs the link. In fact, the need for a projection cycles back upon the problem itself, as we shortly explain. The question this raises for anyone working similarly on lower bounds is,
Do your concepts stay nice under projections?
We will see how Strassen managed with a concept that stays nice, but then explore whether any strategy for better lower bounds can do so.
Consider your shadow cast by a broad light against a back wall which we will call the plane. If the light is level with your head along the -axis, then your silhouette on the wall is a mathematical projection that takes an especially simple form: it removes the -coordinate. This is a natural physical concept, so you wouldn’t think it typifies the abstract idea of nondeterminism by which is defined. But it does. The link is shown by noting:
A point is in the projection if and only if there exists an such that the point is in the 3D image of your head.
Now the point might be unique—it might be the tip of your nose. Or there might be many such “solution points,” as with light (not) going through your skull. Either way, the “exists” makes it an matter.
To see why, first suppose somehow code a Boolean formula in some number of variables, and that represents numbers from to that encode assignments to . Then the “3D image” is the set of cases where . This is an easy set in . The projections, however, form the language of formulas for which there exists an making . This is the complete set SAT.
The analogy to Vinay Deolalikar’s paper can be made even better by considering just to encode a formula, and to be a partial assignment to some of its variables. Then the projection becomes the language EXT described in this post. Either way, the upshot is that in computational complexity we believe that a projection can make a set more complicated.
Shadows and Slices
The idea that your shadow can be more complex than you sounds laughable. If anything, features of your head get simplified. If you have a double chin, it’s gone. Your hair may now mask your ears. If people enter the room, their shadows may combine with yours. But you will never, ever, see more people in a shadow than you have in the room.
The abstract mathematical fact is that the number of connected components of an image can never increase under projections. This is so even under general kinds of projections that may be onto an arbitrary subspace, even an affine subspace meaning one not through the origin, or may be stereographic meaning from a pinpoint not broad source. It seems that anything “physical” or “statistical” about a structure should become simpler when you take a projection. Or at least it should not get more complex.
Projections should not be confused with slices. A slice is what you get by fixing a variable, or substituting an affine linear expression for a variable. If you fix a formula , its solution space is a slice of the global solution space we defined above.
If you take a thin slice of a donut across the hole, you get two circular pieces. Thus it’s not a surprise that the complexity can go up with slices. In fact, you might start with a formula that has many easy-to-find solutions, but begin with choices for the “” variables that make it very difficult to find a good answer with the “” variables. You have thus created a difficult 1D slice for the EXT problem from the easy 2D slice of the SAT problem given by your fixed instance . It is interesting that Strassen’s complexity measure described next does not go up under slices either.
Slices can be used as supports of distributions. You can even take a section which is an adjacent set of slices. Further you can induce distributions on subsets of the global solution space that in a sense represent “phases.”. Thus slices are part of the conceptual environment in Deolalikar’s paper. They are not, however, where the main problem is believed to lie.
Volker’s Complexity Measure
Recall we talked about arithmetical formulas last week here. If you allow the and gates to connect to each other in a directed acyclic graph rather than just a tree, and allow multiple output gates, then you get an arithmetical circuit . Suppose has input variables and output gates . As with formulas, each is a polynomial in (some or all of) the inputs . Thus the circuit computes a polynomial mapping , which is given by the tuple of polynomials . If we think of as variables standing for the values, you can represent the mapping by the set of equations (each set to zero)
We are now ready to define Strassen’s measure. We have equations in unknowns. We can get a well-specified system by adding more equations that are affine-linear, allowing a constant term. The resulting system is well-specified if its solution space is a finite set of single points in -dimensional space. What is the maximum size of such a finite set? That is Strassen’s measure, . What he proved is,
Theorem: Every arithmetic circuit computing must have size at least .
Walter Baur later helped him extend this to lower-bound the circuit size of a single polynomial , by taking to be the set of first partial derivatives of (so ). The key lemma needed was also covered in the earlier Strassen post. A simple example is the polynomial This gives
Make more equations for each . The resulting system becomes for each . Over the complex numbers each has roots, and the equations are independent, so we get points. Strassen’s theorem then concludes that every arithmetical circuit that computes must have at least gates. That yields an ironclad non-linear, lower bound on computing , allowing a constant of about that comes from the Baur-Strassen lemma. Unfortunately no better lower bound is known for any reasonably explicit family of polynomials—but at least that’s better than the situation in Boolean circuit theory, where no non-linear lower bound for any function in is known at all.
How the Measure Plays Nice
It is immediate that Strassen’s measure is non-increasing under slices. A slice is the same as attaching a new affine linear equation. For example if you want to substitute for , you can add the equation . Since the definition of takes a maximum over all choices of such equations to add, adding one can’t give it a higher value.
Non-increasing under projections is a little trickier, but boils down to what we said above about the number of connected components. There is no way to get more single points by taking a projection of the ultimate set than it already has. Now let’s see how projections are used in Strassen’s proof.
Consider a hypothetical circuit computing the mapping . Let us assign not only variables to the output gates of , but also variables to the intermediate gates . For every binary addition gate getting inputs from gates and , possibly multiplied by constants and respectively, adjoin the equation
And if is a multiplication gate, adjoin . Note that and/or might be an input variable and/or (importantly, is possible too), or a constant, and that might be an output gate, making it have a “-variable” rather than a . These added equations make a system and define a solution space over a larger universe of variables that includes the -variables. The key observation is:
The graph of the mapping is obtained by projecting the -variables out of the solution space of .
Since every equation introduces a new variable, the full system of equations has the essential properties of a mapping needed to well-define . By the non-increasing property under projections, . Finally to upper-bound , consider what happens when we add gate equations one-at-a-time. Each addition gate gives a linear equation (affine if one input to the gate is a constant), and we already know from slices that this does not increase . Thus the entire lower bound follows from observing that adjoining the quadratic equation can at most double as is built up. This follows from a special case of Bézout’s Theorem, for which an elementary argument can be found in Madhu Sudan’s course notes here. The argument comes down to a polynomial of degree in a single variable having at most roots.
We have thus proved a lower bound on by proving an upper bound on . Note that Strassen’s lower bound actually applies to the number of multiplication gates in the circuit alone. We proved , thus . Hence we learned more than we set out to prove, which is a hallmark of a good proof. Can we learn even more than this here? Alas perhaps not.
The Problem With Projections
The problem with the above observation about being a projection of is that it does not care that the circuit is deterministic. Because it is deterministic, every input and correct output has a unique “solution” that satisfies the expanded set of equations . However, the concepts used do not care that these solution points are unique, and do not avail determinism in any other way. Thus again we learn more—we could frame a way that Strassen’s technique gives the same lower bound on some kind of nondeterministic arithmetic circuits. But this is more power than we want, applied to something other than what we want, which is to understand deterministic circuits.
The problem also shows up as a limitation on the measure itself. For a single polynomial of degree in variables, the maximum possible on the first partials of is . When itself is , meaning that comes from a family of low-degree polynomials, the log of this is . This means that the best possible lower bound which Strassen’s technique alone can prove for low-degree polynomials such as the permanent is , no more. This already is maxed out by the simple polynomial given above, where it is asymptotically tight because can be computed with multiplication gates via iterated squaring.
We speculate that these lacks may owe ultimately to the single fact that Strassen’s plays nice. It is non-increasing under projections, as well as slices. Here we have actually concealed a little shock—the central concept of the theory that arguably underlies this work is not itself well defined under projections. Namely, put in your head the solution space of the equations and . Now projecting out the co-ordinate leaves the line , but missing the origin . Missing this point prevents the projection from itself being the solution space of a set of polynomial equations. Superficially this resembles one of the concrete counterexamples on Tao’s wiki page to preservation of Deolalikar’s “pp” measure (while the other reminds of the lower bounds on parity). Strassen’s proof is able to escape, however, because the , example is not the graph of a mapping (most to the point, its solution space is not an irreducible variety), while his tools work well for mappings.
Ironically this puts Deolalikar’s “pp” complexity measure in the opposite situation from Strassen’s It does not stay nice under projections—per the counterexamples. If it did stay nice, then steps 1. and 2. at the top of this post would mesh, and presumably he would be able to complete a proof of by fixing issues with each step separately.
It remains to ask whether it can be modified in a way that sidesteps the problem like Strassen’s did. Here we can revisit our intuition about projections from the beginning:
Can a “statistical” property of distributions ever become more complex under projections?
If by “statistical” we mean purely numerical or physical properties encountered in everyday situations, the answer may seem to be “no,” as surmised by one of my colleagues in computer vision. However, two others postulated an example where a high-entropy distribution in a 2D plane masks a low-entropy one in the 3D interior. More crudely, one can picture a complex 2D pattern simply being replicated along the axis of projection.
A “no” answer would be great if Deolalikar’s “pp” measure were solely statistical. However, the measure’s definition involves the existence of an acyclic graph on which the numerical properties are mounted. This graph is very like the circuits we have been talking about. Hence this graph is very much like a computation. This parallel was demonstrated most aptly by commenter “vloodin,” with a proof now linked from the wiki page that the needed graphs can arise directly from consideration of a polynomial-time (oblivious) Turing machine, without needing to go through the FO(LFP) characterization of . The final rub may be that Deolalikar’s measure is more “computational” than “statistical,” so that the example of projections giving SAT or EXT from an easy set becomes the ruling obstacle.
Can we hope to prove lower bounds with a complexity measure that is preserved under projections? Can we hope to prove lower bounds with a complexity measure that is not preserved under projections? What happens with slices?
Can the arithmetical lower bounds be carried over to Boolean circuit complexity? The Baur-Strassen technique fails over finite fields.
Can results over infinite fields lead work over discrete domains at all? Is this like, or unlike, the history of the Prime Number Theorem?