Projections can be tricky

Henri Lebesgue is famous for the creation of modern measure theory. His definition of what we now call the Lebesgue Integral replaced previous notions of integrals. These were due to such great mathematicians as Isaac Newton, Gottfried Leibniz, Augustin-Louis Cauchy, and Bernhard Riemann. His definition was one of the landmark achievements of mathematics. His notion has many beautiful properties that all previous definitions lacked.

Today Ken and I wish to talk about the role that projections play in mathematics. And further how tricky the seemingly simple concept of projection can be.

The main reason we are interested in discussing projections is to raise points of comparison between these two developments:

1. A classic result of Volker Strassen on lower bounds for arithmetic circuits. His result, even after 37 years, is the best known general lower bound on arithmetic computation.
2. A recent claimed result of Vinay Deolalikar on Turing Machine lower bounds. His claim, if true, would be the best result ever on Turing lower bounds.

Both attacks need to face and handle issues that arise from projections. Strassen does this in a clever way that we will describe shortly in some detail. It is a beautiful argument. Deolalikar may or may not know how to handle projections. We will see. But, this issue of projections in his “proof” has emerged as the main conceptual gap in his proof strategy for ${\mathsf{P} \neq \mathsf{NP}}$.

How hard can the notion of projection be? After all a projection is just the notion of a shadow—right? Well it can be very tricky. The great Lebesgue once made a “throw-away” remark about how sets behave under projections in a monograph. Here is a wonderful summary of the error:

Lebesgue in his preface (to Lusin’s book) humorously points out that the origin of the problems considered by Lusin lies in an error made by Lebesgue himself in his 1905 memoir on functions representable analytically. Lebesgue stated there that the projection of a Borel set is always a Borel set. Lusin and his colleague Souslin constructed an example showing that this statement was false, thus discovering a new domain of point sets, a domain which includes as a proper part the domain of Borel sets. Lebesgue expresses his joy that he was inspired to commit such a fruitful error.

Let’s now turn to discuss projections, and how Strassen handles them while perhaps Deolalikar does not. But, that a great mathematician like Lebesgue can be mistaken about projections should be a reminder to us all. The rest of this post is by Ken Regan.

Let’s Be Nice

Let us abbreviate Deolalikar’s key concept of “polylog parameterizability” of a distribution by the English word “nice.” Then the facts established by his paper, as summarized on the wiki page opened by Terence Tao, become:

1. If ${\mathsf{P} = \mathsf{NP}}$, then the solution space of any ${k}$-SAT problem with at least one solution supports a projection of a nice distribution.
2. The hard phase of ${k}$-SAT does not have a nice distribution.

Subsequent to Tao’s original comment on this, the paper’s public examiners have agreed that these two facts do not, and conceptually cannot, combine to make a proof of ${\mathsf{P} \neq \mathsf{NP}}$. The extra clause “projection of” severs the link. In fact, the need for a projection cycles back upon the ${\mathsf{P} = \mathsf{NP}}$ problem itself, as we shortly explain. The question this raises for anyone working similarly on lower bounds is,

Do your concepts stay nice under projections?

We will see how Strassen managed with a concept that stays nice, but then explore whether any strategy for better lower bounds can do so.

Projections

Consider your shadow cast by a broad light against a back wall which we will call the ${y,z}$ plane. If the light is level with your head along the ${x}$-axis, then your silhouette on the wall is a mathematical projection that takes an especially simple form: it removes the ${x}$-coordinate. This is a natural physical concept, so you wouldn’t think it typifies the abstract idea of nondeterminism by which ${\mathsf{NP}}$ is defined. But it does. The link is shown by noting:

A point ${(y,z)}$ is in the projection if and only if there exists an ${x}$ such that the point ${(x,y,z)}$ is in the 3D image of your head.

Now the point ${(x,y,z)}$ might be unique—it might be the tip of your nose. Or there might be many such “solution points,” as with light (not) going through your skull. Either way, the “exists” makes it an ${\mathsf{NP}}$ matter.

To see why, first suppose ${y,z}$ somehow code a Boolean formula ${\phi}$ in some number ${n}$ of variables, and that ${x}$ represents numbers from ${0}$ to ${2^n - 1}$ that encode assignments to ${\phi}$. Then the “3D image” is the set ${S}$ of cases where ${\phi(x) = \mathtt{true}}$. This is an easy set in ${\mathsf{P}}$. The projections, however, form the language of formulas ${\phi}$ for which there exists an ${x}$ making ${\phi(x) = \mathtt{true}}$. This is the ${\mathsf{NP}}$ complete set SAT.

The analogy to Vinay Deolalikar’s paper can be made even better by considering just ${z}$ to encode a formula, and ${y}$ to be a partial assignment to some of its variables. Then the projection becomes the language EXT described in this post. Either way, the upshot is that in computational complexity we believe that a projection can make a set more complicated.

The idea that your shadow can be more complex than you sounds laughable. If anything, features of your head get simplified. If you have a double chin, it’s gone. Your hair may now mask your ears. If people enter the room, their shadows may combine with yours. But you will never, ever, see more people in a shadow than you have in the room.

The abstract mathematical fact is that the number of connected components of an image can never increase under projections. This is so even under general kinds of projections that may be onto an arbitrary subspace, even an affine subspace meaning one not through the origin, or may be stereographic meaning from a pinpoint not broad source. It seems that anything “physical” or “statistical” about a structure should become simpler when you take a projection. Or at least it should not get more complex.

Projections should not be confused with slices. A slice is what you get by fixing a variable, or substituting an affine linear expression for a variable. If you fix a formula ${\phi}$, its solution space ${S_\phi = \{x: \phi(x) = \mathsf{true}\}}$ is a slice of the global solution space ${S}$ we defined above.

If you take a thin slice of a donut across the hole, you get two circular pieces. Thus it’s not a surprise that the complexity can go up with slices. In fact, you might start with a formula ${\phi}$ that has many easy-to-find solutions, but begin with choices ${y_1,y_2,\dots}$ for the “${y}$” variables that make it very difficult to find a good answer with the “${x}$” variables. You have thus created a difficult 1D slice for the EXT problem from the easy 2D slice of the SAT problem given by your fixed instance ${\phi}$. It is interesting that Strassen’s complexity measure described next does not go up under slices either.

Slices can be used as supports of distributions. You can even take a section which is an adjacent set of slices. Further you can induce distributions on subsets of the global solution space ${S}$ that in a sense represent “phases.”. Thus slices are part of the conceptual environment in Deolalikar’s paper. They are not, however, where the main problem is believed to lie.

Volker’s Complexity Measure

Recall we talked about arithmetical formulas last week here. If you allow the ${+}$ and ${*}$ gates to connect to each other in a directed acyclic graph rather than just a tree, and allow multiple output gates, then you get an arithmetical circuit ${C}$. Suppose ${C}$ has input variables ${x_1,\dots,x_n}$ and output gates ${y_1,\dots,y_s}$. As with formulas, each ${y_j}$ is a polynomial ${g_j(x)}$ in (some or all of) the inputs ${x_i}$. Thus the circuit computes a polynomial mapping ${G}$, which is given by the tuple of polynomials ${(g_1(x),g_2(x),\dots,g_s(x))}$. If we think of ${y_1,\dots,y_s}$ as variables standing for the values, you can represent the mapping by the set of equations (each set to zero)

$\displaystyle G = \{y_1 - g_1(x),\; y_2 - g_2(x),\;\dots\;y_s - g_s(x)\}.$

We are now ready to define Strassen’s measure. We have ${s}$ equations in ${n+s}$ unknowns. We can get a well-specified system by adding ${n}$ more equations that are affine-linear, allowing a constant term. The resulting system is well-specified if its solution space is a finite set ${F}$ of single points in ${(n+s)}$-dimensional space. What is the maximum size ${\mu}$ of such a finite set? That is Strassen’s measure, ${\mu = \mu(G)}$. What he proved is,

Theorem: Every arithmetic circuit ${C}$ computing ${G}$ must have size at least ${\log_2(\mu(G))}$.

Walter Baur later helped him extend this to lower-bound the circuit size of a single polynomial ${p(x)}$, by taking ${G}$ to be the set of first partial derivatives of ${p}$ (so ${s = n}$). The key lemma needed was also covered in the earlier Strassen post. A simple example is the polynomial ${p(n) = x_1^n + \dots + x_n^n}.$ This gives

$\displaystyle G_p = \{y_1 - n x_1^{n-1},\;\dots,\; y_n - n x_n^{n-1}\}.$

Make ${n}$ more equations ${y_i = n}$ for each ${i}$. The resulting system becomes ${x_i^{n-1} = 1}$ for each ${i}$. Over the complex numbers each has ${n-1}$ roots, and the equations are independent, so we get ${(n-1)^n}$ points. Strassen’s theorem then concludes that every arithmetical circuit that computes ${G_p}$ must have at least ${n\log_2(n-1)}$ gates. That yields an ironclad non-linear, ${\Omega(n\log n)}$ lower bound on computing ${p}$, allowing a constant of about ${4}$ that comes from the Baur-Strassen lemma. Unfortunately no better lower bound is known for any reasonably explicit family of polynomials—but at least that’s better than the situation in Boolean circuit theory, where no non-linear lower bound for any function in ${\mathsf{E}^{\mathsf{NP}}}$ is known at all.

How the Measure Plays Nice

It is immediate that Strassen’s measure ${\mu(G)}$ is non-increasing under slices. A slice is the same as attaching a new affine linear equation. For example if you want to substitute ${x_1 + 3x_2 - 7}$ for ${x_3}$, you can add the equation ${x_3 - x_1 - 3x_2 + 7 = 0}$. Since the definition of ${\mu(G)}$ takes a maximum over all choices of such equations to add, adding one can’t give it a higher value.

Non-increasing under projections is a little trickier, but boils down to what we said above about the number of connected components. There is no way to get more single points by taking a projection of the ultimate set ${F}$ than it already has. Now let’s see how projections are used in Strassen’s proof.

Consider a hypothetical circuit ${C}$ computing the mapping ${G}$. Let us assign not only variables ${y_j}$ to the output gates of ${C}$, but also variables ${z_k}$ to the intermediate gates ${g_k}$. For every binary addition gate ${g_k}$ getting inputs from gates ${g_i}$ and ${g_j}$, possibly multiplied by constants ${c}$ and ${d}$ respectively, adjoin the equation

$\displaystyle z_k - cz_i - dz_j = 0.$

And if ${g_k}$ is a multiplication gate, adjoin ${z_k - z_i z_j}$. Note that ${z_i}$ and/or ${z_j}$ might be an input variable ${x_i}$ and/or ${x_j}$ (importantly, ${i=j}$ is possible too), or a constant, and that ${g_k}$ might be an output gate, making it have a “${y}$-variable” ${y_k}$ rather than a ${z_k}$. These added equations make a system ${G_C}$ and define a solution space ${S'}$ over a larger universe of variables that includes the ${z}$-variables. The key observation is:

The graph of the mapping ${G}$ is obtained by projecting the ${z}$-variables out of the solution space ${S'}$ of ${G_C}$.

Since every equation introduces a new variable, the full system ${G_C}$ of equations has the essential properties of a mapping needed to well-define ${\mu(G_C)}$. By the non-increasing property under projections, ${\mu(G) \leq \mu(G_C)}$. Finally to upper-bound ${\mu(G_C)}$, consider what happens when we add gate equations one-at-a-time. Each addition gate gives a linear equation (affine if one input to the gate is a constant), and we already know from slices that this does not increase ${\mu(G_C)}$. Thus the entire lower bound follows from observing that adjoining the quadratic equation ${z_k - z_i z_j = 0}$ can at most double ${\mu(G_C)}$ as ${C}$ is built up. This follows from a special case of Bézout’s Theorem, for which an elementary argument can be found in Madhu Sudan’s course notes here. The argument comes down to a polynomial of degree ${d}$ in a single variable having at most ${d}$ roots.

We have thus proved a lower bound on ${C}$ by proving an upper bound on ${\mu(G_C)}$. Note that Strassen’s lower bound actually applies to the number ${m}$ of multiplication gates in the circuit alone. We proved ${\mu(G_C) \leq 2^m}$, thus ${m \geq \log_2(\mu(G))}$. Hence we learned more than we set out to prove, which is a hallmark of a good proof. Can we learn even more than this here? Alas perhaps not.

The Problem With Projections

The problem with the above observation about ${G}$ being a projection of ${G_C}$ is that it does not care that the circuit ${C}$ is deterministic. Because it is deterministic, every input ${x}$ and correct output ${y}$ has a unique “solution” ${z}$ that satisfies the expanded set of equations ${G_C}$. However, the concepts used do not care that these solution points are unique, and do not avail determinism in any other way. Thus again we learn more—we could frame a way that Strassen’s technique gives the same lower bound on some kind of nondeterministic arithmetic circuits. But this is more power than we want, applied to something other than what we want, which is to understand deterministic circuits.

The problem also shows up as a limitation on the measure ${\mu}$ itself. For a single polynomial ${p}$ of degree ${d}$ in ${n}$ variables, the maximum possible ${\mu(G_p)}$ on the first partials of ${p}$ is ${(d-1)^n}$. When ${d}$ itself is ${n^{O(1)}}$, meaning that ${p}$ comes from a family of low-degree polynomials, the log of this is ${O(n\log n)}$. This means that the best possible lower bound which Strassen’s technique alone can prove for low-degree polynomials such as the permanent is ${\Omega(n\log n)}$, no more. This already is maxed out by the simple polynomial ${p}$ given above, where it is asymptotically tight because ${x^n}$ can be computed with ${\sim\log_2 n}$ multiplication gates via iterated squaring.

We speculate that these lacks may owe ultimately to the single fact that Strassen’s ${\mu}$ plays nice. It is non-increasing under projections, as well as slices. Here we have actually concealed a little shock—the central concept of the theory that arguably underlies this work is not itself well defined under projections. Namely, put in your head the solution space of the equations ${xy = 1}$ and ${xz = 1}$. Now projecting out the ${x}$ co-ordinate leaves the line ${y = z}$, but missing the origin ${(0,0)}$. Missing this point prevents the projection from itself being the solution space of a set of polynomial equations. Superficially this resembles one of the concrete counterexamples on Tao’s wiki page to preservation of Deolalikar’s “pp” measure (while the other reminds of the lower bounds on parity). Strassen’s proof is able to escape, however, because the ${xy = 1}$, ${xz = 1}$ example is not the graph of a mapping (most to the point, its solution space is not an irreducible variety), while his tools work well for mappings.

Ironically this puts Deolalikar’s “pp” complexity measure in the opposite situation from Strassen’s ${\mu}.$ It does not stay nice under projections—per the counterexamples. If it did stay nice, then steps 1. and 2. at the top of this post would mesh, and presumably he would be able to complete a proof of ${\mathsf{P} \neq \mathsf{NP}}$ by fixing issues with each step separately.

Endgame?

It remains to ask whether it can be modified in a way that sidesteps the problem like Strassen’s ${\mu}$ did. Here we can revisit our intuition about projections from the beginning:

Can a “statistical” property of distributions ever become more complex under projections?

If by “statistical” we mean purely numerical or physical properties encountered in everyday situations, the answer may seem to be “no,” as surmised by one of my colleagues in computer vision. However, two others postulated an example where a high-entropy distribution in a 2D plane masks a low-entropy one in the 3D interior. More crudely, one can picture a complex 2D pattern simply being replicated along the axis of projection.

A “no” answer would be great if Deolalikar’s “pp” measure were solely statistical. However, the measure’s definition involves the existence of an acyclic graph on which the numerical properties are mounted. This graph is very like the circuits ${C}$ we have been talking about. Hence this graph is very much like a computation. This parallel was demonstrated most aptly by commenter “vloodin,” with a proof now linked from the wiki page that the needed graphs can arise directly from consideration of a polynomial-time (oblivious) Turing machine, without needing to go through the FO(LFP) characterization of ${\mathsf{P}}$. The final rub may be that Deolalikar’s measure is more “computational” than “statistical,” so that the example of projections giving SAT or EXT from an easy set ${S}$ becomes the ruling obstacle.

Open Problems

Can we hope to prove lower bounds with a complexity measure that is preserved under projections? Can we hope to prove lower bounds with a complexity measure that is not preserved under projections? What happens with slices?

Can the arithmetical ${\Omega(n\log n)}$ lower bounds be carried over to Boolean circuit complexity? The Baur-Strassen technique fails over finite fields.

Can results over infinite fields lead work over discrete domains at all? Is this like, or unlike, the history of the Prime Number Theorem?

1. August 19, 2010 2:02 pm

For some footnotes: A flip side of “projections” is “padding”, and this has affected key mathematical concepts of Ketan Mulmuley’s and Milind Sohoni’s /Geometric Complexity Theory/ (GCT) approach from the beginning. For a graphic illustration of the kind of embedding relation between structures associated to the permanent and determinant (respectively) that GCT seeks to show is impossible unless the permanent has high complexity [or rather, unless one uses very high “N” for the determinant], see slide 23 of
this presentation by Scott Aaronson.

I decided not to address the difference between string-valued and scalar-valued variables. With the former, or with appropriate decoding primitives, one can say that every r.e. language is a projection of a poly-time decidable predicate. Take it as implied than when the article moves to circuits as the basic model, it’s talking about a fixed polynomial number of extra scalar variables. This also could have led me to discuss compactness in the continuous-space domain, but this sort of thing can be done in comments…

• August 19, 2010 3:29 pm

Dear Kenneth, it would be very nice to have a full post on Mulmuley’s GCT approach (and its latest status) at sometime, addressing how it may or may not evade known barriers. As you have written a review on the approach some time ago, I guess this would be a perfect match! 🙂

2. August 19, 2010 2:56 pm

Ah, my clause “Since every equation introduces a new variable…” leading off one paragraph should change “introduces” to “is set equal to”, or should append “…as a term to itself.” This can be relaxed, but that was the intent.

3. August 19, 2010 5:18 pm

It might be worth mentioning that every set in NP arises as a projection of a set in P. (Given X in NP, the set Y = {(x,p(x)} consisting of members x of X each paired with a proof of membership of x in X is in P, and X is the projection of Y on its first coordinate.) Hence if projection were to preserve time complexity, even only to within a polynomial, P = NP would be an immediate consequence.

As a case in point, the set of all solvable Sudoku puzzles of a given shape and size having the same blank squares is the projection of the set of Sudoku solutions of that shape and size obtained by projecting out, or erasing, the blank squares in those puzzles. Projection is what makes Sudoku a challenge even though every solution is easily verified.

The computability counterpart of this is even stronger: a set is recursively enumerable if and only if it is the projection of a recursive set. This gives a slick characterization of the r.e. sets, namely as those sets arising as the projection of a recursive set.

Noting that the “if” direction of this connection is missing in the case of the first paragraph, it’s a nice exercise to work out the analogous characterization of NP in terms of P via projection. (Hint: first show that every r.e. set arises as the projection of a set in P.)

August 20, 2010 12:54 am

Is there a nice notion of sparseness of a projection?

August 20, 2010 2:23 pm

And if you wish to check your answers to the exercise, you may look here:

http://mathoverflow.net/questions/34832/correct-to-characterise-np-set-as-p-time-image-of-p-set

=)

• August 20, 2010 9:44 pm

The notion shown in the exercise (second answer) is called “polynomial honesty” in the literature. My “footnote” comment above seeks the same effect by insisting that variables be scalars [over some finite domain] and a fixed polynomial bound on how many coordinates get zapped.

Sparseness of a projection—hmmm, how to define that? I wonder if you mean something even tighter than linear honesty? If you mean that at most polylog(n) extra co-ordinates get removed, then [again, over a finite domain] you could try all combinations and still run in quasi-polynomial time. I would use it to mean o(n) extra coordinates—maybe O(n/log(n))—and then it does verge on things Dick and I have been thinking about…

August 19, 2010 9:30 pm

The idea that your shadow can be more complex than you sounds laughable.

…unless you’re a geometer, I guess.

Make two fences, each out of n closely-spaced parallel sticks, like this: |||||||. Turn one of the fences 90 degrees so its n sticks are horizontal. Now make the planes of these two fences parallel. By any reasonable definition of “complexity”, this geometric object has complexity O(n). But with the light in the right direction, its shadow is an n×n grid, which has (again using any reasonable definition) complexity Ω(n²).

Less intuitively, with three fences made from straight sticks and an area light source, you can actually make curved shadows.

Similarly, arbitrarily smooth curves can have arbitrarily non-smooth shadows.

August 20, 2010 7:22 am

This is a loosely related question, that I would be grateful if anyone could give references to:

Let’s say we are sitting by a lake, surrounded by trees. Imagine the water surface to be a perfect reflection (i.e. no waves), and in the reflection you see the trees – they appear exactly as the trees we can see surrounding the lake, hence it is easy to imagine that the reflected tree corresponds to a real shape.
Now a small wave disturbes the surface and the reflection gets distorted. As the wave is small, we can somehow still imagine that if we didn’t know of the wave, the reflection actually corresponds to a real albeit ‘bent’ tree out there, fully possible to exist.
Finally, the waves get amplified and the reflection fails to be connected and we are not able to understand what kind of shape would correspond to the reflection that we see, if we are unaware of the surface being distorted.
My question is: is this a well defined mathematical problem, finding the maximum amplitude of the wave?
Also, is it possible (using PDEs?) to reconstruct the distorted object; that is actually finding the description of the ‘bent’ tree, given the real tree and the surface in which it is reflected? I assume that the wave has to be small for this to work?

August 19, 2010 10:50 pm

Projections are notoriously tricky buggers in any type of problem. In algebraic geometry, projections correspond to *elimination*, a process so algorithmically painful that early 20th century mathematicians cheered when they found ways to bypass it.

• August 20, 2010 10:11 pm

Replying to JeffE then Random: My first reaction to your “fence” idea is that the same structure is present in both cases. However, in the 3D case it is sparse, and would register lower complexity than the 2D pattern in a measure that divides by volume (respectively, area). The entropy-based examples from my two vision colleagues seem to be analogous.

While writing the post I was trying to think of a smooth 3D curve with a non-smooth 2D projection, something “fractal” I was thinking, but I couldn’t. Still can’t find with a quick Google search after a long day, so I’d love to see an example.

Random, funny to hear you speak thus about elimination—after using the Gröbner-based software package called Singular for over a decade, it’s a natural algorithm for me. But even there, often my trials in an elimination order run for too many hours/days, so I try with a faster degree-based order to see if I can get enough idea of a possible solution to continue.

What I’m mainly after, however, is some theory on the behavior of polynomial ideals under substitutions, even “slices”. Identifying sets of variables or substituting constants is sometimes called a “Valiant projection”, and every formula is “projectable” that way from a read-once formula of the same size. The first partials of a read-once formula (and some more general cases) form a Gröbner basis. I had an attack based on identifying pairs of variables one-at-a-time, but in some experiments the Gröbner complexity mushroomed already after only a few such substitutions into a read-once formula, and there are counterexamples to the big-kahuna goal it had, so I abandoned it.

August 22, 2010 5:09 pm

I had something very specific in mind when I wrote my post, but I couldn’t find the quote again. It was from Andre Weil’s Foundations of algebraic geometry, and it goes: “The device that follows, which, it may be hoped, ﬁnally eliminates from algebraic geometry the last traces of elimination theory […]”.

I first encountered that sentence in Abhyankar’s Historical Ramblings in Algebraic Geometry. Needless to say Abhyankar is less than impressed by this attitude, and of course computers have sparked a renewed interest in the subject (with algorithms that have substantially improved on Buchberger’s ideas). Still, there is no doubt that projections are not trivial (and the complexity gets worse over the reals!).

August 23, 2010 12:06 pm

While writing the post I was trying to think of a smooth 3D curve with a non-smooth 2D projection

The image of the moment curve (t, t^2, t^3) is smooth. The image of its projection (t^2, t^3) into the yz-plane is not even differentiable at the origin.

August 20, 2010 12:43 am

This is a really nice analogy, thanks.

Though, I think that while the 3D example with the shadow of a person is great, a 2D version might be even easier to grasp and reason about.

(It might also help with vagueness about “solution spaces” – i.e. is the solution space a single column for a given x? the set of all columns? the projection over rows? what is on your axes – formulas/assignments, or instances/non-deterministic bits of a general NP problem? etc.)

August 20, 2010 4:25 am

In data mining and statistics, projections appear prominently. For example, the famous EM algorithm, is a sequence of projections on the manifold defined by the parameters of the probability distribution. In fact to estimate the parameters of a pdf f(x), it is often written as a sum (over z) of a another distribution g(x,z) which has a “nice” form.

August 20, 2010 6:47 am

Just found that Deolalikar’s paper is now available as a Technical Report on the website of HP Laboratories.
http://www.hpl.hp.com/techreports/2010/HPL-2010-95.html

Does anyone know if this is the latest version that was sent to journals and supposedly fixes the known bugs?

9. August 20, 2010 1:11 pm

I want to expand on Sanjay Chawla’s post by observing that projections (and their natural generalization pullbacks) are ubiquitous nowadays in *every* STEM discipline … and I will further cite some references to the effect that the topic of projection-and-pullback is taught very poorly at the undergraduate level.

Hopefully students (especially) may have fun looking into some of these references …

A fun start is William L. Burke’s samizdat classic of applied mathematics, Div Grad and Curl Are Dead. Regrettably, Prof. Burke died in an accident before his book was published … fortunately it is available on-line (in a draft version).

In Burke’s “Chapter 0: Prolegomena” (page 7 of PDF) we read the following hilariously transgressive polemic:

——————————
I am going to include some basic facts on linear algebra, multilinear algebra, affine algebra, and multi-affine algebra. Actually I would rather call these linear geometry, etc., but I follow the historical use here.

You may have taken a course on linear algebra. This to repair the omissions of such a course, which now is typically only a course on matrix manipulation.

The necessity for this has only slowly dawned on me, as the result of email with local mathematicians along the lines of:

Mathematician: When do you guys (scientists and engineers) treat dual spaces in linear algebra?

Scientist: We don’t.

Mathematician: What! How can that be?
——————————

Some other texts that take an enjoyably “Burkean” attitude toward projection and pullback are Feynman’s 1961 Quantum Electrodynamics, Arnold’s 1978 Mathematical Methods of Classical Mechanics, Troy Schilling’s 1996 thesis Geometry of Quantum Mechanics … and worthy of special mention (IMHO) is John Lee’s 2003 Introduction to Smooth Manifolds; this is IMHO a wonderful text (it is known to grad students as “smooth introduction to manifolds”).

These works all teach a mathematically natural view of dynamics in general—and projection-and-pullback in particular—that is very much in the spirit of Dick’s previous theme “Mathematics That Teaches Us Why” … this “teach us why” spirit represents (one can hope) an exceptionally fruitful organizing principle for 21st century STEM education.

The preceding is why (for us engineers) a particularly novel and fruitful aspect of Ken and Dick’s discussion of projection, is its emphasis that “Projections Can Be Informatically Tricky.” This informatic theme is not developed in any of the preceding textbooks, or in any other textbook on dynamics that I know. Ken and Dick’s post amounts to a compelling argument that students benefit greatly from appreciating “projective trickiness” in all its aspects … and that Vinay Deolalikar’s manuscript in particular might have benefitted from a great appreciation of it.

So it seems (to me) there are some terrific “projective” opportunities here both in STEM research, teaching, and practical applications. It’s this three-fold impact that (for me) makes the general topic of “projective trickiness” especially interesting.

August 20, 2010 2:43 pm

This reminds me of when I took Linear Algebra in college, and it was all about linear transformations on vector spaces, null spaces, norms, various decompositions into subspaces, and Jordan and Rational Canonical Forms, and the like.

Now in the middle of the course, I came across an old book from the 1950’s in the library, which was titled “Matrix Algebra.” In it was everything from my Linear Algebra course, but with nary a mention of vector spaces or linear transformations. It was all about MxN matrices of numbers.

So I went back to the professor, who was somewhat of a renowned algebraist, and asked, “Why introduce the unnecessary abstractions of vector spaces and linear transformations, when everything we’re doing can be done using only ordinary arithmetic on numeric matrices?”

He replied, “Because people who think of Linear Algebra in terms of vector spaces and linear transformations do more interesting things than people who think of Linear Algebra in terms of arithmetic on matrices of numbers.”

Just to show things come full circle, a friend of mine who recently completed a BS in Physics took Linear Algebra, and his course was all examples of matrix arithmetic using Matlab, with nary a mention of vector spaces, linear transformations, or anything abstractly mathematical.

• August 20, 2010 3:25 pm

Fnord, all that you say about algebra, applies with redoubled force to geometry and to dynamics (and many other mathematical disciplines too).

It is a shocking fact that nowadays students can earn an undergraduate degree in almost any STEM discipline without ever having been exposed—in even one lecture or textbook—to even such a fundamental notion as mathematical “naturality”.

Although these graduating students will have learned many mathematical facts, only a few of them will have been so fortunate as to discover spontaneously that (in Bill Thurston’s words) “Mathematics is an art of human understanding [that] we feel with our whole brain.”

Thurston’s mathematical-yet-human understanding is increasingly central to all forms of creative and synthetic activity within the STEM community … it is necessary therefore that we find better ways to communicate it … otherwise we will neither grasp the wonderful opportunities of the 21st century, nor meet its urgent challenges … and accounts for a large measure of my interest in this and other STEM blogs.

For which, my thanks and appreciation are extended to Ken and Dick, for extending (yet again) at least one reader’s notions of mathematical-yet-human naturality.

• August 20, 2010 10:52 pm

Thanks also for the appreciation—and e-mail too. Your use of “natural” may be aligning with the intent of Razborov and Rudich in using that word, in hinting that the kind of “nice” math we cognitively “grok” might be powerless on circuit lower bounds.

This also gives me a pretext to mention something I cut from the article. Einstein’s quote “Raffiniert ist der Herrgott, aber boshaft ist er nicht” has been given quasi-religious meaning (“Subtle is the Lord…”), but I think it was only speaking to a difference between physics and pure-math research. He clarified it to Oswald Veblen as: “Die Natur verbirgt ihr Geheimnis durch die Erhabenheit ihres Wesens, aber nicht durch List”, which I render non-standardly as “Nature hides her secrets as the perquisite of lofty birth, but not through mean trickery.” That is to say, in physics—and especially in a geometric theory like relativity—your intuition will not be wantonly deceived. If it is looking high enough, it will succeed, even if it must get by singularities. In math, however, it seems that even on fundamental grounds alone, any nastiness that can occur, mustoccur—somewhere saliently. Certainly if one works in the Riemann Hypothesis, where so many things can be true of the first quadrillion or so integers and then turn south, one must be prepared for “All Boshaft, All The Time.” The “barriers” in computational complexity often feel the same…

August 20, 2010 1:43 pm

I thought this was an excellent and provocative post. It took me back to thinking about projective (and injective) modules and the like, and started raising questions about the algebra of projections, and the implicit question “how does one choose a good projection” (Strassen managed).

I happen to be reading HSM Coxeter on Projective Geometry just now – and also the recent biography of Paul Dirac (The Strangest Man) – which suggests that his early ideas in Quantum Theory were in part built on his early training in, and ability to visualise and intuit within, projective geometry.

I am left pondering just how much mathematics can be conceived from a projective point of view.

Vaughan’s comment above shows how some particularly natural projections illuminate particular problems. I suppose that one question, from the projective side of things, would be whether there were other less obvious projections (orthogonal in some sense?) which would reveal more tractable elements of the structure.

August 21, 2010 7:57 am

http://en.wikipedia.org/wiki/Stephen_Cook#Biography

But he had to sacrifice his own reputation for it.

12. August 21, 2010 8:41 am

This question came up in the earlier discussion of the “proof”. I did not find the responses satisfactory. Let me state it more clearly.

Let A consist of the following data: a Turing machine algorithm accepting bit strings, and a polynomial time bound (absolute – i.e., not asymptotic). We define n(A) to be the least length of a bit string x such that when the Turing machine algorithm runs at input x for the number of steps given by the time bound declared in A (as a function of the length of x), the resulting computation witnesses that A is not an algorithm for SAT. I.e., either A encodes a tautology and the algorithm does not return “yes” within the time bound declared in A, or A does not encode a tautology and the algorithm does not return “no” within the time bound declared in A.

If P != NP is provable in a weak formal system, then we do get an upper bound on n(A) as a function of the size of A. Roughly speaking, the weaker the formal system, the better this upper bound is. This is at least implicit in the literature.

However, if we merely know that P != NP, then can we get any upper bound on n(A) at all? I was thinking “no”, but perhaps I am missing something important here.

• August 21, 2010 12:13 pm

(I assume you mean “x encodes a tautology” instead of “A encodes a tautology”.)

My guess is that there is no “compactness” property we know of that prevents NP from being “arbitrarily close” to P, e.g. there might exist an algorithm that solves (say) 3-SAT in time $n^{\log \log \log \log n}$, and this is essentially best possible. (Of course, one could consider far slower growth functions here than $\log \log \log \log n$, e.g. inverse Ackermann.) With this intuition, I would say that n(A) could (in principle) grow arbitrarily fast in A.

One could possibly bound n(A) by some sort of Busy Beaver function, but that would probably require P!=NP to not just be true, but to be provable in a specific formal system. But this is something you are more qualified to discuss than I. 🙂

• August 21, 2010 3:13 pm

Terry wrote

“(I assume you mean”x encodes a tautology” instead of “A encodes a tautology”.)”

Yes, thanks for pointing out my typo!

If P != NP is true then n(A) is a recursive function of A, by search, and so is automatically eventually bounded by any of the usual Busy Beaver functions (see http://en.wikipedia.org/wiki/Busy_beaver).

It is interesting to look into the relationship between running times for solving 3-SAT and the function n(A) in my previous message.

I don’t have the patience right now for this sort of tedious stuff, but the following should be reasonably close to the truth:

THEOREM. 3-SAT can be solved in running time at most n^(c^(f^-1(n))). If 3-SAT can be solved in running time at most n^g(n) then g(n) >= log_c(f^-1(n)). Here f is the function n(A) in my previous message.

Proof: Let x be a problem instance of length n. Let m be greatest such that f(m) <= n. Then there exists an algorithm y of length at most m that solves 3-SAT for all problem instances of length at most n running in time at most (2^m)(n^(2^m)). But at this point we don't know what y to choose. So we go through the y of length at most m, running them, and see if a falsifying assignment is generated in the obvious way. If so, then return yes for x. If not, then return no for x.

Suppose 3-SAT can be solved in running time at most n^g(n). Etcetera… QED

• August 21, 2010 7:16 pm

To contrast P != NP with the Riemann hypothesis, in the case of the latter there is a quantifiable (but ineffective!) gap between the bounds in the error term $\pi(x) - li(x)$ in the prime number theorem in the case when RH is true, and when RH is false. Namely, the error term is $O( \sqrt{x} \log x )$ when RH is true (and there is even an explicit value for the constant, namely $1/8\pi$ for $x \geq 2657$), but will be larger than $x^{1/2+\varepsilon}/\log x$ infinitely often for some $\varepsilon > 0$ if RH fails (indeed, one takes $1/2+\varepsilon$ to be the real part of a zero to the right of the critical line). This comes easily from the “explicit formula” linking the error term in the prime number theorem to the zeroes of the zeta function (roughly speaking, the error term is something like $- \frac{1}{\log x} \sum_\rho x^\rho / \rho$, where $\rho$ ranges over zeroes of the zeta function). But the bound is ineffective because we have no lower bound on $\varepsilon$.

This gap between bounds when RH is true and RH is false can be used, among other things, to phrase RH in $\Pi_1$ in the arithmetic hierarchy (as Dick pointed out some time ago on this blog).

For P != NP, there is nothing remotely resembling the explicit formula, and so we don’t have this gap in bounds; nor can we place P != NP or its negation in $\Pi_1$.

• August 21, 2010 7:44 pm

Dear Terry,

That’s exactly what I was getting at. That, at the moment, we should classify P != NP as being in Pi- 0-2 (Greek capital pi, superscript 0, subscript 2), which in terms of quantifiers is (forall)(therexists), or AE. Over the years, there seem to have been dozens of somewhat different estimates connected with the Riemann Hypothesis, showing that it is in Pi-0-1, as Terry writes.

This business of classifying open mathematical questions in terms of their logical complexity is something that has not been pursued in a proper systematic fashion. This can also be done reasonably robustly for solved mathematical questions. Here one has to address the objection that solved mathematical questions are either 0 = 0 or 1 = 0, and so in Pi-0-0, and there is nothing to do.

As a tangent to a tangent to the main points in my book at http://www.math.ohio-state.edu/%7Efriedman/manuscripts.html section 3, I endeavored to take all of the Hilbert problems and classify them in these terms.

See the Introduction only, and then only the section 0.3, E.g, Twin Prime is Pi-0-2, Goldbach is Pi-0-1, Collatz is Pi-0-2, and so forth.

I think I did at best only a passable job, and, given my limited research efforts in core mathematics, I could use plenty of help from mathematicians in various areas to make this tangent to a tangent much better.

In fact, when I wrote the Introduction, I had the ambition of doing the same for Smale’s Problem List, and for the Clay Problem List! To systematically treat a classification of this kind for these three problem sets, would itself take a large book.

13. August 21, 2010 2:35 pm

Indeed projections are extremely important and mysterious. Regarding nice distributions it is not clear to me yet

1) which sets support projections of nice distributions. (Or, in other words which set are ppp). So it is not clear if the main missing piece in D’s argiment is equivalent or similar to NP=! P or amounts to something that is just plain false.

2) Also it was claimed that the ppp distributions were already consider before (under a different name) is there a definite word on that.

3) Regarding the claim that k-SAT does not suppport a pp distribution. Is this a proven fact? In D’s paper or elswhere

(As Paul Beame mentioned the old claims that NP=P were also based on projection arguments and this lead to interesting theorems by Yannakakis on why this approach must fail, at least under some symmetry conditions, and there are still related interesting questions regarding projections of polytopes.)

August 21, 2010 3:22 pm

1) Sets that support ppp are precisely sets whose membership can be recognized with non-deterministic c^polylog size circuits (this is slightly larger than polynomial, since c^polylog contains some faster growing functions than polynomials); see this comment by Jun Tarui https://rjlipton.wordpress.com/2010/08/15/the-p%E2%89%A0np-proof-is-one-week-old/#comment-5863, here he takes polynomial size instead of c^polylog, but he is essentially right. As a special case, solution space of k-SAT formulas support ppp.

3) There was no proof of this in paper and I have not seen it elsewhere. Jun Tarui comments about this here https://rjlipton.wordpress.com/2010/08/15/the-p%E2%89%A0np-proof-is-one-week-old/#comment-5838

Some other solution spaces, like for XORSAT or EVEN, do not support pp distribution which was also proved by Jun Tarui in discussion/on wiki

• August 21, 2010 7:21 pm

k-SAT (or more generally, any non-empty solution space of an NP problem) supports ppp distributions (by starting with the uniform distribution on $\{0,1\}^n$, and then returning some default solution if one falls outside the solution space). ppp supports seem to be close to something like a “polynomially computable image of a cube”, but I do not know the precise characterisation.

In order to support a pp distribution, the behaviour of the solution space in one of the variables (a terminal one in the pp factorisation) must depend on at most a polylog number of the other variables. Thus for instance the set EVEN of n-tuples (x_1,…,x_n) that add up to 0 mod 2 does not support pp, and more generally neither does a typical solution to a XORSAT problem. From the known clustering properties of SAT I can well believe that the same is also true for SAT in the hard phase, and it is likely that Deolalikar’s paper contained a proof of this fact.

• August 23, 2010 7:01 pm

I may not fully understand everything here, so forgive my ignorance but,

1. Polylog-parametrizations are only important because they constrain the size of the tables to represent conditional distributions.

If however you are allowed to use simple circuits to return parameters of a conditional distribution then the case EVEN of n-tuples (x_1,…,x_n) that add up to 0 mod 2 does support a pp distribution, where x_n’s conditional distribution is just a simple circuit that computes 0 mod 2 of the sum and returns either 0 or 1 depending on whether x_n has to be 0 or 1.

Of course this doesn’t appear to help anything because now you have to prove that no such similar circuit can characterize the conditional distributions of k-sat.

2. Knowing a solution a priori when constructing a distribution seems like a bit of a cheat, like using an oracle. If we banned such a construction would k-sat still be in ppp?

August 22, 2010 2:35 am

Since it was never clearly said in the paper what polylog parametrabizility means, it is not clear what is the missing part. If this means solution space does not support ppp, than this is false, since k-SAT formulas support ppp. However, if this means solution space supports a quasi-uniform distribution, than k-SAT formulas likely do not satisfy this, but this is no easier to prove than P!=NP. It is easy to see that we can sample quasi-uniform distributions on k-SAT0 formulas with polynomial algorithms if and only if P=NP, for instance. On the other hand, proof that k-SAT formulas do not support pp distributions is possibly not hard (similar things have been proved here for EVEN, n-XORSAT, n-SAT), but pp is useless in characterizing P, since very easy computable sets do not support pp (even {0,1}^n \setminus {(0,0…0)} does not support pp).

Hence, D.’s method and notion of parameters in any interpretation has no hope of giving anything useful.

14. August 21, 2010 4:41 pm

After watching Inception, I agree projections can indeed be very tricky!

August 22, 2010 5:09 am

Is there some relationship with projective sets and projective determinacy? It seems like there ought to be.

16. August 22, 2010 11:04 am

Gil Kalia says: “Projections are extremely important and mysterious …”

Ken Regan says: “A flip side of projections is padding …”

And so, any engineer might reasonably conclude: “This is great! Surely,
MathSciNet will lead me to articles that discuss the interrelation of padding and projection.” But this reasonable conclusion would be wholly wrong … there are (to leading order) no such MathSciNet reviews.

It takes awhile for engineers even to guess that perhaps fine-grained padding is all about involutions and foliations (or is it?) and coarse-grained padding is all about complexity/cryptography (or is it?) … and that all of these concepts are broadly subsumed under the ubiquitous mathematical notions of pullback and pushforward (or are they?).

It is evident in any event that the unity of these subjects—projection-versus-padding, pullback-versus-pushforward, efficiency-versus-cryptography, foliation-versus-reduction, etc.—is more universally appreciated among mathematicians as individual practitioners, than it is reflected in the (present) mathematical literature.

That is why weblogs like this one are immensely helpful in helping to establish—for us engineers especially—that vital appreciation of mathematical naturality and narrative coherence, that is necessary to the practical application of mathematical advances to this century’s urgent problems (this is true also of Terry Tao’s well-integrated summary of the 2010 Field Medals).

In particular, Dick and Ken’s overview of projections, and the accompanying (terrific!) discussions, are substantially (albeit slowly) expanding our QSE Group’s conception of the mathematical roles of projection in STEM enterprises—both how to apply methods of projection and how to teach them—and for this our appreciation and thanks are (as always) extended.