The role of levers in finding mathematical proofs

Archimedes of Syracuse (c. 287 BC to c. 212 BC) was a Greek mathematician, whose insights led him to many inventions and practical ideas, and to work in physics and astronomy.

Today I wish to talk about the mathematical equivalent of the lever.

Archimedes invented many things, but not the lever, which may have been invented by Archytas of Tarentum. Archimedes is famous for the quote:

Give me a place to stand on, and I will move the Earth.

Of course what he meant was: in principle with a large enough lever and a place to stand the strength needed to move even something as heavy as the Earth would be possible. This is pretty impressive given that the Earth’s mass is ${5.9722 \times 10^{24}}$ kg. His standing place would have needed to be somewhere beyond the Andromeda Galaxy—see here for some simple estimates—but his mathematical proof of possibility needed no galactic figures.

Levers

I was just recently in Ann Arbor at the Coding, Complexity and Sparsity Workshop, which was organized by Anna Gilbert, Martin Strauss, Atri Rudra, Hung Ngo, Ely Porat, and Muthu Muthukrishnan. It was a wonderful experience, and I am planning shortly to discuss some of the great talks that were given there. The workshop website should soon have the talks available so that you can at least see the slides, although there is nothing like being in the room listening to a good talk.

I realized during the workshop that several of the talks—not all—had essentially used a lever to solve their particular problem. In this sense a lever is some trick, insight, or idea that one allows one to make a start on solving a problem. It is not the full solution; it is not an essential idea. There may be ways to solve the problem without the lever, but the lever does allow paraphrasing Archimedes: Give me a place to start, and I will prove the theorem.

One thing that I realized too was that papers and even talks often gloss over any lever. For some reason the writers and speakers do not think it is worth making a big deal about it. One possible reason is that to them, who are likely experts in the area, the lever seems so simple—why state it explicitly? Another reason is that the lever often is a pretty simple idea, so why make a big deal out of it? Often other parts of the proof are much harder and more technical.

An example of a trivial lever is a transformation that is used often in analysis. Suppose one needs to prove something about a function ${f(x)}$. Replace the function by

$\displaystyle g(x) = f(x) -f(0).$

Often this change does not affect the theorem being proved, but now ${g(0) = 0}$, which could dramatically reduce the number of cases that are needed later in the proof. It is a lever in my sense.

Archimedes’ Lever

It seems that Archimedes himself actually used the lever as a proof lever in my sense. Or rather he used the lever to convince himself something was true, and then generated a proof by other means.

Archimedes is known for estimating or computing areas and volumes by the older method of exhaustion, which sandwiches the object being analyzed between simpler shapes, computes their areas or volumes, and uses them for upper and lower bounds or to demonstrate convergence. For example, by wrapping a sphere of radius ${1}$ in progressively tighter polygonal bounds, one can prove the value ${(4/3)\pi}$ for the volume. However, it seems this was not his idea of first resort.

According to scholarship summarized here, Archimedes instead considered a cone of height ${2}$ and base radius ${1}$. He found that for each ${x}$, ${0 \leq x \leq 2}$, he could hang a slice of the cone and a slice of the sphere whose areas added up to ${2\pi x}$ at distance ${1}$ on one side of the lever. These would balance a circle of radius ${\sqrt{2}}$ hung at distance ${x}$ on the other side of the lever. The circles formed a cylinder of height ${2}$ whose center of gravity was at distance ${1}$ on the other side. Since all the mass on the near side was at distance ${1}$, the two masses and hence volumes had to be equal. Since Archimedes knew the cylinder had volume ${4\pi}$ and the cone had volume ${8\pi/3}$, he obtained volume ${4\pi - 8\pi/3 = 4\pi/3}$ for the sphere.

Archimedes was so enchanted by this that he had a cylinder and a sphere placed on his gravestone. We believe he felt that his infinitesimal slices presaged a new way of calculating—which Newton and Leibniz turned into the calculus 1,800 years later. Perhaps he even perceived the singular difference between his foliations of the sphere and cone on one side, and the cylinder on the other side.

Two Other Simple Examples

Maximum-likelihood estimation often involves setting parameters to maximize the probability of a series of independent events. Since the events are independent, this is just the product

$\displaystyle p = \prod_i p_i$

of the events. With many events, ${p}$ may be a very small number, so that numerical accuracy becomes a problem, and differentiating this formula to find a maximum may also consume much effort. However, since the logarithm is a continuous strictly increasing function in the range of these probabilities, it is equivalent to maximize the log of this product. If one wishes to keep all quantities non-negative, one can instead minimize

$\displaystyle \sum_i \log(1/p_i).$

This preserves numerical accuracy and is easier to differentiate.

The paper “How Powerful are Random Strings” by Eric Allender, Luke Friedman, and William Gasarch, which we featured here, also starts with a lever. Instead of using the random-string set directly as an oracle, they create a different oracle out of the overgraph ${\{(x,y): y \geq f(x)\}}$ of a related entropy function ${f}$. They show that the two oracles are equivalent for the complexity reductions used in their main theorem, but find the overgraph-oracle easier to analyze. As usual see their paper for details.

A Nontrivial Example

I would like to give a beautiful example of a lever from one of the talks at the workshop. David Woodruff spoke on “${(1+\epsilon)}$Approximate Sparse Recovery, which is based on joint work with Eric Price, and will appear this fall at FOCS 2011.

The problem is to discover a ${k}$-sparse vector that approximates a given signal. Here ${k}$-sparse means that the vector has at most ${k}$ non-zero coordinates. This is a major problem in the area of approximation and compressed sensing. I will not try to even begin to survey this huge body of work.

David at the beginning of his talk said: let’s consider just the special case of ${k=1}$. This was his lever. He did not make any fuss about the lever, and he proceeded to use this lever in both upper bound and lower bound results. He did point out that the ability to solve the case where ${k=1}$, where there is one large signal, is easily justified. One can take the input vector

$\displaystyle x_{1},x_{2}, \dots, x_{n}$

and use random sampling to divide the signal up into ${k}$ pieces. Likely the pieces will have one big signal—that is, it is likely that ${k=1}$ will hold. Then one could use the ${k=1}$ algorithm on each piece to find all the ${k}$ large signals. There is a bit more needed to make this upper reduction process, but the idea is fairly simple. A great lever.

Once David uses this lever, all his calculations after that are much simpler. There is a fair bit of estimation and employment of inequalities needed to make the ${(1+\epsilon)}$ recovery algorithm work correctly. These calculations are much easier to follow, and probably were easier to discover in the first place, by restricting the situation to ${k=1}$.

This Lever Extends

David, in his talk, also outlined several lower bound theorems. Each was for different versions of the the recovery problem, since there were several parameter regimes and models.

Lower bounds are hard to prove in general—we have so few of them. But in this area of signal recovery the general paradigm is possible since the lower bounds are essentially counting how many measurements are needed to solve a recovery problem. This sounds like information theory or communication complexity theory, and it is. David shows how to use known—often very deep—results from both areas to prove that too few measurements would lead to a violation.

Again the lever to the recuse. The proof for the ${k=1}$ case is often not too difficult—that is not to say easy. Then to prove the general case one must be careful. Lower bounds on computing one object can change when we are trying to compute several objects. But the lever helps point out that this obstacle must be addressed. David uses some communication product theorems—again some are quite powerful and deep—to prove his lower bounds.

Open Problems

What are some of your favorite levers? Is this notion helpful to make explicit?

1. August 5, 2011 11:57 pm

How would a ‘lever’ differ from ‘equivalence’?

August 6, 2011 8:07 am

Bhupinder Singh Anand,

A lever is, in this sense, a device that makes progress on a problem. It need not be an equivalence. This is seen in the lower direction where the lever is used by David but is not exactly the same problem.

August 6, 2011 12:22 am

“Archimedes’ Theorem” is that if you place a sphere inside a cylinder of the same radius and height, and consider the horizontal inward projection of the cylinder onto the sphere, that is area-preserving. In particular the easy area of the cylinder (2 r * 2 pi r) equals the area of the sphere. Or, if you want to draw a map using this, you’ll get the areas right. (But you won’t turn great circles, which people want to navigate along, into straight lines on the map, so it’s not so popular.)

I suspect that’s why the sphere and cylinder are on his gravestone, not for the volumetric calculation.

• August 6, 2011 3:19 am

According to Plutarch, it is what we wrote:

Πολλῶν δὲ καὶ καλῶν εὑρετὴς γεγονὼς λέγεται τῶν φίλων δεηθῆναι καὶ τῶν συγγενῶν ὅπως αὐτοῦ μετὰ τὴν τελευτὴν ἐπιστήσωσι τῷ τάφῳ τὸν περιλαμβάνοντα τὴν σφαῖραν ἐντὸς κύλινδρον, ἐπιγράψαντες τὸν λόγον τῆς ὑπεριχῆς τοῦ περιέχοντος στερεοῦ πρὸς τὸ περιεχόμενον.

Wikipedia says the same here, but of course we consulted a primary source.

• August 11, 2011 10:37 pm

Funny—now (five days later) the Google Translate of the Greek seems to have changed from what I remember, and it’s not so clear. I wish I’d saved it, as I sure thought it showed enough words to prove my point. Anyway, here is the link to the translation I used:

——-
Plutarch (AD 45-120), Parallel Lives: Marcellus

“And although he made many excellent discoveries, he is said to have asked his kinsmen and friends to place over the grave where he should be buried a cylinder enclosing a sphere, with an inscription giving the proportion by which the containing solid exceeds the contained.”

• August 23, 2011 10:16 am

“… inventor of many and good [things] , it is said he pleaded his friends and his relatives that after the ceremony they would mount on the grave that would contain him the sphere inside the cylinder, inscribing the ratio of supremacy of the containing solid over the contained.”

I tried to keep the words as close to the original as possible. Studied ancient Greek in high school, it is mandatory in Greece. These scripture is of the Hellenistic period and quite similar to modern Greek, apart from some weird grammar. Golden-era ancient Greek are a lot harder to translate.

August 6, 2011 7:10 am

Hi Dick

There is a very nice “lever” used by Newton to show why a particle inside a hollow sphere out in space will not feel the gravity of the sphere. This one works with junior high school kids. But for years I’ve been asking physicists for a simple proof of “the biggie”, which is that the attraction of two non enclosed bodies acts as though the force is being exerted from the center of the bodies.

Anyone know of any levers for this one?

Cheers,

Alan

August 6, 2011 7:46 am

@Alan Kay:

August 6, 2011 8:56 am

Most readers are more than 5 times more efficient than this!

And isn’t the purpose of asking a specific question to solicit a specific answer first, before moving on to larger contexts.

Let me try again here. I’m looking for a lever that works for junior high school students and have already polled quite a few physicists for this (including Leon Ledermann, If you know of such an approach, then please just write it down in a few sentences.

Best wishes,

Alan

• August 6, 2011 7:11 pm

SpongeBob’s “Proof Without Words” is a start (some additional hand-waving will be required).

August 6, 2011 12:51 pm

Obviously, the force is not the same. Let A be a sphere centered about some point c, and let x be a point mass outside of A. Suppose A is orbiting around x. Then if we start crushing A into its center c, x will start feeling more gravitational force.

Despite this, the center of mass, c, of A will *move* as if it were a point mass of constant mass, orbiting around x. This is because the acceleration of c doesn’t depend on how sparse A is.

To prove this, I’ll kind of cheat, but not badly. We’ll pretend that A is actually two points, a1 and a2, of equal mass. Then c lies halfway between a1 and a2.

It’s easy to check that the acceleration of c is just the average of the accelerations of a1 and a2. (If a1, a2 were different masses, it would be the weighted average).

OK, I have to go now, but work through the calculations. It should take you 10 mintues. You’ll see the acceleration you get for c, as a center of mass, is equal to the acceleration you would get for c’, a point mass.

August 6, 2011 8:02 am

GAUSS LAW FOR GRAVITY
PLUS intuitive (fluid flow) statement of the divergence theorem.
Apply to a sphere.
Thus, gravity attraction of uniform sphere is same as point mass.

August 6, 2011 8:49 am

For junior high students?

Cheers,

Alan

August 6, 2011 8:04 am

Similarly, for any homogenous body, or apply to a cube, and approximate the body as a limit
of cube unions.

August 6, 2011 8:58 am

Ditto

• August 6, 2011 1:05 pm

+1

6. August 6, 2011 1:13 pm

Of course, what counts as trivial is always dependent on what level of abstraction you’re looking at at any time. Personally, I’m tickled by the switch of the two-population mean alternative hypothesis μ1 > μ2 to μ1-μ2 > 0, which makes the sampling distribution of the point estimate x’1-x’2 a sum of observations, and thus just another normal variable (in the limit).

August 6, 2011 2:13 pm

Probably the biggest (and most widely effective) lever in physics is that the simplest possible answer is the right one. Of course, while this very well born out by experience—even Einstein’s equations are the simplest possible way to couple mass and energy to the curvature of space-time—this is ultimately a religious belief about the nature of reality.

On the other hand, in mathematics the simplest possible answer is _often_ right, and the probability seems to go up when we ask beautiful, natural questions. For instance, many natural combinatorial problems have bijections between them. It’s easy to invent ugly problems that are unrelated to everything else, but the beautiful problems are often related to each other. Indeed, this is a good way to tell whether the problem you’re working on is beautiful or not.

August 6, 2011 7:36 pm

August 6, 2011 7:11 pm

The Sponge Bob example starts with a point source and is one way to guess “inverse square” if there is reason to think that this is the way various kinds of radiation propagate.

But it is a different matter to deal with something that is not a point source and is close. There are reasonable symmetry arguments to make the idea that the composite attraction of the many particles is along the line of the center, but it is not easy to make a simple argument that the composite also acts as though all the mass is at the center.

By the way note that light radiation near a large glowing body does not act as though it comes from a point source (e.g. Sun and Earth). This is one of the problems with Mathchick’s “intuitive” explanation.

• August 8, 2011 8:33 am

Alan, the simplest elementary argument I can think of for point-source equivalence is to combine the “Sponge Bob” geometric argument with a “telescoping series” argument (see Wikipedia article of the same name).

That is, replace a single shell of mass ${m}$ with ${n+1}$ concentric shells having alternating mass ${+m,-m,+m, ... ,-m,+m}$. Obtain the same net force by telescoping the series from the left and from the right, thus proving that the force exerted by a point mass (telescoping from the left) is equal to the force exerted by a shell having the same mass (telescoping from the right).

While non-rigorous, this approach at least unites a fundamental geometric insight (“Sponge Bob”) with a fundamental combinatorical insight (a telescoping series). Needless to say, these are two fabulous levers.

To deal with young formalists who insist on rigor … make sure the school library has a paperback volume of Spivak’s slender masterpiece Calculus on Manifolds. That’s sure to keep them busy for awhile!

August 8, 2011 9:53 am

Hi John

We are talking about 8th graders here, so I think Calculus on Manifolds is for a later time.

The “telescoping theories” argument is an interesting one, but it doesn’t seem to be particularly intuitive (“-m” ?).

I have been waiting to see if anyone would suggest “experimental math” using a computer. For example, a very simple vector sum argument establishes that the line of the force will be through the center of the sphere.

What we want is the distance of the force through this line. But just as you can get a nice estimate for pi by sampling a square with a circle in it — 8th graders find this interesting and fun — you can also get a pretty good estimate of the resultant force by sampling a sphere and just adding in each force vector on the located particle.

This is not a proof but it is “highly suggestive”. And could lead to a more exhaustive non-random sampling of the sphere that suggests it could be made up of differential “particles” (as mathchick suggested).

Cheers,

Alan

• August 8, 2011 1:08 pm

Alan, it’s true that Calculus on Manifolds is pretty tough sledding for kids … and yet, at pretty much any age it can be inspiring to read the Prefaces and Forewords to pretty much any mathematical textbook. This reason alone is sufficient justification to include at few advanced math books in any school library … a good recent example (IMHO) is Bill Thurston’s Foreword to Mircea Pitici’s anthology Best Mathematical Writing 2010.

August 7, 2011 9:12 am

I am not a big fan of “Laws of Form” by G. Spencer-Brown, but “A Note on the Mathematical Approach” that appears at the beginning of the book is one of the more remarkable attempts to leverage a simple idea into a grand scheme of things:

“The theme of this book is that a universe comes into being when a space is severed or taken apart. The skin of a living organism cuts off an outside from an inside. So does the circumference of a circle in a plane. By tracing the way we represent such a severance, we can begin to reconstruct, ….., the basic forms underlying linguistic, mathematical, physical and biological science, …..”

“Laws of Form” was written over 40 years ago when the idea of “multiverses” would have been dismissed as “Gee Whiz Physics”.

If the Theory of Multiverses takes off, the historians might have to promote him from “eccentric character” to “eccentric prophet”.

August 7, 2011 9:33 am

Computer folks will recognize that “Laws of Form” is essentially “Sheffer Stroke Logic” (which was actually identified earlier in the 19th century by Charles Peirce). It lives on today as NAND and NOR logic either of which are sufficient to construct everything else. (To me) the most interesting thing in Brown’s exposition is the notation, which does help paper and pencil evaluation of complex expressions. However, the logical paradoxes from this kind of logic merely “generate time” in the computer world, and are much easier to understand in that form.

August 7, 2011 8:10 pm

http://www.tricki.org

11. August 8, 2011 6:30 am

The history of many branches of mathematics can be read as the sequential discovery of increasingly powerful levers. Consider for example the following nine levers of dynamics:

Lever 1: First law   Conservation of energy constrains individual trajectories (Newton).

Lever 2: Second law   Non-decreasing entropy further constrains ensembles of trajectories (Clausius).

Lever 3: Extremal action   Symmetry is associated to further conservation laws (Noether).

Lever 4: Path integral relations   Causal separability further constrains Hilbert dynamics; coordinate-free notations become essential (Faddeev-Popov).

Lever 5: Geometric dynamics   Extends the above constraints to non-Lagrangian dynamics on non-Hilbert state-spaces (Cartan, Mac Lane, Ashtekar and Schilling).

Lever 6: KAM dynamics   Spectral theorems generalized to integrable systems (Kolmogorov, Arnold, Moser).

Lever 7: Quantum operations   In essence, the integration of measurement theory with path integrals (e.g., Mikhail Mensky’s book).

Lever 8: Dynamic compression   Noisy/observed/evolved systems require exponentially fewer dynamical dimensions (Choi, Kraus, Carmichael & many more; see also Kohn-Sham).

Lever 9: Hyperpolarization engines   The 21st century’s natural generalization of 20th century low-temperature physics and 19th century heat engines.

These levers are knit together by an evolving mathematical toolset that increasingly emphasizes considerations of universality and naturality. Every STEM generation fondly imagines itself near the end of this quest for dynamical universality and naturality … and yet nature keeps unveiling surprising new dynamical “levers” … which is good news for everyone (young researchers particularly).

What dynamical levers are missing from this list? What new dynamical levers may be coming in the 21st century? If you got em, please post em.

August 8, 2011 9:46 am

A reliable way to produce good approaches to research problems (and levers?), in my opinion, is simply to question assumptions. The problem is in knowing what assumptions one is making to begin with. Many examples come to mind of how one can be blind to ones assumptions; one example is the “connect all the dots of a 3 x 3 grid using 4 lines and without lifting your pencil off the page” problem (which I assume most everyone knows). Here, the assumption people often make without realizing it is that the lines should stay inside the grid. Of course they are not required to, and being aware of this is one key of one approach to solving to the problem. (If you want another example, look up http://www.physicsforums.com/showthread.php?t=365381 and the heading “Murray Gell-Mann: Quark and the Jaguar: A Moment of Illumination”.)

Once one becomes aware of the assumptions one is making, and questions them, and finds some of them to be erroneous, one can then use this as a basis for where to go next. To outsiders who have long believed in the assumptions themselves, the fruits of such work can seem like “magic”, though they are anything but.

August 8, 2011 1:21 pm

Hi John

I like your notion that the prefaces and forwards of advanced math books can be inspiring to younger kids — in fact were for me in the 40s and 50s (and very likely turned me towards a degree in pure math).

Still, I was lucky to have a few adults around who could answer questions, and were willing to.

Leon Lederman likes to say “If you read something that you don’t understand, keep on reading and revisiting it and pretty soon it will be *hauntingly familiar*.

Cheers,

Alan

August 9, 2011 9:50 am

I am curious about Alan Kay’s comment that “Laws of Form” is essentially “Sheffer Stroke Logic”.

Where would I find that intersection of ideas?

August 9, 2011 10:41 am

My recollection (from 40 plus years ago) is that he referenced Sheffer early, and Peirce a little later. I just dug the book out from the stacks in my library, and this is the case.

I first encountered this book because it was one of many that were listed in the Whole Earth Catalog. It was slim (which was nice), and obscure (which was annoying — especially when most of the obscurity was in his choice of terms and phrasing). This was odd, because his notation for “a crossing” was a pretty nice invention.

I think that the obscure way it is written has led to its attraction in some quarters — it seems more philosophically interesting than it is.

If one has done some “60s digital design” where thinking directly in NAND or NOR logic becomes second nature, then the deep correspondences become evident. The first published account of this kind of logic was done by Sheffer in 1913, and later it was discovered that Peirce had a section on these universal operators in the early 1880s in his unpublished writings.

Having waded through Russell & Whitehead as a teenager in the 50s, this was another example where the “computer version” of the ideas (and especially the paradoxes) was so much clearer and simpler.

Cheers,

Alan