How small results can be extremely important

 By permission of Paul Helm, artist—source

Johannes Brahms is one of the “Three B’s” of music, along with Johann Sebastian Bach and Ludwig van Beethoven. Yet among pieces of music one would call “Brahms’ X,” he seems to be widely known only for Brahms’ Lullaby. His great German Requiem is close, but Wikipedia won’t even forward …/wiki/Brahms’s_Requiem or …/wiki/Brahms’_Requiem to it unless you remove the apostrophe. Whereas Beethoven’s Fifth and Ninth, and Bach’s chorales, fugues, and cantatas come trippingly off the cloud.

Today Ken and I wish to point out a curious phenomenon: often the simplest results are the best known. Also they may be the most important after all.

‘Who is best known for what’ belongs to a genre of culture trivia questions that have been debated in taverns for centuries. Yet only in the past two decades have we had a neutral way to resolve them: the Internet. The distribution of hits on various searches gives unscientific but meaningful survey feedback. Thus we have tried searching “N’s Theorem” putting various names in place of ‘N’—and you can do likewise.

Often what you get for famous “giants” of the field are simple results. We are not saying that the giants are only known for these results—after all, Brahms is known for much more than the lullaby. Yet how many parents have used this great lullaby—this is one version:

Lullaby and goodnight
With roses bedight
Is baby’s wee bed

Lay thee down now and rest
May thy slumber be blessed
Lay thee down now and rest
May thy slumber be blessed

—to try to get their little ones to sleep? Countless. It is also musically formative for them.

I think there are a number of reasons for this phenomenon, but first let’s give a few examples. We will begin with two by Niels Henrik Abel.

## Abel’s X

Abel was a Norwegian mathematician who made many seminal contributions to mathematics during his tragically short life. His work included brilliant research into the theory of functions, particularly elliptic functions, and created a new class of functions now called abelian functions. And yes he is the “abel” in the term “abelian group,” and the “Abel” in the Abel Prize.

As we discussed recently here, Abel is indeed famous for giving the first rigorous proof of the impossibility of solving the quintic equation by radicals. Earlier a serious—and almost correct—“proof” had been given by Paolo Ruffini, but it was ignored for various reasons, perhaps owing to his manuscript being 500 pages. See this for some of the history. Yet what we find for Abel’s X by itself are two results on infinite sums that are bagatelles by today’s standards:

${\bullet }$ Abel’s lemma: One of Abel’s most useful results is a simple summation formula that he needed in another paper. It is the following:

$\displaystyle \sum_{n=1}^{N} a_{n}b_{n} = a_{N}\sum_{n=1}^{N}b_{n} + \sum_{m=1}^{N-1}(a_{m} - a_{m+1})\sum_{n=1}^{m}b_{n}.$

This is often called summation by parts, because it allows a non-trivial transformation of any finite sum. While it is easy to prove, it is used frequently in all parts of mathematics—especially in analytic number theory. The ability to re-write any sum is what gives this result its immense power. I (Dick) was reading a number theory paper just the other day and counted three calls to this trick in four pages.

Note it is really the finite version of integration by parts:

$\displaystyle \int f \ dg = fg - \int g \ df.$

This is a perfect example of how sum formulas can be much more involved than integration formulas. Too bad in complexity theory we mostly deal with finite sums. Oh well.

${\bullet }$ Abel’s theorem: This is the other paper, and says that if ${\sum_{k=1}^{\infty} a_k}$ converges, then ${\sum_{k=1}^{\infty} a_k z^k}$ converges to the same value as ${z}$ approaches ${1}$ from below. Isn’t this obvious? Well it’s non-trivial since ${z > 1}$ makes the latter sum diverge, but today we might not even bother pointing that out in a lecture. Two hundred years ago, just at the time of Cauchy’s convergence criterion, it was important to be rigorous with series, while now we use results like this tacitly all the time.

## Lagrange’s Theorem

Joseph-Louis Lagrange made key contributions to all fields of analysis, number theory, and classical and celestial mechanics. One of my favorites is the four square theorem proved in 1770: that every positive integer is the sum of four squares. One interesting comment related by our friends at Wikipedia is:

Due to thorough preparation, he was usually able to write out his papers complete without a single crossing-out or correction.

Ken and I have no idea what we would do without word processing, spell checkers, and the ability to make changes easily. Ken didn’t begin writing his dissertation at Oxford until he had introduced such a system to the University.

The “small” result of Lagrange is the classic theorem of his: Suppose that ${H}$ is a subgroup of the finite subgroup ${G}$. Then the size, in number of elements, of ${H}$ divides the size of ${G}$. This theorem is an invaluable tool in all of finite group theory. Indeed this restriction on the size of subgroups is what makes group theory so different from other mathematical areas: in graph theory, for example, there is no such restriction on the size of a subgraph.

## Cauchy’s X

Augustin-Louis Cauchy did so much great and seminal research that we will not even try to list some of the areas. We have already mentioned his convergence criterion for a sequence ${[a_m]}$:

$\displaystyle (\forall \epsilon > 0)(\exists N)(\forall m,n \geq N)|a_m - a_n| < \epsilon.$

But when we search on Cauchy’s Theorem, we mainly get Lagrange in a mirror, or the Schwarz is with us.

${\bullet }$ Cauchy’s theorem: If ${p}$ is a prime number that divides the order of a finite group ${G}$, then ${G}$ has an element of order ${p}$. Caution: this is not true in general if ${p}$ is not a prime—the converse of Lagrange is false, in general.

${\bullet }$ Cauchy Inequality: For all complex valued vectors ${x,y}$,

$\displaystyle \left| \sum_{i=1}^{n} x_{i}\bar{y}_{i} \right|^{2} \le \sum_{j=1}^{n} |x_{j}|^{2} \sum_{k=1}^{n} |y_{k}|^{2}.$

Where would a fraction of our results in theory be without this irreplaceable tool? Of course the history of this fundamental result is more involved as you probably know since we often call it the “Cauchy-Schwarz” inequality. More properly it is sometimes called the Cauchy-Bunyakovsky-Schwarz Inequality. The sum version was published in 1821 by Cauchy, the integral version by Viktor Bunyakovsky in 1859, and it was rediscovered by Hermann Schwarz in 1888. Whatever you call it, we all need and depend on it.

## Banach and Tarski

Stefan Banach picked up the mantle about series and convergence, with his notion of a Banach space. But his most useful result seems again to be almost a definition:

${\bullet }$ Banach’s Theorem: A contraction mapping ${T}$ on a Banach space ${X}$, meaning the distance from ${T(x)}$ to ${T(y)}$ is always bounded by ${(1-\epsilon)}$ times the distance from ${x}$ to ${y}$, has a unique fixed point ${x_0}$ such that ${T(x_0) = x_0}$. Proof: Iterate and use the triangle inequality for uniqueness.

Though some algebra makes the proof stretch over a page, it’s really this simple. Nevertheless, this idea is used umpteen times in analysis, usually for a proof that a solution exists. Among many fixed-point theorems this is about the simplest one to prove. Yet it is one of the most useful.

Alfred Tarski did an incredible amount of deep work in geometry as well as logic. Yet a development by Emil Post following Alan Turing renders the modern understanding of his “Theorem” completely simple.

${\bullet }$ Tarski’s Theorem: The set ${V}$ of true statements in the natural numbers is not definable in formal arithmetic. Proof: If ${V}$ were definable it would be in some fixed level ${\Sigma_k}$ of Post’s arithmetical hierarchy, but then we’d have ${\Sigma_k^V = \Sigma_k}$ which would collapse the hierarchy, which is false by Turing’s theorem. Well our friends at Wikipedia say there is technically more to it, but the above simple treatment is what gets it included in a fair number of introductory theory courses.

Simple and simple. But when you put both names together, you get the Banach-Tarski theorem, which is so un-simple and incredible that our Wiki friends call it a “Paradox” and neither Ken nor I really understands it.

## Little Lemmas in Complexity

Venturing into our era, it is perhaps not the time to make “known for” judgments. But we would like to point out two simple results that go into the background of our technical thinking, yet are invaluable.

${\bullet }$ Turing Machine Linear Speedup: Juris Hartmanis and Richard Stearns have made many key contributions to complexity theory—we’ve been discussing one recently. One of the simplest remarks they have is the linear speedup theorem. The linear speedup theorem states that given any real ${c > 0}$ and any Turing machine solving a problem in time ${T(n)}$, there is another machine that solves the same problem in time at most ${cT(n) + n + 2}$. This is one of those almost trivial remarks that is invaluable. The key “trick” is to increase the alphabet so the machine does more computation per unit time: the theory version of upgrading from a 16-bit processor chip to a 64-bit processor chip. We believe they showed tremendous insight in actually making this simple idea a concrete result. Cool.

${\bullet }$ Block Respecting Turing Machines: John Hopcroft, Wolfgang Paul, and Les Valiant have done many many brilliant things in their research together and separately. In their famous paper “On Time Versus Space” they needed to show that any Turing Machine that runs in time ${T(n)}$ can be replaced by another Turing Machine that accepts the same language, runs in the same time, and is block respecting. The big result in that paper is the proof that space is more powerful than time—a landmark theorem. But in their proof they needed to make the machines block respecting.

I (Dick) recall that Wolfgang told me he thought that this lemma was more important than the “big” separation theorem. I was a bit skeptical at the time. But over the years I have used this lemma several times. I covered the lemma three years ago here and gave a new application. It is used in my joint paper with Ryan Williams which just appeared at the Computational Complexity Conference in Portugal.

## Why This Phenomenon?

We think the answer is fairly simple, but we may be wrong. What happens in many cases when great researchers are trying to prove a “big” result is that they face some roadblock. Since they may be the first to face this obstacle they prove the required lemma to get around it. The new passageway means more to other researchers than the particular theorem that was found. At least that is how to explain some of the above examples.

Another fact is that short is sweet and simple is sweeter. We remember the four notes of Beethoven’s Fifth, and the “Ode to Joy” of the Ninth, although the depth of these pieces is elsewhere.

The third reason is that although simplicity need not mark a great proof, it goes with a great theorem statement, and may be required for a great idea. Most of these results, and maybe most others you would find, are the first application of a simple but profound new idea.

## Open Problems

What is your favorite simple result by a great researcher that is an invaluable tool? We wish we had a good name for these results. Any suggestions?

July 11, 2012 4:35 pm

As an answer to your open question I suggest various silly portmanteaus between ‘lemma’ and synonyms for ‘important’:
elemental + lemma = elemmental
essential + lemma = lemmential
fundamental + lemma = fundalemmental

Which of these rolls off of the tongue the best?

July 12, 2012 9:16 am

Shouldn’t this be a noun? Ending it with tal makes it sound like an adjective.

July 11, 2012 4:39 pm

To me, these are the best part of mathematics. Why are mathematicians (and top-tier journal reviewers) obsessed with filling pages, blackboards, and (in hollywood movies at least) windowpanes up with maximal esotericity? Math should be about simplifying things, not making them worse!

July 11, 2012 4:39 pm

Not related to theory, but a friend once conveyed to me the same sentiment by saying “that’s like calling Carole King `the woman who wrote The Locomotion’..”

July 11, 2012 5:03 pm

Your statement of Abel’s lemma is wrong. The first terms should be (sum a_n) * (sum b_n), not a_N * (sum(b_n)

• July 11, 2012 6:11 pm

Hmmm…I’ve factored it around a different way, based on the PlanetMath link http://planetmath.org/AbelsLemma.html used in the post. The term where both sums are multiplied comes in the second summation. Still, you may be uncovering a source of confusion/conflation in drafts of this post between Dick and me. The Wikipedia link we use also has the \sum f_k*g_k form here with the form being similar to what we have. I did check the cases N=2 and N=3 :-).

July 11, 2012 7:03 pm

Minor presentation issue: there is a little inconsistency in the vector representation in Cauchy’s inequality.

• July 11, 2012 8:00 pm

I don’t see it—? It’s verbatim from the Wikipedia link given; the bar on y_i is complex conjugation.

July 12, 2012 4:36 am

Sorry, my fault… (shouldn’t comment late at night)

6. July 11, 2012 11:26 pm

“We wish we had a good name for these results. Any suggestions?”

closer to physics 🙂

7. July 12, 2012 6:08 am

A footnote to Lagrange’s Theorem, which is as you say such a small result for groups; it has been recently shown to hold also for Moufang loops, but the proof is very long and uses the Classification of Finite Simple Groups. Maybe it is significant that such a simple result as Lagrange’s is at the edge of such a precipice…

Jack van Lint used to call the double counting principle “Biggs’ Lemma”, because Norman Biggs stated it explicitly in his textbook on Discrete Mathematics. But I don’t think this is the kind of thing you have in mind.

July 12, 2012 8:50 am

I’ve always liked Shannon’s Sampling Theorem. It’s implicitly used every time a signal is converted from analog to digital and back, and you can prove it by drawing a couple of pictures

9. July 12, 2012 11:11 am

Euclid’s algorithm (which I realize he didn’t discover) has turned out to be quite handy.

July 12, 2012 11:50 am

Jensen’s inequality is one of my favorites. – Mikko Koivisto