Mathematical tricks

Ben Green grew up about 100 meters from the house in Bristol where Paul Dirac was born and lived until age 10. He has followed Dirac to Cambridge University as a Fellow of Trinity College, which adjoins St. John’s College where Dirac studied and taught. Both also held appointments at Princeton. Green is known for many things including, proving with Terence Tao that for every natural number ${k}$ there exist ${k}$-many primes in arithmetic progression. That is, there is a prime ${p}$ and an integer ${c}$ such that for all ${i < k}$, the number ${p + ic}$ is also prime.

Today I want to talk about mathematical tricks—not magic, not games, but small insights that are very useful. Tricks.

There are two kinds of tricks. One is a device for understanding a theorem, the other is a shortcut to getting it in the first place. The Dirac belt trick is the former kind: it illustrates physical systems that are preserved under two turns of a circle—a 720-degree rotation—but not under one turn. The Dirac delta function is the latter kind: It allows you to postulate a function ${f}$ that is zero except at one point, such that ${\int f = 1}$, and also whose Fourier transform is the constant-1 function. This can speed up calculations, and certain “meta-theorems” often remove the need to do anything more to make a proof involving such calculations rigorous.

Green’s co-author Tao maintains tricks as a category on his blog. Both of them assisted Tim Gowers in setting up the Tricki Wiki for collecting and discussing mathematical tricks, along with Alex Frolkin and Olof Sisask. In a followup, Gowers asked whether it needed more examples of research-level tricks compared to undergraduate-level tricks. Let’s talk a little more about what tricks are.

## Tricks

I really like “tricks.” One recurrent theme in mathematics is that big results are easy to find, usually are well known—especially to experts—and beginners can find them in almost any textbook. This is true whether the area is graph theory, number theory, or complexity theory; real analysis, complex analysis, or functional analysis; algebraic geometry, differential geometry, or topology. How did the last manage to avoid being called “-theory” or “-analysis”?

A trick is different from an ordinary “tool” in that it has some element of surprise. It achieves something by a way that is at first counter-intuitive. That’s what makes it fun. Magic tricks without surprise, without the watcher’s expectation being deceived, are nothing.

The trouble with tricks is that they often have no name, or an un-descriptive name, as with the “W-trick” below, and consequently are hard to find. The tricks often are not written down, or if they are, they are hard to find even if you have a general idea of what to search for. A trick may be used in a long proof, causing a causal reader to miss it. Or even if you see them, you may not internalize them and add them to your own bag. Another way that tricks are learned is through oral conversations, lectures, and technical talks. When I teach a class I try to highlight the tricks—at least I try.

The Tricki Wiki was an attempt to make tricks, mathematical tricks, better documented. I thought it was, and still is, a brilliant idea; yet it seems to not have become as popular as it should have. Maybe in time it will.

For now let me explain one trick that is simple, useful, and powerful: the W-trick. Then Ken will ask whether research-level power can be obtained by extending an old and simple trick of arithmetic.

## The W-Trick

The trouble with the prime numbers is that while they embody considerable randomness, they are biased in many obvious ways. For one they are all odd, except the poor number ${2}$. Even among the odd numbers the bias is progressive: only one prime is divisible by ${5}$, and so on. This messes up lots of simple arguments, and makes some proofs about primes extra complicated. A second trouble is that their density is close-but-not-quite constant: the number of primes below ${n}$ goes as ${n/\log n}$.

Enter the W-trick. Consider the mapping

$\displaystyle x \mapsto Wx+b$

where ${W}$ is the product of all the primes less than some threshold ${w}$ and ${b}$ is relatively prime to ${W}$. Often ${b=1}$ is just fine.

The numbers that primes map to are called “W-tricked primes.” They behave more like “pseudorandom numbers” than primes do. W-tricked primes are well distributed modulo ${q}$ provided ${w}$ is greater than or equal to all the prime factors of ${q}$.

The key to using W-tricked primes in the Green-Tao theorem on arithmetic progressions is that such progressions are preserved by this linear map: if the W-tricked primes have a progression of length ${3}$, for example, then so do the real primes.

Recently I have used this same idea to show that W-tricked numbers are more likely to be relatively prime: have no common factor in common. If ${x}$ and ${y}$ are chosen randomly in a long interval, then it is well known that they will be relatively prime about ${6/{\pi^{2}}}$ of the time, about ${61\%}$. But W-tricked numbers will be relatively prime almost all the time: as ${w}$ goes to infinity the probably of a common factor goes to zero.

## Duality

Duality is an umbrella term for a host of tricks. For example, the proof of the Four-Color Map Theorem begins by forming the dual graph in which every “country” becomes a vertex, and two vertices are connected if their countries share a boundary. Repeating the process would restore the original map, which makes the graph strictly a dual. In linear programming and other optimization techniques the presence of dual programs is often important. The Fourier transform gives a dual form to any function, and the Hadamard transform provides an important dual basis in quantum computation. Change-of-basis is another umbrella term for many tricks in linear algebra—or maybe many of them are just tools.

A kind of duality may operate with the Green-Tao theorem. Instead of the set of primes being “tricked up” to be denser, one can also “trick down” the integers into a suitably pseudorandom subset ${X}$. Whereas the primes have density approaching zero in ${\mathbb{N}}$, their density in ${X}$ may stay above some constant. The trick is set up so that one can carry over Szemeredi’s Theorem that every subset of ${\mathbb{N}}$ of positive density has arbitrarily long arithmetic progressions, from ${\mathbb{N}}$ to ${X}$. This view was developed in a paper by Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan, people we know well in complexity theory. Gowers proved something similar independently.

Often the dual is “the same” as the original but just viewed in a mirror. This may be the case here, since this is actually proved by enlarging ${P \subset X}$ to ${M \subset \mathbb{N}}$ where ${M}$ stays having positive density in ${\mathbb{N}}$. The W-trick is a concrete instance of the latter. Still, these papers show that the dual view of the Green-Tao “transference principle” has its own uses, all derived from tools of pseudorandomness developed in complexity theory. Cool and worth deeper study.

## The Square Trick

We have previously discussed an old trick known to the Babylonians for reducing any integer multiplication to the lookup of three squares, or alternatively two squares:

$\displaystyle a\cdot b = \frac{1}{2} ((a+b)^2 - a^2 - b^2) = \frac{1}{4}((a+b)^2 - (a-b)^2).$

Ken uses this as a first example of a Turing reduction that needs more than one oracle call. In the post I stopped at consulting a table of large enough squares, but Ken wants to go further with simulating the squaring part. The hitch is in the denominators of the fractions—if we could pretend they don’t exist, we’d be home.

The reason is another trick that allows replacing squaring by shifting. Regard the finite field ${\text{GF}(2^n)}$ as a vector space of dimension ${n}$ over ${\mathbb{Z}_2}$. A vector ${v}$ in this space is normal if its iterated squares

$\displaystyle v^2, v^4 = (v^2)^2, v^8 = (v^4)^2, v^{16}, v^{32}, \dots, v^{2^{n-1}}$

are all linearly independent, and hence form a basis of the vector space. Then for any field element ${w}$ represented over this basis, the left-shift (wrapping the first bit around to be the last) represents ${w^2}$. This trick is exploited to implement codes that do frequent squaring.

Alas this only works in characteristic 2, so that the final dividing by 2 or 4 in the formulas for ${a\cdot b}$ is like dividing by zero. Over ${\text{GF}(p^n)}$ the corresponding concept involves ${p}$-th powers of ${v}$, not squares. The vanishing of multiples of the characteristic is needed anyway so that expressions of the form ${(w_1 z_1 + \cdots + w_n z_n)^p}$ leave only the terms in ${w_i^p z_i^p}$, which is what gives the shifting behavior for ${p=2}$ to begin with. Unfortunately, the formula above for ${a\cdot b}$ becomes undefined in characteristic 2. Ken still wonders, though, whether there is some consistent way to postulate it, so that—as with the Dirac delta function—it can be used in computations.

## Open Problems

The general question is how do we document and pass on tricks to others? This is an important issue especially for new researchers to an area of mathematics.

A specific question that I think arises is, can we create a “non-linear” W-trick? The idea would be to build a nonlinear mapping that allowed us to map primes to another set and use the existence of progressions there to conclude something useful about primes themselves? Is there any way to create a W-tricked set of primes that allows progress on open questions in the structure of primes? I am sure that Green and Tao, among many others, have thought about this. But perhaps you might have a new idea here?

1. January 26, 2013 10:54 am

There is a study called generalized primes that investigates in a more general way the relationship between arbitrary things called primes and all the things called composites that can be composed from them according to specified laws of composition. Comparisons can be made between different numerical systems or any kind of combinatorial species that one might imagine. I seem to recall at least one old monograph by Rademacher on the subject.

My own fascination with questions like that led me many years ago to the Riff & Rote trick, a special case of the Make A Picture (MAP) trick, and there is a bit on that here.

January 26, 2013 6:04 pm

“The numbers that primes map to are called ‘W-tricked primes.’ ”

Or are they the numbers mapped to the primes?

3. January 27, 2013 2:03 am

“The trouble with tricks is that they often have no name, or an un-descriptive name… and consequently are hard to find.”

I probably emphasize learning the names of things more than any other instructor I know of, for the purposes of (a) enabling conversation, (b) engaging the student’s language-center, and (c) writing justifications. Sometimes in desperation I use very obscure names or make up my own. Interesting that an added point is to make topics searchable in the current context.

January 27, 2013 11:48 am

Some mathematicians are good at finding tricks, others at using them, and yet others at converting them into theorems.

January 27, 2013 1:55 pm

Geometric (thermo)dynamics — both classical and quantum — is replete with natural dualities, of which a partial (and not-too-formal) list is:

$\begin{array}{rcl} \hline \rule[2.5ex]{0pt}{0pt}\text{Riemannian sharp''}\ \sharp&\leftrightarrows&\text{Riemannian flat''}\ \flat\\ \text{symplectic sharp''}\ \sharp&\leftrightarrows&\text{symplectic flat''}\ \flat\\ \rule[-1.5ex]{0pt}{0pt}\text{complex structure (tangents)}&\leftrightarrows&\text{complex structure (forms)}\\ \hline \rule[2.5ex]{0pt}{0pt}\text{Hamiltonian flows}&\leftrightarrows&\text{Carmichael unravellings}\\ \text{local informatic causality}&\leftrightarrows&\text{nonlocal quantum entanglement}\\ \rule[-1.5ex]{0pt}{0pt}\text{thermodynamic potentials}&\leftrightarrows&\text{thermodynamic densities}\\ \hline \rule[2.5ex]{0pt}{0pt}n\text{-forms}&\leftrightarrows&\text{Hodge-dual}\ (n\text{-}k)\text{-forms}\\ \rule[-1.5ex]{0pt}{0pt}\text{exterior derivatives}\ d&\leftrightarrows&\text{exterior coderivatives}\ d^\ast\\ \hline \end{array}$

These interlocking dualities serve to “geometrize” various wonderful conjectures in computational complexity theory (for example, that matrix multiplication and matrix inversion have the same numerical complexity). Moreover, these dualities provide a unifying mathematical naturality to texts that otherwise seem disparate:

• Jack Lee’s Introduction to Smooth Manifolds
• Vladimir Arnold’s Mathematical Methods of Classical Mechanics
• Michael Nielsen and Isaac Chuang’s Quantum Computation and Quantum Information
• Howard Carmichael’s Statistical Methods in Quantum Optics (I and II)
• BSL’s Transport Phenomena

Thus what initially seems like a mathematical “trick” ofttimes can be appreciated more broadly.

Summary  the progression $\text{tricks''}\Rightarrow\text{dualities}\Rightarrow\text{naturality}\Rightarrow\text{unity}$ is fun to unravel :)

January 27, 2013 4:42 pm

Golly, here are two further crucial dualities:

Ito increments (informatic naturality) $\leftrightarrows$ Stratonovich increments (geometric naturality)

• the pushforward of trajectories (and tangents) $\leftrightarrows$ the the pullback of functions (and $n$-forms)

So ubiquitous are these dualities that they have given rise not just to new mathematical notations, but even new art (per Hoffman’s Commutative Diagrams in the Fine Arts, for example).

The standard texts of the 20th century seldom employ diagrammatic notation (e.g. the above references by Arnold, Nielsen & Chuang, Carmichael, BSL) … this constitutes a substantial — and needless — barrier to cultivating a unified appreciation of these texts … because it makes these texts look like unstructured compendia of tricks … when in fact they are no such thing!

January 27, 2013 7:17 pm

Two more dualizing transforms are time reversal and Legendre transforms (formally both are involutions), which are associated to the celebrated computational/physical “tricks” of Onsager reciprocity and the duality of (entropy)$\,\leftrightarrows\,$(free energy).

6. January 28, 2013 1:40 am

You probably forgot that topology used to be called analysis situs.

January 28, 2013 10:57 am

As Poincaré was aware of, analysis situs – which is how he called topology – is pervasive in math. I think that’s why it ended up with a single-word name. Most other subjects quoted by Dick actually make use of topology.

7. January 28, 2013 11:40 am

One of my favorite tricks — it seems almost too tricky to be true — is the Propositions As Types Analogy. And I seem to see hints that the 2-part analogy can be extended to a 3-part analogy.

• proof hint : proof : proposition :: untyped term : typed term : type

January 29, 2013 7:49 pm

For me, the proof-program equivalence is much more than a trick – it’s as fundamental as the wave-particle duality in quantum mechanics. It’s no coincidence that in both domains, uncertainty holds. In physics we have Heisenberg’s relations and in complexity theory, the impossible separation of some well-known complexity classes.

• January 30, 2013 1:40 pm

Perhaps, but other times I think that mode-node duality, and all its ilk, are just a manifestation — the word implies being struck with a fist, or maybe just a facepalm, but never mind that now — of the fact that our currently received concepts are just too clumsy to conceive of reality as it really is.

January 30, 2013 4:08 pm

You’re probably right. In any case, making sense of even the above-mentioned analogy requires a kind of math that doesn’t exist as of yet.

January 31, 2013 12:50 am

It’s been a long time since I could consider myself (almost) a mathematician. Am wondering if since W-tricked numbers are relatively prime almost all the time, there might be some way to use them to improve upon the Miller-Rabin primality test?

9. January 31, 2013 4:46 pm

Dualities like these point to a higher unity — a calculus of forms whose expressions can be read in two different ways by switching the meanings assigned to a pair of primitive terms.

• February 6, 2013 11:12 am

Update

C.S. Peirce explored a variety of De Morgan type dualities in logic that he treated on analogy with the dualities in projective geometry. This gave rise to abstract formal systems where the initial constants — and later on their geometric or graph-theoretic representations — had no fixed meaning but could be given dual interpretations in logic.

It was in this context that his systems of logical graphs developed, issuing in dual interpretations of the same formal axioms that Peirce referred to as “entitative graphs” and “existential graphs”. It was only the existential interpretation that he developed very far, since the extension from propositional to relational calculus seemed easier to visualize there, but whether there is some truly logical reason for the symmetry to break at that point is not yet known to me.

When I have explored how Peirce’s way of doing things might be extended to “differential logic” I have run into many themes that are analogous to differential geometry over GF(2). Naturally, there are many surprises.