Skip to content

Bias In The Primes

March 26, 2016

Casinos beware—the primes are not random

Quanta source (K.S. at left)

Robert Lemke Oliver and Kannan Soundararajan have observed that the primes fail some simple tests of randomness in senses that are both concrete and theoretical.

Today we discuss this wonderful work and what it means both for properties of the primes and for asymptotics.

At the heart is the question,

How closely does the sequence of primes emulate a truly random sequence?

Lemke Oliver and Soundararajan use a genre of simple tests of randomness—ones that we might have thought would be true. The simplest one is:

Let {p_{i}} and {p_{i+1}} be consecutive primes in order. If {p_{i}} is congruent to {a} mod {q}, then {p_{i+1}} is less likely to also be congruent to {a} mod {q}.

They note that among the first billion primes the expected frequency is much less than {1/r}, where {r = \phi(q)} is the number of positive integers {b < q} that are relatively prime to {q}. They argue based on a famous {k}-tuple conjecture of Godfrey Hardy and John Littlewood holds that this bias persists through the whole sequence of primes. Yet it follows that the proportion of cases where the next prime has the same congruence mod {q} is {\Theta(1/r)}. This is not a contradiction.

Primes Are Not Random

From first principles the primes are completely not random: they are determined by a tiny rule. In this respect they are like the digits of {\pi}. Unlike {\pi} the primes also flunk an immediate frequency test: only one even number is prime. We can make a fairer comparison by taking the primes mod {q} and expanding {\pi} in base {r = \phi(q)}. Call these infinite base-{r} sequences {P_q} and {\pi_r}, respectively. Jacques Hadamard and Charles-Jean de la Vallée Poussin proved that each “digit” of {P_q} appears with frequency asymptotic to {1/r}, but this has not been proved for any {\pi_r} even with {r = 2}. Yet {\pi_r} and its analogues for {e} and {\sqrt{2}} and other famous irrational constants are commonly expected to be normal in every base {r}. Normal means that every sequence of {k} digits occurs with frequency asymptotic to {1/r^k}; the case {k=1} is called “simply normal.”

Indeed, the normality of {\pi_r} and the other constants follows from some reasonable hypotheses about digital dynamical systems. So what about {P_q}? One difference is that whereas the normality of a number in base {r} is equivalent to its being simply normal in base {r^k} for all {k}, this doesn’t carry over to the primes mod {q} even when you find {q'} such that {\phi(q') = r^k}. Still, it is significant that the primes meet the {k=1} case for all {q}. So what about {k = 2}? This is where Lemke Oliver and Soundararajan weighed in with a universally regarded surprise.

The Bias

They posted their paper two weeks ago, and it has been covered by Scientific American, by Nature, by the New Scientist, and by Quanta Magazine. Terry Tao also posted about it, and he explicates the gist in a wonderful way.

Also wonderful are the diagrams in this 2002 paper by Chung-Ming Ko, in which much of the concrete numerical evidence was observed. A 2011 paper by Avner Ash, Laura Beltis, Robert Gross, and Warren Sinnott, observed the numerics and went further to try to explain them by heuristic formulas. However, one needs Lemke Oliver and Soundararajan (and a nod to Tao’s exposition) to plumb the relations among asymptotics, heuristics, and theory.

The easiest case for us to visualize is {q = 10}, {r = 4}. All primes other than 2—which is a very odd prime—must have a last digit of 1, 3, 7, or 9. The first and last of these are quadratic residues mod 10, and a long-known bias, which was first noted by Pafnuty Chebychev in the case {q = 4}, {r = 2}, explains why they usually occur less often in concrete tables up to some number {x} than 3 and 7 as last digits. For the first {n = 100,\!000,\!000} primes the counts are:

\displaystyle  1: 24,\!999,\!437\qquad 3: 25,\!000,\!135\qquad 7: 25,\!000,\!401\qquad 9: 25,\!000,\!027

Here only primes congruent to 1 mod 10 are lagging overall. There are values of {n} for which 1 has the highest number, but—as has been proved assuming the Generalized Riemann Hypothesis—they are in certain senses sparse, so that this picture is typical. This bias varies as {O(\frac{1}{\sqrt{x}})} (recall that in general {n = \pi(x) \sim \frac{x}{\ln x}} by the Prime Number Theorem), so it is pretty mild. The bias found by Lemke Oliver and Soundararajan for {k=2}, however, blows it away. Let {a'} stand for the cyclic successor of {a} in {1,3,7,9}, so for example {(a,a''')} groups the pairs {(1,9),(3,1),(7,3),(9,7)}:

\displaystyle  (a,a)\!: 18,\!127,\!875\quad (a,a')\!: 29,\!896,\!434\quad (a,a'')\!: 27,\!754,\!430\quad (a,a''')\!: 24,\!221,\!261

Most of the bias is against the next prime having the same last digit as the previous one. It is gigantic. How can this be?

Not the Usual “Laws”

One can try to view the primes as the outcome of a random process. Say we are up in the range from {x/2} up to {x}, maybe up to {2x}. The density of primes there is about {\delta = \frac{1}{\ln x}}, so let’s picture a series of random trials each with success probability {\delta}. If {a} was the congruence mod {q} of our last success, then the point is that {a'}, {a''}, and {a'''} (etc.) get the next cracks at being prime before the turn of {a} comes around again. As some commenters to the above news items have remarked, this could be analogous to Benford’s Law in base {r}, which we once covered in regard to baseball.

Our {(a,a')\ldots} counts above make this plausible, but the idea falls apart when looking at individual pairs of digits. For instance, conditioned on a prime {p} being 3 mod 10, the next prime {p'} is more often congruent to 9 than 7, by a count of {7,\!502,\!896} to {7,\!043,\!695}. Lemke Oliver and Soundararajan (LOS) further note that the above bias should be no greater than the probability of {p_{n+1}} being less than {p_n + q}—because once you get to {p_n + q - 1} which is {\equiv a-1 \pmod{q}}, the case {(a,a)} has first crack in every coming cycle. But it is known that this probability is asymptotically less than the bias they identify in their argument. In a comment to his own post, Tao calculates for {n = 100,\!000,\!000} that the “next-crack” argument would predict 20% for {(a,a)}, not the observed 18%.

The authors say they were motivated by a second principle they heard in a lecture by Tadashi Tokieda: Suppose Alice rolls a 4-sided die with faces marked {1,3,7,9} trying to roll a {1} two consecutive times, while Bob rolls the same die trying to get a {1} followed by a {3}. Occurrences of {(1,1)} and {(1,3)} are equally likely, yet Alice will need appreciably more rolls on average to meet her goal than Bob. The reason is that when Alice rolls a {1} but then fails by getting {3}, {7}, or {9} on her next roll, she has to roll some more to get a {1} again, whereas Bob on rolling {1} can fail by getting another {1}, which sets him up immediately for another try. Stated in this form, the issue resembles the exposé of a simple bias of studies of the “hot hand” fallacy, which we discussed last October.

However, this argument goes away when we change Alice’s goal to rolling any {(a,a)} and Bob’s to rolling any {(a,a')}. Now both have a 1-in-4 chance of winning on any roll after the first, so there should be no bias. Another fix to Alice and Bob, analogous to our suggestion in the “hot-hand” case, makes no difference for LOS: make their goals be to roll {(1,1)} and {(1,3)} beginning with an odd-numbered roll. Then their expected trials to first success is the same. In the prime case this means considering {(p_n,p_{n+1})} where {n} is odd ({n \geq 4} to rule out the primes {2,3,5}). Does the bias in such pairs persist? Yes, because we could equally well start with {n} even, and the overall bias is the average of these two cases.

However imperfect, these analogies directed them to suspect and consider a deeper connection to the theory of gaps between primes.

The Argument of the Paper

At the heart is a conjecture by Godfrey Hardy and John Littlewood that we discussed last August. LOS started with a strong form of it that expedited their mod-{q} twist. Given any prime {p} and any finite set {H} of integers including {0}, let {N_p(H)} denote the proportion of {a \in [0 \dots p-1]} that are not congruent to any element of {H} modulo {p}. Then define the product series

\displaystyle  S(H) = \prod_p N_p(H) \left(\frac{p}{p-1}\right)^{|H|}.

If {H} hits every congruence class modulo some {p} then {S(H) = 0}, so in particular {H} cannot include any odd numbers. If {H = \{0\}} then {N_p(H) = \frac{p-1}{p}} so {S(H) = 1}. Finally let {\pi_H(x)} denote the number of {u \leq x} such that for all {h \in H}, {u + h} is prime. The so-called “First” Hardy-Littlewood conjecture then states that for any {H} and {\epsilon > 0},

\displaystyle  \pi_H(x) = S(H)\int_{z=2}^{z=x} \frac{1}{(\ln z)^{|H|}} dz + O(x^{1/2 + \epsilon}).

When {H = \{0\}}, the integrand is the offset log-integral function (the paper uses {\text{li}(x)} for this) and this becomes a well-known equivalent form of the Riemann Hypothesis (RH)—and still equivalent if the error is sharpened to {O(x^{1/2}\ln x)}. When {H = \{0,2\}} this subsumes the Twin Primes conjecture. Thus Hardy and Littlewood presented their conjecture as a natural strong extension of RH.

The twist by LOS starts by excluding the finitely many primes {p} that divide {q} from the product, calling the resulting function {S_q(H)}. They need to relate this to tuples a of congruences mod {q}. They break a down according to how equal congruences mod {q} are situated; these become like diagonal entries of a matrix and there are symmetries around this “diagonal.” The lone hard-and-fast theorem in their paper, called Proposition 2.1, shows that their formulas faithfully follow the original Hardy-Littlewood intuition about the behavior of the series. Eventually they boil all this down into functions {c_1(q;\text{\bf a})} and {c_2(q;{\bf a})} which become appropriate constants when {q} and a are fixed.

In terms of {n} and the length {k} of a, and recalling {r = \phi(q)}, we can call {\frac{n}{r^k}} the “naive” estimate of the number {\pi(x;q,\text{a})} of primes {p_m} among the first {n} primes such that {p_{m+j} \equiv a_j \bmod{q}} for each {j}, {0 \leq j < k}. In terms of {x}, the natural estimate is {\frac{\text{li}(x)}{r^k}}. The new conjecture by LOS shows how far this needs to be adjusted—and the overhead is far bigger than the gentle {O(\frac{1}{\sqrt{x}})} in the analogous factored form of RH or Hardy-Littlewood:


\displaystyle  \pi(x;q,\text{a}) = \frac{\text{li}(x)}{r^k}\left(1 + c_1(q;\text{\bf a})\frac{\ln\ln x}{\ln x} + c_2(q;\text{\bf a})\frac{1}{\ln x} + O(\frac{1}{(\ln x)^{7/4}})\right).

As often in number theory, the “{+ O(\cdots)}” is meant in the sense of “{\pm e(x)} for some error term {e(x) = O(\cdots)}.” The paper closes with detailed numerical evidence supporting this expression. As usual we say to consult the paper for further details, but we have some final remarks about the two or three big terms in this formula.

About the Asymptotics

The three terms still all go to {0} as {x \rightarrow \infty}. Let’s go back to our original focus on the {(a,a)} frequency. Call it {f_r(x)} for the primes below {x} and just {f_r} in the limit. from being asymptotic to {1/r}. Then it is perfectly mathematically consistent to prove theorems—maybe contingent on a generalized RH and/or Hardy-Littlewood and/or related principles—with diametrically opposite messages:

  • (a) {f_r \sim \frac{1}{r}} —so in the long run the primes behave randomly in this case after all.

  • (b) {f_r(x)} is always vastly less than {\frac{1}{r}}, so the primes are always substantially non-random.

The paper indeed conjectures “always” in a related case, so we might not even have the kind of sporadic remissions seen in the Chebychev bias. We can’t imagine any more opposite than this.

Now number theory is used to emphasis on lower-order terms—as we noted above, RH is equivalent to lower-order behavior of {\pi(x)}. But it strikes us that complexity theory has rarely been so subtle. We prove theorems of type (a) all the time. Are we perhaps missing important ways to sharpen and qualify our asymptotic estimates?

Open Problems

How does this discovery align with other observations of unexpected structure in the primes, including ramifications of Stanislaw Ulam’s spiral?

Tao makes a point of noting—including in his followup comments to his own piece—that while the {O(\frac{1}{\ln x})} term is already sizable and responsible for much of the deviation tabulated for {n = 100,\! 000,\! 000}, the asymptotically bigger {O(\frac{\ln\ln x}{\ln x})} term is novel and potentially more important theoretically. We look forward to further developments. Can readers say ways in which asymptotics in complexity theory are showing the same maturity?

6 Comments leave one →
  1. March 26, 2016 11:23 am

    Let q = 2 … oh, never mind …

  2. March 28, 2016 10:43 pm

    ⭐💡😎❤❗😀 this new apparently fundamental prime pattern and the big media attn is super exciting to me and so way cool! the greats of the field such as hardy or ramanujan seemed to pursue similar patterns ages ago (which reminds me, theres a new ramanujan movie out at the end of april, hope you cover it, plan to blog on it myself.) note that the pattern was found with empirical studies which are an increasingly big deal and maybe even a very slow/ long paradigm shift in (T)CS in action. one might argue it ties in heavily with statistical/ “big data” approaches also.

    also, random walks and connections to fractals is some kind of key area of research for the future, it unites several fields such as CS, math, physics; it could be some kind of rosetta stone.

  3. April 1, 2016 11:58 am

    💡 ok half inspired by you, just wrote all this up myyself:

    number theory news/ highlights/ headlines/ topnotch links: prime bias discovery, Diffie-Hellman Turing prize, Zhang-Tao twin primes, Wiles Abel prize, Riemann / tmachine blog

    lots of complexity theory angles. at the end, am proposing an open problem of proving Euclids thm for infinitude of primes via the halting problem. think few mathematicians computer scientists will understand it because its crosscutting in a “not invented here, neither here nor there” sense.


  1. The Primes Strike Again | Gödel's Lost Letter and P=NP
  2. number theory news/ highlights: prime bias, Diffie-Hellman, Zhang-Tao, Wiles, Riemann | Turing Machine
  3. Missing Mate in Ten | Gödel's Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s