Skip to content

Codes Meet Numbers

March 9, 2014

Results and problems about primes that make us think of coding theory

Cropped from source.

Zhi-Wei Sun is a professor in Mathematics at Nanjing University in China. He is the Editor in Chief of the recently-founded Journal of Combinatorics and Number Theory. His homepage is unique in prominently featuring a long list of

  • …not his awards,

  • …not his papers,

  • …not his theorems,

  • …but rather his conjectures.

Indeed we count 432 total conjectures in his list, subtracting one that he seems last year to have proved. They do not seem to be easy conjectures—this one implies the Riemann Hypothesis in a nontrivial way. Some of them involve powers of 2, which lend them a coding-theory flavor.

Today Ken and I wish to share ideas of using coding theory to prove number-theoretic results.

Coding theory owes much to Richard Hamming, who defined Hamming distance between codewords and created the binary Hamming codes. The Hamming sphere of radius {d} centered on a word {x} is the set of all words {y} of the same length whose Hamming distance is at most {d}, meaning {x} and {y} differ in at most {d} places. Our question is how the distributions of prime numbers and other sets relate to the Hamming spheres of their binary encodings. For example, {5 = 101} and {7 = 111} are twin primes of Hamming distance {1}, while {11 = 1011} and {13 = 1101} have distance {2}.

Sun has his own “Super Twin Prime Conjecture” listed first on his page. Call a pair {(k,m)} of integers “super” if {\pi(k)} and {\pi(\pi(m))} are both twin primes, indeed the lower member of a twin pair, where {\pi(k)} denotes the {k^{th}} prime number. The conjecture—a hybrid of Twin and Goldbach—is:

Every integer {n \geq 3} is the sum of a super pair.

He has verified this for {n} up to {10^9}. What can motivate such a conjecture? Certainly one motivation is expected density—not only “should” the twin primes be infinite, but they should be dense enough to fill this kind of requirement, much as the set of primes themselves is dense enough to make Goldbach’s conjecture—that every even number {n \geq 4} is the sum of two primes—plausible. But what other structure can supply motivation? That is what we seek.

Numbers and Codes

Coding theory is all about the properties of vectors over finite sets, often over the binary field {\{0,1\}}. Of course number theory is all about the additive and multiplicative properties of integers. These seem to be vastly different, yet they are related: every number can be represented by a binary vector.

Coding theory gives a way to express relationships that might be cumbersome with arithmetical tools such as congruences. For example, say a number {r} is “top modulo {2^m}” if its {m}-bit encoding begins with a {1}. Of course we can specify this as {2^{m-1} \leq r < 2^m}, but “less-than” is a dodgy concept when working modulo some number. When {m} is odd we might want to define instead that the middle bit of {r} in binary is a {1}. Middle bits are sometimes important in crypto, but relations involving them are not always easy to define via congruences.

Of course a number is odd iff its last bit in binary is {1}, and is congruent to {3} mod {4} iff its second-last bit is also {1}. The distinction between congruence to {1} versus {3} mod {4} is generally important for odd primes. How about congruence to {5} or {7} mod {8}, versus {1} or {3}, that is being top mod {8}? Of course one important distinction is which congruences are quadratic residues, but with binary-code notions we can define others.

The number {71 = 1000111} is congruent to {7} mod {8}, and is part of a twin pair of Hamming distance {3} with {73 = 1001001}. The first twin pair with greater Hamming distance actually gives distance {6}: {191 = 10111111} and {193 = 11000001}. Next comes {2687 = 10101111111} and {2689 = 10110000001} for distance {7}. Is the Hamming distance of twin primes unbounded?

Of course we don’t even know if there are infinitely many twin primes. This is really asking whether twin primes can flank a multiple of an arbitrarily large power of {2}. Quick web searches have not found this question, while our point is that our motive came from the coding-theory angle.

Polignacs and Twins and Spheres

In 1848, Camille Marie de Polignac somewhat lazily conjectured that every odd number {n} is the sum of a prime number and a power of {2}. We speculate that he may have intended to exclude false cases such as {n = 127} where {n} itself is prime, but even so he missed {n = 905} amid his reported (but to us incredible) claim to have verified this up to about 3 million. Perhaps it was spoken with the bluster of a Confederate major-general during the Civil War, which is surprisingly what this French nobleman became.

His older brother, Alphonse de Polignac, made a somewhat less lazy conjecture: that every even number {k} is the difference between infinitely many pairs of consecutive primes. Of course with {k=2} this subsumes the Twin Primes Conjecture, and indeed the latter is sometimes called de Polignac’s Conjecture after him.

Should Camille have teamed up with his brother to make his conjecture? And would they have done better if they had been twins—maybe prove something about their conjecture? Well we have an example to go by: Zhi-Wei Sun has a twin brother, Zhi-Hong Sun at Huaiyin Normal University, and they teamed up in 1992 to prove something about the following conjecture by Donald Wall:

There are infinitely many primes {p} such that either {p \equiv \pm 1} modulo {5} and {p^2} divides the Fibonacci number {F_{p - 1}}, or {p \equiv \pm 2} modulo {5} and {p^2} divides the Fibonacci number {F_{p + 1}}, where {F_n} is indexed with {F_0 = 0}, {F_1 = 1}.

What Sun and Sun proved is that any minimal counterexample to Fermat’s Last Theorem would need to involve such a prime—of course from the proof of FLT two years later, we know there are none. They also gave a sufficient condition for a “Wall-Sun-Sun prime” to exist, though none has yet been found.

Back to the Polignacs, we can try to capture ideas of both their conjectures with a case of what is actually a pretty general kind of question—a kind one can pose about other sets of numbers besides the primes:

What is the minimum {d} such that every odd number is within the distance-{d} Hamming sphere of a prime number? Is it finite?

To get the even numbers too we can add {1} to {d}. Of course this is still a strong “every” kind of conjecture, and those are hard to prove. One can first try to attack “infinitely-many” versions. Obviously there are infinitely many odd numbers {q} that are {\pm 2} from a prime {p}, but if we insist that {q} too be prime we have our old friend the Twin Prime Conjecture again. So here is the corresponding Hamming-like question:

What is the minimum {d} such that there are infinitely many prime numbers {q} that are within Hamming distance {d} of some other prime number?

Using Hamming’s own ideas in coding theory, we can prove the minimum is at most {d=2}. Note that this is stronger than saying there are infinitely many pairs of primes {(p,q)} such that {q - p = 2^k \pm 2^l}, because we are also restricting what {p} and {q} have in the bit positions {k} and {l} from the end.

The Proof

The theorem is not that amazing, or unexpected, but we think how we prove it may be of interest. The proof is via Hamming’s famous theorem on the density of codes. Let {A_{q}(n,d)} be size of the largest set {S} of {n} vectors over an alphabet of size {q} so that any two distinct code words in {S} are at least Hamming distance {d} apart.

Theorem 1 For all {n} and {q} and {d}:

\displaystyle  A_{q}(n,d) \le \frac{q^{n}}{\sum_{k=0}^{t} \binom{n}{k}(q-1)^{k}},


\displaystyle  t = \left\lfloor \frac{d-1}{2} \right\rfloor.

Now to state formally what we are proving, it is:

Theorem 2 For every sufficiently large {n}, there are primes {p,q} with {2^n \leq p,q < 2^{n+1}} such that {p} and {q} have Hamming distance at most {2}.

Proof: Consider the set {S} of all primes in the interval {[2^{n}, \dots, 2^{n+1}-1]}. These of course can be represented by {n}-bit vectors, eliding the leading {1} in the {2^n} place. Think of them as a code. We will show that its minimum distance {d} is at most {2}, from which the theorem follows.

Suppose that the distance is larger, that is {d \ge 3}. The apply Hamming’s Theorem for {A_{2}(n,d)} noting that {t \ge 1}. This yields

\displaystyle  |S| \le \frac{2^{n}}{\sum_{k=0}^{t} \binom{n}{k}} \le \frac{2^{n}}{1+n}.

The Prime Number Theorem states that the density of primes up to {N} converges to {N/\ln N} as {N \longrightarrow \infty}, where {\ln} is natural log. By an easy manipulation of estimates it follows that for any {\epsilon > 0} and all large enough {N}, the primes have density at least {(1-\epsilon)\frac{1}{\ln N}} in {[N/2,\dots,N-1]}. It follows with {N = 2^{n+1}} that

\displaystyle  |S| \geq (1-\epsilon)\frac{2^{n}}{(n+1)\ln 2}.

Since {1/\ln 2 = \log_2 e = 1.44\dots}, this implies with the above that

\displaystyle  \frac{1.44(1-\epsilon)2^n}{n+1} \le \frac{2^n}{n+1},

but this is clearly false for large enough {n} and small enough {\epsilon}. This contradiction proves {d \leq 2} and hence the theorem. \Box

As the proof shows, there is actually a fair bit of “slack” in the counting. Hence the theorem can be extended to add conditions on the primes: we can further restrict the sizes of the primes, their residues modulo small numbers, and other properties. Indeed the counting makes this all close to working with {d=1}. That takes us back within the sphere of Camille de Polignac’s question as well.

An Obstinate Question?

For {d=1} the question again comes in “every” (with the qualifier “large enough”) and “infinitely many” flavors:

  1. Is it true that for every large enough prime {p}, there is a prime {q} such that {p} and {q} differ in one bit, or at least differ by a power of {2}? (no)

  2. Are there infinitely many primes {p} such that for some other prime {q}, {p} and {q} differ in one bit, or at least {q - p} is a power of {2}? (open)

Note that {q > p} is allowed. Without that condition we’d have already the counterexample {p = 127} to Camille’s conjecture. Incidentally, counterexamples to Camille are called obstinate numbers, and there are various pages devoted to enumerating them. A case where {q > p} is important is {p = 113,\!921}: {q = p + 2^{141}} is prime. Of course whenever {q \geq 2p} such a pair also has Hamming distance {1}, using leading {0}‘s to pad {p} to the same length as {q}.

In 1964, Fred Cohen and John Selfridge noted that allowing {q > p} made Camille’s idea good for every odd number up to {262,144}. However, they proved that the prime

\displaystyle p = 47,\!867,\!742,\!232,\!066,\!880,\!047,\!611,\!079

cannot be written as {p = q \pm 2^k} with {q} prime. Moreover, they gave an infinite arithmetic progression of numbers {a + bn} with {a,b} coprime that cannot be written as {\pm 2^k \pm q^l} with {q} prime. Since every such progression contains infinitely many primes, this finally lays question 1 to rest even with the “large enough” modifier. Zhi-wei Sun himself made good on something stated in their abstract but not proved in their paper, in a 2000 paper giving an infinite arithmetic progression of numbers that cannot be written as {p^k \pm q^l} with {p,q} prime and {k,l > 0}.

All this still, however, leaves the second question open. We would like to prove it, indeed find moderately dense sets of such pairs.

Open Problems

Are there infinitely many pairs of primes that differ in just one bit?

We note that there are people who have thought about connections between coding theory and number theory. For example, Toyokazu Hiramatsu and Günter Köhler have a whole monograph titled Coding Theory and Number Theory on this topic. But the idea there is to apply number theory to shed light on the structure of codes. Elliptic curves, for instance, can be used to construct certain interesting codes. We are interested in the impact of coding theory on number theory, such as the distribution of important sets like the primes, which in turn may have applications in computing theory.

[$2^k + 2^l$ changed to $2^k \pm 2^l$, fixed ceiling to floor in Hamming bound]

6 Comments leave one →
  1. March 9, 2014 1:36 am

    I think Theorem 2 only implies that there are primes whose difference is the sum OR difference of two 2 powers, while before the Proof you claim it with only the sum.

    • March 9, 2014 10:14 am

      Thanks—fixed; $2^k \pm 2^l$ was intended there.

  2. March 9, 2014 5:42 am

    Riffs and Rotes

  3. Beppe permalink
    March 10, 2014 7:59 am

    Hi to everyone. I’m very interested in your post, but I have a question about the Hamming theorem: in your post the t parameter is defined via a ceil function, while in the Wiki page you have indicated it is defined via the floor: which one is the correct? (ok, the ceil version gives a stronger conclusion, but is it correct?)

    • March 11, 2014 11:30 am

      Thanks—you are correct, missed this while editing.

  4. March 10, 2014 1:05 pm

    yeah codes & number theory seem to have some deep connections as youve pointed out, wish there was a comprehensive survey somewhere esp in electronic form. (Hiramatsu/Kohler ref seems to come close, hadnt heard, looks great, thx) reminds me of use of sieves in the recent zhang twin prime results, and computers/experimental attacks were used to narrow the sieves. wonder if there is some more general sieve theory that connects number theory/TCS. the sieving in that case maybe a good topic for further (blog) inquiry? it seems to be a large area in number theory that connects somewhat naturally with TCS angles….

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s