Codes Meet Numbers
Results and problems about primes that make us think of coding theory
![]() |
Cropped from source. |
Zhi-Wei Sun is a professor in Mathematics at Nanjing University in China. He is the Editor in Chief of the recently-founded Journal of Combinatorics and Number Theory. His homepage is unique in prominently featuring a long list of
- …not his awards,
- …not his papers,
- …not his theorems,
- …but rather his conjectures.
Indeed we count 432 total conjectures in his list, subtracting one that he seems last year to have proved. They do not seem to be easy conjectures—this one implies the Riemann Hypothesis in a nontrivial way. Some of them involve powers of 2, which lend them a coding-theory flavor.
Today Ken and I wish to share ideas of using coding theory to prove number-theoretic results.
Coding theory owes much to Richard Hamming, who defined Hamming distance between codewords and created the binary Hamming codes. The Hamming sphere of radius centered on a word
is the set of all words
of the same length whose Hamming distance is at most
, meaning
and
differ in at most
places. Our question is how the distributions of prime numbers and other sets relate to the Hamming spheres of their binary encodings. For example,
and
are twin primes of Hamming distance
, while
and
have distance
.
Sun has his own “Super Twin Prime Conjecture” listed first on his page. Call a pair of integers “super” if
and
are both twin primes, indeed the lower member of a twin pair, where
denotes the
prime number. The conjecture—a hybrid of Twin and Goldbach—is:
Every integer
is the sum of a super pair.
He has verified this for up to
. What can motivate such a conjecture? Certainly one motivation is expected density—not only “should” the twin primes be infinite, but they should be dense enough to fill this kind of requirement, much as the set of primes themselves is dense enough to make Goldbach’s conjecture—that every even number
is the sum of two primes—plausible. But what other structure can supply motivation? That is what we seek.
Numbers and Codes
Coding theory is all about the properties of vectors over finite sets, often over the binary field . Of course number theory is all about the additive and multiplicative properties of integers. These seem to be vastly different, yet they are related: every number can be represented by a binary vector.
Coding theory gives a way to express relationships that might be cumbersome with arithmetical tools such as congruences. For example, say a number is “top modulo
” if its
-bit encoding begins with a
. Of course we can specify this as
, but “less-than” is a dodgy concept when working modulo some number. When
is odd we might want to define instead that the middle bit of
in binary is a
. Middle bits are sometimes important in crypto, but relations involving them are not always easy to define via congruences.
Of course a number is odd iff its last bit in binary is , and is congruent to
mod
iff its second-last bit is also
. The distinction between congruence to
versus
mod
is generally important for odd primes. How about congruence to
or
mod
, versus
or
, that is being top mod
? Of course one important distinction is which congruences are quadratic residues, but with binary-code notions we can define others.
The number is congruent to
mod
, and is part of a twin pair of Hamming distance
with
. The first twin pair with greater Hamming distance actually gives distance
:
and
. Next comes
and
for distance
. Is the Hamming distance of twin primes unbounded?
Of course we don’t even know if there are infinitely many twin primes. This is really asking whether twin primes can flank a multiple of an arbitrarily large power of . Quick web searches have not found this question, while our point is that our motive came from the coding-theory angle.
Polignacs and Twins and Spheres
In 1848, Camille Marie de Polignac somewhat lazily conjectured that every odd number is the sum of a prime number and a power of
. We speculate that he may have intended to exclude false cases such as
where
itself is prime, but even so he missed
amid his reported (but to us incredible) claim to have verified this up to about 3 million. Perhaps it was spoken with the bluster of a Confederate major-general during the Civil War, which is surprisingly what this French nobleman became.
His older brother, Alphonse de Polignac, made a somewhat less lazy conjecture: that every even number is the difference between infinitely many pairs of consecutive primes. Of course with
this subsumes the Twin Primes Conjecture, and indeed the latter is sometimes called de Polignac’s Conjecture after him.
Should Camille have teamed up with his brother to make his conjecture? And would they have done better if they had been twins—maybe prove something about their conjecture? Well we have an example to go by: Zhi-Wei Sun has a twin brother, Zhi-Hong Sun at Huaiyin Normal University, and they teamed up in 1992 to prove something about the following conjecture by Donald Wall:
There are infinitely many primes
such that either
modulo
and
divides the Fibonacci number
, or
modulo
and
divides the Fibonacci number
, where
is indexed with
,
.
What Sun and Sun proved is that any minimal counterexample to Fermat’s Last Theorem would need to involve such a prime—of course from the proof of FLT two years later, we know there are none. They also gave a sufficient condition for a “Wall-Sun-Sun prime” to exist, though none has yet been found.
Back to the Polignacs, we can try to capture ideas of both their conjectures with a case of what is actually a pretty general kind of question—a kind one can pose about other sets of numbers besides the primes:
What is the minimum
such that every odd number is within the distance-
Hamming sphere of a prime number? Is it finite?
To get the even numbers too we can add to
. Of course this is still a strong “every” kind of conjecture, and those are hard to prove. One can first try to attack “infinitely-many” versions. Obviously there are infinitely many odd numbers
that are
from a prime
, but if we insist that
too be prime we have our old friend the Twin Prime Conjecture again. So here is the corresponding Hamming-like question:
What is the minimum
such that there are infinitely many prime numbers
that are within Hamming distance
of some other prime number?
Using Hamming’s own ideas in coding theory, we can prove the minimum is at most . Note that this is stronger than saying there are infinitely many pairs of primes
such that
, because we are also restricting what
and
have in the bit positions
and
from the end.
The Proof
The theorem is not that amazing, or unexpected, but we think how we prove it may be of interest. The proof is via Hamming’s famous theorem on the density of codes. Let be size of the largest set
of
vectors over an alphabet of size
so that any two distinct code words in
are at least Hamming distance
apart.
Theorem 1 For all
and
and
:
where
Now to state formally what we are proving, it is:
Theorem 2 For every sufficiently large
, there are primes
with
such that
and
have Hamming distance at most
.
Proof: Consider the set of all primes in the interval
. These of course can be represented by
-bit vectors, eliding the leading
in the
place. Think of them as a code. We will show that its minimum distance
is at most
, from which the theorem follows.
Suppose that the distance is larger, that is . The apply Hamming’s Theorem for
noting that
. This yields
The Prime Number Theorem states that the density of primes up to converges to
as
, where
is natural log. By an easy manipulation of estimates it follows that for any
and all large enough
, the primes have density at least
in
. It follows with
that
Since , this implies with the above that
but this is clearly false for large enough and small enough
. This contradiction proves
and hence the theorem.
As the proof shows, there is actually a fair bit of “slack” in the counting. Hence the theorem can be extended to add conditions on the primes: we can further restrict the sizes of the primes, their residues modulo small numbers, and other properties. Indeed the counting makes this all close to working with . That takes us back within the sphere of Camille de Polignac’s question as well.
An Obstinate Question?
For the question again comes in “every” (with the qualifier “large enough”) and “infinitely many” flavors:
- Is it true that for every large enough prime
, there is a prime
such that
and
differ in one bit, or at least differ by a power of
? (no)
- Are there infinitely many primes
such that for some other prime
,
and
differ in one bit, or at least
is a power of
? (open)
Note that is allowed. Without that condition we’d have already the counterexample
to Camille’s conjecture. Incidentally, counterexamples to Camille are called obstinate numbers, and there are various pages devoted to enumerating them. A case where
is important is
:
is prime. Of course whenever
such a pair also has Hamming distance
, using leading
‘s to pad
to the same length as
.
In 1964, Fred Cohen and John Selfridge noted that allowing made Camille’s idea good for every odd number up to
. However, they proved that the prime
cannot be written as with
prime. Moreover, they gave an infinite arithmetic progression of numbers
with
coprime that cannot be written as
with
prime. Since every such progression contains infinitely many primes, this finally lays question 1 to rest even with the “large enough” modifier. Zhi-wei Sun himself made good on something stated in their abstract but not proved in their paper, in a 2000 paper giving an infinite arithmetic progression of numbers that cannot be written as
with
prime and
.
All this still, however, leaves the second question open. We would like to prove it, indeed find moderately dense sets of such pairs.
Open Problems
Are there infinitely many pairs of primes that differ in just one bit?
We note that there are people who have thought about connections between coding theory and number theory. For example, Toyokazu Hiramatsu and Günter Köhler have a whole monograph titled Coding Theory and Number Theory on this topic. But the idea there is to apply number theory to shed light on the structure of codes. Elliptic curves, for instance, can be used to construct certain interesting codes. We are interested in the impact of coding theory on number theory, such as the distribution of important sets like the primes, which in turn may have applications in computing theory.
[$2^k + 2^l$ changed to $2^k \pm 2^l$, fixed ceiling to floor in Hamming bound]
I think Theorem 2 only implies that there are primes whose difference is the sum OR difference of two 2 powers, while before the Proof you claim it with only the sum.
Thanks—fixed; $2^k \pm 2^l$ was intended there.
Riffs and Rotes
Hi to everyone. I’m very interested in your post, but I have a question about the Hamming theorem: in your post the t parameter is defined via a ceil function, while in the Wiki page you have indicated it is defined via the floor: which one is the correct? (ok, the ceil version gives a stronger conclusion, but is it correct?)
Thanks—you are correct, missed this while editing.
yeah codes & number theory seem to have some deep connections as youve pointed out, wish there was a comprehensive survey somewhere esp in electronic form. (Hiramatsu/Kohler ref seems to come close, hadnt heard, looks great, thx) reminds me of use of sieves in the recent zhang twin prime results, and computers/experimental attacks were used to narrow the sieves. wonder if there is some more general sieve theory that connects number theory/TCS. the sieving in that case maybe a good topic for further (blog) inquiry? it seems to be a large area in number theory that connects somewhat naturally with TCS angles….