Skip to content

Fermat’s Little Theorem for Matrices

August 7, 2009

Fermat’s little theorem for matrices with application to potential factoring algorithms


Pierre de Fermat is perhaps best known for the proof that he never wrote down, and perhaps never had.

I have discovered a truly remarkable proof which this margin is too small to contain. (Around 1637)

This of course refers to his famous “Fermat’s Last Theorem,” that was finally proved by Andrew Wiles in 1995. Most doubt whether Fermat really had a proof, but it is an intriguing part of the history of this famous problem. If Fermat had a solution, he certainly did not have the brilliant one that Wiles found.

Today I would like to talk about some new, discovered this century, results that generalize Fermat’s Little Theorem to matrices. They do not seem to be well known, but are quite pretty. Perhaps they may have applications to some complexity theory problems.

While Fermat’s Last Theorem is very hard to prove, the version for polynomials is much easier to resolve. In particular, one can prove that

\displaystyle a(x)^{m} + b(x)^{m} = c(x)^{m}

has no solutions in non-constant polynomials, for {m>2}. There are many proofs of this fact. Often changing a problem from integers to polynomials makes it easier.

Let’s now turn to study number theory problems for matrices, not for integers nor for polynomials.

Fermat’s Little Theorem

This states,

Theorem: If {p} is a prime and {m} an integer, then {m^{p} \equiv m \bmod p}.

There are many proofs of this beautiful theorem. One is based on the following observation:

\displaystyle  (x_{1} + \cdots + x_{m})^{p} \equiv x_{1}^{p} + \cdots + x_{m}^{p} \bmod p.

The proof of this equation follows from the binomial theorem, and the fact that

\displaystyle  {p \choose k} \equiv 0 \bmod p

for {p} prime and {0<k<p}. Then, simply set all {x_{i}=1}, and that yields,

\displaystyle  (1 + \cdots + 1)^{p} \equiv 1 + \cdots + 1 \bmod p,

which is { m^{p} \equiv m \bmod p.}


The brilliant Vladimir Arnold once stated:

There is a general principle that a stupid man can ask such questions to which one hundred wise men would not be able to answer. In accordance with this principle I shall formulate some problems.

Since Arnold is certainly not stupid, he was joking. Yet there is some truth to his statement: it is notoriously easy to raise questions in number theory that sound plausible, hold for many small cases, and yet are impossible to prove. Fermat’s Last Theorem exactly fit this category for over three hundred years.

In any event, Arnold did extensive numerical experiments, to search for a way to generalize Fermat’s Little Theorem to matrices. He noticed, immediately, that simply replacing the integer {m} by a matrix {A} will not work. For example, consider a matrix {A} that is nilpotent, but is not zero. Recall a matrix is nilpotent provided some power of the matrix is {0}. Then, for {p} large enough, {A^{p} = 0} and so clearly {A^{p} \not\equiv A \bmod p}.

Thus, Arnold was forced to extend the notion of what it meant for a theorem to be “like” Fermat’s Little Theorem. After extensive experiments he made make the following conjecture about the trace of matrices. Recall that the {\mathsf{trace}(A)} is the sum of the elements in the main diagonal of the matrix {A}.

Conjecture: Suppose that {p} is a prime and {A} is a square integer matrix. Then, for any natural number {k \ge 1},

\displaystyle  \mathsf{trace}(A^{p^{k}}) \equiv \mathsf{trace}(A^{p^{k-1}}) \bmod p^{k}.

In his paper, published in 2006, he found an algorithm that could check his conjecture for a fixed size {d \times d} matrix and a fixed prime. He then checked that it was true for many small values of {d} and {p}, yet he did not see how to prove the general case. He did prove the special case of {k=1}.

Finally, the general result was proved by Alexander Zarelua in 2008 and independently by others:

Theorem: Suppose that {p} is a prime and {A} is a square integer matrix. Then, for any natural number {k \ge 1},

\displaystyle  \mathsf{trace}(A^{p^{k}}) \equiv \mathsf{trace}(A^{p^{k-1}}) \bmod p^{k}.

An important special case is when {k=1}, recall Arnold did prove this case,

\displaystyle  \mathsf{trace}(A^{p}) \equiv \mathsf{trace}(A) \bmod p.

For an integer matrix {A}, the corresponding coefficients of the characteristic polynomials of {A} and {A^{p}} are congruent {\bmod p}. This, in effect, generalizes the statement about traces.


Arnold’s conjecture—now theorem—has some interesting relationship to the problem of factoring. If we assume that factoring is hard, then his theorem can be used to prove lower bounds on the growth of the traces of matrix powers.

Let’s start with a simple example. I will then show how we can get much more powerful statements from his theorem. Consider a single integer matrix {A}, and look at the following simple algorithm where {n=pq} is the product of two primes.

  1. Compute {\alpha = \mathsf{trace}(A^{n}) \bmod n.}
  2. Then, compute the greatest common divisor of {\alpha-k} and {n} for all {k=1,\dots,\log^{c} n} where {c} is a constant.

Clearly, if this finds a factor of {n} it is correct. The only question is when will that happen? Note, by Arnold’s theorem,

\displaystyle  \begin{array}{rcl}  	\alpha &\equiv& \mathsf{trace}(A^{pq}) \bmod p \\ 			&\equiv& \mathsf{trace}(A^{q}) \bmod p. \end{array}


\displaystyle  \begin{array}{rcl}  	\alpha &\equiv& \mathsf{trace}(A^{pq}) \bmod q \\ 			&\equiv& \mathsf{trace}(A^{p}) \bmod q. \end{array}

The key observation is: suppose that the trace of the powers of the matrix {A} grow slowly. Let {\alpha \bmod p \neq \alpha \bmod q}, and let one of these values be small. Then, for some {k} the value {\alpha-k} will be zero modulo, say, {p} and not zero modulo {q}. Thus, the gcd computation will work.

All this needs is a matrix so that the trace of its powers grow slowly, and yet are different. I believe that this is impossible, but am not positive.

We can weaken the requirements tremendously. For example, replace one matrix {A} by a family of integer matrices {A_{1},\dots,A_{k}}. Then, define {\alpha(m)} as:

\displaystyle  \sum_{i=1}^{k} \lambda_{i}\mathsf{trace}(A_{i}^{m})

where all {\lambda_{i}} are integers. Note, this value is always an integer.

Now the key is the behavior of the function {\alpha(m)}. In order to be able to factor this function must have two properties:

  1. There must be many values of {m} so that {\alpha(m)} is small;
  2. The values of {\alpha(p)} and {\alpha(q)} should often be distinct.

If these properties are true, then the above method will be a factoring algorithm. Thus, an example of reverse complexity theory (RCT) is: if factoring is hard, then any {\alpha(m)} that is non-constant must not have many small values. This can easily be made into quantitative bounds. I know that some of these results are proved, but their proofs often depend on deep properties from algebraic number theory. The beauty of the connection with factoring is that the proofs are simple—given the strong hypothesis that factoring is hard.

Since I am open to the possibility that factoring is easy, as you probably know, I hope that there may be some way to use these ideas to attack factoring. But either way I hope you like the connection between the behavior of matrix powers and factoring.

Gauss’s Congruence

There are further matrix results that also generalize other theorems of number theory. For example, the following is usually called Gauss’s congruence:

Theorem: For any integer {a} and natural number {m},

\displaystyle  \sum_{d | m} \mu(\frac{m}{d})a^{d} \equiv 0 \bmod m.

Here {\mu(n)} is the famous Möbius function: if n=1, then \mu(n)=1; if n is divisible by the square of a prime, then \mu(n)=0; if n is divisible by k distinct primes, then \mu(n)=(-1)^{k}.

By the way, why does Gauss get everything named after him? I guess we should join in and create a complexity class that is called GC, for “Gauss’s Class.” Any suggestions?

Zarelua proves that Gauss’s congruence generalizes nicely to matrices:

Theorem: For any integer matrix {A} and natural number {m},

\displaystyle  \sum_{d | m} \mu(\frac{m}{d})\mathsf{trace}(A^{d}) \equiv 0 \bmod m.

As an example, let {m=pq} the product of two primes. then this theorem shows that we can determine,

\displaystyle  \alpha = \mathsf{trace}(A^{p}) + \mathsf{trace}(A^{q}) \bmod pq

for any integer matrix {A}. What intrigues me is, if {\alpha} is less than {pq}, then we get the exact value of

\displaystyle  \mathsf{trace}(A^{p}) + \mathsf{trace}(A^{q}).

Can we do something with this ability? In particular, can it be used to get some information about the values of {p} and {q}?

Open Problems

I believe there are two interesting types of questions. The first is what can we say about the growth of such sums of matrix traces? Can we improve on known bounds or give shorter proofs based on the hardness of factoring? This would be a nice example of RCT.

Second, a natural idea is to look at other Diophantine problems and ask what happens if the variables range over matrices? What known theorems remain true, which become false, and which become open?

About these ads
10 Comments leave one →
  1. August 7, 2009 6:53 pm

    Some comments:

    1. Here is a short proof of the trace result. A is congruent mod p to a matrix with non-negative integer entries, so let it denote the adjacency matrix of a graph. The cyclic group of order p^k acts on closed walks of length p^k in the obvious way, and computing \text{tr(A^{p^k}) \bmod p^k simply means ignoring the orbits of size p^k. The remaining orbits must all be obtained by repeating walks of length p^{k-1}, whence the result. This generalizes the argument I gave here.

    2. In a similar way one obtains a short proof of Gauss’s congruence: by Mobius inversion \frac{1}{m} \sum_{d | m} \mu \left( \frac{m}{d} \right) \text{tr}(A^d) counts the number of aperiodic paths of length m.

    3. The growth rate of traces of an integer matrix is related to the Mahler measure problem, since if a matrix’s traces grow slowly its characteristic polynomial has small Mahler measure. So this seems like a difficult but interesting avenue of attack. If the conjecture is true then the traces grow at least as fast as 1.1762^k.

    • August 7, 2009 7:00 pm

      Whoops, some corrections.

      1, 2. The formula that didn’t parse is \text{tr}(A^{p^k}) \bmod p^k. I guess what I’m trying to say here is that I am astonished these results aren’t already well-known in the literature. Surely there are references earlier than this decade!

      3. Even the truth of the Mahler measure conjecture leaves open the possibility that there are many small roots of absolute value greater than 1. Based on the known extremisers, however, this seems extremely unlikely.

      • rjlipton permalink*
        August 7, 2009 11:00 pm

        I am sure there are. But these are the ones that I found. In any event these results do not seem to be well known…

  2. August 10, 2009 1:11 am

    “If Fermat had a solution, he certainly did not have the brilliant one that Wiles found. ”
    Quite a stupid statement!

    • rjlipton permalink*
      August 10, 2009 6:58 am

      I guess it is a bit. But what I really meant is that… oh forget it.

  3. Le Gu permalink
    September 11, 2009 6:49 pm

    There is simple lifting of any integer matrix A to some boolean matrix B which adds
    only zeros to the spectra: spectra(B) = spectra(A) + {zeroes}. Actually, in some
    universal basis B = A \oplus 0.
    Thus it is sufficient to consider just {0,1} matrices. Perhaps this observation can simplify things. I really enjoyed the post, your blog is one of my favorites.

    • rjlipton permalink*
      September 11, 2009 9:06 pm

      Thanks very much. Perm(A)=0 is easy for non-negative matrices since reduces to is there a matching?

  4. December 4, 2009 10:12 pm

    There’s a simple generalization of Fermat’s Little Theorem to matrices:

    Let A be a square matrix with non-zero eigenvalues, with A diagnoalizable; let p be prime, and let all residues be taken with respect to the components of the matrix (component-wise).

    APWRp-1 = (SDSPWR-1)p-1 = SDPWRp-1SPWR-1 = SISPWR-1 = I (mod p).

    APWRp-1 = I (mod p).

    (Note: aPWRb mean “a to the power of b”.


  1. Newton’s sums, necklace congruences, and zeta functions « Annoying Precision
  2. An Amplification Trick and STOC 2010 « Gödel’s Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 1,215 other followers