# Why is Congruence Hard to Learn?

* Gears and rings… *

Flickr source: |

Denys Fisher was a British engineer and maker of board games and other toys. In 1965 he invented the Spirograph toy. Some speculate that he was influenced by the designs of the artist, composer, and film animator John Whitney, whose opening sequence for Alfred Hitchcock’s 1958 film “Vertigo” is considered the first use of computer graphics in cinema. The Spirograph toy involves drawing with a pen guided by a gear with *m* teeth going around inside a ring or around a track or other gear with *x* teeth. The kind of design you get depends on how *x* relates to *m*.

Today Ken and I want to talk about a basic notion of mathematics and theory that is simple to define, very useful, and yet seems to be tough for some students to get.

The notion is:

Often mathematical ideas are not easy to trace: who defined multiplication first, or square roots, or any other basic notion? Often the notion is credited not to the person who invented it, but the one who first used it in some important way.

That is not the case for the notion of *congruence*. We know who defined it, when they defined it, and what they used it for. The “they” was the 21-year-old Carl Gauss, who in his famous 1798 Disquisitiones Arithmeticae introduced both the concept and the notation.

It is interesting that while Gauss selected a perfect notation for his brilliant notion, others were less excited. Adrien-Marie Legendre wanted to write

,

with the *m* implicit. Others suggested variations of

Gauss had nailed the right notion and it stuck—thankfully.

Gauss objected to Legendre’s double use; Charles Babbage also said it violated the doctrine of *one notation for one thing*. What all seem to have had in common was wanting to view it as a relation between two things, but it really involves three: *m* as well as *x* and *y*.

A common expression that something is complex to understand is that it has “gears within gears.” As embodied in Spirograph, congruence is exactly that. Can we make it as easy as a child’s toy?

src |

## Why So Hard To Learn?

The definition is, of course, that

if and only if divides .

*I know most of you know this cold, but stay with us*. We are not trying to advance your knowledge of this famous definition, but discuss why it is hard to learn. I think there are topics in math that are complex to state but can be learned without much difficulty. Yet this simple definition of Gauss is in my teaching experience not easy for many students to really get. The fundamental question is:

Why is the notion of congruence so difficult to understand?

At Tech I used to teach the beginning discrete math class for our majors. It covered all the usual stuff: basic number theory, graph theory, set theory, basic rules of logic, Galois cohomology, and recurrences. Actually we did not cover Galois anything, but I added that to see if you were reading. I often started with number theory, since the students—I thought—had seen numbers since grade school. But the concept of congruence always seemed hard to make register. Was it my poor teaching? Or was it something else?

## A Theory

We have a modest theory on why congruence is tricky to learn. The main thesis is that it really is two important ideas that are thrown together. The problem is that the two ideas are not on equal footing. One is naturally high and abstract, the other so low one can easily “forget” it.

The first idea is that given an integer *m* there is the arithmetic modulo *m*. In many places on the web this is introduced as “clock arithmetic.” In the US this means that *m* = 12, while in Europe and elsewhere *m* = 24. But then it is explained how to add, subtract, and multiply numbers modulo *m*. Of course this is quite interesting, since this is essentially the study—in disguise—of finite commutative rings of a special type. The addition part is generated by the unit of the ring.

Gauss’s brilliant insight is that such simple objects had a deep and rich theory. The multiplicative structure of such a simple object is quite nontrivial. For example, there is the theorem that when the order—that is to say, the number of elements—is prime, then the nonzero elements form a cyclic group under multiplication. This is a deep theorem. Why should the 16 nonzero elements modulo 17 be cyclic? So are the 18 elements modulo 19, but Gauss found further properties that allow a regular 17-gon to be constructed by classical means and a regular 19-gon not. This verges into the theory of the Legendre symbol and quadratic reciprocity.

These and many other important theorems are available for the elements modulo *m*. This is a rich area, with many problems remaining open today. So the first hitch is that students must learn some slice of the basics of what is really a deep theory. Even just saying “commutative ring” might sound like saying “Galois cohomology.” What compounds the trouble, we believe, is that while the basic results are not terribly difficult, the motivation is unclear. Why are these objects so important? Today the best answer is these objects play a critical role in cryptography and coding theory and other aspects of theory.

If we stopped there, perhaps that would be enough. Perhaps it would be less confusing to our students. Maybe I will try that next time I have to teach this material. I could say:

Pick a natural number . Then define addition and multiplication on

as follows…

We would then prove that it had lots of nice properties and we could give some nice applications. RSA anyone?

## The Second Difficulty

The key to the power of congruence is: **it throws information away**. If two integers *x* and *y* are equal, then for any *m*,

This means that congruence is in some sense a “forgetful functor.” This aspect underlies applications such as noted here:

Many problems in number theory reduce to the question of the solvability or unsolvability of some type of congruence. … In this connection, research into the question of the number of solutions of a congruence equation is of fundamental importance to number theory.

The sources, however, seem not to explain *why* this is true. The idea is simple but critical:

for any *m* that you choose. This means that we can reduce showing to showing that

is false for **some** *m*. This is one of the big ideas in mathematics. Often it is hard to show that *x* and *y* are not equal, yet it is much easier to show they are not congruent. This is a brilliant insight, and a surprising one.

Note this is quite counter-intuitive. You take *x* and *y* and throw away information about them, and then it becomes easier to handle them. It becomes easier to prove that they are different. Amazing. When you have some freedom to choose *m* or know a bound on *x* and *y* you can use the same idea to prove them equal. Or at least you can gain confidence in equality by sampling *m* randomly.

Here is a simple example. How can we show that can never equal (unless both *x* and *y* are zero)? First if they have a common factor we could divide it out without affecting the equation. That needs a pause for thought, but in fact we only need to visualize it when the factor is 3. Now we use the ‘forgetful’ trick and observe that if then

We’ve agreed that 3 cannot divide both *x* and *y*. If it divides neither we get which is impossible because . If 3 divides only *x*, then we get . But this implies that 3 divides *y*, which is again a contradiction. The same works if we start with 3 dividing only *y*. There is no way out. So .

## How to Motivate?

Gauss seems to have thought the “forgetting” idea to be obvious beyond need of mention. At least I do not find any mention of it in my English edition of his book. Yet it strikes me both as a vital point and a connection that needs to be reinforced when teaching. It would be an interesting test of our hypothesis if emphasizing this simple “transfer principle” were shown to improve students’ results—or not.

The other difficulty remains problematic. David Mumford, who won a Fields Medal in 1974 for work in algebraic geometry and has recently started a blog on his website, co-wrote an obituary for Alexander Grothendieck with John Tate in the journal *Nature*. In his post on the editorial difficulties of explaining Grothendieck’s central notion of *scheme* to scientists in other fields, he wrote:

“To be honest, the central stumbling block for explaining schemes was the word ‘ring.’ … As for finite fields, in spite of John’s discomfort, I thought the numbers on a clock made a decent first exposure. OK, is not a field but what faster way to introduce finite rings than saying ‘a type of number that are added like the hours on a clock — 7 hours after 9 o’clock is not 16 o’clock, but 4 o’clock’?”

If the term ‘ring’ is a block for scientists then it needs special effort to convey the concept to students however it arises. Mumford says in his next paragraph:

“If math was introduced as connected to the rest of the world instead of being an isolated exercise, if it was shown to connect to money, to measuring the real world, to physics, chemistry and biology, to optimizing decisions and to writing computer code, fewer students would be turned off.”

Ken wonders whether reference to Spirograph and other cases of “gears within gears” can help here. In his novel *Cryptonomicon*, Neal Stephenson explained the role of congruence in breaking the Enigma code via an analogy to Alan Turing’s bicycle. Its rear wheel has one bent spoke and its chain has one vulnerable link. If they ever come together the bike will jam. The wheel’s sprocket has *m* teeth and the chain has *x* links. How frequently will this happen? Maybe never? This post emphasizes the idea of forgetting the total distance *C* that the chain has traveled and keeping only the congruence.

## Open Problems

We use congruences all the time in computer theory. Are there other types of “throw information away” methods that we can also use?

Gears, Differentials, Universal Joints …

AI has its “frame problem”, which arises from having to update the great mass of variables that do not change from moment to moment. What few people realize is that physics once had its own frame problem, which it solved almost in its sleep. How did physics do this? And how can the solution be extended to AI?

“Are there other types of ‘throw information away’ methods that we can also use?”

A useful analogy for CS students might be to case-insensitive string comparison. That’s a notion that will be very familiar from programming.

In both cases, one can initially understand the concept by reducing values to a “normal form”: 7 mod 4 becomes 3, ‘Hello’ becomes ‘hello’. But eventually we stick “mod 4” after our equations, or add the “i” flag to our regexps (for example), and feel comfortable working directly in that mode.

This Galois cohomology trick was brilliant, indeed I stopped paying attention while reading the list! A typo: “It would an interesting”

You wrote y^2 instead of 2y^2 at the end of the proof.

While “throwing information away” is accurate, there is also the notion of “reducing the problem”. In a given context we may only need certain information — anything else is confusing and not relevant.

Thanks for the typo fixes. Indeed, we feel in general that it also helps to draw explicit attention to “reducing the problem” when it is used.

By the way, see a new theoretical question (The Dead Cryptographers Society Problem) at: http://arxiv.org/ftp/arxiv/papers/1501/1501.03872.pdf

I think the reason it is difficult to learn is because it is not emphasized in education before the collegiate level. When I was in high school we would play a game called Equations competitively against other schools. As one of the rules variations we would learn congruence. There are a lot of properties of congruence that take a little learning to grasp, so the games really came down to who knew the most about congruence, and the desire to compete gave us the motivation to learn.

Congruences play some role in counting complexity, too. There are #P-complete problems that can be very well approximated, they admit a so-called Fully Polynomial Randomized Approximation Scheme (FPRAS). Still, not all the problems in #P have FPRAS approximations. Why? Because for these #P-complete problems, there is a polynomial reduction to #SAT using congruences (See Valiant 1979 on calculating the permanent and Brightwell and Winkler 1991 on counting the linear extensions of a poset, as examples). Congruences does not preserve relative error, and FPRAS means something like that an approximation is possible in polynomial time s.t. it will have a small relative error with very high probability.

If there was a #P-completeness proof for any counting problem that also has an FPRAS approximation s.t. the operations used in the reduction preserves the relative error then it would immediately follow that FPRAS would be available for all problem in #P, and thus RP would be NP. If RP not equals to NP then such a proof cannot exist. So if RP is not NP then congruences are powerful tools to prove #P-completeness for some counting problems. Of course there are other operations that does not preserve the relative error, for example subtraction, however, I am not aware any #P-completeness proof that uses subtraction and the counting problem is also in FPRAS.

Yesterday i asked the same question. Why after taking all these class i still hard time with congruence.

In teaching Discrete class I’ve found students take better to congruence if I say “leave the same remainder when divided by m” rather than “m divides x-y”.

As a student, I think this is spot on. For example, CS students with some programming background might be familiar with the mod or % operator (“if (x % 2 == 0) { // x is even }”). However, the fact that “leave the same remainder when divided by m” is the same thing as “m divides x-y” might be not be imminently clear to some students not familiar with the congruence. I guess they’d just enough practice to establish the intuition about this?

Also, I’ve never liked the usual notation for congruence that much, it doesn’t really feel very suggestive. While it’s better than the Legrande notation (I agree the $m$ should be explicitly written), writing the mod part in the parenthesis has always looked like an afterthought. I’ve seen other students not familiar with it (maybe not yet exactly sure what’s going on) treating it more like a ‘comment’ to what they apparently think is the main relation (between $a$ and $b$, well it *looks* like a binary relation, compare the familiar +, -, =, …) and thus dropping the mod part from their equations (and so it becomes a Lgandra-like notation). So maybe J Marcos’ notation isn’t a bad idea.

(sorry, meant of course *Legendre in the above comment)

I agree, when I first took abstract algebra the modulus operator seemed quite elementary because I’d been using the “mod” operator in Basic since the 6th grade (which to me mentally is the remainder operator). I’m embarrassed to say that I just now had to jot down a quick proof that it’s equivalent to “m divides x-y” because I’d never seen that before and initially thought this was some different more-exotic concept.

I always have pictures like this in my head.

I wonder how Gauss could have “forgotten” to mention the use of congruence modulo in disproving algebraic identities? This is indeed a brilliant (and surely useful) example of application for modular arithmetic!

Let me take the chance to go back to the question in the title of this post and venture a much more prosaic explanation of why some students find congruence modulo “hard to learn”. This could be precisely because $a\equiv b (\bmod m)$ is a concept involving THREE objects (the two numbers $a$ and $b$ being compared from the viewpoint of a third number $m$ that makes for the *modulus* of the comparison) — as mentioned at the beginning of the post. Freshmen in Computer Science often seem to feel at ease with [unary] predicates (showing up in traditional grammar in terms of the division of a sentence into subject and predicate, and representable by subsets of a given universe) and with binary relations (showing up in noun phrases in which a subject and an object flank the main verb, and representable by graphs). Ternary relations, on the other hand, would seem to be way less common (but see these natural examples, and recall how ternary relations may represent very naturally configurations that appear in tree-like structures). Once in a while I have noticed indeed that some Discrete Math students find the notion of congruence modulo easier to understand if the professor simply writes $a$\stackrel{m}{\equiv}b$ instead of $a\equiv b (\bmod m)$, and insists that $\stackrel{m}{\equiv}$ should be read as a single symbol (something closer to Legendre’s notation?), representing “some sort of identity” with a cyclic character.

As a matter of fact, the same difficulty with ternary relations arises in *Logic* when one intends to teach the concept of “a term $t$ being free for a variable $x$ in a formula $\varphi$”. Many students have a really hard time with this. It may be just a matter of habit, but it may also reflect some psychological difficulty with ternary notions, imposed by a Manichean culture. Any thoughts on this, anyone?

[I just repeat here the main part of the above post, with proper formatting.]

Let me take the chance to go back to the question in the title of this post and venture a much more prosaic explanation of why some students find congruence modulo “hard to learn”. This could be precisely because is a concept involving THREE objects (the two numbers and being compared from the viewpoint of a third number that makes for the

modulusof the comparison) — as mentioned at the beginning of the post. Freshmen in Computer Science often seem to feel at ease with [unary] predicates (showing up in traditional grammar in terms of the division of a sentence into subject and predicate, and representable by subsets of a given universe) and with binary relations (showing up in noun phrases in which a subject and an object flank the main verb, and representable by graphs). Ternary relations, on the other hand, would seem to be way less common (but see these natural examples, and recall how ternary relations may represent very naturally configurations that appear in tree-like structures). Once in a while I have noticed indeed that some Discrete Math students find the notion of congruence modulo easier to understand if the professor simply writes instead of , and insists that should be read as a single symbol (something closer to Legendre’s notation?), representing “some sort of identity” with a cyclic character.As a matter of fact, the same difficulty with ternary relations arises in

Logicwhen one intends to teach the concept of “a term being free for a variable in a formula ”. Many students have a really hard time with this. It may be just a matter of habit, but it may also reflect some psychological difficulty with ternary notions, imposed by a Manichean culture. Any thoughts on this, anyone?Another “throw information away method”: A vector is zero precisely when its inner product with any given vector is zero. Thus, to show that x differs from y, it suffices to take their difference and find a vector that isn’t orthogonal. A random choice will typically do the trick.

Making the students experiment with simple Number Theory done with Python might help

http://userpages.umbc.edu/~rcampbel/Computers/Python/numbthy.html

I personally prefer to think about congruences using complex numbers (a and b are congruent mod n iff )

This is literally genius.

I personally don’t believe this “congruence is hard” idea. I think it’s just part of the math/nonmath divide. That the people in your classes think addition, multiplication, ordinary equality, linear equation solving is easy is simply because they had years of practice, not because they all have a natural knack for understanding such things. Congruence is perhaps the first new mathematical idea they encounter as an “adult”, so to speak, and have to fend for themselves to understand.

As for an easy example of the power of forgetfulness, consider Gowers’ little puzzle from his “ICM2014 — Barak, Guralnick, Brown” posting. He pointed out that seven of the 14 21st century Fields medalists shared a rather unlikely commonality, and left it to his readers to identify. One way to make the identification trivial is to perform a blatant act of “modulo” reduction, then all you have is the commonality and the puzzle solves itself.

An important theory of forgetting, called abstract interpretation, was developed by Patrick Cousot and his late wife Radhia (http://www.pl-enthusiast.net/2014/06/09/radhia-cousot/#more-65) in the mid 1970s. (In abstract interpretation, the formal notion of “forgetting” is called “abstraction,” which is slightly at odds with other uses of the term in computer science.)

Although abstract interpretation took about ten years to catch on, it has led to an immense outpouring of work, and is now recognized as one of the fundamental concepts in the areas of programming languages, compilers, software engineering, hardware verification, and software verification. Abstract interpretation provides a foundation for performing program analysis, where the goal is to obtain information about the possible states that a program passes through during execution, but without actually running the program on specific inputs. Such information is crucial in many contexts:

o In compilers, it is used to gather information used by the optimizer to decide which optimizations to use, thereby allowing programs to run faster.

o In software-engineering tools, it is used to provide feedback to programmers about a program’s run-time properties, which helps them do a better job of developing, modifying, debugging, and testing programs.

o In verification tools, it is one of the key techniques used to show that a program never reaches a bad state, thereby establishing that the program is correct with respect to some property of interest.

Abstract interpretation is able to gather such information despite the fact that the problem that it addresses is formally unsolvable. The Cousots sidestepped undecidability via two concepts:

Abstraction: In the context of the Cousots’ work, “abstraction” means “representing an information space by a smaller space that captures its essential features.” (Such a space is called an abstract domain.)

One-sided analysis: Whenever the analyzer says “no” it means “no,” but whenever it says “yes” it means “maybe.”

For instance, in the program-optimization context, when an abstract-interpretation-based analyzer reports “no, the program never leaves the envelope of behaviors for which an optimizing transformation T applies” then it is safe to transform the program via T. If the analyzer reports “yes, the program might leave the envelope of desired behaviors” then the optimizer does not apply T. (An alternative is that the optimizer does apply T, but guards the T-optimized code by a runtime check C, and runs unoptimized code instead whenever the guard is reached in a state that does not pass C.)

Given the joke in the blog post about “we did not cover Galois anything,” it is amusing to note that in abstract interpretation, the key property that must hold for an abstract domain to properly “forget” information from a concrete domain is that the two domains are related by a Galois connection.

Different abstract domains are used to establish different properties of programs. Numeric properties can be captured by abstract domains based on intervals, polyhedra, octagons, octahedra, etc. For many years, such work assumed an idealized situation in which the numeric values were integers or rationals of unbounded magnitude. However, such an approach does not model correctly the arithmetic that computers actually use, namely, arithmetic mod 2^32 or mod 2^64. Consequently, analyses that use domains based on unbounded values can return unsound results — they could say that the program’s behavior is OK, when in fact it is not OK due to wrap-around effects.

To tie things back to the subject of the blog post, it turns out that when one switches from an abstract domain based on affine relations over integers to one based on affine relations over integers mod 2^32 (or mod 2^64), it becomes possible to express properties that were previously impossible to express. For instance, with affine relations over integers it is not possible to express the property ”x is odd.” However, with affine relations over integers mod 2^32, ”x is odd” is captured by the affine relation “2^32 * x = 2^32.”

It gets even better: in this case, there are actually three benefits from using an abstract domain based on congruences:

o They automatically model soundly the arithmetic that the machine actually uses

o They are able to express a richer set of properties

o Rather than having to use a package that supports arbitrary-precision arithmetic, one is able to use native arithmetic in the implementation of the abstract domain, which makes the analyzer considerably faster and more space-efficient.

In other words, we have a win-win-win situation — something that does not happen very often!

Thanks from both of us for this deep comment, including the Galois connection. It is great to see how useful algebra can be in program enhancement.

Two thoughts:

I wonder the extent to which a difficulty with modulo congruence is related to an understanding of bases. y mod 10 is always the last digit of y.

To take Babbage’s cue of one symbol for one thing, students have probably already seen the term “congruence” in geometry class, where it means something different than it does here.