Skip to content

Is rote learning useful?

Garry Kasparov retired from top-flight chess six-and-a-half years ago, founding a Russian political party called the United Civil Front. This is part of a larger coalition opposed to Prime Minister Vladimir Putin called The Other Russia, whose English-language website is managed by chess commentator Michael “Mig” Greengard. We often see chess used as a metaphor by politicians and cartoonists, but this is the opposite. Kasparov wrote with Greengard a book explaining parallels from chess to life.

Today I want to ask a question posed by Dick Karp. It came about after Michael Rabin told a Kasparov story over lunch, on the last day of the Rabin Celebration a month ago.

Despite his retirement Kasparov still plays exhibition matches against both former and upcoming stars, and “simultaneous exhibitions” against 30-40-or so lesser players. He is set to play an 8-game match of “blitz chess” against his former world championship challenger Nigel Short this coming Sunday. Kasparov gave a 30-board “simul” during festivities at Tel Aviv University last year where Rabin received a Dan David Prize shared with Leonard Kleinrock and Gordon Moore. Later that evening Michael asked Kasparov how he evaluated each board. Michael wanted to know:

Did he re-evaluate each board from scratch each time he returned or did he recall the board’s position and continue his previous thread?

In computer talk: did he have state?

Kasparov looked a bit surprised, Michael recalls, and said of course he remembered the board position, and of course he did not have to recalculate from scratch. He said further that when he was young he could recall all the moves of all the games he had played for the last six months. At the the time of his visit to Michael he was “older” and he could only recall the games he played that week. But recalling the boards during the simultaneous match was still quite easy for him.

As a “wood pusher” unlike my partner Ken, I am beyond impressed. Clearly memory plays a major role in chess—top players usually know many openings, and many moves into each opening. But to recall that many games, for that length of time, seems quite impressive to me. Well Ken says at age 12 he was able to recall all moves of all tournament games he had played since age 10, but he lost that ability by age 14.

The Question

The question that Karp asked Michael and the rest of us at the table was: did memory, even rote memory, play a major role in success at mathematics and theory? He answered his own question quickly—I am giving the gist in third-person: He had thought for a long time that “rote” memory was not very important, but he had changed his mind and started to believe that it played a bigger role in problem solving than he had thought. This lead to a lively discussion at the table, as you can imagine. Most took the position that rote memory especially was not useful, that one needed to understand the concepts. But others besides Karp—myself, for example—felt that perhaps while rote memory usually has a negative connotation, perhaps it is underrated.

A Old Example

All this made me recall a theorem that Dick and I proved years earlier about the cover time of graphs. This is the result of Romas Aleliunas, Richard Karp, Richard Lipton, Laszlo Lovasz, and Charles Rackoff—often refered to as Aleliunas et al. The short version is that Dick came back to Berekely from Toronto and told me the question over lunch—lunch seems to be a good time for Karp to raise questions. There he stated the conjecture that the others had made: a random walk on an undirected graph would quickly cover all the vertices in expected polynomial time.

I have to say that I liked the question immediately. Dick and I continued talking about it in his office right after lunch, where we reduced a proof to a simple fact about Markov chains. We needed to know the expected return time of a certain random walk. Unfortunately Dick had to go teach class, so I told him I would search the library, which was downstairs from his office, while he was in class. I hoped that I could find a formula for the expected return time.

Recall this was in 1979 and there was no Google, no search engines, no Internet, so one actually looked things up by hand. I went into the math library and began a quick linear search of all journals that seemed like they had anything to do with random walks. After about an hour I found the formula we were looking for: the expected return time was just the reciprocal of the probability of being in that state.

As soon as Dick returned from his class I showed him the formula. We both were a bit embarrassed, since the formula was easy to prove. But worse, it was something that we had no doubt seen before, but forgotten. If we’d had better memory we would have proved the theorem instantly. In any event the conjecture became a theorem, and we all wrote up a paper together.

I have no idea whether this story was part of the reason Dick raised the question of rote memory. But I always wondered what would have happened if we had not been able to find the simple formula? It was the type that once one has it, it is easy to prove. But without the formula it seems hard to figure out what is happening with the random walk.

This was long ago. Maybe today young researchers are much better than we were in the understanding of Markov processes, but back then it was hard to prove this theorem. Oh well.

Rote Learning

Richard Feynman sharply criticized this type of learning. A quote from our friends at Wikipedia is relevant:

Rote learning is sometimes disparaged with the derogative terms parrot fashion, regurgitation, cramming, or mugging because one who engages in rote learning may give the wrong impression of having understood what they have written or said. It is strongly discouraged by many new curriculum standards. For example, science and mathematics standards in the United States specifically emphasize the importance of deep understanding over the mere recall of facts, which is seen to be less important, although advocates of traditional education have criticized the new American standards as slighting learning basic facts and elementary arithmetic, and replacing content with process-based skills.

Some have a different take. See this, for example. Larry Sanger says:

Whenever I encounter yet another instance of educationists’ [I’m guessing that is not a term of endearment] arguments against “memorizing,” the following rather abstract yet simple thought springs into my philosopher’s mind: Surely the only way to know something is to have memorized it. How can I be said to know something that I do not remember? So being opposed to memorizing has always sounded to me like being opposed to knowledge. I realize this argument likely seems glib. The thing educationists object to, of course, is not the remembering or even the memorizing but rather the memorizing by rote—that is, by dull repetition and often without experience or understanding.

In complexity theory one must learn many facts that I cannot see how to know except by memorization. Just think of the the definitions of the almost countless different complexity classes: there is no way to know what ${\mathsf{AWP}}$ is unless you have memorized it. That does not even cover the also almost countless theorems and lemmas that relate these classes.

Chess, Again

Ken notes that Garry Kasparov’s female counterpart of the 1980’s into the 1990’s, former women’s world champion Susan Polgar, may have helped find the particular kind of rote memory used. At least for chess, if not for theory, it is the same area of the brain used for face recognition. This is the argument of Part 1 of the National Geographic documentary My Brilliant Brain, which centers on Susan. That is, to a chess player:

Perhaps we each have our own “Facebook” of familiar theorems, techniques, and complexity classes? This may mean that the act of solving already-solved problems as training may be more important for theorists than we have expected.

Susan Polgar, incidentally, is also retired from active play, but runs an extremely popular chess blog, to which Ken has sometimes contributed. She routinely posts 10–15 items per day, which is like playing 40 games at once over the same 3-4 day period that elapses between our items. She is highly active in chess education from young school levels up through university, and her Susan Polgar Institute for Chess Excellence is hosted by Texas Tech University.

Open Problems

Should we stress more memorization in teaching and learning math and in particular complexity theory?

[removed documentary credit to BBC]

18 Comments leave one →
1. October 4, 2011 7:32 am

Just a little point, but Garry Kasparov’s female counterpart of the 1980′s into the 1990′s, would rather be Judit Polgar, the younger, stronger sister of Susan.

http://en.wikipedia.org/wiki/Judit_Polg%C3%A1r

Susan is very impressive as well nonetheless!

• October 4, 2011 8:11 am

I wrote “of the 1980’s into the 1990’s” carefully to indicate the time before Judit came to the fore—after all, Judit was born in 1976. Susan (Zsuzsa) became the top-rated woman in 1984, and had such prominence that on the Jan 1987 FIDE rating list, the World Chess Federation added 100 rating points to every woman except her. Of course Susan did not become Women’s World Champion until 1996, but that was arguably because she didn’t play for it—the fact that she concentrated on the men’s circuit was the reason FIDE gave for feeling her rating did not need an adjustment.

• Troy Fabian permalink
October 8, 2011 5:51 pm

Alexia, really? I was unaware that Judit Polgar had ever won the women’s world championship. You learn something new every day!

2. October 4, 2011 8:28 am

I think it’s extremely important in this discussion to be clear what one is talking about. To memorize a proof is (as we all know) utterly different from memorizing something like digits of $\pi$. The difference is that when learning proofs we try to develop proof-discovery tips that generate most of a proof, leaving only one or two hints to be remembered in a learning-by-rote sense. And as one progresses, even some of those hints turn into standard techniques. Thus, most of the time it doesn’t make sense to talk about “mere” rote learning, and it’s quite obvious that the more active type of memorization is of huge value to mathematicians.

But as you say, there are some things that you simply have to learn by rote if you want to do some parts of mathematics. Your example of complexity classes is a good one: I am fascinated by complexity theory, but I find this particular aspect of it very off-putting. I just can’t face making the big investment of time it would take to follow easily what people are saying when they drop these acronyms around. And, again as you say yourself, it’s not just the acronyms but how the complexity classes are related, what’s important and why, and so on. I’d be interested to know whether you think this feature of TCS is an unfortunate necessity or a historical accident.

A very good example of this was Ryan Williams’s recent breakthrough. I had no idea beforehand that what he did was an open problem, and if you had told me the result without telling me it was a big breakthrough I wouldn’t have had the slightest idea how important it was. (That’s not to say that I have no understanding of that: I understand that he somehow went further in the direction of lower bounds than had seemed possible with current technology, and that is of course extremely interesting.) I also can’t give a precise statement of what he proved without looking it up, even though I’ve read about it several times. I think that partly says something about me: if you put mathematicians on a spectrum according to how much brute memorization they are prepared to inflict on their brains, I am at the low-space end of that spectrum, which has disadvantages (sometimes you just have to learn things) and advantages (it forces me to digest more what I do want to learn).

• Serge Ganachaud permalink
October 4, 2011 3:37 pm

It’s a known fact that “space is more powerful than time” because it can be reused. This makes me suppose that you must have a very good short-term memory. 🙂

I’m not a TCS specialist, but the role of space and time being morally the same for human thoughts as it is for any computer program, we should be allowed to make use of the appropriate theorems of complexity theory in this context.

3. October 4, 2011 11:18 am

To me it seems self-evident that having a large amount of raw knowledge at your fingertips is an enormous asset in any intellectual endeavor.

However, it is a separate question what the best way is to acquire such a vast amount of knowledge. The word “rote” suggests a particular method for acquisition, which may not be the most effective method.

4. October 4, 2011 11:22 am

Even within my own family, there are sharply varying mnemonic styles. My wife and one of my sons have nearly unlimited capacity to absorb detail: she has a degree in (ancient) Egyptology, they both absorb languages rapidly (grammatical irregularities being no problem) and playing cards against them is daunting … they effortlessly remember every card played.

And yet, this same ability handicaps them in learning mathematics, in that the attribute of naturality seems inessential; why worry about naturality when it’s easy to memorize every useful fact?

These issues are coming to the fore nowadays, as STEM professionals turn more-and-more attention to the systematic engineering of technological narratives, in which naturality is the “glue” that bonds the narrative elements. The point being, it is far easier for most people to remember a coherent narrative than it is to remember disconnected facts.

This is how I presently understand an aphorism that Domokos Szász attributes to Albert Rényi:

“Other mathematicians prove what they can, von Neumann what he wants.”

This can be understood as a bipartite tribute that simultaneously praises von Neumann’s mathematical creativity and his ability to direct that capability to create mathematically natural — and thus memorably unitary — enterprise narratives.

• Serge Ganachaud permalink
October 6, 2011 6:47 pm

I’m wondering whether the naturality of a concept can be related to this concept having low Kolmogoroff complexity, and whether a narrative has anything to do with the shortest theory that can describe it. Naturality seems difficult to define technically. Category theorists have their own definition – the natural transformations – but that one doesn’t seem to be very useful in computer science…

5. David permalink
October 4, 2011 3:02 pm

Hi Richard, just thought I’d mention that coincidentally I just this afternoon gave a talk about cover times in the graduate student colloquium of my university which included a discussion of the theorem you talk about in the post. Interesting to learn something about how the proof came about.

• rjlipton permalink*
October 4, 2011 3:50 pm

David,

Cool. Thanks for telling me.

6. kodlu permalink
October 6, 2011 4:41 am

Just a comment, though not directly related to TCS. In information theory, it is possible to analyze algorithms where an upper bound on memory is imposed. For example there is a paper “gambling for the mnemonically impaired” in the IEEE Info Theory Transactions, where a sequence is observed but the observations are compressed to fit in memory, and then the algorithm to bet on the next bit has to use that memory only.

7. October 6, 2011 5:50 pm

Kasparov might remember an awful lot of board positions but it is still not the same as rote learning. His memory most likely depends heavily upon context. So, for instance, if Kasparov is shown a chess board with randomly and illegally placed pieces his ability to remember how the board looks like would probably grind to a halt. For a skilled chess player, it is trivial to remember where all the pieces are just from a glance. It is similar to how it is easy to remember quite well how the portrait of a beautiful woman looks like but impossible to remember a similar sized canvas with random pixels on it.

8. October 6, 2011 5:53 pm

When I say “For a skilled chess player, it is trivial to remember where all the pieces are just from a glance.” I mean the case where the pieces are placed legally. If the pieces are placed randomly Kasparov or any other chess player would have no advantage over non chess players in remembering the board.

• Troy Fabian permalink
October 8, 2011 5:55 pm

“If the pieces are placed randomly Kasparov or any other chess player would have no advantage over non chess players in remembering the board.”

Hmm, that doesn’t sound right. A non-player might have NO a priori knowledge of what the pieces look like nor what they are named. That’s bound to impact memory, imo.
However, I have heard that top players and amateurs have the same recall of positions with randomly placed pieces.

• October 10, 2011 12:13 pm

“A non-player might have NO a priori knowledge of what the pieces look like nor what they are named. ”

I should have been clearer, I meant amateur when I said non-player. In any case the point was that Kasparov’s ability to recall board positions is not necessarily an advantage he can use in other situations where he has to memorize something by rote. In general I think that understanding and appreciating a particular subject deeply (be it math, chess or baseball) leads to a very good ability to recall obscure facts/statistics and so on about the subject but it is not a transferable skill mostly.

9. October 10, 2011 9:26 am

A couple of months ago I wrote a half-tongue-in-cheek post about the changing role of memorization, in the context of surviving without electricity after Irene. http://teachingintrotocs.blogspot.com/2011/08/power-and-education.html

Regarding memorizing a proof versus memorizing the digits of pi: one is basically a procedure, the other one is data, but, as all of us computer scientists know, procedures are just a form of data – there is no essential difference. Or is there?

10. koala permalink
October 21, 2011 3:58 am

I think education’s current stance, encouraging memorization as little as possible is the right one. If a student *can* memorize however, good for him / her, they can go ahead and do that. But curricula, especially math curricula needs to be about deriving things, showing how pieces connect, how to move from one concept to another, so students understand how things are *in relation* to eachother, and that kind of comparative ability is what makes science tick.

In your story , you and your friend could not remember Theorem X or Y, so what? You had considerable skill in guessing where the answer would be, and you researched it, connected the pieces, and reached the conclusion. How is this not a success story?

Directing curricula can only be done in blunt moves, supporting memorization or not is one of those blunt moves, but once started, the details that follow can lead you in some dark places. Feynman saw this up close when he taught for a while in Brasil, and after he came back to US, he found the same thing in US curricula.

But I agree that having more things in memory (especially if they were understood first) can make a difference in research.