# Do results have a “teach-by-date?”

* Teaching automata theory *

Noam Chomsky is famous for many many things. He has had a lot to say over his long career, and he wrote over 100 books on topics from linguistics to war and politics.

Today I focus on work that he pioneered sixty years ago.

Yes sixty years ago. The work is usually called the Chomsky hierarchy(CH) and is a hierarchy of classes of formal grammars. It was described by Noam Chomsky in 1956 driven by his interest in linguistics, not war and politics. Some add Marcel-Paul Schützenberger’s name to the hierachry. He played a crucial role in the early development of the theory of formal languages—see his joint

paper with Chomsky from 1962.

We probably all know about this hierarchy. Recall grammars define languages:

- Regular languages
- Context-free languages
- Context-sensistive languages
- And finally recursively enumerable languages

One neat thing about this hierarchy is that it has long been known to be strict: each class is more powerful than the previous one. Each proof that the next class is more powerful is really a beautiful result. Do you know, offhand, how to prove each one?

## My Question

I have a simple question:

Should we still teach the CH today?

Before discussing let me explain some about grammars.

## Programming Languages

In the 1950’s people started to define various programming languages. It quickly became clear that if they wanted to be precise they needed some formal method to define their languages. The formalism of context-free grammars of Noam Chomsky was well suited for at least defining the syntax of their languages—semantics were left to “English,” but at least the syntax would be well defined.

Another milestone in the late 1950s was the publication, by a committee of American and European computer scientists, of “a new language for algorithms”; the ALGOL 60 Report (the “ALGOrithmic Language”). This report consolidated many ideas circulating at the time and featured several key language innovations: Perhaps one of the most useful was a mathematically exact notation, Backus-Naur Form (BNF), that was used to describe their grammar. It is not more expressive than context-free grammars, but is more user friendly and variants of still it are still used today.

I must add a story about the power of defining the syntax of a language precisely. Jeff Ullman moved from Princeton to Stanford many years ago in 1979. I must thank him, since his senior position was the one that I received in 1980. Jeff was a prolific writer of textbooks already then and used an old method from Bell Labs, TROFF, to write his books. On arrival at Stanford he told me that he wanted to try out the then new system that Don Knuth had just created in 1978—of course that was the TeX system. Jeff tried the system out and liked it. But then he asked for the formal syntax description, since he wanted to be sure what the TeX language was. He asked and the answer from Knuth was:

There is no formal description. None.

Jeff was shocked. After all Knuth had done seminal work on context-free grammars and was well versed in formal grammars—for example Knuth invented the LR parser (Left to Right, Rightmost derivation). TeX was at the time only defined by what Knuth’s program accepted as legal.

## My Question Again

Let’s return to my question: *Should we still teach the CH today?*

It is beautiful work. I specially think the connection between context-free languages and pushdown automata is wonderful, non-obvious, and quite useful. Context free and pushdown automata led to Steve Cook’s beautiful work on two-way deterministic pushdown automata (2DPDA). He showed they could be simulated in linear time on a random-access machine.

This insight was utilized by Knuth to find a linear-time solution for the left-to-right pattern-matching problem, which can easily be expressed as a 2DPDA:

This was the first time in Knuth’s experience that automata theory had taught him how to solve a real programming problem better than he could solve it before.

The work was finally written up and published together with Vaughan Pratt and James Morris several years later.

And of course context sensitive languages led to the LBA problem. This really was the question whether nondeterministic space is closed under complement. See our discussion here.

## Open Problems

Should I teach the old CH material? Or leave it out and teach more modern results? What do you think? Do results have a “teach-by-date?”

[Fixed paper link]

You don’t really make any kind of case here at all, either pro or con. You might as well ask if we should continue to teach calculus, what with it being 350 years old and all.

Personally, I think the theory of finite automata is essential to understanding the theory of computation, which, after all, is ultimately about mathematical *models* of physical objects like physical computers. Since finite automata are, for all practical purposes, the very simplest models of computers, it’s worth understanding them in detail.

Then there are the applications, which I am not really the best person to address. But lots of programming problems can be modeled by finite automata, or deterministic pushdown automata, or DPDA’s, and recognizing this often leads to fast algorithms.

Without knowing what “more modern results” you’d teach in the place of finite automata, though, I can’t say much more.

My motivation would be something like this: with full-blown Turing machines we can define stuff like recursive and RE languages. By halting problem reasons (also Rice’s theorem), for these kind of languages various problems like intersection, inclusion et al. are undecidable. It seems natural to ask whether there are classes of languages for which some or all of these problems become tractable (so we can write parsers, compilers et al.).

It is simply too valuable a concept to not teach.

But the real question is what do you teach in its place?

I think it is still relevant, even if it isn’t a particullary active area of research. I was rereading Sipser’s Theory of Computation recently, and CH seemed to be rather core to how he lays out the knowledge. When I had originally taken courses decades ago, the machines and languages seemed disconnected to me, but seeing how they were interwinned both helped and made it more interesting. From a programming perspective understanding how ‘expressive’ different languages are really helps in areas like parsing, DSLs and optimizations. They are advanced techniques, but I think that they are still fundumental and strong software developers should at least have some exposure, so they can hopefully not attempt to reinvent them (which I have seen all too often).

Paul.

I teach a TOC course, at a small liberal arts college. It is a pretty standard course (I use Sipser). Because finite automata comes first they get more airtime than Turing machines, and complexity gets very little. Maybe at another type of school there would be TM’s and complexity in another course but where I am this is it.

To me it seems the time tradeoff should go the other way and over the years I have shaded more and more in that other direction. Nothing against Chomsky, but ..

I’m a developer, as in software developer. Context-free grammars are a “must-know” topic. Here are some real-world examples where I needed to write parsers for context-free grammars:

1. LaTex WYSIWIG editor

2. Spell checker plugin

And there are probably more. To @Jeffrey Shallit’s point, you might as well not teach Calculus if you’re not teaching this.

The link “paper” seems broken (brings back to this page).

Manu

Thanks. I fixed the link to paper.

I agree with Jim that languages and automata tend to take up too much time in the standard first-semester TOC course. When Stephan Mertens and I wrote our book on computation, we largely omitted it, precisely so that we could focus more on the big questions of P vs. NP and so on.

At the same time, I still have a soft spot for languages, automata, and grammars, and I usually spend two weeks on them when I teach. DFAs and CFLs are a good place for students to think about closure properties, the fact that we can run two DFAs in parallel (but not in series) as a combined DFA, that the resulting state space is the Cartesian product, etc. The Myhill-Nerode theorem is a great use of equivalence classes, and is much more interesting and fundamental than the pumping lemma.

I also think it’s cool for them to contemplate that languages that require an infinite description in one model (I like talking about finite-state machines with infinite numbers of states 🙂 can have finite descriptions according to another, more powerful model. And I like that we can think both in terms of recognizing languages (with automata) and generating them (with grammars). The generative point of view is nice when we think about how speech works (a la Chomsky) but also about growth and biology.

Of course, in an algorithms course, parsing CFLs is a nice example of dynamic programming. And understanding DFAs and NFAs is pretty crucial if you want to talk about L and NL, so I often review automata when I cover space-bounded computation.

The other justification people often use is that it’s nice for students to see one example where nondeterminism “makes no difference” (except for the exponential blowup in the state space) and another example where it provably makes a difference (DPDAs vs. NPDAs, although usually takes too long to prove). However, the notion of a nondeterministic machine conflicts with the witness-checking definition of NP, which I find much better for teaching than the nondeterministic Turing machine, so this sometimes confuses students a little.

Maybe the best solution is for some of this material to stay in the curriculum, but to try to move it into the pre-TOC discrete math course, where they can use it to think about closure properties, equivalence classes, the pigeonhole principle, etc.

The finite automata material is not only good theory in itself, it is integral for digital design and it gives all the right intuitions about nondeterminism. We used to have a curriculum that taught the design aspects and the language recognition aspects of finite automata separately in two courses. We decided to combine those, most of the constructions of Kleene’s Theorem and Myhill-Nerode style non-regularity proofs in our first foundations course (which begins with logic of all sorts, as well as basic number theory and induction) along with CFGs (done as examples of useful kinds of inductive definitions) and basic undecidability. This works very well.

We ended up removing the material on PDAs and parsing from the curriculum that all our students have to take. Parsing is mostly taught at the beginning of the compilers course. In our senior level theory course, I quickly discuss PDAs and their connections to CFGs but it is more as an incidental topic that is good to know about than as a focus of the material. (I don’t cover DCFLs or CSLs explicitly, though I do discuss LBAs, Chomsky normal forms and the CKY algorithm as in Sipser’s text.)

BTW: It is interesting to note that the first preliminary edition of Sipser’s text began with the current Chapter 3 and did not include any of the automata material… Only in the latest edition does it include DCFLs in any meaningful way.

I am not teaching this course, but if I were to teach it, I’d teach it from the perspective of computer-aided verification.

Context: we have two distinct courses for TOC: the first one covers computability and complexity in the first semester, while the second one covers formal languages in the second semester. The Chomsky hierarchy as a whole is only the topic of a couple of hours of lectures in that second course, dedicated to proving the equivalences with LBAs etc., but we spend quite some time on regular languages and context-free languages.

I would say that the algorithmic aspects of automata and grammars should be the main focus in a formal language course: in most applications (logic, parsing, verification, linguistics, bioinformatics and so on and so forth), the point is that we have finite representations with good algorithmic properties for infinite sets. Indeed, by contrast with Turing machines or even LBAs, many decision problems over finite automata / regular expressions and context-free grammars / PDAs are decidable with (somewhat) low complexity—and when they’re not, they provide good examples of e.g. NP-complete and PSPACE-complete problems. Furthermore, we can effectively convert between those representations, and several language closure operations have polynomial-time algorithms. In the end, they are part of the standard toolbox for designing efficient algorithms.

Teaching regular languages and context-free languages on par with recursive languages or other complexity classes like P or NP is just missing the point in my opinion.

There was a similar question on cstheory: http://cstheory.stackexchange.com/q/8539/236

I would say yes for a formal languages / parsing course, no for a computability and complexity course.

So, as something of an outsider:

The automata material itself seems perfectly fine. But I’m not convinced of the value of the Chomsky hierarchy

asa hierarchy, as opposed to as just a collection of a few particular useful classes of languages (which happens to be totally ordered under inclusion). Like, sure, these are useful classes of languages, but is there anything to particularly single out these four as naturally belonging together in a hierarchy? It’s not clear to me that there is. It might make more sense to teach theelementsof the Chomsky hierarchy, and the relations that do exist between them, without presenting them as “the Chomsky hierarchy”, prominent over other classes of languages. Just, here are some useful, commonly occurring classes, and here are the relations between them.The Chomsky hierarchy itself is outdated, especially as a hierarchy of languages. The formalism of context-free grammars on the other hand is here to stay. Whether you want to teach it in your “Introduction to the Theory of Computation” class is more a question of organization, and whether it is available in your favorite text book that you plan to use as “suggested reading” to accompany your class.

The suggestion that regular languages and finite automata are closely related to the Chomsky hierarchy only shows that the Chomsky hierarchy is not an appropriate hierarchy of grammars either. (Especially the type-0 – type-3 stuff is bogus.) The regular grammars are forced and not natural, the linear grammars would have been a more natural class of grammars below the context-free grammars. And then it would have become obvious that the deterministic context-free grammars don’t properly fit into that linear hierarchy, in addition to their other drawbacks. However, notions like unambiguous grammar or deterministic grammar can be useful for practical applications.

Moshe Vardi’s comment “I’d teach it from the perspective of computer-aided verification.” suggests that it would be a better idea to focus on

visibly pushdown grammars instead of deterministic grammars, because their close relationship to regular expressions allows concise specifications of important decidable properties, which can then be checked by static code analyzers. In addition, they also allow to address Cris Moore’s comment “However, the notion of a nondeterministic machine conflicts with the witness-checking definition of NP, which I find much better for teaching than the nondeterministic Turing machine, so this sometimes confuses students a little.”, because they have exactly the right relation to context-free grammars to be interpreted as providing the required witness.

A serious drawback is that they are not yet treated in my favorite text books, that their relations to synchronization, prefix codes, and recognizable variable length codes (recognized by unambiguous finite state transducers where the only accepting state is the initial state) is not yet conclusively described, and that they would take even more precious time away from the big questions of P vs. NP and so on.

Interesting to see someone agreeing with me!

I notice the other commenters here seem to have, as best I can tell, just addressed the classes

inthe hierarchy rather than the hierarchy as a whole.If you read between the lines, they all agree with you. They don’t mention the hierarchy, because they only want to save the automata material itself without the rest of the hierarchy. My goal was to argue for the (context-free) grammar material, hence I mentioned the hierarchy and tried to argue against its “pseudo” connection to regular languages and finite automata. And I mentioned a current research topic with connections to both context free grammars and regular languages, to defend them against the criticism of being old and disconnected from current research.

I’m a compiler/interpreter developer who learned these results as they were developed. I always found the hierarchy relevant but seldom applied the automata theory directly (just fought with parser-generators over it for many years). Now I’m working on hardware definition languages and find that almost all of the Register Transfer Level code that comes through describes networks of finite state automata. Optimizing the compilations of these definitions, and developing other derivative properties (e.g. device drivers given the FSA description of the the device) feels unsound unless it can be expressed (by someone, at least sketchily) in automata theory terms.

In my experience what’s discarded in one era for being too {uneconomic,obtuse,…} returns later after the economics shifts when its alternative is now too {simplistic,intractable at scale}. “Teach-by” date is a cute turn of phrase, but that’s not how the evolution of ideas functions. Age doesn’t matter, impact does.

Instead of thinking in terms of “Chomsky Hierarchy” vs “some other cool topic”, I think the question is how much “Theory” — some combination of Math and TCS is a *must* in an undergraduate CS program.

I think there are roughly 4 major areas one should cover:

1. Descriptive formalisms

2. Computability and Complexity

3. Algorithms

4. Statistics and Machine Learning

I am not claiming the areas to be distinct: for example basic probability is certainly a core prerequisite to 4. and 3., and, depending on personal taste, to 2. and 1. The question is further compounded by the fact that some of the topics may be treated in “applied” courses: for example parsing may be part of a PL course, regular expressions may be a feature of programming, etc.

This is a tremendous amount of material, and in many curricula it ends up compressed into too few courses.

It is very much the case that basic facts about finite automata and regular languages are part of basic CS literacy, as is undecidability. The case for context-free languages and pdas is weaker–certainly the idea that simple recursion can be implemented by a stack must be known by all CS folks, but it is unclear that the particular formalism of pdas is a must. As for context-sensitive languages, I am not sure that they are particularly relevant–classes like NC seem more natural. Finally, it is not clear that type 0 grammars are a natural way to denote c.e. languages. So the Chomsky Hierarchy deals with four language classes, of which only one is clearly a central topic for CS. On the other hand, formal descriptions other than formal languages have become quite important, and may well deserve a place in a Formal Methods course substituting the Formal Languages course.

Another natural change would be to concentrate on finite automata and extend the course to deal with finite Markov chains.

My main objection the the standard Formal Languages course, especially when it is the first Theory course, is that most of the math in it is very boring. Students leave the course thinking that Theory consists in long proofs by induction, where the thing to be proven is almost obvious, the hypothesis is often somewhat imprecisely stated, and the proof itself is just tedious. In contrast, results like Sperner’s Lemma, or the Birthday Paradox, or the Immerman-Szelepscény Theorem, or undecidabilty, show that mathematical tools can be powerful and worth learning.

While attending TOC in the university I was working in a company as a GIS Administrator. The theory of finite automata that I was reading about at that time, helped me design a spatial algorithm that turned out to be essential for the GIS project that we were working on. A procedure that required several hours to be completed only took 5 minutes with my algorithm.

Most, if not all, the concepts of TOC are really valuable in practice. The problem from my point of view is not whether to teach or not some theory to a class, but to teach it on a way that will create for the students of the class a link from theory to reality. This is not something easy and not something that all the tutors may achieve. I really believe that understanding automata and languages is very useful in practice not just in theory; it is not just some mathematics.

In the few computer science curriculums that I’ve been exposed to first-hand as a student, there is a total disconnect between teaching classical theorems about regular expressions and building regular expressions for information extraction. Also, I think there is a total disconnect between teaching classical theorems about context-free grammars and building a parser by specifying grammar rules using a somewhat modern parser generator such as ANTLR or JavaCC.

Having recently had experience applying these concepts to the latter practical situations, I feel that both the theory and practice fit together. My suggestion is that with theory alone, the CH hierarchy doesn’t seem all that relevant to CS students learning this for the first time.

Also, it might be good to ask some of the students what they think. I know that when I’ve asked students from various universities, several of them didn’t see much connection between context-free grammars from TOC class and parsers from programming languages class.

Further, there has been a lot of work on languages in the past few decades. In the literature, I’ve seen maybe five to ten variations on restricted single or multi-counter or pushdown automata that each characterize an interesting class of languages that in someway might expand the hierarchy between regular and context-free or context-free and context-sensitive.

From the researcher side of things, I suggest that maybe an official expanded hierarchy should be established.

Feel free to offer any corrections or suggestions in regards to an expanded hierarchy.