The role of definitions in mathematics and theory

Alfonso Bedoya was not a theoretician, but an actor. He is perhaps most famous for uttering the lines:

He played Gold Hat, in the The Treasure of the Sierra Madre.

Today I want to talk about definitions.

We all think mostly about theorems and proofs. We try to understand proofs of others, we try to prove new theorems, we try to advance the field. Yet definitions are not the main stars. I cannot imagine a paper getting into the main conferences that had only a new definition, but perhaps that should happen. Our attitude is a little like this: To paraphrase Bedoya:

Definitions, definitions, we don’t need no definitions.

Yet definitions are critical, but strange. Unlike a proof, a definition cannot be “wrong.” Yet a definition can be “silly.” A great definition can be more important than a new proof, and that is why I want to talk about them.

Definitions

A major part of mathematics, of all kinds, is played by definitions. Every area of math has many definitions that must be learned if one is to be able to understand that area. Here are some key points, in my opinion, about definitions in general:

${\bullet }$Definitions can be eliminated. A definition can always be eliminated from a proof. Essentially, a definition is a shorthand; a definition allows us to make arguments clearer and shorter. For example, the statement that there are an infinite number of primes

$\displaystyle \forall x \exists p>x, \; p \text{ is a prime }$

could be written without using the definition “prime.” But, the statement would be longer and less readable:

$\displaystyle \forall x\exists p>x, \; p>1 \wedge (\forall y \forall z \ p=y \cdot z \rightarrow y=1 \vee z=1).$

This is a important role that definitions play: they help reduce the size of proofs. There is some nice work in proof theory on the role of definitions in reducing proof size. I will discuss this role in the future.

${\bullet }$Definitions can be “silly”. Any definition is allowed; however, some are much more useful and productive than others. Consider the definition: ${p}$ is a special prime if and only if ${p}$, ${p+2}$, and ${p+4}$ are all prime. Is this a good definition? No. Because ${p}$ is special if and only if ${p=3}$. We will see other, less trivial, examples of “silly” definitions in a moment.

${\bullet }$Definitions can be “critical”. Definitions shape a field; they affect the way we think, and what theorems we even try to prove. A good definition can suggest new research, can open up new directions, and can have a profound effect on progress.

${\bullet }$Definitions need justification. While any definition is “valid,” some are more useful than others. Often we read something into a definition that is not justified. So a good question to ask about any definition is does it actually capture the property that we are trying to “define?”

I will continue the discussion of the last point by giving a series of examples.

Mathematical Definitions

${\bullet }$ What is a function? This is one of the hardest definitions to get “right.” Early mathematicians argued over what was the right definition: some insisted that any “rule” was fine, others that a function had to have an analytic form, others that it had to be given by a series of some kind.

${\bullet }$ What is a set? Georg Cantor gave a beautiful definition:

A set is a Many that allows itself to be thought of as a One.

The trouble with his definition is that it is not precise enough. Unfortunately, there are many paradoxes possible with a naive definition of set. The most famous perhaps is due to Bertrand Russell: consider the set

$\displaystyle S = \{ x \mid x \not\in x \}.$

Is ${S}$ a member of ${S}$? Either answer yields a contradiction.

${\bullet }$ What is a “nice” function? Define a continuous function ${f}$ on the interval ${[0,1]}$ to be ${K}$-Lipschitz provided,

$\displaystyle |{ f(x)-f(y) }| \le K \cdot |{x-y}|.$

Similarly, continuous functions which satisfy the below property are said to be Lipschitz of order ${\alpha \le 1}$, or Hölder continuous functions,

$\displaystyle |{ f(x)-f(y) }| \le K \cdot |{x-y}|^{\alpha}$.

These conditions are named after Rudolf Lipschitz and Otto Hölder.

There is a story: a student is giving his thesis defense at Princeton University. He has studied the class of Lipschitz functions where ${\alpha > 1}$—why not study these too—he reasons. His thesis has many neat results, and he has proved that these functions are closed under many interesting operations. The story continues that a visiting professor raises his hand, during the talk, and asks:

Can you give me an example of any function in this class that is not constant?

The sad answer is no. End of talk, end of thesis. Hopefully, only an urban legend.

${\bullet }$ What is a curve? An obvious definition of a curve is a mapping from ${[0,1]}$ to ${[0,1]^{2}}$ define by,

$\displaystyle (x(t),y(t))$

where ${x(t)}$ and ${y(t)}$ are continuous functions. This seems to be well defined and reasonable. The problem is that such “curves” have a non-intuitive property: a curve can fill the entire square. Giuseppe Peano discovered that a space-filling curve was possible.

This discovery showed that the definition was not “correct.” In order to avoid space filling curves one needs to add more to the definition. For example, if the maps ${x(t)}$ and ${y(t)}$ are smooth, then space filling curves are impossible.

Computational Definitions

Let’s consider two definitions from computer science: sorting and security.

${\bullet }$ What is a sorting algorithm? Suppose that someone sells you a sorting algorithm: the algorithms takes groups of records and sorts them according to any key that you like. What does it mean to be a correct sorting algorithm? I once asked someone this question—their answer was:

1. The algorithm’s output must be a permutation of the input records. The algorithm cannot change the records in any manner at all.
2. The algorithm’s output must have the records in the correct order based on the given key. If the keys of the output records are

$\displaystyle k_{1}, k_{2}, \dots, k_{n}$

then,

$\displaystyle k_{1} \preceq k_{2} \preceq \dots \preceq k_{n}.$

Is this the right definition of a sorting algorithm? Would you buy this algorithm from me? Or is this definition “wrong”?

The definition is not precise enough; if you used this algorithm to do many tasks that arise, there would be a problem. The issue is that the definition does not insist that the sorting algorithm be stable. A sorting algorithm is stable if it retains the order of the records that have the same key.

For example, quicksort is not stable, but merge is stable.

Consider three records

$\displaystyle \begin{array}{rcl} <& Alice, Smith, 67890 &> \\ <& Bob, Jones, 12345 &> \\ <& Alice, Jones, 12900 &> \end{array}$

Suppose we sort the records on the second key: the last names.

$\displaystyle \begin{array}{rcl} <& Bob, Jones, 12345 &> \\ <& Alice, Jones, 12900 &> \\ <& Alice, Smith, 67890 &> \end{array}$

Then, we sort on first names. This could be the outcome:

$\displaystyle \begin{array}{rcl} <& Alice, Smith, 67890 &> \\ <& Alice, Jones, 12900 &> \\ <& Bob, Jones, 12345 &> \end{array}$

This is not what we wanted: the records are not sorted according to last names and then first names. A stable sort would have yielded the correct order:

$\displaystyle \begin{array}{rcl} <& Alice, Jones, 12900 &> \\ <& Alice, Smith, 67890 &> \\ <& Bob, Jones, 12345 &> \end{array}$

${\bullet }$ What is a secure protocol? Modern cryptography has stated many definitions that attempted to capture some notion of security. Then, later on the notion was discovered to be deficient: usually the definition sounded right, but lacked the exact property that was wanted. So a new definition was created and the process continues ${\dots}$

Justification of Definitions

If definitions can be “wrong,” how do we know that they are “right?” The best methods that I know are based on two kinds of evidence.

${\bullet }$ Equivalence Approach: One method of getting some confidence that a definition is right is to show that it has many different equivalent definitions. This is done throughout mathematics. For example, there are many very different—but equivalent—definitions of a continuous function. Some use ${\epsilon}$ and ${\delta}$‘s, others use topology, and others use ${\dots}$ In computing there are many equivalent definitions of the complexity class NP, for example.

The fact that very different definitions yield the same concept is good evidence that the definition is the right one.

${\bullet }$ Consequence Approach: Another method for increasing confidence that a definition is correct is to look “consequences.” For example, the fact that the original definition of a curve could fill a square was not expected. One could argue that the “right” definition of a curve would not have this property.

Another example is the definition of continuous functions. The fact that they are closed under certain basic operations gives one a good feeling that the definition is right. If our definition of some property is not closed under reasonable operations—operations we feel it should be closed under—then the definition probably is not capturing our intuition.

Open Problems

Any definition is valid. Some definitions are interesting, some are silly, some definitions are useful. But, getting the right definition is not easy. Often it is quite hard to get the definition that captures the property that you are after.

The open question you might think about is: are the definitions you work with the right ones? How do you know that they make sense? I would argue that more thought should go into definition justification. Often—especially in computing—we are given a definition without any supporting arguments.

1. January 23, 2010 12:45 pm

I think about this question a lot. Certain foundational definitions – e.g. the definition of a group or vector space – have proven themselves repeatedly in many fields of mathematics over time, but I am very skeptical of less foundational definitions until I have thoroughly convinced myself that they are natural and/or important steps to take, e.g. the definition of the Riemann zeta function.

That Princeton story is awful. I’m pretty sure that fact is an exercise in Rudin; the student should’ve paid more attention to his foundations!

2. January 23, 2010 12:58 pm

Consider Presburger Arithmetic. Unless you have a language rich enough in predicate terms to define “divisible by n” for all n you can’t prove quantifier elimination and subsequently can’t prove decidability. See Marker’s Model Theory, ch 3. But wouldn’t this be a counterexample to your claim that definitions are eliminable? In general, there seems to be a non-trivial connection between defining predicates using open formulae and proving quantifier elimination for algebraic theories.

3. January 23, 2010 2:23 pm

To us engineers, the set {theorems, definitions, names} forms an ecosystem. Powerful theorems are apex predators (kinda like orcas) there aren’t many of them, and they rule the ecosystem. Definitions serve the role of mid-level species (like seals and salmon). Theorems depend on definitions—exactly as Dick’s post says—and thus strong theorems serve to limit (*not* increase!) the number of definitions. Moreover, strong theorems improve the vigor of surviving definitions … kinda like healthy orcas ensure that surviving seals and salmon are fast, large, and healthy! As for names, they are like plankton; they can be created in effectively unlimited numbers with trivial effort, but who cares?

The point of this ecological metaphor (which admittedly is somewhat strained) is this: when there is no apex predator, every species in a food chain suffers. For example, without a proof of (say) P ne NP to impose mathematical discipline, there is little to prevent complexity classes (and names of complexity classes) from proliferating in a “red tide” that fills the mathematical ocean. And perhaps this is happening?

In simulationist engineering we are confronted with an even *worse* situation: no one understands *at all* why the efficient simulation of physical systems is generically feasible. Absent strong theorems, definitions and names of simulation state-spaces are nowadays proliferating almost without limit.

There is a nice article by Charles Van Loan on this subject, titled The ubiquitous Kronecker product. But even Van Loan’s article does not mention more than a tiny fraction of the names and definitions by which algebraic state-spaces—which are ubiquitous in science and engineering for no known fundamental mathematical reason—are called. This proliferation makes it mighty tough to read the literature.

Thus the strong theorems that mathematicians create are of benefit to engineers and scientists in a non-obvious way: they improve the vigor and reduce the number of relevant mathematical definitions and names.

That is why what we engineering fisher-folk need and want ASAP (definitionally speaking) is not millions of mathematical sardines, but thousands of big healthy mathematical salmon!

Moreover, surely it is *good* news that creating more strong theorems is the *sole* path leading to fewer-yet-stronger definitions (and vice versa). That is why everyone needs to keep busy!🙂

4. January 23, 2010 2:30 pm

Thank you for a post on an essential topic that receives almost no attention. Perhaps in a future post you can address use of axioms vs. use of defnitions.

This is something Gödel gave thought to. In Gödel’s Collected Works (III, pp. 432-433), this mysterious sentence occurs: “It [Gödel’s ontological proof] can be grounded only on *axioms* and *not* on a definition ( = construction ) of ‘positive’, for a construction is compatible with an arbitrary relationship.”

I’d love any insight you could give as to what Gödel meant here. My guess is that it might have something to do with your “Equivalence Approach” — that if you’re introducing something that you insist has a specific ontological significance (i.e., something that’s “out there”, and that is not “just” a definition) then a definition is not appropriate. But I’m not at all confident of this guess. Any thoughts?

January 23, 2010 11:51 pm

Cool. I hope I was okay teacher.

5. January 23, 2010 3:56 pm

A graph is called cordial if it is possible to label its vertices with 0s and 1s so that when the edges are labeled with the difference of the labels at their endpoints, the number of vertices (edges) labeled with ones and zeros differ at most by one.

I have given the above definition (of cordial graphs) in 1987. As today the number of papers on cordial graphs is over 100.

January 23, 2010 11:54 pm

Very neat example. Is it related to graceful labeling?

• January 24, 2010 7:03 am

Yes it is related both to graceful and harmonious labeling.

Cordial graphs: A weaker version of graceful and harmonious graphs, Ars Combinatoria, 23 (1987), 201-207.

January 23, 2010 8:38 pm

If you want a stable sorting algorithm, then give a definition of a stable sorting algorithm, not a definition of a sorting algorithm.

January 23, 2010 11:52 pm

The point of the example was that if you had not thought carefully, you would not get the definition correct. If sorting is hard, then what about real software?

January 25, 2010 9:46 am

I think your parent poster’s point was that you don’t always need a stable sorting algorithm. Of course, having a stable sort instead of an arbitrary sort can’t hurt, but if you don’t actually need it, why specify it?

7. January 23, 2010 8:59 pm

I couldn’t help but think of a friend of mine who has a very interesting (if not unique) perspective on algebra, strongly influenced by his reading Maclane and Birkhoff’s text. The gist of the attitude is that the definitions in algebra have been carefully chosen, polished and refined, to the point where they implicitly contain the ensuing theory.

This is historically true for basic structures like groups, whose modern definition followed after much work on symmetry groups in geometry and polynomials (Galois) and vector spaces, whose modern definition was driven not only by Gibbs and Heaviside, but also Hilbert’s work on function spaces. It’s probably a bad idea to put much stock in a definition that isn’t trying to consolidate existing knowledge, since it’s almost certainly the wrong definition.

One approach I’ve heard for checking whether one has the right definition is categorical. The rule of thumb might be stated, “once you’ve made your definition, does it allow you to readily plug into and take advantage of general machinery such as category theory?” My favorite new definition in this category (ba dum chh) is the max-plus algebra (or max-plus semiring) which seems poised to subsume a large chunk of optimization theory through analogy to linear algebra.

Finally, the friend I mentioned works on programming languages, which is certainly an area that seems more focused on developing “the right” or “good” definitions, to the point where theorems are largely just litmus tests of the definitions. Does your type system guarantee progress? Check. Preservation? Check. etc. Mathematical areas that given preference to solving problems over building theories seem to value definitions less, which may be the case for complexity/algorithms.

January 23, 2010 11:50 pm

I am not an expert at category theory, by any measure. But, you are right that category can help with definitions via the notion of functors. This is an interesting class of ways to tell if a definition is right. One that I missed.

8. January 24, 2010 12:48 am

Hé, hé, hé!
Yes, be wary, be very wary, definitions are the Sapir-Whorf scourge of mathematics…

January 24, 2010 4:12 am

Very thought-provoking. Here is a problem often encountered when playing with a definition that is meant to capture something one already has an intuition about. It is hinted at in your post. Suppose I take the definition of curve as provided, and I find the unexpected consequence that there is a space-filling curve. There are (at least) two possible interpretations:

(1) The definition is “wrong” because it did not capture my intuitive idea of a curve.

(2) My natural definition of curve led to an interesting and unexpected consequence. Perhaps curves are deeper than I thought; perhaps I should take the definition seriously and revise my intuitions.

Unless the “unexpected consequence” makes the definition trivial or is an outright contradiction, it can be hard to tell which of (1) or (2) is the right path. Sometimes they look equally logical, in which case I think we tend to choose (2). It would be nice to have some solid heuristics to better decide between (1) and (2).

Imre Lakatos’ “Proofs and Refutations” is a great philosophical/mathematical book that consists of a long discussion between a teacher and several pupils about definitions and how they relate to the theorems being proved with them. Along the way he names various tactics used by definition-makers, such as “monster barring”. A lot of fun.

10. January 24, 2010 2:01 pm

Thanks RJL. My next paper ‘A new definition of…(badges, maybe!)’ with thanks to you and Alfonso Bedoya.

11. January 24, 2010 3:25 pm

“Are the definitions you work with the right ones? How do you know that they make sense?”

Great question!

January 24, 2010 5:49 pm

Thanks, Gil. We do not seem to spent much, any?, time on this key issue.

12. January 24, 2010 11:28 pm

The Princeton story is at least 33 years old, insofar as the above telling matches what I recall hearing an an undergraduate there in 1977. First, it is important to say this was an undergraduate thesis. Although the Princeton senior thesis was-and-is more substantial than at other Ivies (this is one reason I chose Princeton), it is not absolutely required to be original research extensionally, only intensionally. The most particular detail I recall from the story I heard is that the committee felt that the student’s technical development and theorems were interesting and precise enough to pass the thesis with a good grade, despite the lack of extension. This may have been a seminar presentation before the formal thesis submission, too.

To amplify what the first two commenters say, in formal systems of logic one can get “forbidden fruits” by adjoining definitions that the system alone is not able to represent. Moreover, in terms of complexity it often matters whether a function f is defined by a symbol that can be used in building up (numerical) terms, or only via a relation R(x,y) holding iff f(x) = y. There is a famous problem in formal arithmetic called Kreisel’s Conjecture. Specifically for Peano’s Arithmetic (PA), it says that if there is a k such that each instance A(n) of a unary predicate A(.) has a proof of k steps, then PA proves (forall x)A(x). In 2007, Pavel Hrubes^ proved that in various systems extending PA by just a single definition, the analogous conjecture becomes false. The first of these merely adds a function symbol for proper subtraction, plus an axiom for its correctness. The last adds instead a predicate N(x) for “x is a natural number”, which we know from nonstandard models of arithmetic cannot be represented in PA (you’re not allowed to write “(\exists i) x = S^i(0)”).

My own opinion is that crafting “the right” definitions is an important research skill, to be specifically nurtured in seminars or advanced classes. Another is be(com)ing aware of relaxations of definitions you’ve used, in a direction you may not have suspected…

January 25, 2010 9:56 am

My favorite is the definition of Lebesgue integral. The impact of having this “right” integral definition is enormous.

14. January 25, 2010 11:09 am

If the underlying axioms always allow for the definitions to be optional, then in a way the definitions are analogous to using a ‘pre-processor’ on top of a programming language, such as cpp is used with the C language.

Both provide a powerful way to extend the underlying syntax, to encapsulate specific areas of complexity, and to try to bring out the higher meaning (to make it clearer and more obvious). Good, elegant programming language code is such because human beings can read it easily and thus determine whether or not the semantics are correct. Good mathematical proofs have the same qualities. That is, they are simple, readable, easy to understand (except for their own intrinsic complexities, like Cantor’s diagonalization), and can be placed in a textbook somewhere as both a proof, and an example of how to prove things correctly.

In that sense, the right definitions are the ones that simply or clarify the underlying mathematical ideas, while the wrong ones just insert an arbitrary level of complexity over the whole, making it far more difficult to understand. It is easy to over-complicate matters, and extremely difficult to simplify them.

Paul.

January 26, 2010 4:25 am

Great post, and a great issue that should garner at least a few more cycles from the already-overtaxed brains of many mathematicians and computer scientists.

One point I’d like to raise: there may be more than one “right” definition for a concept. I have in mind here the example of the notion of a set being “small.” Both Baire category (meagerness) and measure zero provide good definitions of smallness, both of which yield interesting fruit, and both of which are quite distinct!

16. January 26, 2010 6:07 am

What is a sorting algorithm? It seems this question is about the extensional behavior of sorting algorithms, that is about the function or relation implemented by a sorting algorithm. This is the question I addressed in my paper “What is a Sorting Function?”, J. Logic and Algebraic Programming 5:78, pp. 381-401, May-June 2009 (also available from my home page), with the twist that no sorting criterion (total preorder) is a priori given. In other words, somebody holds up a function (black box!) and asks “Does this thing sort or doesn’t it? And if so, what is the sorting criterion (total preorder) it sorts for?” The motivation for this is that for generic linear-time distributive sorting (“Generic discrimination: Sorting and partitioning unshared data in linear time”, Proc. ICFP 2008) one needs a sorting function as the primitive operation on ordered primitive and abstract types, not binary comparison functions. But how does one know that such a function really “sorts” and doesn’t just permute its input? What turned out to be a key property (discovered in the final stages of this fun expedition) is obliviousness of sorting algorithms: satellite data are carried along (referenced, copied, moved), but not inspected (no control dependencies on their actual values). This can be modeled extensionally using Reynolds’ Parametricity Theory. The answers in the paper then become: A “sorting function” is any intrinsically parametric permutation function; this works for both stable and unstable sorting algorithms, but it captures that even unstable sorting algorithms do not foray into satellite data to mix up the order or equivalent input records. A “stable sorting function” is any intrinsically stable permutation function. “Intrinsic” refers to the total preorder each permutation function f canonically induces: a <= b iff a occurs before b in some output of f. Interestingly, even though f is computable, <= may be undecidable. Insisting on parametricity makes it decidable: indeed a <= b if and only if f [a, b] = [a, b] or f [b, a] = [a, b]. As it turns out, both (deterministic) comparison-based and key-based (distributive) sorting algorithms define not just permutation functions, but permutation functions that are intrinsically parametric, and their canonically induced total preorder is exactly the one they are designed to sort for.

Fritz

17. January 26, 2010 10:48 am

It is interesting that most of the discussion here has focused on the role of definitions in pure math where, indeed, they serve largely to shorten proofs and provide general useful abstractions for proving quickly new theorems. At some point, the abstractions take on a life of their own and new fields are born, such as category theory.

However, as the comments on engineering ecosystems (Sidles) and programming languages (Bernstein) hinted at, definitions also play another important role, as models of some real phenomenon. Arguably, as computer scientists, this “modeling” aspect of definitions is even more important to our work than the “useful abstraction” aspect (of course, these aspects overlap somewhat).

– Turing’s work would have been impossible without the notion we now call a Turing machine and the associated proofs it comes with.

– The notion of “polynomial time” has played a central role in organizing modern complexity theory, even though it does not exactly capture the everyday notion of “efficient”.

– Definitional questions are central to cryptography because quantifying over an unknoown, malicious adversary is so difficult. Many of the best papers in cryptography essentially do survive more because of the definitions they introduced than the specific constructions they proposed, e.g. Dolev, Dwork, Naor (“Nonmalleable cryptography”, STOC 91). Some very influential papers contain *only* definitions and some basic facts about them, such as Canetti’s Universal Composability paper.

Researchers in certain fields (notably cryptography) are very comfortable spending hours and hours, papers and papers, discussing the justification behind various definitions. In more problem-solving oriented fields, this is not always appreciated — “state the open problem, already!”. [To be fair, not all cryptographers enjoy long definitional deliberations (Yuval Ishai: “I dislike problems where the problem statement is longer than the solution.”) ]

Having moved around between a few different fields, I feel that many people in algorithms/complexity don’t sufficiently appreciate the importance of new definitions, and don’t spend enough time thinking carefully through the justification behind a new problem statement. Conversely, I am often frustrated with definitional papers that don’t distill from their discussion a concise question or implication that can guide future work.

I would argue that more thought should go into definition justification. Often—especially in computing—we are given a definition without any supporting arguments.

I would add only, that the “exploitation/exploration” balance is important: we need both basic definitional work and the more problem-solving oriented kind of work that deepens our understanding of established topics. As Luca might say, we need first and second and follow-up papers…

January 27, 2010 8:37 am

great post. reminds me of this paper
http://www.cs.ucdavis.edu/~rogaway/papers/def.pdf

January 27, 2010 9:07 am

great post. reminds me of this paper:
http://www.cs.ucdavis.edu/~rogaway/papers/def.pdf

January 30, 2010 6:24 pm

Thanks a lot for this post, very thought provoking. It seems to go deep into matters of human cognition, though strangely I have not encountered this topic yet. Hope to read more on this issue

January 31, 2010 4:49 pm

I am reading right now Gian-Carlo Rota’s `Indescrete Thought’: in a couple of chapters he explains that in a certain sense theorems and their proofs could be exchanged, that is _the meaning_ of a theorem rests in the usefulness of its proof where the _value_ of a proof rests in which (other, perhaps) theorems it allows us to prove.

The argument is clearly circular, but it makes sense, at least to me: a similar reasoning can be carried on the duality among definitions and theorems: I like to picture the arrow that goes from the former to the latter that gets reversed.

22. July 13, 2011 10:59 am

Just saw this. The Princeton story is the stuff of which Lovecraftian nightmares are made.