And whose theorem is it anyway?

Georg Cantor, Felix Bernstein, and Ernst Schröder are each famous for many things. But together they are famous for stating, trying to prove, and proving, a basic theorem about the cardinality of sets. Actually the first person to prove it was none of them. Cantor had stated it in 1887 in a sentence beginning, “One has the theorem that…,” without fanfare or proof. Richard Dedekind later that year wrote a proof—importantly one avoiding appeal to the axiom of choice (AC)—but neither published it nor told Cantor, and it wasn’t revealed until 1908. Then in 1895 Cantor deduced it from a statement he couldn’t prove that turned out to be equivalent to AC. The next year Schröder wrote a proof but it was wrong. Schröder found a correct proof in 1897, but Bernstein, then a 19 year old student in Cantor’s seminar, independently found a proof, and perhaps most important, Cantor himself communicated it to the International Congress of Mathematicians that year.

Today I want to go over proofs of this theorem that were written in the 1990’s not the 1890’s.

Often Cantor or Schröder gets left out when naming it, but never Bernstein, and Dedekind never gets included. Steve Homer and Alan Selman say “Cantor-Bernstein Theorem” in their textbook, so let’s abbreviate it CBT. The theorem states:

Theorem: If there is an injective map from a set ${A}$ to ${B}$ and also an injective map from ${B}$ to ${A}$, then the sets ${A}$ and ${B}$ have the same cardinality.

Recall that a map ${h: A \longrightarrow B}$ is injective provided ${f(x)=f(y)}$ implies that ${x=y}$, surjective if its range is all of ${B}$, and bijective if it is both injective and surjective. We can thank the great nonexistent French mathematician Nicolas Bourbaki for standardizing these terms, noting that sur is French for “on.” The definition of ${A}$ and ${B}$ having the same cardinality is that there exists a map ${h: A \longrightarrow B}$ that is bijective. All of this applies for sets of any cardinality, finite or infinite.

CBT and Graph Theory

The key insight for me is to think about CBT as a theorem about directed bipartite graphs. This insight is due to Gyula König. He was a set theorist, but his son became a famous graph theorist. So perhaps this explains the insight. The ideas which follow are related to proofs in a 1994 paper by John Conway and Peter Doyle, which is also summarized here and relates to this 2000 note by Berthold Schweizer. We mentioned Doyle recently here.

A directed bipartite graph has two sets of disjoint vertices the left side and the right side. All edges go only from a left vertex to a right or from a right to a left.

In a directed graph every vertex has an out-degree and an in-degree: the former is the number of edges leaving the vertex and the latter is number of edges that enter the vertex.

In order to study CBT via graph theory we need to restrict our attention to a special type of directed bipartite graph. Say ${G}$ is an injective bipartite graph provided it is a directed bipartite graph with the following restrictions:

1. The out-degree of all vertices is one;
2. The in-degree of all vertices is at most one.

The claim is:

Theorem 2 Any injective directed bipartite graph has the same number of left and right vertices.

This theorem proves CBT. Let ${f,g}$ be the injective maps from ${A}$ and ${B}$ as above, where we assume that ${A}$ and ${B}$ are disjoint. Then define ${G}$ as a directed bipartite graph with ${A}$ as the left vertices and ${B}$ as the right. There is an edge from ${x \in A}$ to ${f(x) \in B}$ and an edge from ${y \in B}$ to ${g(y) \in A}$. The graph is injective: property (1) follows since both ${f}$ and ${g}$ are functions, and property (2) follows since both ${f}$ and ${g}$ are injective.

Basic Properties

From now on, let ${G}$ be an injective directed bipartite graph with left vertices ${A}$ and right vertices ${B}$. Note that every path in ${G}$ must alternate left and right vertices, because ${G}$ is bipartite.

The key concept is that of a maximal path. A maximal path in ${G}$ is a path that cannot be extended on either end. One simple example of a maximal path is a cycle:

$\displaystyle x_{1} \rightarrow \cdots \rightarrow x_{m} \rightarrow x_{1}.$

However, in an infinite graph there can be other maximal paths. One is a two-way infinite path

$\displaystyle \cdots \rightarrow x_{-m} \rightarrow \cdots \rightarrow x_{-1} \rightarrow x_{0} \rightarrow x_{1} \rightarrow \cdots$

And the other is a one-way infinite path

$\displaystyle x_{0} \rightarrow x_{1} \rightarrow \cdots$

Here there is no edge into ${x_{0}}$. A simple but important observation is that two distinct maximal paths have no vertices in common.

A final basic observation is the following: Suppose that ${A}$ is partitioned into sets

$\displaystyle A_{1} \cup A_{2} \cup \dots$

and ${B}$ is also partitioned into

$\displaystyle B_{1} \cup B_{2} \cup \dots$

If for each index ${i}$, there is a bijection from ${A_{i}}$ to ${B_{i}}$, then ${A}$ and ${B}$ have the same cardinality. Let’s call this the partition trick.

I am trying to include even the simplest idea of the proof. Is this helpful, or am I being too detailed? You can skip the easy parts, but my experience is that people sometimes get hung up on the most basic steps of a proof. This is why I am including all the details.

Finite Graphs

Let’s prove the theorem for the case when the graph is finite.

Theorem 3 An injective directed finite bipartite graph has the same number of left and right vertices.

We had better be able to do this. We claim the following decomposition fact: The graph ${G}$ is the union of disjoint cycles. This follows since the only maximal paths in ${G}$ are cycles—this uses that ${G}$ is finite.

This proves the theorem, since each cycle has the same number of left vertices as right, and therefore each cycle has a bijection from left to right. By the partition trick the theorem is proved.

So far, pretty easy.

Infinite Graphs

We now will prove the theorem in the case that ${G}$ is infinite. By the partition trick we need only show that there is a bijection on each maximal path. This is clear for cycles—it is the same as above even for a two-way infinite cycle: since it alternates left and right, we can just take ${h}$ to be the map that defines the bijection. The case of a one-way infinite path is just barely harder. Let

$\displaystyle x_{0} \rightarrow x_{1} \rightarrow \cdots$

be such a path. If ${x_{0}}$ is a left node, then we define ${h(x_0) = f(x_0) = x_1}$ and ${h(x_1) = g^{-1}(x_1) = x_0}$ to make the bijection go back and forth over the first edge in the path, then ${h(x_2) = f(x_2) = x_3}$ and ${h(x_3) = g^{-1}(x_3) = x_2}$ similarly for the third edge, and so on. If ${x_{0}}$ is a right vertex, we instead get ${h(x_j) = g(x_j)}$ for even ${j}$ and ${h(x_j) = f^{-1}(x_j)}$ for odd ${j}$. Pretty easy too.

Even thought the proof is about any sets of any cardinality—as large as you like—the proof employs paths that are either finite or countable in length. This seems a bit strange—no? I have wondered whether this ability to work only with such paths can be exploited in some manner. I do not see how to use this fact. Oh well.

The countable case ${A,B \subset \mathbb{N}}$ matters most in computability and complexity theory. John Myhill proved that if ${f}$ and ${g}$ are computable, then so is some bijection ${h}$. This at first seems strange—how can you recognize you are in an infinite cycle?—but it works this way in finite stages ${1,2,3,\dots}$: At any odd stage, let ${x}$ be the least number for which ${h(x)}$ has not been defined. Let ${y_1 = f(x)}$. If ${y_1}$ has not been listed as a value of ${h}$ at a previous stage, put ${h(x) = y_1}$. Else, we have already handled some ${x_1}$ such that ${h(x_1) = y_1}$. Then ${x_1 \neq x}$ since we hadn’t handled ${x}$, and ${f(x_1) \neq f(x)}$ since ${f}$ is injective. So put ${y_2 = f(x_1)}$. If ${y_2}$ has not already been listed as a value of ${h}$, then put ${h(x) = y_2}$. Else, we have already handled ${x_2}$ such that ${h(x_2) = y_2}$, and we repeat with ${y_3 = f(x_2)}$

Eventually we must exhaust the finitely many strings previously placed into the range of ${h}$, and the first new string ${y_i}$ becomes ${h(x)}$. Since ${y_i}$ is a value of ${f}$, inductively we are also preserving the property ${x \in A \iff h(x) \in B}$. Next, however, we need to do an even stage. Then we call ${z}$ the least number not already placed into the range of ${h}$, and consider ${x_1 = g(z)}$. Then ${x_1 \in A \iff z \in B}$. If ${x_1}$ is not already in the domain of ${h}$, we define ${h(x_1) = z}$. Else, we have already handled some ${z_1}$ such that ${h(x_1) = z_1}$, and inductively ${z_1 \in B \iff x_1 \in A \iff z \in B}$, so we repeat with ${x_2 = g(z_1)}$. Eventually we obtain ${x_i}$ not already in the domain of ${h}$, and define ${h(x_i) = z}$. This entire process computably defines both ${h}$ and ${h^{-1}}$ in “zipper” fashion for any ${x}$ or ${z}$, yielding a computable bijection of ${\mathbb{N}}$ that induces an isomorphism between ${A}$ and ${B}$.

In complexity theory, we want ${h}$ to be polynomial-time computable if ${f}$ and ${g}$ are. This is true provided ${f}$ and ${g}$ are length increasing as well as polynomial-time invertible, but not by the same algorithm—given an ${x}$ we can’t go back and do all previous stages in polynomial time. Instead, the algorithm found by Leonard Berman and Juris Hartmanis alternates trying

$\displaystyle g^{-1}(x),f^{-1}(g^{-1}(x)),g^{-1}(f^{-1}(g^{-1}(x))),\dots$

as far as possible, which must stop within length-of-${x}$ steps because the length is always decreasing. If ${g^{-1}}$ fails first then ${h(x) = f(x)}$, else ${h(x) = g^{-1}(x)}$.

This is the second puzzle: why do the ideas have to be so different? Is there a common formulation that might be used with other levels and kinds of complexity? The Berman-Hartmanis proof does resemble the one-way-infinite path case more than it does Myhill’s proof.

Open Problems

Does this help in understanding the proof? There are many proofs of CBT on the web, perhaps this is a better version. Take a look.

[fixed “domain”->”range” in one place in Myhill proof; worked around WordPress bug with length-of-x for |x|.]

1. July 31, 2014 10:43 pm

the berman-hartmanis conjecture re sparse sets and NP isomorphism seems like one of the most important conjectures in complexity theory to me, nearly ranking with P=?NP. yet there seems to be very little attn to it & new research directions, and wikipedia & others say it is thought to be erroneous, with some circumstantial evidence. hope you can comment on it further some time…. eg there seems to be very little “machinery” in complexity theory devised to analyze the property of sparse vs nonsparse sets etc.

2. August 1, 2014 12:28 am

An interesting variant on the Schröder-Bernstein Theorem is Lindenbaum’s Theorem, which says that for totally ordered sets A and B, if A is isomorphic to an initial segment of B, and B is isomorphic to a final segment of A, then A and B are isomorphic. If I understand the proof correctly, one consequence of this proof is that the same bijection produced by the (usual proof of the) Schröder-Bernstein theorem is the isomorphism! However, actually proving it that way directly seems annoying, and I haven’t seen it done. But the proof is still quite simple; it can be found in Sierpinski’s book “Cardinal and Ordinal Numbers”.

August 1, 2014 11:49 am

Sniffnoy

Interesting fact. There is also a proof of CBT based on fixed point theory, perhaps related to your comment

Thanks again

August 1, 2014 8:31 am

Dear Prof.Lipton,Excuse me for asking you a question about your old post by the comment.I recall you have a post about the relation between$NP=P$ and algebraic number,which also cites Chow’s theorem about algebraic curve.Could you tell me which one,I have browsed and searched for several times,but have not found it .

August 2, 2014 1:50 pm

Wang Xiuli

Let me check that for you

August 2, 2014 12:36 pm

A recent interesting exchange on a claimed proof of this theorem: http://math.stackexchange.com/questions/858978/lamport-claims-there-is-an-error-in-kelleys-proof-of-the-schroeder-bernstein-th

This concerns an elegant yet incorrect proof in Kelley’s “General Topology”. Lamport used this as an example of how more structure in proofs can help to reveal errors.

August 2, 2014 8:19 pm

Here’s how I think of this theorem. The function f is mostly fine; it just needs some changes so that it gets all of the set B. To see what changes are needed, think of f and g as pieces of string that connect A and B. Find something in B that is not pointed to by f, and pick it up. This is the beginning of an infinite string that alternates elements of B, then A, then B, then A, etc. This string is like the Hilbert Hotel. Every room (in A) has a string leading down to the person in it (in B). But now there is one more person, namely the B at the beginning of the string. So people have to move from the room above to the room below. Now the first room is free for the person that we picked up. Of course, you can’t do this step by step, like an algorithm. And I suppose you have to argue that these changes don’t cause any bad effects. But the picture helps me see that the basic idea is pretty simple.

August 3, 2014 8:15 pm

HalPrince

I believe this is a fine way to look at the proof. This is closer to Cantor’s original proof that where he used an inductive argument, of sorts. This does use the AC axiom and so is a weaker result. But good intuition.

August 2, 2014 10:35 pm

I like your proof a lot since the only cardinality used in the proof is the countability of the maximal paths. You never mention the cardinality of the other sets involved. Your proof does not seem to use the axiom of choice. Does it?