Christofides bound beaten by an epsilon’s idea of epsilon

 src1, src2, src3

Anna Karlin, Nathan Klein, and Shayan Oveis Gharan have made a big splash with the number

$~~~~~~\frac{1}{1,000,000,000,000,000,000,000,000,000,000,000,000}.$

No that is not the amount of the US debt, or the new relief bill. It is the fraction by which the hallowed 44-year-old upper bound of ${1.5}$ on the approximation ratio of the metric Traveling Salesperson Problem has been improved. With the help of randomization, we hasten to add.

Today we discuss the larger meaning of their tiny breakthrough.

The abstract of their paper is as pithy as can be:

For some ${\epsilon > 10^{-36}}$ we give a ${3/2 - \epsilon}$ approximation algorithm for metric TSP.

Metric TSP means that the cost of the tour ${(v_1,v_2,\dots,v_n,v_1)}$ is the sum of the distances of the edges

$\displaystyle \mu(v_1,v_2) + \mu(v_2,v_3) + \cdots + \mu(v_{n-1},v_n) + \mu(v_n,v_1)$

according to a given metric ${\mu}$. When the points are in ${\mathbb{R}^m}$ with the Euclidean metric, an ${n^{O(1)}}$-time algorithm can come within a factor ${(1+\delta)}$ of the optimal cost for any prescribed ${\delta > 0}$. Sanjeev Arora and Joseph Mitchell jointly won the 2002 Gödel Prize for their randomized algorithms doing exactly that. The rub is the constant in the “${O(1)}$” depends on ${\delta}$—indeed, nobody knows how to make it scale less than linearly in ${\frac{1}{\delta}}$. But for general metrics, getting within a factor of ${(1+\delta)}$ is known to be ${\mathsf{NP}}$-hard for ${\delta}$ up to ${\frac{1}{122}}$.

Some intermediate cases of metrics had allowed getting within a factor of ${1.4}$, but for general metrics the ${1.5}$ factor found in 1976 by the late Nicos Christofides, and concurrently by Anatoliy Serdyukov, stood like a brick wall. Well, we didn’t expect it to be a brick wall at first. Let me tell a story.

## A Proof in a Pub

Soon after starting as a graduate student at Oxford in 1981, I went with a bunch of dons and fellow students down to London for a one-day workshop where Christofides was among the speakers and presented his result along with newer work. I’d already heard it spoken of as a combinatorial gem and perfect motivator for a graduate student to appreciate the power of combining simplicity and elegance:

1. Calculate the (or a) minimum spanning tree ${T}$ of the ${n}$ given points.

2. Take ${A}$ to be the leaves and any other odd-degree nodes of ${T}$ and calculate a minimum matching ${M}$ of them.

3. The graph ${T+M}$ has all nodes of even degree so it has an easily-found Eulerian cycle ${C_E}$.

4. The cycle ${C_E}$ may repeat vertices, but by the triangle inequality for the distance metric ${\mu}$, we can bypass repeats to create a Hamilton cycle ${C_H}$ giving ${\mu(C_H) \leq \mu(C_E)}$.

Now any optimal TSP tour ${C_O}$ arises as a spanning tree plus an edge, so ${\mu(T) < \mu(C_O)}$. And ${C_O}$ can be partitioned into two sets of paths with endpoints in ${A}$. One of those sets has weight at most ${\frac{1}{2}\mu(C_O)}$ and yet matches all pairs of ${A}$. Thus ${\mu(M) \leq \frac{1}{2}\mu(C_O)}$. It follows that ${\mu(C_H) \leq \mu(T) + \mu(M) < \mu(C_0) + \frac{1}{2}\mu(C_0)}$ and we’re done.

My memory of what we did after the workshop is hazy but I’m quite sure we must have gone to a pub for dinner and drinks before taking the train back up to Oxford. My point is, the above proof is the kind that can be told and discussed in a pub. It combines several greatest hits of the field: minimum spanning tree, perfect matching, Euler tour, Hamiltonian cycle, triangle inequality. The proof needs no extensive calculation; maybe a napkin to draw ${A}$ on ${C_0}$ and the partition helps.

The conversation would surely have gone to the question,

Can the ${1.5\;}$ factor be beaten?

A perfect topic for mathematical pub conversation. Let’s continue as if that’s what happened next—I wish I could recall it.

## Trees That Snake Around

Note that the proof already “beats” it in the sense of there being a strict inequality, and it really shows

$\displaystyle \mu(C_H) \leq (1.5 - \frac{1}{n})\mu(C_0).$

The advantage ${\frac{1}{n}}$ shrinks to zero as ${n}$ grows, however. Moreover, examples where Christofides’s algorithm does no better than approach ${1.5}$ are easy to draw. Pub walls are often covered with emblems of local organizations, and if one has a caduceus symbol it can serve as the drawing:

The staff is a path ${T}$ of ${n}$ nodes while the snakes alternate edges of weight ${1 + \gamma}$ between nodes two apart on the path. Going up one snake and down the other gives an optimal tour of weight ${(1 + \gamma)(n-2) + 2}$ (using the two outermost path edges to switch between the snakes), which ${\sim (1 + \gamma)n}$. The snake edges don’t change the path’s being the minimum spanning tree, and for ${C_H}$ this costs ${n-1}$ plus the weight required to match the path’s endpoints. The extra weight is reckoned as the length of one snake, which ${\sim n\frac{1+\gamma}{2}}$, so the ratio approaches ${\frac{3}{2}}$ as ${\gamma \rightarrow 0}$ and ${n \rightarrow \infty}$. Here are some tantalizing aspects:

• The ${n-2}$ snake edges, plus one path edge to connect them, make a maximum-weight spanning tree ${T'}$ in the graph ${G}$ formed by the two kinds of edges. Yet ${T'}$ followed by the same steps 2–4 of Christofides’s algorithm would yield and optimum tour.

• When one is given only the ${n}$ points plus the graph metric ${\mu}$ induced by ${G}$, not ${G}$ itself, then there are much worse spanning trees. The single edge connecting the endpoints ${(v_1,v_n)}$ of the previous path has weight ${\mu(v_1,v_n) \approx \frac{n}{2}}$.

• Thus ${T'}$ has relatively low weight compared to these possible other trees. And its weight approaches that of ${T}$ as ${\gamma \rightarrow 0}$. This means that small changes in the size of the tree yield large changes in the quality of the induced tour.

• The advantage of ${T'}$ is that its odd-valence nodes have small distance under ${\mu}$. As a path it snakes around so that its ends are near each other, unlike those of the minimum spanning tree ${T}$. This raises the question of weighting spanning trees according to a slightly different measure ${\mu'}$ that incorporates a term for “odd-closeness.”

In 1981, we would not have known about Arora’s and Mitchell’s results, so we would have felt fully on the frontier by embedding the points in the plane and sketching spanning trees and cycles on a piece of paper. After a couple pints of ale we might have felt sure that a simple proof with such evident slack ought to yield to a more sophisticated attack.

There is one idea that we might have come up with in a pub. The motivation for choosing ${T}$ to be a minimum spanning tree is that many of its edges go into the Euler tour ${C_E}$ and those bound the final ${C_O}$ even if ${C_O}$ shortcuts them. So making the total edge weight of ${T}$ minimum seems to be the best way to help at that stage. We might have wondered, however, whether there is a way to create ${T}$ to have a stronger direct relation to good tours, if not to the optimal tour.

Oveis Gharan did have such an idea jointly with a different group of authors a decade ago, in the best paper of SODA 2010. We cannot seem to get our hands on the optimal tour, nor even a “good” tour if that means a better than ${1.5}$ factor approximation—that is what we are trying to find to begin with. But there is another “tour” ${O^*}$ that we can compute. This is an optimum of the linear programming relaxation of TSP, whose relation to the exact-TSP methods of Michael Held and Dick Karp we covered long back. ${O^*}$ is not a single tour but rather an ensemble of “fractional tours” where each edge ${e}$ has a rational number ${z_e}$ representing its contribution to the LP solution. The higher ${z_e}$, the more helpful the edge.

The objective then becomes to design distributions ${{\cal T}}$ of spanning trees ${T}$ so that:

1. Sampling ${T \leftarrow {\cal T}}$ is polynomial-time efficient.

2. For every edge ${e}$, ${\Pr_{T \leftarrow {\cal T}}[e \in T] \propto (1 + \delta_n) z_e}$ where ${\delta_n}$ is tiny.

3. The distribution ${{\cal T}}$ promotes trees ${T}$ with fewer leaves and odd-valence interior nodes.

The algorithmic strategy this fits into is to sample ${T}$ from ${{\cal T}}$, plug ${T}$ into the first step of the Christofides algorithm, and continue as before.

## The Proof and the Pudding

The first two conditions are solidly defined. Considerable technical details in the SODA 2010 paper and another paper at FOCS 2011 that was joint with Amin Saberi and Mohit Singh are devoted to them. A third desideratum is that the distribution ${{\cal T}}$ not be over-constrained but rather have maximum entropy, so that for efficiently computable numbers ${\lambda_e}$ approaching ${z_e}$ one has also:

$\displaystyle \Pr_{\cal T}(T) \propto \prod_{e \in T}\lambda_e.$

The third condition, however, follows the maxim,

“the proof of the pudding is in the eating.”

As our source makes clear, this does not refer to American-style dessert pudding, but rather savory British pub fare going back to 1605 at least. The point is that we ultimately know a choice of ${{\cal T}}$ is good by proving it gives a better approximation factor than ${\frac{3}{2}}$.

In America, we tend to say the maxim a different way:

“the proof is in the pudding.”

The new paper uses the “pudding” from the 2011 paper but needed to deepen the proof. Here is where we usually say to refer to the paper for the considerable details. But in this case we find that a number of the beautiful concepts laid out in the paper’s introduction, such as real stability and strong Rayleigh distributions, are more accessibly described in the notes for the first half of a course taught last spring by Oveis Gharan with Klein as TA. One nub is that if a set of complex numbers all have positive imaginary part, then any product ${z = z_1 z_2}$ of two of the numbers has real part less than the product of the real parts, and if the latter product is positive, then ${z}$ is not a real number. This rules out assignments drawn from the set from being solutions to certain polynomials as well as setting up odd/even parity properties elsewhere.

## Rigidity of the TSP Universe

I’ll close instead with some remarks while admitting that my own limited time—I have been dealing with more chess cases—prevents them from being fully informed.

The main remark is to marvel that the panoply of polynomial properties and deep analysis buy such a tiny improvement. It is hard to believe that the true space of TSP approximation methods is so rigid. In this I am reminded of Scott Aaronson’s calculations that a collision of two stellar black holes a mere 3,000 miles away would stretch space near you by only a millimeter. There is considerable belief that the approximation factor ought to be improvable at least as far as ${\frac{4}{3}}$.

It strikes me that the maximum-entropy condition, while facilitating the analysis, works against the objective of making the trees more special. It cannot come near the kind of snaky tree ${T_O}$ obtained by deleting any edge from a good tour ${O}$, such that plugging ${T_O}$ into step 1 yields ${O}$ back again. The theory of polynomials and distributions that they develop has a plug-and-play element, so that they can condition the distributions ${{\cal T}}$ toward the third objective using the parity properties. But their framework has inflexibility represented by needing to postulate a real-valued function on the optimum edges whose expectation is of order the square of a parameter ${\eta}$ already given the tiny value ${10^{-12}}$. Of the requirement that ${\eta}$ be a small fraction of their governing epsilon parameter, they say in section 3:

This forces us to take ${\eta}$ very small, which is why we get only a “very slightly” improved approximation algorithm for TSP. Furthermore, since we use OPT edges in our construction, we don’t get a new upper bound on the integrality gap. We leave it as an open problem to find a reduction to the “cactus” case that doesn’t involve using a slack vector for OPT (or a completely different approach).

What may be wanting is a better way of getting the odd-valence tree nodes to be closer, not just fewer in number. To be sure, ideas for “closer” might wind up presupposing a metric topology on the ${n}$ given points, leading to cases that have already been improved by other means.

## Open Problems

Will the tiny but fixed wedge below ${\frac{3}{2}}$ become a lever by which to find better approximations?

There is also the kvetch that the algorithm is randomized, whereas the original by Christofides and Serdyukov is deterministic. Can the new methods be derandomized?

[fixed = to + sign at end of Christofides proof; fixed wording of “nub” at end of pudding section]