Christofides bound beaten by an epsilon’s idea of epsilon

 src1, src2, src3

Anna Karlin, Nathan Klein, and Shayan Oveis Gharan have made a big splash with the number

$~~~~~~\frac{1}{1,000,000,000,000,000,000,000,000,000,000,000,000}.$

No that is not the amount of the US debt, or the new relief bill. It is the fraction by which the hallowed 44-year-old upper bound of ${1.5}$ on the approximation ratio of the metric Traveling Salesperson Problem has been improved. With the help of randomization, we hasten to add.

Today we discuss the larger meaning of their tiny breakthrough.

The abstract of their paper is as pithy as can be:

For some ${\epsilon > 10^{-36}}$ we give a ${3/2 - \epsilon}$ approximation algorithm for metric TSP.

Metric TSP means that the cost of the tour ${(v_1,v_2,\dots,v_n,v_1)}$ is the sum of the distances of the edges

$\displaystyle \mu(v_1,v_2) + \mu(v_2,v_3) + \cdots + \mu(v_{n-1},v_n) + \mu(v_n,v_1)$

according to a given metric ${\mu}$. When the points are in ${\mathbb{R}^m}$ with the Euclidean metric, an ${n^{O(1)}}$-time algorithm can come within a factor ${(1+\delta)}$ of the optimal cost for any prescribed ${\delta > 0}$. Sanjeev Arora and Joseph Mitchell jointly won the 2002 Gödel Prize for their randomized algorithms doing exactly that. The rub is the constant in the “${O(1)}$” depends on ${\delta}$—indeed, nobody knows how to make it scale less than linearly in ${\frac{1}{\delta}}$. But for general metrics, getting within a factor of ${(1+\delta)}$ is known to be ${\mathsf{NP}}$-hard for ${\delta}$ up to ${\frac{1}{122}}$.

Some intermediate cases of metrics had allowed getting within a factor of ${1.4}$, but for general metrics the ${1.5}$ factor found in 1976 by the late Nicos Christofides, and concurrently by Anatoliy Serdyukov, stood like a brick wall. Well, we didn’t expect it to be a brick wall at first. Let me tell a story.

A Proof in a Pub

Soon after starting as a graduate student at Oxford in 1981, I went with a bunch of dons and fellow students down to London for a one-day workshop where Christofides was among the speakers and presented his result along with newer work. I’d already heard it spoken of as a combinatorial gem and perfect motivator for a graduate student to appreciate the power of combining simplicity and elegance:

1. Calculate the (or a) minimum spanning tree ${T}$ of the ${n}$ given points.

2. Take ${A}$ to be the leaves and any other odd-degree nodes of ${T}$ and calculate a minimum matching ${M}$ of them.

3. The graph ${T+M}$ has all nodes of even degree so it has an easily-found Eulerian cycle ${C_E}$.

4. The cycle ${C_E}$ may repeat vertices, but by the triangle inequality for the distance metric ${\mu}$, we can bypass repeats to create a Hamilton cycle ${C_H}$ giving ${\mu(C_H) \leq \mu(C_E)}$.

Now any optimal TSP tour ${C_O}$ arises as a spanning tree plus an edge, so ${\mu(T) < \mu(C_O)}$. And ${C_O}$ can be partitioned into two sets of paths with endpoints in ${A}$. One of those sets has weight at most ${\frac{1}{2}\mu(C_O)}$ and yet matches all pairs of ${A}$. Thus ${\mu(M) \leq \frac{1}{2}\mu(C_O)}$. It follows that ${\mu(C_H) \leq \mu(T) + \mu(M) < \mu(C_0) + \frac{1}{2}\mu(C_0)}$ and we’re done.

My memory of what we did after the workshop is hazy but I’m quite sure we must have gone to a pub for dinner and drinks before taking the train back up to Oxford. My point is, the above proof is the kind that can be told and discussed in a pub. It combines several greatest hits of the field: minimum spanning tree, perfect matching, Euler tour, Hamiltonian cycle, triangle inequality. The proof needs no extensive calculation; maybe a napkin to draw ${A}$ on ${C_0}$ and the partition helps.

The conversation would surely have gone to the question,

Can the ${1.5\;}$ factor be beaten?

A perfect topic for mathematical pub conversation. Let’s continue as if that’s what happened next—I wish I could recall it.

Trees That Snake Around

Note that the proof already “beats” it in the sense of there being a strict inequality, and it really shows

$\displaystyle \mu(C_H) \leq (1.5 - \frac{1}{n})\mu(C_0).$

The advantage ${\frac{1}{n}}$ shrinks to zero as ${n}$ grows, however. Moreover, examples where Christofides’s algorithm does no better than approach ${1.5}$ are easy to draw. Pub walls are often covered with emblems of local organizations, and if one has a caduceus symbol it can serve as the drawing:

The staff is a path ${T}$ of ${n}$ nodes while the snakes alternate edges of weight ${1 + \gamma}$ between nodes two apart on the path. Going up one snake and down the other gives an optimal tour of weight ${(1 + \gamma)(n-2) + 2}$ (using the two outermost path edges to switch between the snakes), which ${\sim (1 + \gamma)n}$. The snake edges don’t change the path’s being the minimum spanning tree, and for ${C_H}$ this costs ${n-1}$ plus the weight required to match the path’s endpoints. The extra weight is reckoned as the length of one snake, which ${\sim n\frac{1+\gamma}{2}}$, so the ratio approaches ${\frac{3}{2}}$ as ${\gamma \rightarrow 0}$ and ${n \rightarrow \infty}$. Here are some tantalizing aspects:

• The ${n-2}$ snake edges, plus one path edge to connect them, make a maximum-weight spanning tree ${T'}$ in the graph ${G}$ formed by the two kinds of edges. Yet ${T'}$ followed by the same steps 2–4 of Christofides’s algorithm would yield and optimum tour.

• When one is given only the ${n}$ points plus the graph metric ${\mu}$ induced by ${G}$, not ${G}$ itself, then there are much worse spanning trees. The single edge connecting the endpoints ${(v_1,v_n)}$ of the previous path has weight ${\mu(v_1,v_n) \approx \frac{n}{2}}$.

• Thus ${T'}$ has relatively low weight compared to these possible other trees. And its weight approaches that of ${T}$ as ${\gamma \rightarrow 0}$. This means that small changes in the size of the tree yield large changes in the quality of the induced tour.

• The advantage of ${T'}$ is that its odd-valence nodes have small distance under ${\mu}$. As a path it snakes around so that its ends are near each other, unlike those of the minimum spanning tree ${T}$. This raises the question of weighting spanning trees according to a slightly different measure ${\mu'}$ that incorporates a term for “odd-closeness.”

In 1981, we would not have known about Arora’s and Mitchell’s results, so we would have felt fully on the frontier by embedding the points in the plane and sketching spanning trees and cycles on a piece of paper. After a couple pints of ale we might have felt sure that a simple proof with such evident slack ought to yield to a more sophisticated attack.

There is one idea that we might have come up with in a pub. The motivation for choosing ${T}$ to be a minimum spanning tree is that many of its edges go into the Euler tour ${C_E}$ and those bound the final ${C_O}$ even if ${C_O}$ shortcuts them. So making the total edge weight of ${T}$ minimum seems to be the best way to help at that stage. We might have wondered, however, whether there is a way to create ${T}$ to have a stronger direct relation to good tours, if not to the optimal tour.

Oveis Gharan did have such an idea jointly with a different group of authors a decade ago, in the best paper of SODA 2010. We cannot seem to get our hands on the optimal tour, nor even a “good” tour if that means a better than ${1.5}$ factor approximation—that is what we are trying to find to begin with. But there is another “tour” ${O^*}$ that we can compute. This is an optimum of the linear programming relaxation of TSP, whose relation to the exact-TSP methods of Michael Held and Dick Karp we covered long back. ${O^*}$ is not a single tour but rather an ensemble of “fractional tours” where each edge ${e}$ has a rational number ${z_e}$ representing its contribution to the LP solution. The higher ${z_e}$, the more helpful the edge.

The objective then becomes to design distributions ${{\cal T}}$ of spanning trees ${T}$ so that:

1. Sampling ${T \leftarrow {\cal T}}$ is polynomial-time efficient.

2. For every edge ${e}$, ${\Pr_{T \leftarrow {\cal T}}[e \in T] \propto (1 + \delta_n) z_e}$ where ${\delta_n}$ is tiny.

3. The distribution ${{\cal T}}$ promotes trees ${T}$ with fewer leaves and odd-valence interior nodes.

The algorithmic strategy this fits into is to sample ${T}$ from ${{\cal T}}$, plug ${T}$ into the first step of the Christofides algorithm, and continue as before.

The Proof and the Pudding

The first two conditions are solidly defined. Considerable technical details in the SODA 2010 paper and another paper at FOCS 2011 that was joint with Amin Saberi and Mohit Singh are devoted to them. A third desideratum is that the distribution ${{\cal T}}$ not be over-constrained but rather have maximum entropy, so that for efficiently computable numbers ${\lambda_e}$ approaching ${z_e}$ one has also:

$\displaystyle \Pr_{\cal T}(T) \propto \prod_{e \in T}\lambda_e.$

The third condition, however, follows the maxim,

“the proof of the pudding is in the eating.”

As our source makes clear, this does not refer to American-style dessert pudding, but rather savory British pub fare going back to 1605 at least. The point is that we ultimately know a choice of ${{\cal T}}$ is good by proving it gives a better approximation factor than ${\frac{3}{2}}$.

In America, we tend to say the maxim a different way:

“the proof is in the pudding.”

The new paper uses the “pudding” from the 2011 paper but needed to deepen the proof. Here is where we usually say to refer to the paper for the considerable details. But in this case we find that a number of the beautiful concepts laid out in the paper’s introduction, such as real stability and strong Rayleigh distributions, are more accessibly described in the notes for the first half of a course taught last spring by Oveis Gharan with Klein as TA. One nub is that if a set of complex numbers all have positive imaginary part, then any product ${z = z_1 z_2}$ of two of the numbers has real part less than the product of the real parts, and if the latter product is positive, then ${z}$ is not a real number. This rules out assignments drawn from the set from being solutions to certain polynomials as well as setting up odd/even parity properties elsewhere.

Rigidity of the TSP Universe

I’ll close instead with some remarks while admitting that my own limited time—I have been dealing with more chess cases—prevents them from being fully informed.

The main remark is to marvel that the panoply of polynomial properties and deep analysis buy such a tiny improvement. It is hard to believe that the true space of TSP approximation methods is so rigid. In this I am reminded of Scott Aaronson’s calculations that a collision of two stellar black holes a mere 3,000 miles away would stretch space near you by only a millimeter. There is considerable belief that the approximation factor ought to be improvable at least as far as ${\frac{4}{3}}$.

It strikes me that the maximum-entropy condition, while facilitating the analysis, works against the objective of making the trees more special. It cannot come near the kind of snaky tree ${T_O}$ obtained by deleting any edge from a good tour ${O}$, such that plugging ${T_O}$ into step 1 yields ${O}$ back again. The theory of polynomials and distributions that they develop has a plug-and-play element, so that they can condition the distributions ${{\cal T}}$ toward the third objective using the parity properties. But their framework has inflexibility represented by needing to postulate a real-valued function on the optimum edges whose expectation is of order the square of a parameter ${\eta}$ already given the tiny value ${10^{-12}}$. Of the requirement that ${\eta}$ be a small fraction of their governing epsilon parameter, they say in section 3:

This forces us to take ${\eta}$ very small, which is why we get only a “very slightly” improved approximation algorithm for TSP. Furthermore, since we use OPT edges in our construction, we don’t get a new upper bound on the integrality gap. We leave it as an open problem to find a reduction to the “cactus” case that doesn’t involve using a slack vector for OPT (or a completely different approach).

What may be wanting is a better way of getting the odd-valence tree nodes to be closer, not just fewer in number. To be sure, ideas for “closer” might wind up presupposing a metric topology on the ${n}$ given points, leading to cases that have already been improved by other means.

Open Problems

Will the tiny but fixed wedge below ${\frac{3}{2}}$ become a lever by which to find better approximations?

There is also the kvetch that the algorithm is randomized, whereas the original by Christofides and Serdyukov is deterministic. Can the new methods be derandomized?

[fixed = to + sign at end of Christofides proof; fixed wording of “nub” at end of pudding section]

It is a Friday

James Maynard is a number theorist. He attended Cambridge as an undergrad and then moved to do his grad work at Oxford at Balliol College. He is now a professor at Oxford. He is one of the world experts on prime density type theorems.

Today, since it is Friday, I thought we would discuss a timely idea of Maynard. Not an idea about time complexity or time in physics, but involving the use of time.

The search for a vaccine—is not a development.

Edward Jenner was an English physician who created the first vaccine, one for smallpox.

In 1798 he used the weak cox-pox to fool our immune system to create a protection against the deadly small-pox. Jenner is said to have saved more lives than any other human.

Today there is an attempt to create a vaccine against our small-pox of the 21st century.

In his day small-pox killed 10% or more of populations. In our day there is a similar threat. and thus the immense interest in the development of a vaccine. However, there is a misunderstanding about vaccines for COVID-19 that is pervasive. Read the New York Times or watch cable news—CNN, FOX, MSNBC—where “experts” explain how AstraZeneca, Johnson & Johnson, Novavax, and other drug companies are developing a vaccine. What developing means could potentially affect all of us, and a better understanding could save millions of lives.

They are not currently developing the vaccines, they are testing them. The point we want to emphasize is:

The development of a vaccine does not change the vaccine. The vaccine is the same at the start of its testing trials, and remains the same throughout.

The Oxford vaccine AZD1222 is the same today as it was months ago when it was created. The same is true for the other vaccines currently being tested around the world.

A vaccine is not developed in the usual sense. Drug companies can modify: how the drug is made, how it is stored, how it is given, how many doses are needed, and so on. Drug companies cannot modify the vaccine without starting over—the vaccine must remain the same. Trials can lead to a vaccine being adopted, or it can cause the vaccine to be abandoned. In the later case the drug company can try again, but with a different vaccine.

Not Development

Think of the what development means elsewhere.

• In programming an app: We build a version and try it out. We find bugs and fix them. We use version numbers. Note, there is no AZD1222 version 3.
• In writing a book: We make a draft. We have people read the draft. We fix typos and inaccuracies. Our quantum book’s ${2^{nd}}$ edition is on version 11.
• In engineering a product: You get the idea.

Here is a sample explaining vaccine development:

• Phase I: The vaccine is given to healthy volunteers to see whether it is safe and what dosage (quantity) is most effective.
• Phase II: The vaccine is given to target groups that would be vaccinated to see what dosage (quantity) is most effective and how well the immune system responds to it.
• Phase III: The vaccine is given to an even larger group, consisting of thousands of people, to see how well the vaccine works to prevent COVID-19. People who do receive the vaccine are then compared with people who did not receive the vaccine.Note: there is no step that modifies the vaccine.

Consequences

There are several consequences from this insight about vaccines. For one it makes sense to order millions of doses of a vaccine, even one that has not yet been proved to be safe and effective. For example,

The European Commission has placed its first advance order for a coronavirus vaccine, snapping up 300 million doses of AstraZeneca’s AZD1222 candidate developed by the University of Oxford, with an option on another 100 million.

Note we would never order a large number of copies of a book before all editing and typos were fixed. This is a “proof” that the vaccine is the same.

Actually it may make sense to even begin to take the vaccine. Especially for high risk people. In the past inventors of vaccines have often taken their own new vaccine, even before they were sure they worked.

Open Problems

I am a computer scientist with no experience in vaccines. In 1954 I did help test the Jonas Salk polio vaccine. My help was in the form supplying an arm that got a shot of the Salk polio vaccine, I was nine years old then. But I have a math view of vaccines—a viewpoint that sheds light on this misunderstanding.

Our congratulations on the 2020 Nobel Prize in Physics

 Composite crop of src1, src2

Roger Penrose, Reinhard Genzel, and Andrea Ghez have won the 2020 Nobel Prize in Physics. The prize is divided half to Penrose for theoretical work and half to Genzel and Ghez for finding a convincing and appreciably large practical example.

Today we congratulate the winners and give further musings on the nature of knowledge and the role of theory.

The physics Nobel has always had the rule that it cannot be for a theory alone, no matter how beautiful and how many mathematical discoveries follow from its development. Stephen Hawking’s theory of black-hole radiation is almost universally accepted, despite its association with paradox, yet it was said that only an empirical confirmation such as mini-black holes being discovered to explode in an accelerator core would have brought it a Nobel. The official citation to Sir Roger says that his prize is:

“for the discovery that black hole formation is a robust prediction of the general theory of relativity.”

What is a “robust” prediction? The word strikes us as having overtones of necessity. Necessary knowledge is the kind we deal with in mathematics. The citation to Genzel and Ghez stays on empirical grounds:

“for the discovery of a supermassive compact object at the centre of our galaxy.”

The “object” must be a black hole—given relativity and its observed gravitational effects, it cannot be otherwise. Among many possible witnesses for the reality of black holes—one being the evident origin of the gravitational waves whose detection brought the 2017 Nobel—the centers of galaxies are hefty examples. The combination of these citations opens several threads we’d like to discuss.

The Proof Horizon of a Black Hole

Dick and I are old enough to remember when black holes had the status of conjecture. One of my childhood astronomy books stated that the Cygnus X-1 X-ray source was the best known candidate for a black hole. In 1974, Hawking bet Kip Thorne that it was not a black hole. The bet lasted until 1990, when Hawking conceded. He wrote the following in his famous book, A Brief History of Time:

This was a form of insurance policy for me. I have done a lot of work on black holes, and it would all be wasted if it turned out that black holes do not exist. But in that case, I would have the consolation of winning my bet. … When we made the bet in 1975, we were 80% certain that Cygnus X-1 was a black hole. By now [1988], I would say that we are about 95% certain, but the bet has yet to be settled.

In the 1980s, I was a student and then postdoc in Penrose’s department, so I was imbued with the ambience of black holes and never had a thought about doubting their existence. I even once spent an hour with John Wheeler, who coined the term “black hole,” when Penrose delegated me to accompany Wheeler to Oxford’s train station for his return to London. But it seems from the record that the progression to regarding black holes as proven entities was as gradual as many argue the act of crossing a large black hole’s event horizon to be. Although the existence of a central black hole from data emanating from Sagittarius had been proposed at least as far back as 1971, the work by Ghez and then Genzel cited for their prize began in 1995. The official announcement for Riccardo Giacconi’s share of the 2002 physics Nobel stated:

“He also detected sources of X-rays that most astronomers now consider to contain black holes.”

This speaks lingering doubt at least about where black holes might be judged to exist, if not their existence at all.

However their time of confirmation might be pinpointed, it is the past five years that have given by far the greatest flood of evidence, including the first visual image of a black hole last year. The fact of their presence in our universe is undeniable. But necessity is a separate matter, and with Penrose this goes back to 1964.

Relativity and Necessity

We have mentioned Kurt Gödel’s solution to the equations of general relativity (GR) in which time travel is possible. This does not mean that time travel must be possible, or that it is possible in our universe. A “solution” to GR is more like a model in logic: it may satisfy a theory’s axioms but have other properties that are contingent (unless the theory is categorical, meaning that all of its models are isomorphic). Gödel’s model has a negative value for Einstein’s cosmological constant; the 2011 physics Nobel went to the discovery that in our universe the constant has a tiny positive value. GR also allows solutions in which some particles (called tachyons) travel faster than light.

That GR has solutions allowing black holes had been known from its infancy in work by Karl Schwarzschild and Johannes Droste. There are also solutions without black holes; a universe with no mass is legal in GR in many ways besides the case of special relativity. Penrose took the opposite tack, of giving minimal conditions under which black holes are necessary. Following this article, we list them informally as follows:

1. Sufficiently large concentrations of mass exerting gravity exist.
2. Gravity always attracts, never repels.
3. No physical effect can travel faster than light.
4. Gravity determines how light bends and moves.
5. The space-time manifold is metrically complete.

Penrose showed that any system obeying these properties and evolving in accordance with GR must develop black holes. He showed this without any symmetry assumptions on the system. Thus he derived black holes as a prediction with the force of a theorem derived from minimal axioms.

His 1965 paper actually used a proof by contradiction. He derived five properties needed in order for the system to avoid forming a singularity. Then he showed they are mutually inconsistent—a proof by contradiction. Here is the crux of his paper:

 [ Snip from paper ]

In the diagram, time flows up. The point in a nutshell—a very tight nutshell—is that once a surface flows inside the cylinder at the Schwarzschild radius then light and any other motion from it can go only inward toward a singularity. The analysis is possible without the kind of symmetry assumption that had been used to tame the algebraic complexity of the equations of GR. The metric completeness mandates a singularity apart from any symmetries; a periodic equilibrium is ruled out by analysis of Cauchy surfaces.

Necessary For Us?

Like Richard Feynman’s famous diagrams for quantum field theory, Penrose developed his diagrams as tools for shortcutting the vicissitudes of GR. We could devote entire other posts to his famous tiles and triangle and other combinatorial inventions. His tools enable quantifying black-hole formation from observations in our universe.

The question of necessity, however, pertains to other possible universes. Let us take for granted that GR and quantum theory are facets of a physical theory that governs the entire cosmos—the long-sought “theory of everything”—and let us also admit the contention of inflationary theorists that multiple universes are a necessary consequence of any inflation theory. The question remains, are black holes necessary in those universes?

It is possible that those universes might not satisfy axiom 1 above, or might have enough complexity for existence of black holes but not large-scale formation of them. The question then becomes whether black holes must exist in any universe rich enough for sentient life forms such as ourselves to develop. This is a branch of the anthropic principle.

Lee Smolin proposed a mechanism via which black holes engender new universes and so propagate the complexity needed for their large-scale formation. Since complexity also attends the development of sentient life forms, this would place our human existence in the wake of consequence, as opposed to the direction of logic when reasoning by the anthropic principle.

A Little More About Science

The 2020 Nobel Prize in Chemistry was awarded this week to Jennifer Doudna and Emmanuelle Charpentier for their lead roles in developing the CRISPR gene-editing technology, specifically around the protein Cas9.

We argue that two more different types of results cannot be found:

${\bullet }$ Penrose shows that black holes and general relativity are connected, which is a math result. We still cannot create black holes in a lab to experiment with—or maybe we could but should be very afraid of going anywhere near doing so. It was not clear that there could ever be a real application of this result.

${\bullet }$ Charpentier and Doudna discover that an existing genetic mechanism could be used to edit genetic material. Clearly this can and was experimented on in labs. Also clear that there are applications of this result. Actually it is now a standard tool used in countless labs. There even are patent battles over the method.

We like the fact that Nobels are given for such diverse type of research. It is not just that one is for astrophysics and one for chemistry. It is that Nobels can be given for very different types of research. We think this is important.

But wait. These results do have something in common, something that sets them apart from any research we can do in complexity theory. Both operate like this:

Observe something important from nature. Something that is there independent of us. Then in Penrose’s case explain why it is true. Then in Charpentier and Doudna’s case, use it to solve some important problems.

We wonder if anything like this could be done in our research world—say in complexity theory?

Open Problems

Besides our congratulations to all those mentioned in this post, Ken expresses special thanks to Sir Roger among other Oxford Mathematical Institute fellows for the kindness recorded here.

[changed note about massless universe]

Science is good too

Emil Faber is the pretend founder of the pretend Faber College. The 1978 movie Animal House starts with a close-up of Faber’s statue, which has the inscription, Knowledge Is Good.

Today, Ken and I thought we might talk about knowledge, science, mathematics, proofs, and more.

Some differences from the Computational Lens

Chai Wah Wu, Jonathan Lenchner, Charles Bennett, and Yuhai Tu are the moderators for the four days of the First IBM Research Workshop on the Informational Lens. The virtual workshop begins Tuesday morning at 10:45 ET. The conference has free registration and of course is online.

Today we preview the conference and discuss a few of the talks.

Not puzzling reviews

 Princeton University Press page

Jason Rosenhouse is professor in the Department of Mathematics at James Madison University. His research focuses on algebraic graph theory and analytic number theory involving exponential sums. The former includes a neat paper on expansion properties of a family of graphs associated to block designs, with two undergraduates among its authors. But besides his “real” research, he has written a number of books on puzzles such as The Monty Hall Problem: The Remarkable Story of Math’s Most Contentious Brain Teaser. Soon his book Games for Your Mind: The History and Future of Logic Puzzles is to be published.

Today Ken and I thought we would highlight his recent review of a book on math puzzles.

Happy birthday to Ken

Ken Regan is of course my partner on GLL. He is faculty in the computer science department at the University of Buffalo. His PhD was in 1986 from Oxford University and it was titled On the separation of complexity classes. He was the PhD student of Dominic Welsh who was a student of John Hammersley.

Today I would like to wish Ken a happy birthday.

Continuous can beat discrete

Nisheeth Vishnoi is a professor at Yale University in the computer science department. The faculty there is impressive and includes many of the top researchers in the world. The CS faculty is pretty good too. As Nisheeth’s PhD advisor, years ago, I am proud that he is at Yale.

Today I wish to discuss a new book by Nisheeth.

The title is Algorithms for Convex Optimization. Let me jump ahead and say that I like the book and especially this insight:

One way to solve discrete problems is to apply continuous methods.

This is not a new insight, but is an important one. Continuous math is older than discrete and often is more powerful. Some examples of this are:

${\bullet}$ Analytic number theory is based on the behavior of continuous functions. Some of the deepest theorems on prime numbers use such methods. Think of the Riemann zeta function

$\displaystyle \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} + \cdots$

as a function of complex numbers ${s}$.

${\bullet}$ Additive number theory is based on the behavior of continuous functions. Think of generating functions and Fourier methods.

The power of continuous methods is one that I sometimes forget. Nisheeth’s book is a testament to the power of this idea.

Convexity

Nisheeth’s book uses another fundamental idea from complexity theory. This is: restrict problems in some way. Allowing too large a class usually makes complexity high. For example, trees are easier in general than planar graphs, and sparse graphs are easier than general graphs. Of course “in general” must be controlled, but restricting the problem types does often reduce complexity.

Convexity adds to this tradition since convex generalizes the notion of linear. And convex problems of all kinds are abundant in practice, abundant in theory, and are important.

The MW dictionary says convex means:

: being a continuous function or part of a continuous function with the property that a line joining any two points on its graph lies on or above the graph.

Here is a passage by Roman Dwilewicz on the history of the convexity concept:

It was known to the ancient Greeks that there are only five regular convex polyhedra.

It seems that the first more rigorous definition of convexity was given by Archimedes of Syracuse, (ca 287 – ca 212 B.C.) in his treatise: On the sphere and cylinder.

These definitions and postulates of Archimedes were dormant for about two thousand years!

I say it’s lucky that Archimedes was not up for tenure.

Nisheeth’s Book

Nisheeth’s book is now available at this site. I have just started to examine it and must say I like the book. Okay, I am not an expert on convex algorithms, nor am I an expert on this type of geometric theory. But I definitely like his viewpoint. Let me explain in a moment.

First I cannot resist adding some statistics about his book created here:

No way I can read the book in nine hours. But I like seeing how many characters and so on the book has. I will have to calculate the same for other books.

Discrete vs Continuous Methods

Nisheeth in his introduction explains how continuous methods help in many combinatorial problems, like finding flows on graphs. He uses the ${s}$${t}$ flow problem as his example. The ${s}$${t}$-maximum flow problem arises in real-world scheduling problems, but is also a fundamental combinatorial problem that can be used to find a maximum matching in a bipartite graph, for example.

Combinatorial algorithms for the maximum flow problem. He points out that by building on the Ford-Fulkerson method, various polynomial-time results were proved and other bounds were improved. But he states that the improvements stopped in 1998. Discrete methods seem to be unable to improve complexity for flow problems.

Convex programming-based algorithms. He adds:

Starting with the paper by Paul Christiano, Jonathan Kelner, Aleksander Mądry, Daniel Spielman, Shang-Hua Teng
the last decade has seen striking progress on the ${s}$${t}$ maximum flow problem. One of the keys to this success has been to abandon combinatorial approaches and view the ${s}$${t}$ maximum flow problem through the lens of continuous optimization.

Thus, at this point it may seem like we are heading in the wrong direction. We started off with a combinatorial problem that is a special type of a linear programming problem, and here we are with a nonlinear optimization formulation for it. Thus the questions arise: which formulation should we chose? and, why should this convex optimization approach lead us to faster algorithms?

Indeed.

Open Problems

Take a look at Nisheeth’s site for the answers.

I wish I were better informed about continuous methods in general. They are powerful and pretty. Maybe I could solve an open problem that I have thought about if I knew this material better. Hmmm. Maybe it will help you solve some open problem of your own. Take a look at his book.

[Edited]

Which is best for students?

 Cropped from Wikipedia src

Moshe Vardi holds multiple professorships at Rice University. He is also the Senior Editor of Communications of the ACM. His is therefore a voice to be reckoned with in the current debate over how best to teach during the pandemic. Much of the debate is over whether all should hear his voice the same way, or some hear it in the classroom while others hear it remotely.

Today we note his recent column for Medium advocating the former. Then I (Ken) give some of my own impressions.

His September 5 column followed an August 8 opinion given to the Rice student newspaper. Both begin with concern over the conflict between safety and value for students. Much of the value of college—most according to statistics he cites—comes from being collegial: outside the classroom. But many such activities, not only evening parties but informal games and gatherings, are the most unsafe.

We will focus however on what Moshe says about the nature of instruction for lecture courses. Certainly for laboratory courses there is a sharp trade-off between safety and in-person interaction. But we focus here on what he says about the nature of teaching in the lecture hall, where one can take safety as a given requirement.

An In-Person Remoteness Paradox

I have just returned from sabbatical at the University at Buffalo (UB) and am teaching this fall a small elective 4xx/5xx theory course. It has 15 students, smaller than the 25 in the hypothetical class Moshe describes but of the same order of magnitude. In the spring I will be teaching a larger undergraduate course which is also on target for his concerns. I have taught such a class every spring for a decade. While this assignment is not a new to me, the issue of safety raises tough choices about the delivery options. My options are:

1. remote-only;

2. in-person only;

3. hybrid use of 1 and 2 for designated course components;

4. hybrid-flexible, meaning 1 and 2 are conducted simultaneously with students free to choose either option, even on a per-lecture basis.

I have committed to hybrid-flexible. For my current fall course, I made this commitment in early summer when there was uncertainty over in-person instruction requirements for student visas and registration. I believe that my larger course will be implemented as safely in a large room as my current course. The question is quality.

Moshe notes right away a paradox for his hypothetical class that could apply to any of modes 2–4; to include the last expressly, I’ve inserted the word “even”:

…I realized that [even] the students in the classroom will have to be communicating with me on Zoom, to be heard and recorded. All this, while both the students and I are wearing face masks. It dawned on me then that I will be conducting remote teaching in the classroom.

 Business Insider source—yet another variation

In fact, I have one volunteer now in the room logging into Zoom to help with interaction from those attending remotely. This helps because my podium has less space to catch faces and detect questions right away. I do repeat questions so they are picked up in the recording and often redirect them to the class. Still, the mere fact of my not seeing faces alongside the notes and interactive drawings I am sharing makes me feel Moshe’s paradox all the time. This is even though my room allows denser spacing than at Rice, so a class of 25 could sit closer. Let me, however, say why I love stand-up teaching before addressing his paramount question of what is best for the students at this time.

From Whiteboards to Tutorials

Dick once wrote a post, “In Praise of Chalk Talks.” First, with reference to talks pre-made using PowerPoint or LaTeX slides, Dick wrote:

Such talks can be informative and easy to follow, yet sometimes PowerPoint is not well suited to giving a proof. The slides do not hold enough state for us to easily follow the argument.

Moreover, when I contributed to the open-problems session of the workshop at IAS in Princeton, NJ that we covered two years ago, Avi Wigderson insisted that everyone use chalk, not slides. I’ve used slides for UB’s data-structures and programming languages courses, but I think students benefit from seeing proofs and problem-solving ideas grow.

I find furthermore that the feel of immersion in a process of discovery is enhanced by an in-person presence. I had this in mind when I followed Dick’s post with a long one imagining Kurt Gödel expounding the distinctive points of his set theory (joint with Paul Bernays and John von Neumann), all on one chalkboard. My classes are not as interactive as in that post, but I prepare junctures in lectures for posing questions and doing a little bit of Socratic method. And I try to lead this with body language as well as voice inflection, whether at a whiteboard or drawing on hard paper via a document camera.

Still, it exactly this “extra” that gets diminished for those who are remote. When I share my screen for notes or a drawing (both in MathCha), they see my movements only in a small second window if at all. They do hear my voice—but I do not hear theirs even if they unmute themselves. Nor can I read their state of following as I do in the room. Without reiterating the safety factor as Moshe does, I can reformulate his key question as:

Does the non-uniformity and inequality of hybrid delivery outweigh the benefits of making in-person instruction available to some?

I must quickly add that in-person teaching is perceived as a collective need at UB. The web form I filled for Spring 2021 stated that some in-person classes must be available at all levels, 1xx through 7xx. I am happy to oblige. But the fact that I chose a flexible structure, especially in a small class, does allow the students to give opinion on this question, as well as on something Moshe says next:

“Remote teaching” actually can do a better job of reproducing the intimacy that we take for granted in small classes.

Toward this end, I am implementing a remote version of the tutorial system I was part of for two eight-week terms at Oxford while a junior fellow of Merton College. When Cambridge University declared already last May that there would be no in-person lectures all the way through summer 2021, this is because most lectures there are formally optional anyway. The heart of required teaching is via weekly tutorial hours in groups of one-to-three students. They are organized separately by each of thirty-plus constituent colleges rather than by department-centered staff, so the numbers are divided to be manageable. In my math-course tutorials the expectation was for each student to present a solved problem and participate in discussions that build on the methods.

I am doing this every other week this fall, alternating with weeks of problem-set review that will be strictly optional and classed as enhanced office hours. All UB office hours must be remote anyway. The tutorial requirement was agreed by student voice-vote in a tradeoff with lowering the material in timed exams to compensate for differences in home situations. After a few weeks of this, the class will take stock for opinions on which delivery options work best. UB has already committed to being remote-only after Thanksgiving, and it is possible that the on-campus medical situation will trigger an earlier conversion anyway.

Open Problems

We would like to throw the floor open for comment on Moshe’s matters that we’ve highlighted and on his other opinions about the university mission amid the current crisis more generally.

[edited to reflect that at Rice too, the hypothetical class could be in any of modes 2–4, and that spacing is further than in my UB room. “Princeton IAS” -> “IAS in Princeton, NJ”]