The Shadow Case Model of Complexity
An approach to clustering based on a new model of complexity
Nina Balcan is one of the top young researchers in computational learning theory. She is not only a strong problem solver, but has a vision of her field that is wonderful.
Today I want to talk about a breakthrough she has made with Avrim Blum and Anupam Gupta. Their work has created an entire new way of looking at problems, and I think it will be considered soon, if not already, as one of the great results of recent times.
Nina was hired over two years ago, then she spent a year at the Microsoft Labs in Boston. We—at Tech—held our collective breath hoping she would leave there and come “back” to Tech. She did, much to our delight.
Classic Complexity Models
One of the main contributions of complexity theory is the formalization of what it means for an algorithm to be efficient. Prior to modern complexity theory, there were algorithms, but there was no way to understand their performance abstractly. Of course, one could use a stop-watch approach, and people did. Algorithms were timed on benchmark data—this is still done today. But, the ability to compare the performance of algorithms as a mathematical notion is, in my opinion, one of the great contributions of the field.
In the following let be a collection of problem instances. As usual will be used to denote the size of the problem.
Worst Case Model: This is the classic gold standard model we all know and love. Given an algorithm we judge its performance by how well it does on the worst possible instance of size . I have discussed this model before here.
The strength of the model is also its weakness: sometimes problem instances are not worst case. Sometimes, it “feels” that instances have some additional structure or some additional properties—but it is hard to make this “feeling” precise.
Average Case Model: This is now a classic model whose formal version is due to Leonid Levin—see the survey by Andrej Bogdanov and Luca Trevisan.
The concept here is that additional structure is captured by assuming the problem instances come from a random distribution.
There are many pretty results in this area, but there seems to be something unrealistic about the model. How often are instances of a problem really selected according to a random distribution? This is the primary difficulty with this model of complexity—it is not easy to see why instances are random.
There are other models, but these are two of the main ones. In the next sections I will discuss the new model due to Nina and her colleagues.
A New Model
Nina has developed a new model that is currently called the BBG model after the three authors of the seminal paper: Balcan, Blum, and Gupta.
Shadow Case Model: This is the new model—the name is mine. The central idea is that in practice, problem instances are not selected by an all powerful adversary, nor are they selected by the flip of a coin. BBG argue instances are selected to have a certain formal property that allows approximation algorithms to work. I think we need a better name, but for now I will stay with “the Shadow Case model.”
I have always thought problem instances are neither worst case nor random, but have never been able to find a good replacement. BBG seem to have succeeded in finding the right way to model many situations. At first it seems like their idea might be circular: a problem instance is “good,” provided our algorithms work well on it. This might be true, but it seems useless for building a sound theory. The magic is BBG have a notion of what a “good” instance is, they can build a sound mathematical theory, further the theory is beautiful, and even better it works in practice. Hard to beat.
Application to Clustering
Here is how they make the Shadow Case model concrete for the classic problem of clustering. As always see their paper for the details and for the generalization to domains other than clustering.
They assume there is a metric space with a distance function . You are given points from the space, and the task is to find a division of the points into sets, clusters. They assume there is a true clustering into sets,
Finding this cluster is usually very hard, so we almost always settle for a clustering
that is “near” the true clustering. The notion of near is quite simple—just count the fraction of points that are in the wrong cluster. This induces a natural distance function on set systems. The goal is to find a clustering so the distance to the true one is at most some small .
So far nothing new, nothing different: this is all standard stuff. Of course the clustering problem as set up is impossible to solve. Not just hard. Impossible, since we have no prior knowledge of the true clustering.
The key to making all this work is the following. Let be any measure on the quality of a particular clustering. Say the Shadow Hypothesis holds for with parameters provided: for any clustering with
then is within of the true clustering .
There several remarks I need to make. Note, the definition is not meant to be checkable, since it refers to the which is almost always impossible to compute efficiently. Second, they use the assumption in a very clever way. They do not go and find a clustering that is close to the optimum and then move it around somehow. No. They use the assumption purely in the background. The mere existence of the property is used to get an algorithm that works. This is the brilliance of their notion.
The reason I call it the Shadow Case Model should now be clear. It is not that I do not like BBG, but I wanted to highlight that there is something lurking in the “shadows.” The assumption talks about an optimum they never find, and an approximation to that optimum they never find either.
Here is a kind of lemma they can prove using the Shadow Hypothesis.
Lemma: Suppose a -median clustering problem instance satisfies the Shadow Hypothesis with parameters . Then, at most points have a distance to their true cluster of more than
where is the average distance of each point to its true cluster.
Lemmas like this allow them to create simple and fast algorithms that get terrific clustering results. Again the measure , the optimal clustering, and even the approximation to it, all remain in the shadows.
This is very neat work—see their paper for how all the magic works.
There are two obvious open problems. What is the right name for this notion? Is BBG the right one? If they use Shadow they can use, perhaps, Shadow the Hedgehog as their logo:
The most important open problem is to start using the model on other areas of theory. This is already well underway, but I expect a flood of papers using their brilliant idea.
Finally, the complexity theorist in me wants to know if there are ways to use their model to define formal complexity classes? Is this possible?