Shifts In Algorithm Design
How to find approximate page rank fast, among other things
Jennifer Chayes is the current director of a research lab in Cambridge—that is Cambridge Massachusetts—for a company called Microsoft. She is famous for her own work in many areas of theory, including phase transitions of complex systems. She is also famous for her ability to create and manage research groups, which is a rare and wonderful skill.
Today Ken and I wish to talk about how to be “shifty” in algorithm design. There is nothing underhanded, but it’s a different playing field from what we grew up with.
Ken first noted the paradigm shift in 1984 when hearing Leslie Valiant’s British unveiling of his “Probably Approximately Correct” learning theory at the Royal Society in London. Last year Leslie wrote a book with this title. That’s about learning, while here we want to discuss the effect on algorithm design.
We illustrate this for a paper (ArXiv version) authored by Chayes and Christian Borgs, Michael Brautbar, and Shang-Hua Teng. It is titled, “Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation.” Since PageRank is a vital application, they don’t want their algorithms to be galactic. They trade off by not having the algorithms “solve” the problems the way we used to.
In The Good Old Days
I, Dick, recall the “good old days of theory.” When I first started working in theory—a sort of double meaning—I could only use deterministic methods. I needed to get the exact answer, no approximations. I had to solve the problem that I was given—no changing the problem. Well sometimes I did that, but mostly I had to solve the problem that was presented to me.
In the good old days of theory, we got a problem, we worked on it, and sometimes we solved it. Nothing shifty, no changing the problem or modifying the goal. I actually like today better than the “good old days,” so I do not romanticize them.
One way to explain the notion of the good old days is to quote from a Monty Python skit about four Yorkshiremen talking about the good old days. We pick it up a few lines after one of them says, “I was happier then and I had nothin’. We used to live in this tiny old house with great big holes in the roof.” …
: You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t’ mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi’ his belt.
: Luxury. We used to have to get out of the lake at six o’clock in the morning, clean the lake, eat a handful of ‘ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!
: Well, of course, we had it tough. We used to ‘ave to get up out of shoebox at twelve o’clock at night and lick road clean wit’ tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit’ bread knife.
: Right. I had to get up in the morning at ten o’clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.
: And you try and tell the young people of today that ….. they won’t believe you.
: They won’t!
Those were the days. I did feel like I sometimes worked twenty-nine hours a day, I was paid by Yale, but so little that perhaps it felt like I paid them. Never had to drink a cup of sulphuric acid—but the coffee—oh you get the idea.
Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful.
The brilliance of the current approach is that we can change the problem. There are at least two major ways to do this:
Change the answer required. Allow approximation, or allow a partial answer. Do not insist on an exact answer.
Change the algorithmic method. Allow algorithms that can be wrong, or allow algorithms that use randomness. Do not insist that the algorithm is a perfect deterministic one.
This is exactly what Chayes and her co-authors have done. So let’s take a look at what they do in their paper.
In their paper they study PageRank, which is the definition and algorithm made famous by Google. It gives a way to rank webpages in response to a query that supplements criteria from the query itself. An old query-specific criterion was the number of matches to a keyword in the query. Rather than rank solely by this count, PageRank emphasizes a general page score. The score is sometimes interpreted as a measure of “popularity” or “authority,” leading to the following circular-seeming definitions:
A webpage is popular if it has a healthy number of links from popular pages.
A webpage is authoritative if it is well cited, especially by other authoritative pages.
What the PageRank score actually denotes mathematically is the likelihood that a person randomly traversing links will arrive at any particular page. This includes a frequency with which the person will stop clicking, do something healthier like ride a bicycle, and start again on a “random” webpage.
The situation can be modeled by the classic random walk on a directed graph. We have a graph on nodes and an matrix that is row-stochastic, meaning the entries in each row are non-negative and sum to . Given that the web-walker is at node , the entry is the probability of going next to node . If node has out-degree , then
We can tweak this e.g. by modeling the user hitting the “Back” button on the browser, or jumping to another browser tab, or using a search engine. We could also set higher in case page has few or no outgoing links. We still get an , and since the use of effectively makes the graph strongly connected and averts certain pathologies, we get a beautiful conclusion from random-walk theory: There is a unique stationary distribution , which is the unique left-eigenvector for the largest eigenvalue, which as normalized above is :
Then the PageRank of node is . It is remarkable that this simple, salient idea from the good old days works so well. A further fact from the theory (and use of ) is that if you start at any node, in the long run you will find yourself on page with frequency . Here is Wikipedia’s graphical example:
The issue is: how to compute ? In the good old days this was a trivial problem—just use linear algebra. But now the issue is that is really big, let alone being unspeakably big. The is too big even to get an approximation via the “further fact,” that is by simulating a random walk on the whole graph, and classical sparse-matrix methods might only help a little. This is where Chayes and company change the game: let us care about computing only for some ‘s, and even then, let us be content with fairly rough approximation.
The Shifted Problems
The approximation to PageRank is called SignificantPageRank. The paper gives a randomized algorithm that solves the following problem.
Let us be given a graph. Then, given a target threshold and an approximation factor , we are asked to output a set of nodes such that with high probability, contains all nodes of PageRank at least , and no node of PageRank smaller than .
This is a perfect example of the shift. The algorithm is random, and the problem is to find not the nodes with a given PageRank, but those that are not too far away.
The nifty point is that the algorithm can tolerate fuzzing the matrix , in a manner called SARA for “sparse and approximate row access”:
Given and , return a set of columns and values such that for all :
- , and
It is important to use this for different values of . The cost of a query is .
If we picture “” as “exponential” and take where , then this becomes an approximative version of being succinct, which we just talked about. In this scaling of we are effectively limiting to a local portion of the graph around node . Since we also have , under SARA entries outside would become effectively zero, so that the chance of “teleporting” outside on the whole would be regarded as negligible. In fact the paper also researches the case where each Web user always starts afresh at a “home node” in that portion, making just for that user. Then the -related probability is not negligible, and the resulting user-dependent estimate is called PersonalizedPageRank.
The problem they need to solve for SignificantPageRank then becomes “SignificantColumnSums”:
Given and as above, find a set of columns such that for all columns :
An even simpler problem which they use as a stepping-stone is “VectorSum”:
Given a length- vector with entries in , and and :
- output yes if ;
- output no if , don’t-care otherwise.
The goal is always to avoid looking at all nodes or entries, but only an or so portion of them, where . Thus the problem shift is necessitated by being huge. This isn’t my good-old-days idea of solving a problem, but can be called “-solving” it.
Multi-Level Sampling and Results
Ken and I are interested because these are similar to problems we have been encountering in our quest for more cases where quantum algorithms can be simulated in classical randomized polynomial time. Thus any new ideas are appreciated, and what catches our eye is a multi-level approximation scheme that exploits the requirement of SARA to work for different . The situations are different, but we hope to adapt them.
The situation for VectorSum is that a probe still costs order-of , and returns unless . A simple-minded use of a set of random probes for the same would yield the estimate
The resulting error has order , so we need which is rather demanding. Indeed the total cost would have order , where needs to be so large as to kill any hope of making the cost or even . In the pivotal case where , we would need , incurring cost on the order of .
However, they show that by using a different precision for each random probe, they can get acceptable error with a reasonably small number of probes. The case where we have occurs only once, so its cost can be tolerated. Other probes have smaller cost, and while their precisions are looser, the aggregate precision on the estimate becomes good enough for the following result:
Theorem 1 Given as above and , VectorSum can be -solved with probability at least and cost
Well in the good old days before LaTeX we wouldn’t even have been easily able to write such a formula with a typewriter, let alone prove it. But it is certainly better than , and allows taking to meet the goal of runtime . As usual, for the details on SignificantColumnSums and the application problems, see the paper.
Do you miss the good old days? Or do you like the current approaches? What shifts, what new types of changing the goals, might we see in the future? For clearly today will one day be the “good old days.”
[changed qualifier on "certain pathologies", may as well footnote here that the "Back" button creates a Markov Chain with memory; changed intro to say British unveiling" of PAC learning.]