How to find approximate page rank fast, among other things

Jennifer Chayes is the current director of a research lab in Cambridge—that is Cambridge Massachusetts—for a company called Microsoft. She is famous for her own work in many areas of theory, including phase transitions of complex systems. She is also famous for her ability to create and manage research groups, which is a rare and wonderful skill.

Today Ken and I wish to talk about how to be “shifty” in algorithm design. There is nothing underhanded, but it’s a different playing field from what we grew up with.

Ken first noted the paradigm shift in 1984 when hearing Leslie Valiant’s British unveiling of his “Probably Approximately Correct” learning theory at the Royal Society in London. Last year Leslie wrote a book with this title. That’s about learning, while here we want to discuss the effect on algorithm design.

We illustrate this for a paper (ArXiv version) authored by Chayes and Christian Borgs, Michael Brautbar, and Shang-Hua Teng. It is titled, “Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation.” Since PageRank is a vital application, they don’t want their algorithms to be galactic. They trade off by not having the algorithms “solve” the problems the way we used to.

In The Good Old Days

I, Dick, recall the “good old days of theory.” When I first started working in theory—a sort of double meaning—I could only use deterministic methods. I needed to get the exact answer, no approximations. I had to solve the problem that I was given—no changing the problem. Well sometimes I did that, but mostly I had to solve the problem that was presented to me.

In the good old days of theory, we got a problem, we worked on it, and sometimes we solved it. Nothing shifty, no changing the problem or modifying the goal. I actually like today better than the “good old days,” so I do not romanticize them.

One way to explain the notion of the good old days is to quote from a Monty Python skit about four Yorkshiremen talking about the good old days. We pick it up a few lines after one of them says, “I was happier then and I had nothin’. We used to live in this tiny old house with great big holes in the roof.” …

${\mathsf{FIRST \ YORKSHIREMAN}}$: You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t’ mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi’ his belt.

${\mathsf{SECOND \ YORKSHIREMAN}}$: Luxury. We used to have to get out of the lake at six o’clock in the morning, clean the lake, eat a handful of ‘ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!

${\mathsf{THIRD \ YORKSHIREMAN}}$: Well, of course, we had it tough. We used to ‘ave to get up out of shoebox at twelve o’clock at night and lick road clean wit’ tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit’ bread knife.

${\mathsf{FOURTH \ YORKSHIREMAN}}$: Right. I had to get up in the morning at ten o’clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.

${\mathsf{FIRST \ YORKSHIREMAN}}$: And you try and tell the young people of today that ….. they won’t believe you.

${\mathsf{ALL}}$: They won’t!

Today

Those were the days. I did feel like I sometimes worked twenty-nine hours a day, I was paid by Yale, but so little that perhaps it felt like I paid them. Never had to drink a cup of sulphuric acid—but the coffee—oh you get the idea.

Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful.

The brilliance of the current approach is that we can change the problem. There are at least two major ways to do this:

${\bullet }$ Change the answer required. Allow approximation, or allow a partial answer. Do not insist on an exact answer.

${ \bullet }$ Change the algorithmic method. Allow algorithms that can be wrong, or allow algorithms that use randomness. Do not insist that the algorithm is a perfect deterministic one.

This is exactly what Chayes and her co-authors have done. So let’s take a look at what they do in their paper.

PageRank

In their paper they study PageRank, which is the definition and algorithm made famous by Google. It gives a way to rank webpages in response to a query that supplements criteria from the query itself. An old query-specific criterion was the number of matches to a keyword in the query. Rather than rank solely by this count, PageRank emphasizes a general page score. The score is sometimes interpreted as a measure of “popularity” or “authority,” leading to the following circular-seeming definitions:

${\bullet}$ A webpage is popular if it has a healthy number of links from popular pages.

${\bullet}$ A webpage is authoritative if it is well cited, especially by other authoritative pages.

What the PageRank score actually denotes mathematically is the likelihood that a person randomly traversing links will arrive at any particular page. This includes a frequency ${\alpha}$ with which the person will stop clicking, do something healthier like ride a bicycle, and start again on a “random” webpage.

The situation can be modeled by the classic random walk on a directed graph. We have a graph ${G}$ on ${N}$ nodes and an ${N \times N}$ matrix ${M}$ that is row-stochastic, meaning the entries in each row are non-negative and sum to ${1}$. Given that the web-walker is at node ${i}$, the entry ${M[i,j]}$ is the probability of going next to node ${j}$. If node ${i}$ has out-degree ${b}$, then

$\displaystyle M[i,j] = \frac{\alpha}{N} + \begin{cases} (1 - \alpha)\frac{1}{b} & \text{if~} i \text{~links to~} j\\ 0 & \text{otherwise.}\end{cases}$

We can tweak this e.g. by modeling the user hitting the “Back” button on the browser, or jumping to another browser tab, or using a search engine. We could also set ${\alpha}$ higher in case page ${i}$ has few or no outgoing links. We still get an ${M}$, and since the use of ${\alpha}$ effectively makes the graph strongly connected and averts certain pathologies, we get a beautiful conclusion from random-walk theory: There is a unique stationary distribution ${P}$, which is the unique left-eigenvector for the largest eigenvalue, which as normalized above is ${1}$:

$\displaystyle P M = P.$

Then the PageRank of node ${i}$ is ${P(i)}$. It is remarkable that this simple, salient idea from the good old days works so well. A further fact from the theory (and use of ${\alpha}$) is that if you start at any node, in the long run you will find yourself on page ${i}$ with frequency ${P(i)}$. Here is Wikipedia’s graphical example:

The issue is: how to compute ${P(i)}$? In the good old days this was a trivial problem—just use linear algebra. But now the issue is that ${N}$ is really big, let alone ${N \times N}$ being unspeakably big. The ${N}$ is too big even to get an approximation via the “further fact,” that is by simulating a random walk on the whole graph, and classical sparse-matrix methods might only help a little. This is where Chayes and company change the game: let us care about computing ${P(i)}$ only for some ${i}$‘s, and even then, let us be content with fairly rough approximation.

The Shifted Problems

The approximation to PageRank is called SignificantPageRank. The paper gives a randomized algorithm that solves the following problem.

Let us be given a graph. Then, given a target threshold ${\Delta}$ and an approximation factor ${c > 1}$, we are asked to output a set ${S}$ of nodes such that with high probability, ${S}$ contains all nodes of PageRank at least ${\Delta}$, and no node of PageRank smaller than ${\frac{1}{c}\Delta}$.

This is a perfect example of the shift. The algorithm is random, and the problem is to find not the nodes with a given PageRank, but those that are not too far away.

The nifty point is that the algorithm can tolerate fuzzing the matrix ${M}$, in a manner called SARA for “sparse and approximate row access”:

Given ${i}$ and ${\epsilon}$, return a set ${S}$ of ${O(1/\epsilon)}$ columns ${j}$ and values ${p_j}$ such that for all ${j}$:

• ${j \in S \implies |p_j - M[i,j]| \leq \epsilon}$, and

• ${j \notin S \implies M[i,j] \leq \epsilon}$.

It is important to use this for different values of ${\epsilon}$. The cost of a query ${(i,\epsilon) \rightarrow S}$ is ${\Theta(|S|) = O(1/\epsilon)}$.

If we picture “${N}$” as “exponential” and take ${\epsilon = 1/\mathsf{poly}(n)}$ where ${n = \log N}$, then this becomes an approximative version of ${M}$ being succinct, which we just talked about. In this scaling of ${\epsilon}$ we are effectively limiting to a ${\mathsf{poly}(n)}$ local portion ${S}$ of the graph around node ${i}$. Since we also have ${\frac{\alpha}{N} \ll \epsilon}$, under SARA entries outside ${S}$ would become effectively zero, so that the chance of “teleporting” outside ${S}$ on the whole would be regarded as negligible. In fact the paper also researches the case where each Web user always starts afresh at a “home node” ${u}$ in that portion, making ${M[i,u] = \alpha}$ just for that user. Then the ${\alpha}$-related probability is not negligible, and the resulting user-dependent estimate is called PersonalizedPageRank.

The problem they need to solve for SignificantPageRank then becomes “SignificantColumnSums”:

Given ${M}$ and ${\Delta,c}$ as above, find a set ${S}$ of columns such that for all columns ${j}$:

• ${\sum_i M[i,j] \geq \Delta \implies j \in S}$;

• ${\sum_i M[i,j] \leq \frac{1}{c}\Delta \implies j \notin S}$.

An even simpler problem which they use as a stepping-stone is “VectorSum”:

Given a length-${N}$ vector ${P}$ with entries in ${[0,1]}$, and ${1 \leq \Delta \leq N}$ and ${c > 1}$:

• output yes if ${{}\text{~}\sum_i P(i) \geq \Delta}$;

• output no if ${{}\text{~}\sum_i P(i) \leq \frac{\Delta}{c}}$, don’t-care otherwise.

The goal is always to avoid looking at all ${N}$ nodes or entries, but only an ${N^{1 - \gamma}}$ or so portion of them, where ${\gamma > 0}$. Thus the problem shift is necessitated by ${N}$ being huge. This isn’t my good-old-days idea of solving a problem, but can be called “${(\Delta,c)}$-solving” it.

Multi-Level Sampling and Results

Ken and I are interested because these are similar to problems we have been encountering in our quest for more cases where quantum algorithms can be simulated in classical randomized polynomial time. Thus any new ideas are appreciated, and what catches our eye is a multi-level approximation scheme that exploits the requirement of SARA to work for different ${\epsilon}$. The situations are different, but we hope to adapt them.

The situation for VectorSum is that a probe ${(i,\epsilon) \rightarrow P(i)}$ still costs order-of ${1/\epsilon}$, and returns ${0}$ unless ${P(i) \geq \epsilon}$. A simple-minded use of a set ${S}$ of random probes for the same ${\epsilon}$ would yield the estimate

$\displaystyle \frac{N}{|S|} \sum_{i \in S: P(i) \geq \epsilon} P(i).$

The resulting error has order ${N\epsilon}$, so we need ${\epsilon \approx \frac{\Delta}{N}}$ which is rather demanding. Indeed the total cost would have order ${\frac{|S|N}{\Delta}}$, where ${S}$ needs to be so large as to kill any hope of making the cost ${N^{1 - \gamma}}$ or even ${o(N)}$. In the pivotal case where ${\sum_i P(i) = \Delta}$, we would need ${|S| \approx \frac{N}{\Delta}}$, incurring cost on the order of ${(\frac{N}{\Delta})^2}$.

However, they show that by using a different precision ${\epsilon_t \approx \frac{t}{|S|}}$ for each random probe, they can get acceptable error with a reasonably small number of probes. The case ${t = 1}$ where we have ${\epsilon \approx \frac{1}{|S|} = \frac{\Delta}{N}}$ occurs only once, so its ${\frac{N}{\Delta}}$ cost can be tolerated. Other probes have smaller cost, and while their precisions are looser, the aggregate precision on the estimate becomes good enough for the following result:

Theorem 1 Given ${P,c,\Delta}$ as above and ${\delta > 0}$, VectorSum can be ${(\Delta,c)}$-solved with probability at least ${1 - \delta}$ and cost

$\displaystyle O\left(\frac{N}{\Delta}\left(\frac{1}{c-1}\right)^2 \log\left(\frac{N}{\Delta(c-1)}\right)\log\left(\frac{2}{\delta}\right)\right).$

Well in the good old days before LaTeX we wouldn’t even have been easily able to write such a formula with a typewriter, let alone prove it. But it is certainly better than ${(\frac{N}{\Delta})^2}$, and allows taking ${\Delta = N^{\gamma}}$ to meet the goal of runtime ${\tilde{O}(N^{1 - \gamma})}$. As usual, for the details on SignificantColumnSums and the application problems, see the paper.

Open Problems

Do you miss the good old days? Or do you like the current approaches? What shifts, what new types of changing the goals, might we see in the future? For clearly today will one day be the “good old days.”

[changed qualifier on “certain pathologies”, may as well footnote here that the “Back” button creates a Markov Chain with memory; changed intro to say British unveiling” of PAC learning.]

1. July 22, 2014 10:38 am

have been looking into a thread somewhat similar to this. spielman came up with a major improvement on fast algorithms for finding graph clusters based on random walks of graphs eg highlighted by Klarreich here: simons institute, Network Solutions:
A new breed of ultrafast computer algorithms offers computer scientists a novel tool to probe the structure of large networks.
this meshes with some other theory looking at bounds on random walks, some of which can be computed via pagerank theory. eg this high voted but unanswered question Mixing properties of random walks on graphs. tricky stuff, anyone else, any ideas? yeah randomized algorithms are a whole new way of thinking and seem to be quite fundamental to theory in deep ways that are still yet to be uncovered. although one could observe that they seem to have very close/deep connections with that old concept, nondeterminism.

2. July 23, 2014 3:59 pm

Dear Profs. Lipton and Regan,

A wonderful column as usual. Prof. Valiant’s seminal paper “A theory of the learnable” was published in the Communications of the ACM in 1984. But according to the web site of The Royal Society, Prof. Valiant was elected as a Fellow only in 1991! So the recollection of Prof. Regan “in 1984 when hearing Leslie Valiant’s inaugural talk on “Probably Approximately Correct” learning at the Royal Society in London” appears to be “probably approximately correct”! :)

Don’t ask me why, but I was once invited to explain PAC learning theory to a group of Indian metallurgists. I explained that a PAC algorithm is one that “more or less works most of the time,” and they seemed happy with that.

I shall certainly look up the paper by Chayes et al.

Keep up the good work!

• July 24, 2014 8:08 am

Thanks for the nice comment. By my recollection it was a lecture held in the Royal Society building at Carleton House Terrace. Possibly it was in a building of the University of London, as part of a workshop that had Royal Society sponsorship. I know we went down from Oxford most specifically to hear it—it was more of an “event” and “unveiling” than if Les were giving an invited talk at U. London or Oxford or Warwick, say.

• July 24, 2014 11:38 pm

Dear Prof. Regan,

The premises of The Royal Society at Carleton House Terrace are often rented out to host meetings that are unconnected with the Fellowship. So it is quite possible that in 1984 you heard Prof. Valiant talk about PAC learning at Carleton House Terrace. My only comment was that it cannot have been his “inaugural talk” upon being elected as an FRS.

Again, I do really enjoy this blog. I just wish I had the time and energy (not to mention the talent) to keep up with the wealth of information being provided).

Kind regards.

• July 25, 2014 10:07 am

I meant “inaugural talk” about PAC learning—as I said, it was a special event, and it was “at the Royal Society” with their aegis. I knew he was elected in 1991; I didn’t think what I wrote would be taken your way.

3. July 25, 2014 4:29 am

Thanks for this post. Seems like there’s always something new I learn even after being in the field for 25 years