How to find approximate page rank fast, among other things

Jennifer Chayes is the current director of a research lab in Cambridge—that is Cambridge Massachusetts—for a company called Microsoft. She is famous for her own work in many areas of theory, including phase transitions of complex systems. She is also famous for her ability to create and manage research groups, which is a rare and wonderful skill.

Today Ken and I wish to talk about how to be “shifty” in algorithm design. There is nothing underhanded, but it’s a different playing field from what we grew up with.

Ken first noted the paradigm shift in 1984 when hearing Leslie Valiant’s inaugural talk on “Probably Approximately Correct” learning at the Royal Society in London. Last year Leslie wrote a book with this title. That’s about learning, while here we want to discuss the effect on algorithm design.

We illustrate this for a paper (ArXiv version) authored by Chayes and Christian Borgs, Michael Brautbar, and Shang-Hua Teng. It is titled, “Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation.” Since PageRank is a vital application, they don’t want their algorithms to be galactic. They trade off by not having the algorithms “solve” the problems the way we used to.

## In The Good Old Days

I, Dick, recall the “good old days of theory.” When I first started working in theory—a sort of double meaning—I could only use deterministic methods. I needed to get the exact answer, no approximations. I had to solve the problem that I was given—no changing the problem. Well sometimes I did that, but mostly I had to solve the problem that was presented to me.

In the good old days of theory, we got a problem, we worked on it, and sometimes we solved it. Nothing shifty, no changing the problem or modifying the goal. I actually like today better than the “good old days,” so I do not romanticize them.

One way to explain the notion of the good old days is to quote from a Monty Python skit about four Yorkshiremen talking about the good old days. We pick it up a few lines after one of them says, “I was happier then and I had nothin’. We used to live in this tiny old house with great big holes in the roof.” …

${\mathsf{FIRST \ YORKSHIREMAN}}$: You were lucky. We lived for three months in a paper bag in a septic tank. We used to have to get up at six in the morning, clean the paper bag, eat a crust of stale bread, go to work down t’ mill, fourteen hours a day, week-in week-out, for sixpence a week, and when we got home our Dad would thrash us to sleep wi’ his belt.

${\mathsf{SECOND \ YORKSHIREMAN}}$: Luxury. We used to have to get out of the lake at six o’clock in the morning, clean the lake, eat a handful of ‘ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!

${\mathsf{THIRD \ YORKSHIREMAN}}$: Well, of course, we had it tough. We used to ‘ave to get up out of shoebox at twelve o’clock at night and lick road clean wit’ tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit’ bread knife.

${\mathsf{FOURTH \ YORKSHIREMAN}}$: Right. I had to get up in the morning at ten o’clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah.

${\mathsf{FIRST \ YORKSHIREMAN}}$: And you try and tell the young people of today that ….. they won’t believe you.

${\mathsf{ALL}}$: They won’t!

## Today

Those were the days. I did feel like I sometimes worked twenty-nine hours a day, I was paid by Yale, but so little that perhaps it felt like I paid them. Never had to drink a cup of sulphuric acid—but the coffee—oh you get the idea.

Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful.

The brilliance of the current approach is that we can change the problem. There are at least two major ways to do this:

${\bullet }$ Change the answer required. Allow approximation, or allow a partial answer. Do not insist on an exact answer.

${ \bullet }$ Change the algorithmic method. Allow algorithms that can be wrong, or allow algorithms that use randomness. Do not insist that the algorithm is a perfect deterministic one.

This is exactly what Chayes and her co-authors have done. So let’s take a look at what they do in their paper.

## PageRank

In their paper they study PageRank, which is the definition and algorithm made famous by Google. It gives a way to rank webpages in response to a query that supplements criteria from the query itself. An old query-specific criterion was the number of matches to a keyword in the query. Rather than rank solely by this count, PageRank emphasizes a general page score. The score is sometimes interpreted as a measure of “popularity” or “authority,” leading to the following circular-seeming definitions:

${\bullet}$ A webpage is popular if it has a healthy number of links from popular pages.

${\bullet}$ A webpage is authoritative if it is well cited, especially by other authoritative pages.

What the PageRank score actually denotes mathematically is the likelihood that a person randomly traversing links will arrive at any particular page. This includes a frequency ${\alpha}$ with which the person will stop clicking, do something healthier like ride a bicycle, and start again on a “random” webpage.

The situation can be modeled by the classic random walk on a directed graph. We have a graph ${G}$ on ${N}$ nodes and an ${N \times N}$ matrix ${M}$ that is row-stochastic, meaning the entries in each row are non-negative and sum to ${1}$. Given that the web-walker is at node ${i}$, the entry ${M[i,j]}$ is the probability of going next to node ${j}$. If node ${i}$ has out-degree ${b}$, then

$\displaystyle M[i,j] = \frac{\alpha}{N} + \begin{cases} (1 - \alpha)\frac{1}{b} & \text{if~} i \text{~links to~} j\\ 0 & \text{otherwise.}\end{cases}$

We can tweak this e.g. by modeling the user hitting the “Back” button on the browser, or jumping to another browser tab, or using a search engine. We could also set ${\alpha}$ higher in case page ${i}$ has few or no outgoing links. We still get an ${M}$, and since the use of ${\alpha}$ effectively makes the graph strongly connected and averts certain pathologies, we get a beautiful conclusion from random-walk theory: There is a unique stationary distribution ${P}$, which is the unique left-eigenvector for the largest eigenvalue, which as normalized above is ${1}$:

$\displaystyle P M = P.$

Then the PageRank of node ${i}$ is ${P(i)}$. It is remarkable that this simple, salient idea from the good old days works so well. A further fact from the theory (and use of ${\alpha}$) is that if you start at any node, in the long run you will find yourself on page ${i}$ with frequency ${P(i)}$. Here is Wikipedia’s graphical example:

The issue is: how to compute ${P(i)}$? In the good old days this was a trivial problem—just use linear algebra. But now the issue is that ${N}$ is really big, let alone ${N \times N}$ being unspeakably big. The ${N}$ is too big even to get an approximation via the “further fact,” that is by simulating a random walk on the whole graph, and classical sparse-matrix methods might only help a little. This is where Chayes and company change the game: let us care about computing ${P(i)}$ only for some ${i}$‘s, and even then, let us be content with fairly rough approximation.

## The Shifted Problems

The approximation to PageRank is called SignificantPageRank. The paper gives a randomized algorithm that solves the following problem.

Let us be given a graph. Then, given a target threshold ${\Delta}$ and an approximation factor ${c > 1}$, we are asked to output a set ${S}$ of nodes such that with high probability, ${S}$ contains all nodes of PageRank at least ${\Delta}$, and no node of PageRank smaller than ${\frac{1}{c}\Delta}$.

This is a perfect example of the shift. The algorithm is random, and the problem is to find not the nodes with a given PageRank, but those that are not too far away.

The nifty point is that the algorithm can tolerate fuzzing the matrix ${M}$, in a manner called SARA for “sparse and approximate row access”:

Given ${i}$ and ${\epsilon}$, return a set ${S}$ of ${O(1/\epsilon)}$ columns ${j}$ and values ${p_j}$ such that for all ${j}$:

• ${j \in S \implies |p_j - M[i,j]| \leq \epsilon}$, and

• ${j \notin S \implies M[i,j] \leq \epsilon}$.

It is important to use this for different values of ${\epsilon}$. The cost of a query ${(i,\epsilon) \rightarrow S}$ is ${\Theta(|S|) = O(1/\epsilon)}$.

If we picture “${N}$” as “exponential” and take ${\epsilon = 1/\mathsf{poly}(n)}$ where ${n = \log N}$, then this becomes an approximative version of ${M}$ being succinct, which we just talked about. In this scaling of ${\epsilon}$ we are effectively limiting to a ${\mathsf{poly}(n)}$ local portion ${S}$ of the graph around node ${i}$. Since we also have ${\frac{\alpha}{N} \ll \epsilon}$, under SARA entries outside ${S}$ would become effectively zero, so that the chance of “teleporting” outside ${S}$ on the whole would be regarded as negligible. In fact the paper also researches the case where each Web user always starts afresh at a “home node” ${u}$ in that portion, making ${M[i,u] = \alpha}$ just for that user. Then the ${\alpha}$-related probability is not negligible, and the resulting user-dependent estimate is called PersonalizedPageRank.

The problem they need to solve for SignificantPageRank then becomes “SignificantColumnSums”:

Given ${M}$ and ${\Delta,c}$ as above, find a set ${S}$ of columns such that for all columns ${j}$:

• ${\sum_i M[i,j] \geq \Delta \implies j \in S}$;

• ${\sum_i M[i,j] \leq \frac{1}{c}\Delta \implies j \notin S}$.

An even simpler problem which they use as a stepping-stone is “VectorSum”:

Given a length-${N}$ vector ${P}$ with entries in ${[0,1]}$, and ${1 \leq \Delta \leq N}$ and ${c > 1}$:

• output yes if ${{}\text{~}\sum_i P(i) \geq \Delta}$;

• output no if ${{}\text{~}\sum_i P(i) \leq \frac{\Delta}{c}}$, don’t-care otherwise.

The goal is always to avoid looking at all ${N}$ nodes or entries, but only an ${N^{1 - \gamma}}$ or so portion of them, where ${\gamma > 0}$. Thus the problem shift is necessitated by ${N}$ being huge. This isn’t my good-old-days idea of solving a problem, but can be called “${(\Delta,c)}$-solving” it.

## Multi-Level Sampling and Results

Ken and I are interested because these are similar to problems we have been encountering in our quest for more cases where quantum algorithms can be simulated in classical randomized polynomial time. Thus any new ideas are appreciated, and what catches our eye is a multi-level approximation scheme that exploits the requirement of SARA to work for different ${\epsilon}$. The situations are different, but we hope to adapt them.

The situation for VectorSum is that a probe ${(i,\epsilon) \rightarrow P(i)}$ still costs order-of ${1/\epsilon}$, and returns ${0}$ unless ${P(i) \geq \epsilon}$. A simple-minded use of a set ${S}$ of random probes for the same ${\epsilon}$ would yield the estimate

$\displaystyle \frac{N}{|S|} \sum_{i \in S: P(i) \geq \epsilon} P(i).$

The resulting error has order ${N\epsilon}$, so we need ${\epsilon \approx \frac{\Delta}{N}}$ which is rather demanding. Indeed the total cost would have order ${\frac{|S|N}{\Delta}}$, where ${S}$ needs to be so large as to kill any hope of making the cost ${N^{1 - \gamma}}$ or even ${o(N)}$. In the pivotal case where ${\sum_i P(i) = \Delta}$, we would need ${|S| \approx \frac{N}{\Delta}}$, incurring cost on the order of ${(\frac{N}{\Delta})^2}$.

However, they show that by using a different precision ${\epsilon_t \approx \frac{t}{|S|}}$ for each random probe, they can get acceptable error with a reasonably small number of probes. The case ${t = 1}$ where we have ${\epsilon \approx \frac{1}{|S|} = \frac{\Delta}{N}}$ occurs only once, so its ${\frac{N}{\Delta}}$ cost can be tolerated. Other probes have smaller cost, and while their precisions are looser, the aggregate precision on the estimate becomes good enough for the following result:

Theorem 1 Given ${P,c,\Delta}$ as above and ${\delta > 0}$, VectorSum can be ${(\Delta,c)}$-solved with probability at least ${1 - \delta}$ and cost

$\displaystyle O\left(\frac{N}{\Delta}\left(\frac{1}{c-1}\right)^2 \log\left(\frac{N}{\Delta(c-1)}\right)\log\left(\frac{2}{\delta}\right)\right).$

Well in the good old days before LaTeX we wouldn’t even have been easily able to write such a formula with a typewriter, let alone prove it. But it is certainly better than ${(\frac{N}{\Delta})^2}$, and allows taking ${\Delta = N^{\gamma}}$ to meet the goal of runtime ${\tilde{O}(N^{1 - \gamma})}$. As usual, for the details on SignificantColumnSums and the application problems, see the paper.

## Open Problems

Do you miss the good old days? Or do you like the current approaches? What shifts, what new types of changing the goals, might we see in the future? For clearly today will one day be the “good old days.”

[changed qualifier on "certain pathologies", may as well footnote here that the "Back" button creates a Markov Chain with memory.]

An approach to complexity lower bounds?

 Book source

Kurt Gödel did it all, succinctly. His famous 1938 paper “The Consistency of the Axiom of Choice and of the Generalized Continuum-Hypothesis” was ${1\frac{1}{3}}$ pages long. Since the Proceedings of the National Academy of Sciences was printed on single-column pages in fairy large type, it would have been under one page in FOCS or STOC format. Only Leonid Levin has famously been that succinct.

Today Ken and I wish to advance another approach on complexity questions based on succinctness and Gödel’s brilliant idea. We have covered succinctness before, but now our angle is more arithmetical.

Taking a conjecture about identities to college

Alex Wilkie is a Fellow of the Royal Society, and holds the Fielden Chair in Mathematics at the University of Manchester. Ken knew him at Oxford in 1981—1982 and again from 1986. In 1993 Wilkie won the Karp Prize of the Association of Symbolic Logic, equally with Ehud Hrushovski. This is named not after Dick Karp but rather Carol Karp, a mathematical logician in whose memory the prize was endowed.

Today I wish to talk about logical theories, motivated by some questions from Bill Gasarch and others.

Musings on gravity and quantum on the 4th of July

 Cropped from Crumbel source

Amanda Gefter is the author of the book titled Trespassing on Einstein’s Lawn: A Father, a Daughter, the Meaning of Nothing, and the Beginning of Everything. As the title suggests—and as Peter Woit pointed out in his review—there are parallels with Jim Holt’s book Why Does the World Exist?, which we covered a year ago. Gefter’s essay for Edge last year lays out more briefly how Einstein regarded relativity and early quantum as structurally divergent theories.

Today her book has led me to think further about the relationship between gravity and quantum mechanics. Read more…

Ann will be missed

Ann Yasuhara was a mathematician and a complexity theorist, who passed away this June 11th. She lived over half of her 82 years in Princeton, and was a member of the faculty of computer science at Rutgers since 1972.

Today Ken and I send our thoughts and condolences to Ann’s husband Mitsuru Yasuhara, her family, and her many friends—Ann will be missed—she was special.

Or rather two beautiful identities

Joseph Lagrange was a mathematician and astronomer who made significant contributions to just about everything. Yet like all of us he could make mistakes: he once thought he had proved Euclid’s parallel postulate. He wrote a paper, took it to the Institute, and as they did in those days, he began to read it. But almost immediately he saw a problem. He said quietly:

Il faut que j’y songe encore.

“I need to think on it some more.” He put his paper away and stopped talking.

Today Ken and I thought we would talk about a beautiful identity of Lagrange, not about the parallel axiom. Read more…

Another dichotomy theorem

Jin-Yi Cai is one of the world’s experts on hardness of counting problems, especially those related to methods based on complex—pun intended—gadgets. He and his students have built a great theory of dichotomy, which we covered two years ago. This means giving conditions under which a counting problem in ${\mathsf{\#P}}$ must either be in ${\mathsf{P}}$ or be ${\mathsf{\#P}}$-complete, with no possibility in-between. The theory is built around a combinatorial algebraic quantity called the holant, which arose in Leslie Valiant’s theory of holographic algorithms.

Today Ken and I wish to discuss a recent paper on the hardness of counting the number of edge colorings, even for planar graphs.