Thought Toys · Strategy & computation · Exhibit 24

PageRank & the random surfer

Imagine someone clicking links at random, forever, with no idea what any page says. Where do they end up spending their time? That long-run fraction is a page's rank — the idea that launched Google. Release the surfer and watch a tangle of links sort itself into an order. Then turn one dial and watch dead-end pages try to hoard it all.

A web of eight pages joined by arrows; a glowing token hops along the links and the pages swell as they collect more visits.

Release the surfer, or solve it instantly. Each page's size = its share of the surfer's time.

What you're seeing

Eight pages, joined by arrows — an arrow from one page to another means it links there. The middle page, C, is the hub: nearly everything in the cluster links to it. Off to the right, a thin tail leaks into two pages, G and H, that only link to each other — a dead end.

Now drop a surfer onto a random page. Each tick they either follow a random link from where they stand, or — with a small chance — get bored and teleport to a page picked at random anywhere. Run it long enough and the fraction of time spent on each page stops wobbling and settles. That settled fraction is PageRank. Watch the circles: they grow toward exactly the share of visits each page earns, and the well-linked hub C floats to the top — not because we counted its links, but because important pages keep sending the surfer its way.

The dial is the whole story. Damping d is the chance of following a link instead of teleporting. Slide it to 1 — never teleport — and watch the surfer get sucked into G and H and never come back: the dead-end pair bounces the surfer between them forever and swallows all the rank, while the real web starves to zero. That's why a pure link-follower is broken. Pull d back down and the occasional teleport yanks the surfer out of the trap, and a sensible ranking returns. Google's famous setting is 0.85: follow links most of the time, but jump free often enough to escape the dead ends. (Notice even at 0.85 the reciprocal pair rides high — a hint of why link-farms work, and why search engines fight them.)

The rule, exactly. Rank is the stationary distribution of the walk — the eigenvector the surfer's long-run time converges to — r = d · M r + (1−d)⁄N where M sends each page's vote out split evenly across its links (column j sums to 1), and the second term is the teleport. Verified in node (improve/verify/24-pagerank.js): M is column-stochastic; power iteration converges to the same vector from any start (a unique stationary distribution, sums to 1); a 2-million-step random walk matches that vector to within 0.01 — so rank really is "time spent"; and at d = 1 the pair {G, H} holds 100% of the rank, which 0.85 bleeds back down to 36%. Counter-example: forget to split each page's vote by its number of links (use raw link counts) and the total "rank" stops adding to 1 — it balloons without bound, so the numbers mean nothing. Dividing by out-degree is what makes it a probability.

← the cabinet · Thought Toys — a cabinet of explorable explanations. Exhibit 24.