The central limit theorem

Take almost any source of randomness — a lopsided coin, a skewed heap, two separate peaks — and average a handful of its draws together. Do that over and over and plot the averages. No matter how strange the thing you started with, the averages always pile into the same smooth bell — centred on the true mean, and narrower by exactly the square root of how many you averaged. This is why the bell curve is everywhere.

What you're seeing

The top panel is the source: the distribution you're drawing single numbers from. Switch it and you get genuinely different shapes — flat, two hard spikes, a lopsided slope, a pair of humps. None of them is a bell.

The bottom panel collects averages. Each time you draw, the toy pulls n numbers from the source, averages them into one value, and drops that value into the histogram. The faint amber curve is the bell the theorem predicts before you start — and as the averages rain down, the bars climb to fit underneath it.

Now play with n, the number averaged together. At n = 1 an "average" is just one raw draw, so the bottom panel is a copy of the top one — spikes and all. Nudge n upward and two things happen at once. The pile centres up on the true mean, and it narrows: quadruple the draws-per-average and the spread halves, because the spread shrinks like one over the square root of n. Most strikingly, the lumps and lean smooth away — even the coin's two bare spikes melt into a clean bell by the time you're averaging twenty or thirty. The shape you started with stops mattering. That universality is the whole point: heights, measurement errors, sums of many small effects — average enough independent things and the Gaussian is what you get.

The rule, exactly. Draw n independent values from any distribution with mean μ and finite variance σ², and average them. The average X̄ has expected value μ and standard deviation σ⁄√n; and as n grows, the distribution of X̄ approaches the normal bell N(μ, σ²⁄n) — whatever shape you started from. Finite variance is the one catch (a heavy enough tail, like a Cauchy, breaks it). Sources here, on [0,1]: uniform (σ=√(1/12)), a 0-or-1 coin (σ=½), a skew U³ (σ=√(9/112)), and twin triangular peaks (σ≈0.2574). The amber curve is that exact N(μ, σ²⁄n). (Checked offline before shipping: across all four sources the collected averages land on μ, their spread tracks σ⁄√n to ~0.1%, and the skew falls like 1⁄√n.) Counter-example, verified in node: averaging does not always tame a distribution — for a Cauchy source the mean of 30 draws is just as spread out as a single draw (no 1/√n shrink).

← Back to the cabinet · Thought Toys — Exhibit 14.