# Benford's Law and Data Spread

Benford's Law and Data Spread

Benford's law is the observation that for many datasets, the distribution of their first significant digit follows a nonuniform law given by:

Probability(leading digit = ) = .

d

log

10

d+1

d

Thus the probability that the leading digit is 1 is 30%, while for the digit 7 to lead, the probability is merely 6%. An underlying reason for this is that data that spans many orders of magnitude has the errors within each order cancel out (see Details section). Thus datasets with large logarithmic spread will naturally follow the law, while datasets with small spread will not.

This Demonstration shows any of the scatter plots of 130 datasets derived from the data on countries in Mathematica; the points in a scatter plot are of the form (logarithmic spread, Benford deviation). Here spread is computed by taking base-10 logarithms and eliminating extreme outliers; the Benford deviation is the norm of the vector difference of the observed frequencies and the Benford predictions, normalized to lie between 0 and 1. Below the scatter plot are plots of the raw distribution, the agreement of the digit probabilities with Benford's law, and the distribution of the base-10 logarithms of the data. Note that the scatter plot supports the explanation remarkably well: all properties with large spread have small Benford deviation, and all properties with small spread have large Benford deviation.