Benford’s Law

Benford’s law is an empirical law in statistics that states that the leading significant digit of numerical data in real life is likely to be small.
June 19, 2017—Matthew Chen

Discovering Benford’s Law

In statistics, not a lot of attention is paid to the first digits of numbers. They seem so simple that it’s meaningless to study them.
Make a histogram of the first digits of the first n natural numbers, with n ranging from 10,000 to 100,000:
In[]:=
Manipulate[Histogram[Array[IntegerDigits[#][[1]]&,n],9,"Probability"],{n,10000,100000,1000}]
Out[]=
​
n
The pattern may look unfamiliar, but it’s easily understandable. Obviously, random integers behave similarly.
Make a histogram of the first digits of 10,000 random integers up to n, with n ranging from 10,000 to 100,000:
In[]:=
Manipulate[Histogram[DeleteCases[Table[IntegerDigits[RandomInteger[n]][[1]],10000],0],9,"Probability"],{n,10000,100000,1000}]
Out[]=
​
n
This seems like a boring topic. Let’s take a look at some meaningful sequences, starting from the primes.
Make a histogram of the first digits of the first n primes, with n ranging from 10,000 to 100,000:
In[]:=
Manipulate[Histogram[Array[IntegerDigits[Prime[#]][[1]]&,n],9,"Probability",LabelingFunction(Placed[Row[{Round[100#,0.01],"%"}],Above]&)],{n,10000,100000,1000}]
Out[]=
​
n
It seems that the first digit of primes is slightly more likely to be small, but the difference is not obvious. But what about the Fibonacci numbers?
Make a histogram of the first digits of the first n Fibonacci numbers, with n ranging from 1,000 to 10,000:
In[]:=
Manipulate[Histogram[Array[IntegerDigits[Fibonacci[#]][[1]]&,n],9,"Probability",LabelingFunction(Placed[Row[{Round[100#,0.01],"%"}],Above]&)],{n,1000,10000,100}]
Out[]=
​
n
Whoa, wait a second. Why is this so stable, and what is this weird pattern? Let’s try more, say, factorials.
Make a histogram of the first digits of the first n factorials, with n ranging from 100 to 1,000:
In[]:=
Manipulate[Histogram[Array[IntegerDigits[#!][[1]]&,n],9,"Probability",LabelingFunction(Placed[Row[{Round[100#,0.01],"%"}],Above]&)],{n,100,1000,10}]
Out[]=
​
n
The same stable pattern shows again. This is getting interesting. What about powers of 2?
Make a histogram of the first digits of the first n powers of 2, with n ranging from 1,000 to 10,000:
In[]:=
Manipulate[Histogram[Array[IntegerDigits[2^#][[1]]&,n],9,"Probability",LabelingFunction(Placed[Row[{Round[100#,0.01],"%"}],Above]&)],{n,1000,10000,100}]
Out[]=
​
n
This is too good to be coincidence. And yet this is not the end of story. Now let’s move on to some data from real life, starting with the populations of countries.
Make a histogram of the first digits of the populations of all the countries in the world:
In[]:=
HistogramIntegerDigits[QuantityMagnitude[#]][[1]]&/@
all countries, dependencies, and territories
COUNTRIES

population
,9,"Probability",LabelingFunction(Placed[Row[{Round[100#,0.01],"%"}],Above]&),ImageSizeMedium
Out[]=
The distribution is generally similar. What about the total areas of countries?
Make a histogram of the first digits of the total areas of all the countries in the world in various units:
Does this pattern apply to any set of data? Let’s take a look at the heights of the tallest structures in the world.
Make a histogram of the first digits of the heights of the top 1,000 tallest structures in the world in various units:
Although the general trend holds in some histograms, most are quite divergent from the pattern that we saw previously. The final example is the lengths of the longest rivers in the world.
Make a histogram of the first digits of the lengths of the top 1,000 longest rivers in the world in various units:
Curiously, the pattern seems to return. Is there a reason behind all this? The answer is Benford’s law.

Introducing Benford’s Law

Explaining Benford’s Law

FURTHER EXPLORATIONS
A more general form of Benford’s law: Zipf’s law
AUTHORSHIP INFORMATION
Matthew Chen
6/19/17
yuanzhec@andrew.cmu.edu