Zipf's Law for Natural Languages

Zipf's law for natural languages states that the frequency of a word is inversely proportional to its rank in the frequency table. The law was originally proposed in the beginning of the twentieth century by George Kingsley Zipf for the English language.
In symbols,
f(t)∝1/r(t)
, where
f(t)
is the number of occurrences of the term
t
,
r(t)
its rank in the frequency table, and
∝
denotes proportionality.
Zipf's law can be best verified by plotting rank and occurrences on a log-log plot. In such a plot, one can see how good the approximation is by looking at how closely the graph fits a linear model.
Choose a text document among one of the Wolfram ExampleData texts. Choose the maximum rank of the terms to be considered. Mouseover dots and labels in the plot to see more details.

Details

Reference: C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.

References

References for demonstration.

Permanent Citation

Giovanna Roda
​
​"Zipf's Law for Natural Languages"​
​http://demonstrations.wolfram.com/ZipfsLawForNaturalLanguages/​
​Wolfram Demonstrations Project​
​Published: September 9, 2024