WOLFRAM|DEMONSTRATIONS PROJECT

Random Character Sequences Do Not Follow Zipf's Law

​
number of different letters for random sequence
N
2
text
Origin of Species
Alice in Wonderland
Hamlet
Faust I
US Constitution
random sequence
real text
This Demonstration shows that word frequencies [1] in random character sequences and real texts behave differently from the point of view of Zipf's law. (For random character sequences, a word means the smallest unit separated by blanks.) Data exhibiting Zipf-like behavior shows a roughly linear relationship between frequency and rank on a log-log plot.
We consider only one random sequence model. All characters, including the blank or space are equally likely. This model is specified with a single parameter,
N
, the number of characters other than the space.
N∈{2,4,6,26}
was used in [2] (as mentioned in [1]). In this Demonstration, you can select
N
between 2 and 26.