Generating Random DNA Sequences
Generating Random DNA Sequences
This Demonstration generates pseudorandom sequences of four letters, representing each nucleotide in the DNA by taking into consideration GC content, which is the number of Gs and Cs occurring in the sequence. Known GC content can also be chosen for popular organisms, including many mammals, but also a few bacteria, ranging in GC content from 20% to almost 52%. Even though the human genome GC content can vary from 35% to 60% from chromosome to chromosome, the average human genome GC content is 46.1%.
Various statistics and estimates of complexity values are provided, including Shannon entropy of the pseudorandomly generated sequence, a Kolmogorov–Chaitin complexity estimation using the Compress[] function in Mathematica to implement the Deflate algorithm, the lossless compression ratio (also using Compress[]), and a histogram of the distribution of nucleotides in the generated sequence. The RNA option simply substitutes thymine (T) for uracil (U). The program generates sequences of up to 1kbp (base pairs). You can select the full sequence even if all of it is not displayed in the Demonstration window.