Generating Random DNA Sequences

# nucleotides

300

GC percentage content

0.4767

known GC contents

Dog

DNA

RNA

seed	reset
	copy to clipboard

Sequence length s: 300bp

TCCCTTATTCAGTACATCAATTCTATGGATGGTGTGCCCGTTGTACATTATATGTGGTCCATTTGTGTTCTTCAAGCAACGTACCTCCCATGCCATAGAAGTGTCCAGTGGTGGGTACGAAATTAGGACACACATACTTAGTTTCTCACTTGATGGATAGGGACATACACCGCAATCCGGAATAATTTGGCTACGGGCAGTGTAACCCAAATCGGCGAGTGGGAACGTTGAATCGATTCTAGGTGAAATCTTACGATCCAGTTCGATTTTAGACTCGAATTATTATCGTCACAGAGGTAG

Shannon entropy: ~1.37683 bits

K(s) = ~198 bits (by lossless compression)

Compression ratio (Deflate): 0.66AT/GC ratio: 1.29008

This Demonstration generates pseudorandom sequences of four letters, representing each nucleotide in the DNA by taking into consideration GC content, which is the number of Gs and Cs occurring in the sequence. Known GC content can also be chosen for popular organisms, including many mammals, but also a few bacteria, ranging in GC content from 20% to almost 52%. Even though the human genome GC content can vary from 35% to 60% from chromosome to chromosome, the average human genome GC content is 46.1%.

Various statistics and estimates of complexity values are provided, including Shannon entropy of the pseudorandomly generated sequence, a Kolmogorov–Chaitin complexity estimation using the Compress[] function in Mathematica to implement the Deflate algorithm, the lossless compression ratio (also using Compress[]), and a histogram of the distribution of nucleotides in the generated sequence. The RNA option simply substitutes thymine (T) for uracil (U). The program generates sequences of up to 1kbp (base pairs). You can select the full sequence even if all of it is not displayed in the Demonstration window.