Hi,
I'd like to briefly go off on a tangent from RE: Junk DNA thread, to
consider an information theory assessment of DNA.
Entropy (information per unit, not in the sense of ``disorder'') and the
total amount of information in a set of objects are fundamentally
measures of a probability distribution. A uniform distribution has
maximum entropy (in an information-theory sense) and a set of objects
generated by sampling from a uniform distribution has maximum
``information'', given the number of samples.
Talking about the amount of information in DNA using information theory
is a way of modeling the sequence of amino acids as a probability
distribution.
Doug Hayworth wrote:
[snip]
> The fact is, the genomes contain segments of DNA having a variety of
> functional relevance to the organisms bearing them.
[snip]
>
> Each of these types of regions with their associated functions comprise
> different constraints on the mutations which can occur without there being
> any effect on function/fitness of the organism. Within exons, variation in
> the third position of each codon has less affect than second or third
> position. Within introns, most nucleotide substitutions have no
> effect. And in intergenic spacers, large deletions or insertions of DNA
> have little or no effect.
>
> There are also many examples of duplications, and at a variety of
> levels.
[snip]
>
> Finally, within nongenic regions of the genome (and occassionally within
> genes, where they often have deleterious effects) there are also repetitive
> DNA elements of little functional relevance.
[snip]
>
> Finally, fairly closely related organisms are known to differ substantially
> in overall genome size, due almost entirely to differences in nongenic
> DNA. This is called the C-value paradox. Here is a relevant summary of
> the issues surrounding this phenomenon from _Molecular evolution_ by W-H Li
> (Sinauer Press, 1997):
[snip]
Doug's post clearly indicates how the ``probability distribution'' of
amino acid sequences is an exceedingly complex distribution and not at
all uniform. The functional decoding mechanisms also influence how
compactly the information is encoded. As Doug's post indicates, this is
very complex as well.
Consequently, the ``amount of information'' in a DNA molecule is
radically less than the maximum information estimated by a uniform
distribution. Every time we discover a systematic mechanism that
interprets some aspect of DNA, the uniform probability distribution
becomes even more untenable model.
As you can see, I think attempts to employ information theory to measure
biological ``complexity'', because they tend to radically oversimplify
the probability distribution and thus, arrive at a much too large
estimate.
This may have implications for some intelligent design arguments. I'm
interested in your comments.
In Him,
Ed
-----------------------------------------------------------
Edward B. Allen, Ph.D.
Research Associate
Department of Computer Science and Engineering
Florida Atlantic University, Boca Raton, FL 33431, USA
Office: (561)297-3916
Fax: (561)297-2800
<<mailto:edward.allen@computer.org>>
This archive was generated by hypermail 2b29 : Wed Jul 05 2000 - 17:28:36 EDT