Re: Junk DNA and information theory

From: Edward Allen (allene@cse.fau.edu)
Date: Wed Jul 05 2000 - 17:28:34 EDT

  • Next message: Bryan R. Cross: "Re: Involvement in evolution"

    Hi,

    I'd like to briefly go off on a tangent from RE: Junk DNA thread, to
    consider an information theory assessment of DNA.

    Entropy (information per unit, not in the sense of ``disorder'') and the
    total amount of information in a set of objects are fundamentally
    measures of a probability distribution. A uniform distribution has
    maximum entropy (in an information-theory sense) and a set of objects
    generated by sampling from a uniform distribution has maximum
    ``information'', given the number of samples.

    Talking about the amount of information in DNA using information theory
    is a way of modeling the sequence of amino acids as a probability
    distribution.

    Doug Hayworth wrote:
    [snip]
    > The fact is, the genomes contain segments of DNA having a variety of
    > functional relevance to the organisms bearing them.
    [snip]
    >
    > Each of these types of regions with their associated functions comprise
    > different constraints on the mutations which can occur without there being
    > any effect on function/fitness of the organism. Within exons, variation in
    > the third position of each codon has less affect than second or third
    > position. Within introns, most nucleotide substitutions have no
    > effect. And in intergenic spacers, large deletions or insertions of DNA
    > have little or no effect.
    >
    > There are also many examples of duplications, and at a variety of
    > levels.
    [snip]
    >
    > Finally, within nongenic regions of the genome (and occassionally within
    > genes, where they often have deleterious effects) there are also repetitive
    > DNA elements of little functional relevance.
    [snip]
    >
    > Finally, fairly closely related organisms are known to differ substantially
    > in overall genome size, due almost entirely to differences in nongenic
    > DNA. This is called the C-value paradox. Here is a relevant summary of
    > the issues surrounding this phenomenon from _Molecular evolution_ by W-H Li
    > (Sinauer Press, 1997):
    [snip]

    Doug's post clearly indicates how the ``probability distribution'' of
    amino acid sequences is an exceedingly complex distribution and not at
    all uniform. The functional decoding mechanisms also influence how
    compactly the information is encoded. As Doug's post indicates, this is
    very complex as well.

    Consequently, the ``amount of information'' in a DNA molecule is
    radically less than the maximum information estimated by a uniform
    distribution. Every time we discover a systematic mechanism that
    interprets some aspect of DNA, the uniform probability distribution
    becomes even more untenable model.

    As you can see, I think attempts to employ information theory to measure
    biological ``complexity'', because they tend to radically oversimplify
    the probability distribution and thus, arrive at a much too large
    estimate.

    This may have implications for some intelligent design arguments. I'm
    interested in your comments.

    In Him,

    Ed
    -----------------------------------------------------------
    Edward B. Allen, Ph.D.
    Research Associate
    Department of Computer Science and Engineering
    Florida Atlantic University, Boca Raton, FL 33431, USA
    Office: (561)297-3916
    Fax: (561)297-2800
    <<mailto:edward.allen@computer.org>>



    This archive was generated by hypermail 2b29 : Wed Jul 05 2000 - 17:28:36 EDT