At 10:20 AM 6/30/98 -0700, Greg Billock wrote:
>> >> But with DNA, we have only 4 nucleotides and thus KNOW what we are =
>> >> restricted to. This should make the information content calculable.
What =
>> >> am I missing here?
>> >
>> >The other 2,999,999,999 nucleotides in the sequence. :-)
>>
>> I think I must have miscommunicated here. The information content of a
>> sequence is related to the ENTIRE sequence, all 2,999,999 of them. There
>> are only 4 letters in the DNA alphabet. That was what I was meaning with
>> the 4 nucleotides. I was NOT referring to a 4 nucleotide long DNA
>> molecule. Go back and re read what I said in that light.
>>
>> I still don't think we need to know how many of the possible 3 billion long
>> DNA chains yield life to calculate information.
>
>I see what you mean, but here's the paragraph from Shannon again:
>
> We can think of a discrete source as generating the message, symbol
> by symbol. It will choose successive symbols according to certain
> probabilities depending, in general, on preceding choices as well
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> as the particular symbols in question. A physical system, or a
> mathematical model of a system which produces such a sequence of
> symbols governed by a set of probabilities, is known as a stochastic
> process.
>
>When the system has dependencies like this (i.e. not every symbol is
>independent of every other symbol), it is too aggressive to calculate
>the information as 2*length (2 bits per base). The reason is because
>there are long-range interdependencies in the genome.
I think I see where we differ. The interesting thing about biological
systems is that there is NO intersymbol dependencies. Thus while Shannon
is obviously talking about a Markov transition maxtrix in which the
symbols DO depend, and indeed in language the symbols do depend on previous
choices, in biological systems this seems not to be the case. Quoting Yockey,
"Intersymbol influence is an important repository of redundant information
in written languages. In spite of considerable search of the protein text
no intersymbol influence has been found. This source of redundance will
have to be ignored until its magnitude and statistical structure is
discovered-if it exists at all." ~ H. P. Yockey, "An Application of
Information Theory to the Central Dogma and the Sequence Hypothesis,"
Journal of Theoretical Biology, 46(1974):369-406, p. 384
I believe this still holds, although I can't put my hand on a citation at
this moment. Are we still in disagreement?
glenn
Adam, Apes and Anthropology
Foundation, Fall and Flood
& lots of creation/evolution information
http://www.isource.net/~grmorton/dmd.htm