[...]
> >> But with DNA, we have only 4 nucleotides and thus KNOW what we are =
> >> restricted to. This should make the information content calculable. What =
> >> am I missing here?
> >
> >The other 2,999,999,999 nucleotides in the sequence. :-)
>
> I think I must have miscommunicated here. The information content of a
> sequence is related to the ENTIRE sequence, all 2,999,999 of them. There
> are only 4 letters in the DNA alphabet. That was what I was meaning with
> the 4 nucleotides. I was NOT referring to a 4 nucleotide long DNA
> molecule. Go back and re read what I said in that light.
>
> I still don't think we need to know how many of the possible 3 billion long
> DNA chains yield life to calculate information.
I see what you mean, but here's the paragraph from Shannon again:
We can think of a discrete source as generating the message, symbol
by symbol. It will choose successive symbols according to certain
probabilities depending, in general, on preceding choices as well
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
as the particular symbols in question. A physical system, or a
mathematical model of a system which produces such a sequence of
symbols governed by a set of probabilities, is known as a stochastic
process.
When the system has dependencies like this (i.e. not every symbol is
independent of every other symbol), it is too aggressive to calculate
the information as 2*length (2 bits per base). The reason is because
there are long-range interdependencies in the genome.
-Greg