Re: pure chance

Brian D. Harper (harper.10@osu.edu)
Mon, 06 Jan 1997 09:45:28 -0500

At 03:52 PM 1/4/97 +0000, Glenn wrote:
>Brian wrote:
>
>>Yockey mentions exactly the type situation you are discussing above.
>>
>> "The third position in eight of the familiar triplet codons in the
>> set space G^3 may vary indefinitely among the four nucleotides
>> in either DNA or RNA alphabets without changing the read off
>> amino acid. The specificity of these codons is defined by the first
>> two nucleotides"
>>
>>With respect to the authors conclusions, it is obvious I think that a
>>mutation in the third position of these eight codons would not
>>change the information content regardless of the frequencies of
>>the codons involved.
>
>This is a nitpick but a change in the third position of the codon would not
>change the information content of the protein but it could change the
>information content of the DNA depending upon condon frequency.
>
>Is this not correct?
>

This is an interesting point and more than a nitpick I think since
it involves the fundamental point of what exactly is the information
being measured. At the beginning of this thread, Greg made an
extremely important point. Before applying info-theory to biology
one needs to check the underlying assumptions in info-theory to
see if they are appropriate to applications in biology. One critical
aspect of this is the analogy between the standard communication
system and the genetic information processing system. In fancy
terminology, are these two systems isomorphic? Yockey spends
some time on this important point, because if the two systems are
not isomorphic, the application of info-theory to biology is contrived,
_ad hoc_.

Two important elements in the standard communication system are
the source and receiver. In the genetic info system, the roles of
source and receiver are played by the DNA and the protein. One
objection that might be made here is that the source and receiver
are employing different alphabets, codons at the source and amino
acids at the receiver. But its ok to have different alphabets at
source and receiver provided there is a code linking the two.

It is this association between source and receiver that I was
thinking about when making the above statements. Thinking
about the communication system as a whole, I don't think its
appropriate to consider the message sent at the source independent
from what is being received. The content of the message sent and
received has to do with amino acid sequence. Returning to the
example above, a mutation in the third position of eight codons
will not change the message received.

I think one thing that may have added some confusion in this
discussion is that I have been inadvertently mixing terminology
from classical information theory (Shannon) and the closely
related but different algorithmic information theory (Kolmogorov,
Chaitin, Solomonoff). In AIT, information content is related to
compressibility in that the information content is proportional
to the shortest algorithm which reproduces the original sequence.

One "problem" with this definition is that there is an inherent
uncertainty, one can never be sure that one has not missed
some principle for compressing the message. Expressing the
change in info-content in terms of the frequencies of the codons,
as proposed by the authors of the study we are discussing,
seems to me to recognize no principles of compression. How
would we go about recognizing a compression method for DNA?
One way might be to recognize that "message" in DNA involves
amino acid sequences in protein. If we realize this then the
original DNA sequence can be compressed according to the
observation that the last position in eight codons is irrelevant.
Thus the "information" in the uncompressed DNA might change
due to a point mutation in the third position of these codons,
but the compressed length (and thus the algorithmic information
content) would not change.

Brian Harper
Associate Professor
Applied Mechanics
Ohio State University