Re: Information: Brad's reply (was Information: a very

Brian D Harper (bharper@postbox.acs.ohio-state.edu)
Wed, 01 Jul 1998 23:43:55 -0400

At 10:50 AM 6/30/98 -0700, Greg wrote:
>Brian,
>
>Perhaps as a prelude we should briefly discuss where information theory
>is getting applied to genome-related activities, since the cases
>seem to be mounting. :-)
>
>1. The DNA is 'decoded' through the mRNA/ribosome/tRNA 'channel' to
>yield proteins which do stuff in the cell. Here the DNA is the
>information source, the codons are the symbols, and the decoding
>process is yielding the proteins according to (we model) an "ideal"
>process where genes map to proteins. There are all the bells and
>whistles of coding theory--errors, burst errors, redundancies, etc.
>(It should be remarked that although the DNA sets up tRNA enzymes
>and codes for many steps in its decoding, the system as a whole is
>required for the DNA to do this, and so it is probably oversimplistic
>to think of the DNA as a mathematically pure information source.)
>
>2. The DNA is copied into daughter sex cells which then may be
>the first cells in a child organism. The copying process here can
>a) introduce errors like transpositions, duplications, b) include
>errors introduced earlier in the parent strand (as in during
>DNA maintenance). The parent DNA can be modelled as the source,
>the copying process as the channel, and the child DNA as the
>consumer.
>
>3. Guessing at DNA of ancient species. The original DNA is the
>source, the intervening generations' DNA is the channel, and we're
>the consumers.
>

Do you mean we because we happen to be biological organisms or
we as dispassionate observers trying to understand the process?

Your 5 cases have helped a lot. I guess I have tended to be
thinking along the lines of case 1.

It really bugs me thinking of us as ever being the receivers.
Unless one is very careful this could lead to some enormous
blunders. It seems to me that any certainty or lack thereof
that we might have has absolutely nothing to do with the
uncertainty in the actual physical process. To say otherwise
would seem to me to make info theory like quantum mechanics
wherein the observer can affect the outcome.

Or let's take this example. Suppose through strenuous efforts
we are able to determine that only 10^6 of the possible 10^120
paths available in info-space lead to a viable (meaningful) result.
So we conclude that evolution must be highly constrained. But
is it really? Many of these nonviable paths may become nonviable
only after they have been followed for awhile. The process
of evolution will not share our foresight and so many many
more than the 10^6 paths may actually be available. I guess
what I'm getting at is the "we are the receivers" model seems
fundamentally teleological. Why is it that these particular
paths are followed and not others. Because of some desirable
final cause.

<see how confused I can get myself :->

>4. Sequencing DNA strands. The target DNA is the source, the machine
>doing the sequencing is the channel, and we're the consumers.
>
>5. 'How many bits are in this DNA sequence.' The DNA is the source,
>we're the consumers, there is no channel (i.e. the channel is error-free
>and infinite-capacity).
>
>Other information measures can also be applied to DNA sequences, like
>compressability and suchlike.
>

My personal favorite would be algorithmic complexity since it
doesn't suffer from some drawbacks of classical info theory.
For example, one can deal with individual sequences rather
than ensembles and one doesn't have to know the underlying
probability distribution or even if there is one, i.e. its
an objective measure dealing only with the structure of the
sequence itself.

[...]

>BH:===
>> isn't it more appropriate to consider ourselves disinterested
>> observers? The proteins are the receivers, how do all these
>> things reduce their uncertainty? This doesn't at first sight
>> seem objectionable wrt what Yockey did. The proteins do
>> "know" about functionally equaivalent amino acids since
>> "know" is just a convenient way of talking about the chemistry
>> involved in the functional equivalence. I think. But I tend
>> to get more confused the more I think about this.
>

Greg:==
>I'm not quite sure I am understanding here. Do you mean that it
>depends on the location we look? That is, if we just look at the
>output of decoding--primary structure of a protein--the code is
>straightforward, but if we look farther--the 'tasks' of the
>protein in full tertiary structure, then it is clear that the DNA
>can't just make any old primary structure and live, and so we
>have a right to expect constraints on which sorts of primary structures
>will get produced in the first place.
>

I'm not quite sure how to answer this. I looked back in Yockey's
book and one interpretation he gives to the result he calculated
is as follows:

"... the genetic message must contain between 233 and 374 bits to
record the instructions to construct one of the molecules of iso-
1-cytochrome c in the high probability set." -- Yockey

Greg:==
>Here there is a bit of overlap between case 1 (decoding DNA) and
>5 (info content of DNA). That is, if was say, "Consider the case where
>DNA doesn't have to worry at all about producing viable conditions, it
>can make whatever proteins it wants." There may be other complex
>constraints (the intron folks are discovering some of them), but as
>a first pass, the information content is just the length of the sequence.
>OK, but we very seldom care about that. It is a neat process to observe,
>and it may be handy to know a lot about that process to make enzymes in
>biologically easy ways, and stuff like that, but it isn't really the
>captivating thing about the information in the DNA. The captivating
>thing is exactly the *meaning* of the information--it makes the cell
>habitable and alive. So there the expectations of what we demand from
>the sequence collides with the meaning we attribute to the information.
>
>This can happen more simply. If I demand that you send me English text,
>instead of random characters, I have a much more complicated source to
>deal with, and its entropy will be reduced (fairly dramatically), but it
>is at the cost of associating meaning with the output. I can try to get
>away from that by just specifying units of English, but that gets hairy.
>Do I specify words? Then I get messages like "globe from party headphones
>want straw elementary", which aren't English text. I have to specify
>sentences; but there are a very poorly defined number of sentences, and
>we quickly get to places where the grammar is marginal. What do you do
>with those?
>

I hope you won't mind my injecting a little humor at this point.
Your suggestion above reminded me of something Yockey wrote in
his book:

====
In Jonathan Swift's _Gulliver's Travels_, written in 1726, the
projectors,as they were called, at the Grand Academy of Lagado
put 'the instructions in the letters and words themselves' to
the useful purpose of writing learned works without the trouble
of mastering any of these difficult subjects. One of the
savants at that great institution of research and higher learning
had constructed a simple frame that recorded all the words in
the language of that country in all there several moods, tenses
and declensions. Students turned cranks, rearranging words in
random order, so that sentences might appear. The frame was
eagerly watched and when fragments of meaningful sentences
appeared, graduate students wrote them down. Thus a collection
of fragments of meaningful sentences was accumulated to be
collated later into works of philosophy, poetry, theology,
law and other learned matters of concern to the savants of
the Grand Academy. -- Yockey
=====

OK, so I guess this isn't what you had in mind :-). This is
typical Yockey. Fun to read even if you disagree with
everything he says. BTW, the savants at the Grand Academy
are Manfred Eigen, Maynard Smith, Richard Dawkins and
Sidney Fox ;-).

Brian Harper
Associate Professor
Applied Mechanics
The Ohio State University

"It appears to me that this author is asking
much less than what you are refusing to answer"
-- Galileo (as Simplicio in _The Dialogue_)