Perhaps as a prelude we should briefly discuss where information theory
is getting applied to genome-related activities, since the cases
seem to be mounting. :-)
1. The DNA is 'decoded' through the mRNA/ribosome/tRNA 'channel' to
yield proteins which do stuff in the cell. Here the DNA is the
information source, the codons are the symbols, and the decoding
process is yielding the proteins according to (we model) an "ideal"
process where genes map to proteins. There are all the bells and
whistles of coding theory--errors, burst errors, redundancies, etc.
(It should be remarked that although the DNA sets up tRNA enzymes
and codes for many steps in its decoding, the system as a whole is
required for the DNA to do this, and so it is probably oversimplistic
to think of the DNA as a mathematically pure information source.)
2. The DNA is copied into daughter sex cells which then may be
the first cells in a child organism. The copying process here can
a) introduce errors like transpositions, duplications, b) include
errors introduced earlier in the parent strand (as in during
DNA maintenance). The parent DNA can be modelled as the source,
the copying process as the channel, and the child DNA as the
consumer.
3. Guessing at DNA of ancient species. The original DNA is the
source, the intervening generations' DNA is the channel, and we're
the consumers.
4. Sequencing DNA strands. The target DNA is the source, the machine
doing the sequencing is the channel, and we're the consumers.
5. 'How many bits are in this DNA sequence.' The DNA is the source,
we're the consumers, there is no channel (i.e. the channel is error-free
and infinite-capacity).
Other information measures can also be applied to DNA sequences, like
compressability and suchlike.
[we discuss case 1 above]
> Having said that it seems to me that Yockey is only concerned
> with the decoding context as you suggest above.
OK.
> Now let me bring up a point that confuses me a great deal.
> Awhile back I spent some time looking at the controversy
> surrounding "meaning" in information theory. I think its
> important to point out that it is not only creationists
> who are uncomfortable with this aspect of information theory.
Right. It is a major headache, because the meaning is typically
what all you non-engineers care about. :-) :-)
> Also, mistakes have been made at the highest levels, so to
> speak. For example, according to Yockey anyway, Manfred
> Eigen errs in this regard by trying to introduce _ad hoc_
> the idea of meaningful information in the development of
> his hypercycles. And again, according to Yockey, his
> hypercycles self-organize only as a result of this
> "meaningful" information in a similar but much more
> complicated way that Dawkins typing monkey always seems
> to find the target phrase :). Yockey is fond of saying things
> like "meaningful to who?" in order to make clear the teleological
> nature of "meaningful" information.
I haven't read much of him, but this could be a way to home in
on exactly who is the consumer, since that's important to figure
out when setting up the formalism.
> Moving on, I gather from your posts that you would side with
> Yockey on the point that information theory cannot address
> meaning. But here's where I get confused. The analog to
> "meaning" in a biological application is functionality or
> specificity. So, while Yockey spends a great deal of time
> emphasizing the point that information theory cannot address
> the meaning of a message, he also spends a great deal of time
> later on in his book getting an estimate for the amount of
> information needed to specify a molecule of iso 1 cytochrome c (icc).
> Here he uses available information on functionally equivalent
> sites (sites where more than one amino acid can be present and
> retain functionality) in order to significantly reduce the
> information content of icc. So, is Yockey contradicting himself?
> I really don't think so, however, I think I'll leave this as an
> open question for the time being soas to avoid dragging on too
> long.
Right. This is the sort of thing I tried to address in my other
post. The meaning of the biological system is related to our
quest to figure out 'how many bits in this strand,' since what
we're looking for is not just a synonym for length, but an idea
of the 'meaningful complexity' there, and where it is, and so forth.
Information theory is poorly prepared to help us out, although
perhaps creative genius will figure out a way to adapt it.
> OK, now there were some things about your recent posts that
> were bugging me for reasons similar to the above. In
> particular your post "More on information, meaning, and
> biological application". For example, you talk about all
> these things that *we* know and how this reduces *our*
> uncertainty and thus the information. But this makes
> sense (to me anyway) only if *we* are the receivers. But
Right. I was speaking from my case 5 above ('how much info in
this strand'). No organism in the wild cares a hoot about DNA,
or even knows it exists, so it is not very meaningful to treat
them as if they're protective of it; they're not.
> isn't it more appropriate to consider ourselves disinterested
> observers? The proteins are the receivers, how do all these
> things reduce their uncertainty? This doesn't at first sight
> seem objectionable wrt what Yockey did. The proteins do
> "know" about functionally equaivalent amino acids since
> "know" is just a convenient way of talking about the chemistry
> involved in the functional equivalence. I think. But I tend
> to get more confused the more I think about this.
I'm not quite sure I am understanding here. Do you mean that it
depends on the location we look? That is, if we just look at the
output of decoding--primary structure of a protein--the code is
straightforward, but if we look farther--the 'tasks' of the
protein in full tertiary structure, then it is clear that the DNA
can't just make any old primary structure and live, and so we
have a right to expect constraints on which sorts of primary structures
will get produced in the first place.
Here there is a bit of overlap between case 1 (decoding DNA) and
5 (info content of DNA). That is, if was say, "Consider the case where
DNA doesn't have to worry at all about producing viable conditions, it
can make whatever proteins it wants." There may be other complex
constraints (the intron folks are discovering some of them), but as
a first pass, the information content is just the length of the sequence.
OK, but we very seldom care about that. It is a neat process to observe,
and it may be handy to know a lot about that process to make enzymes in
biologically easy ways, and stuff like that, but it isn't really the
captivating thing about the information in the DNA. The captivating
thing is exactly the *meaning* of the information--it makes the cell
habitable and alive. So there the expectations of what we demand from
the sequence collides with the meaning we attribute to the information.
This can happen more simply. If I demand that you send me English text,
instead of random characters, I have a much more complicated source to
deal with, and its entropy will be reduced (fairly dramatically), but it
is at the cost of associating meaning with the output. I can try to get
away from that by just specifying units of English, but that gets hairy.
Do I specify words? Then I get messages like "globe from party headphones
want straw elementary", which aren't English text. I have to specify
sentences; but there are a very poorly defined number of sentences, and
we quickly get to places where the grammar is marginal. What do you do
with those?
Similarly, we can consider DNA to be a source with the constraint that it
produce proteins that make the organism viable. That's what's interesting,
and it reduces the information content, but exactly where and how is
unclear. We'll have to speak the 'language of life' as well or better
than we speak the English language before we can do as well there as we
can with compressing English text.
-Greg