Re: pure chance

Brian D. Harper (harper.10@osu.edu)
Tue, 31 Dec 1996 00:53:47 -0500

At 07:52 PM 12/29/96 -0500, Gene wrote:

[...]

>Hi, I'm new on this list so a brief introduction first: My name is Gene
>Godbold and I'm doing postdoctoral research on the molecular biology of
>the enteric human parasite Entamoeba histolytica. My graduate work was in
>the biochemistry of lysosomal enzyme delivery (mammalian). I've been on
>the ASA listserver since it started. From memory, I think the above
>statement about the maintenance of the bacterial genome size is correct.
>The reason I have heard given is that duplicating a lot of extraneous DNA
>increases the doubling time and thus gives bacteria with weighty genomes a
>competitive disadvantage.
>

Welcome to the reflector!

[...]

>Gene:
>Why would a point mutation increase the information content? Sorry if I'm
>asking something that has been dealt with previously. How could it be
>said that an adenine, when changed to a guanine, increased or decreased
>the information content? I can see gene duplications as increasing
>content, but not point mutations.
>

If I'm reading between the lines correctly I think your concern about information content a la Shannon goes back again to the "problem" that
I keep referring to, namely that the Shannon measure of "information"
cannot address the meaning of a message. I believe its been awhile
since I gave the little anecdote about how Shannon arrived at the
name for his measure. Apparently Shannon wanted to
call it a measure of information content but hesitated for
fear of confusion since "information" has so many different
meanings. He then discussed his little "problem" with his friend,
Von Neumann, who advised him to call it "entropy" for
two reasons:

"First, the expression is the same as the expression for
entropy in thermodynamics and as such you should not use
two different names for the same mathematical expression,
and second, and more importantly, entropy, in spite of
one hundred years of history, is not very well understood
yet and so as such you will win every time you use entropy
in an argument."

Though this is humorous, many have lamented over Shannon
taking Von Neumann's advice since it has led to much confusion
between info theory and thermodynamics. But I'm not sure if the
situation would improve much by calling the Shannon measure
"information" instead of entropy since the Shannon entropy is
not really measuring information as this word is used in every
day conversation. This is where the word games come in that I
mentioned previously. Some people pretend they are talking about
information theory but, when discussing the implications of the
theory, suddenly change the meaning of the word information.

Again, I'm reading between the lines so please let me know if
I'm reading incorrectly. I think your concern is motivated
by interpreting "information" according to its more usual meaning.
Yockey suggests, if one gets confused by this, to change the
name from "information" or "entropy" to something else, say
gloopy since the meaning of the term comes from the mathematics
and not from the name. So the answer to your question is
that when one has a point mutation from adenine to guanine
one takes the frequencies in the mutated gene and unmutated
gene, plugs them into the formula, and then compares the two
numbers to see which is bigger. The point of the paper that I
cited was that for the case of a point mutation, one can determine
whether the gloopy will increase or decrease knowing only
the frequencies of adenine and guanine in the unmutated gene.

Well, I guess the next question is what does this mean? What
is the Shannon entropy really measuring and is it useful in any
way? I have a really hard time dealing with this question since
I know practically nothing about molecular biology. I think the
authors of the study I cited want to use Shannon entropy as
a measure for tracking evolution (since the entropy is expected
to increase during evolution). I think with your knowledge of
molecular biology you might be interested in reading Hubert
Yockey's book <Information Theory and Molecular Biology> for
a thorough treatment of the usefulness of information theory.
I also placed a lot of posts that Yockey made to the
bionet.info-theory newsgroup in the reflector archive. If you
need help finding them let me know. In any event, I would be
very interested in knowing your thoughts about applications of info
theory to molecular biology if you should want to dealve into
it some.

Brian Harper
Associate Professor
Applied Mechanics
Ohio State University