re: pure chance

Brian D. Harper (harper.10@osu.edu)
Thu, 19 Dec 1996 22:56:40 -0500

At 09:56 PM 12/18/96 +0000, Glenn wrote:
>Brian Harper wrote:
>
>>
>>I also found in my archives a post to t.o from some time
>>ago which is more along the lines of Glenn's original
>>comment. Unfortunately, I do know who posted this so I
>>can't give appropriate credit:
>
>[snip]
>
>> Genome size Coding Coding genome (1)
>> bp x 10E9 DNA % bp x 10E6
>>
>>Bacteria (E. coli) 0.004 100 4
>>Yeast (Saccharomyces) 0.009 70 6
>>Nematode (Caenorhabditis) 0.09 25 23
>>Fruitfly (Drosophila) 0.18 33 59
>>Newt (Triturus) 19.0 1.5-4.5 285-855
>>Human 3.5 9-27 315-945
>>Newt Gingrich (human salamander) 3.5 1.5 53 (2)
>>Lungfish (Protopterus) 140.0 0.4-1.2 560-1680
>>Flowering plant (Arabidopsis) 0.2 31 62
>>Flowering plant (Fritillaria) 130.0 0.02 26
>>
>>
>>Notes: "Genome sizes" appear to be for haploid genomes.
>> Not in orignal paper:
>> 1) "Coding genome" = Genome size * % Coding DNA.
>> 2) A best guess.
>>=======end talk.origins excerpt====================================
>
>Brian, Thanks for this. My gut told me that humans probably didn't have the
>longest genome, but I didn't have the info. I appreciate this.
>
>However, it is humbling to be less complex than the newt.
>

Is that the newt or Newt? :-)

Seriously, though I think you may be jumping the gun a little here.
If I were to take any of the above as an indicator of complexity it
would probably be the last column for the size of the coding genome.
In this case humans are roughly the same complexity as newts,
the real winner though is the lungfish. When we get a surprising
result like this we need to take a step back and say huh? Is our
measure really measuring what we think it is if it gives such a
surprise, lungfish are more complex than humans ?!. Of course,
one of the reasons for having an objective measure to begin with
is so that we don't have to rely on intuition.

The problem with the above table as a measure of complexity
can be illustrated by considering a naughty young boy (Glenn
can identify with this :) who has to stay after school and write
"I will not talk in school" 1000 times on the blackboard. What
is the complexity (information content) of what he has written?
It is a very long message, but highly compressible so the
original length is not necessarily a good indicator of the
information content.

Thus, I am hesitant to accept the length of the coding genome
as a measure of complexity.

Now I would like to return to another question, whether the
information content (as defined by Shannon) increases during
evolution or more specifically due to a mutation. Based primarily
on what little I know about info-theory and my intuition I had
indicated in a "conversation" with Steve Jones that a random
mutation would increase the info-content. This was in response
to a bold assertion by Steve that information content would
never increase due to random mutation. There was also a
challenge to show such a case. Rummaging through my collection
of papers I managed to find some concrete evidence that mutations
do increase the Shannon IC. The reference is:

J. S. Rao, C.P. Geevan and G.S. Rao (1982). "Significance of the
Information Content of DNA in Mutations and Evolution,"
<J. Theor. Biology> 96:571-577.

Here the authors consider one point mutations and show that the
only requirement for the Shannon IC to increase is that the
frequency of the codon which mutates must be larger than the
frequency of the codon to which it is mutated.

will increase the info-content. In the words of the authors:

"Assuming that the spontaneous mutations occur randomly
at the DNA level, the more frequently occuring codons would
tend to mutate more frequently. This in turn leads to an
increase in the information content of the DNA."

This statement is actually kind of interesting. If the authors had
left it like this then their conclusion would be in doubt for the
same reason as I discussed previously with Jim Bell (this was
in regard to Denton's probability analysis). The statement above
actually assumes more than is readily apparent. They should say
"Assuming that the spontaneous mutations occur randomly and
that each codon has an equal probability of mutating ..."

The above statement appears in the discussion section and is
offered as a way of understanding the empirical results presented.
The authors analyzed a bunch of data for one point mutations in
the human haemoglobin gene and found that out of a total of
204 one point mutations, 139 resulted in an increase in IC,
2 resulted in no change in IC, 54 in a decrease with 9 being
uncertain. So, one point mutations resulted in an increase in
information content about 70% of the time.

Brian Harper | "If you don't understand
Associate Professor | something and want to
Applied Mechanics | sound profound, use the
The Ohio State University | word 'entropy'"
| -- Morrowitz
Bastion for the naturalistic |
rulers of science |