Complexity and Information Theory

Chris Cogan (ccogan@sfo.com)
Mon, 29 Nov 1999 14:15:32 -0800

This was written by a friend of mine--a physicist--and is posted here with
his permission.

Susan

---------------------------------
Complexity and Information Theory
Thursday, 07-Oct-1999 18:36:37

Introduction

One assertion that creationists have long made is that evolution contradicts
the Second Law of Thermodynamics. More recently, creationists have begun to
claim that evolution is also contrary to information theory. They claim that
mutations can only degrade the information coded in the DNA and that even
gene duplications don't really add anything new to the genome. Creationists
and other designists also frequently say that the presence of complex DNA
sequences in cells is evidence of intelligent design. In this article, I
intend to show that all these assertions are incorrect.

Creationist Questions and Assertions

.Progressive evolution is contrary to the Second Law of Thermodynamics and
is therefore scientifically impossible.
.According to information theory, information in a coded message can only be
lost. Therefore evolution is contrary to information theory.
.Mutations only cause degradation of the genome. While a few mutations might
be neutral, the most mutations are harmful and result in information loss.
.Mutations cannot add anything new; they can only lead to deterioration of
the coded information in the DNA.
.Gene duplications do not add anything new to a genome and therefore do not
add any information.
.The complexity of DNA sequences is evidence of intelligent design.

Abstract

.The creationist assertion that progressive evolution violates the Second
Law of Thermodynamics is based on faulty statements of the Second Law and
oversimplified definitions of entropy. About all the Second Law of
Thermodynamics really says about living organisms is that an organism that
stops eating will die.
.Information theory does not contradict evolution. In fact, evolution is a
natural consequence of the increasing informational entropy that information
theory says is inevitable in any error-prone communication system.
.Gene duplications, especially gene duplications that are followed by point
mutations in the original gene or the duplicate copy, do represent an
increase in the information content of a genome.
.Beneficial mutations that increase the information content of genomes have
been observed in the laboratory.
.Random sequences of DNA are indistinguishable from highly organized
sequences of DNA. Therefore complexity of DNA sequences is not evidence of
intelligent design.

Discussion

Creationists have long tried to disprove evolution using the Second Law of
Thermodynamics. According to creationists, the Second Law of Thermodynamics
says that everything should move to a state of higher "entropy" that they
define as disorder, randomness, or chaos. Some creationists will even claim
that evolution is totally unscientific because it contradicts this inviolate
principle. However, if the oversimplified statement of the Second Law and
the inaccurate definition of entropy they use were correct, then ice could
not form. If the Second Law really said what creationists often say it does,
no organism could grow given simple molecules as nutrients. Since
thermodynamic entropy has a precise mathematical definition, defining it as
simply "disorder" is bound to lead to incorrect conclusions. Misstating the
Second Law leads to further errors.

Fortunately, the real Second Law of Thermodynamics in no way forbids local
decreases in entropy as long as energy or matter is released to the
surroundings to increase the net thermodynamic entropy of the Universe as a
whole. For example, the energy released to the surroundings when ice forms
"pays" for the decreased thermodynamic entropy of the ice. Those
creationists who understand this aspect of the Second Law will still often
refuse to give up the argument by bringing up the fact that adding heat to a
squashed bug in a test tube will not cause the bug to re-assemble. They
claim that adding energy to the bug guts is insufficient to get a local
decrease in thermodynamic entropy in the form of bug resurrection. Surprise,
surprise. From a thermodynamic standpoint, trying to form ice by heating
water is equally absurd. However, they use the bug example to introduce an
additional confusion -- the idea that intelligently programmed information
is somehow a necessary condition for local decreases in thermodynamic
entropy in biological systems. While it is true that local decreases in
thermodynamic entropy require the proper conditions in addition to energy
and/or matter exchanges with the surroundings, trying to invoke the need for
information with the bug example is pointless. The Second Law of
Thermodynamics says nothing about information being a necessary condition to
get local decreases in thermodynamic entropy under any circumstances.
Information theory is a completely separate realm from thermodynamics. The
presence of intelligently programmed information is clearly not necessary
for ice formation, for example. (Note: In thermodynamics, the technical term
for something that exchanges energy and/or matter with the surroundings is
an "open system." The energy and matter can go in or out in an open system.
Animals are open systems because they take in energy in the form of food,
radiate heat to the surrounding, breathe air, and produce wastes.)

In the end, the Second Law dictates that energy must be expended or released
under the right conditions to put complex molecules together from simpler
ones. Again, the Second Law says nothing about the need for information for
this to occur. The Second Law also dictates that organisms that cease taking
in energy and cease releasing wastes will, at best, be in stasis or, more
likely, die and deteriorate because energy will become increasingly
unavailable in such closed systems. As long as organisms eat and breathe,
however, no aspect of their existence, including their evolution, will
violate the Second Law of Thermodynamics.

But what about this idea of information then? If the Second Law of
Thermodynamics says nothing about information, obviously information theory
has a lot to say about it. Does evolution, the idea that organisms can
become more complex over succeeding generations, violate information theory?
The short answer is "no." For the long answer, keep reading.

One of the reasons I kept using the perhaps annoying phrase "thermodynamic
entropy" in the above discussion of the Second Law of Thermodynamics is that
"entropy" shows up in information theory as well. However, "informational
entropy" and "thermodynamic entropy" are not analogs. Temperature is an
important aspect of thermodynamics, but there is nothing comparable to
temperature in information theory (Yockey, 1992). Thus one should keep the
concept of entropy in thermodynamics separate from the quantity known as
"entropy" in information theory. However, one thing "thermodynamic entropy"
and "informational entropy" have in common is that they both have a tendency
to increase. (Although, as we have discussed, thermodynamic entropy can
decrease locally under the right conditions.)

Another thing informational entropy has in common with thermodynamic entropy
is that informational entropy has a precise mathematical definition.
Information itself also has a precise mathematical definition in information
theory. Defining informational entropy and information content in words can
lead to inaccuracies and incorrect conclusions, especially when a layman's
understandings of the familiar terms information and meaning creep into the
discussion. However, a reasonable definition of information content can be
described as "a measure of the informational entropy of a message." For the
purpose of this article I will define informational entropy as the
complexity of the message. Therefore, for the purposes of this article,
information content of a message can be described as the minimum number of
instructions needed to describe the message with certainty. This definition
approximates the idea of the Kolmogorov-Chaitin algorithmic entropy of a
message (see Yockey, 1992). The consequence of these definitions is that a
complex message has high informational entropy and high information content.
Conversely, a highly ordered, repetitive message will have a low
informational entropy and a low information content.

Let's look as some examples.

The string of bits

010101010101010101

has low informational entropy. While you could say the string is 20 bits
long and say that is its information content, the definition I am using says
that the information content is low for the sequence because the sequence
can be described as a simple instruction such as

1. Repeat "01" 10 times

In other words, this highly ordered sequence is simple rather than complex.

On the other hand, if we look at 20 random bits

01011010101000111001

the complexity of the message is greater. The information content, as
determined by the Kolmogorov-Chaitin algorithmic entropy is also higher.
This is because it takes a longer instruction set to describe the bit
sequence.

1. Repeat "01" 2 times.

2. Repeat "10" 4 times.

3. Repeat "0" 2 times.

4. Repeat "1" 3 times.

5. Repeat "0" 2 times.

6. Repeat "1" 1 time.

The above example shows us an interesting consequence of information theory.
Namely, a random sequence is complex. Thus information theory can tell us
nothing about whether a complex sequence was designed or generated by random
chance (Yockey, 1992). Claims that complex DNA sequences are evidence of
intelligent design are therefore false.

So what does this have to do with evolution? The more astute readers have
probably already guessed. It turns out that random mutations can increase
the complexity of a sequence and thus the information content in DNA. The
repetitive sequence

ATATATATATATATATATAT

has low informational entropy and therefore low information content since it
could be described by a single instruction:

1. Repeat "AT" 10 times.

A single point mutation increases the complexity of the sequence and thus
its information content as defined by information theory because more
instructions would be required to describe the following mutated sequence:

ATATACATATATATATATAT

1. Repeat "AT" 2 times.

2. Repeat "AC" 1 time.

3. Repeat "AT" 7 times.

If we duplicate the first DNA sequence I gave, the information content does
not really increase. We could just specify a different number of repeats in
the one instruction line. However, duplicating the second DNA sequence
clearly would require more instructions to specify the new, longer sequence
with certainty. Thus duplicating a DNA sequence that is anything other than
a simple repetitive sequence means an increase in the total complexity of
the message. The information content as defined by information theory
clearly goes up. To emphasize this, look at a duplication of the second
sequence.

ATATACATATATATATATATATATACATATATATATATAT

The instruction set for this 40 nucleotide sequence is larger than the
instruction set for the 20 nucleotide sequence from which it was derived.

1. Repeat "AT" 2 times.

2. Repeat "AC" 1 time.

3. Repeat "AT" 9 times.

4. Repeat "AC" 1 time.

5. Repeat "AT" 7 times.

Alternatively, the instruction set could be expressed something like the
following:

1. Repeat "AT" 2 times.

2. Repeat "AC" 1 time.

3. Repeat "AT" 7 times.

4. Repeat Steps 1-3.

In either case, the longer sequence takes more instructions to specify and
therefore is more complex as determined by the Kolmogorov-Chaitin
algorithmic entropy. Thus its information content is higher. The creationist
objection that duplications "do not add anything new" is based on a lay
understanding of information rather than the rigorous definitions of
information theory. Their objection amounts to incredulity and is therefore
meaningless. The same can be said of creationist objections about any other
mutation. If the mutation increases the complexity of the message (as
determined by the minimum set of instructions needed to specify the message)
then information increases according to the definitions of information
theory. Personal incredulity and lay understandings of information do not
change that fact.

Since gene duplications are known to occur, the information content of a
genome can increase as a result. Point mutations in the duplicated or
original sequence have the potential of further increasing the complexity of
the DNA sequence and thus its information content. Since information theory
says that informational entropy tends to increase in a communication system
that is prone to errors (Yockey, 1992), the increasing complexity of genomes
over succeeding generations is inevitable. Thus the evolution of
increasingly complex organisms seems an unavoidable consequence of
information theory. This should be even more apparent when one realizes that
it is not just structural genes that will become more complex. The genes
that regulate body plans can also be duplicated and changed. Thus diversity
of form is inevitable. One thing moderating this increasing complexity is
natural selection. If a new, more complex genome is less fit, the inheritors
of that new, more complex genome will die off and the organisms that inherit
no mutations, fewer mutations, or different mutations will flourish. On the
other hand, if the inheritors of the new, more complex genome are more fit
in the old environment or in some new environment they chance upon, then
they will prosper and the increased complexity will be passed on with the
potential for further increases in subsequent generations. For a more
technical and in depth discussion of informational entropy and evolution, I
recommend Brooks (1984).

In case the implications of the above background in information theory are
unclear, let's look at a two examples of mutations that produced increases
in the information content of the DNA of organisms. I have chosen the
examples I did because not only does information content as defined by
information theory increase, but the increases in information content also
resulted in a benefit for the inheritors of that increased information.
Though some might consider the examples modest, they illustrate that
information content and complexity can increase over generations and that
benefits can result from this. This is all that is required to demonstrate
the plausibility of evolution and to show that the creationist assertions
that information contents can only decrease and that mutations cannot
produce benefits or new information are false.

Example 1: Gene duplications in yeast leading to more fit progeny.

Brown et al. (1998) reported that a population of baker's yeast grown in a
glucose limiting environment for a few hundred generations spontaneously
produced mutant offspring. The mutant offspring were better able to take up
the glucose from the low-glucose environment. The offspring were found to
have duplications of two different sugar transport protein genes.
Furthermore, there were more than three new genes formed from the control
region of one of the sugar transport genes with the coding region of the
second. Finally, the mutant offspring were able to out-compete individuals
of the ancestral population in pair-wise competition experiments.

This really is a great example of an information-increasing mutation leading
to progressive evolution. The new genes, combinations of a control region of
one gene with the coding region of another, represent new information as
determined by the Kolmogorov-Chaitin algorithmic entropy measure. The
objection a creationist might raise that nothing "new" was created since
both genes were already there and all you have is a combination of redundant
information is irrelevant. It is indeed a new combination that did not exist
before that increases the complexity of the yeast genome. The information
content of the genome is increased according to the rigorous definitions of
information theory. Creationists going all out in an incredulity argument
would probably say "well, it's still a yeast" or "the yeast didn't sprout
legs or anything really new." These fail for the reason the first objection
fails. Creationists can still console themselves that the evolution of new
features that even they could not deny takes much longer than a lifetime, so
it probably will never be shown directly in a laboratory experiment that,
for instance, an organism can go from "no legs" to "fully functional legs."
However, this experiment demonstrates that complexity-increasing and
information-increasing beneficial mutations do exist. The association of
changes in the fossil record with such mutations is therefore a solid
scientific inference rather than a religious leap of faith.

Example 2: Spontaneous tandem duplications in a pseudorevertants

This is really three related examples in one. Akanuma et al. (1996) were
working with a bacteria that can normally grow at temperatures up to 85
degrees Celsius. They had a mutant that was thermally sensitive due to the
deletion of 22 nucleotides in a gene coding for a protein involved in the
synthesis of leucine, an amino acid. After growing the mutant strain under
strong selective pressure (i.e. temperatures where the mutant could barely
grow), the researches isolated three strains that had improved growth at
high temperature. The new strains differed from the wild-type bacteria from
which the mutant was derived, hence the term "pseudorevertant." (A true
revertant would have the same genotype as the wild-type organism). The three
pseudorevertants all showed duplications of just part of the gene that added
6 to 21 nucleotides to the gene. The proteins coded for by these new mutant
genes, which were longer and more complex in the informational entropy
sense, also had improved catalytic activity in addition to improved thermal
stability over the protein in the thermally-sensitive strain from which the
pseudorevertants descended. Thus three different mutations, all increasing
the information content of the genomes, produced more stable and efficient
proteins.

Conclusion

Contrary to creationist contentions, evolution does not violate the Second
Law of Thermodynamics or information theory. The evolution of organisms does
not violate the Second Law of Thermodynamics any more than the growth of
individual organisms violates the Second Law. The creationist contention
that intelligent information in DNA somehow gets around the Second Law is
erroneous. The only requirement for localized decreases in thermodynamic
entropy that accompany protein synthesis or organism growth is the
requirement for an open system. Organisms are open thermodynamic systems as
long as they eat and breathe.

The real connection between entropy and evolution comes from looking at
information theory. The kind of entropy that is important to evolution is
informational entropy. Like thermodynamic entropy on a universal scale,
informational entropy tends to increase over time. Since an increase in
informational entropy means the complexity of a message increases, the
message transmitted by DNA over generations increases in complexity. The
organisms specified by the message will be more complex as a result.
Evolution thus seems to be an inevitable consequence of the properties of
information. Selection provides a filter that determines which of the more
complex messages survive. Illustrating these trends are examples of
organisms that, under specific selective pressures, experience partial or
complete duplications of genes that lead to increased information content of
genomes, enhanced fitness, and improved proteins. While these examples may
not be as dramatic as creationists demand in asking for the "proof" of
evolution that they don't really want in any case, the examples at least
falsify the creationist contentions that information-increasing beneficial
mutations do not exist.

References

Akanuma, S., Yamagishi, A, Tanaka, N., and Oshima, T. (1996) J. Bacteriol.
178:6300-6304

Brooks, D. R., Leblond, P. H., and Cumming, D. D. (1984) J. Theor. Biol.
109:77-93.

Brown, C. J., Todd, K. M., and Rosenzweig R., F. (1998) Mol. Biol. Evol.
15(8):931-42.

Yockey, H. P. (1992) Information Theory and Molecular Biology. Cambridge:
Cambridge University Press.
--------