Glenn,
At your request, I append - after the current discussion - some
statements extracted from my following posts, from our last discussion
about "Random origin of biological information":
Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)
Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)
Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)
Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)
It's quite voluminous, though, and if Terry snips it out, check the
archive.
I am sorry this is a long post, as in your answers you often branch out
into many side trails, making the whole discussion somewhat confusing.
Yet I dare not snip out things, lest you again misunderstand me. So I'll
just comment wherever I can't agree with what you say. But remember that
the whole long argument started (04 May 2002 16:46:30 +0200) with my
simple claim that we have to distinguish between:
(I) Maximum information carrying capacity;
(II) Functional information relevant for biological systems.
Your reaction was that Shannon information is the only valid
information, and that (II) has nothing to do with information.
> > Glenn wrote (8 May 2002 21:45:23 -0700):
> >>You use the example of Yockey's work with cytochrome c below. He
>found there
> >> were 10^93 different cytochrome c's which would perform the work of that
> >> molecule. What he didn't prove was the possibility of huge
>numbers of other
> >> FAMILIES of proteins which will also do that job.
>
> I replied:
> >Agreed - in principle. Yet, if there are any other families (let alone
> >huge numbers) which will perform the same function (in the same
> >organismal environment), I find it strange that no such example has been
> >found to date, as far as I know.
>
> You replied (13 May 2002 21:03:42 -0700):
> I believe I cited an example to you earlier. The work of Szostack and
> Ellington points in that direction. This work is cited below.
That's artificial selection of RNA function in vitro, rather than
spontaneous emergence of minimal protein function without selection in
vivo (or in the prebiotic world).
> >You may say this is because just one
> >family happened to be present in the universal common ancestor and was
> >inherited by the whole biosphere. But such an argument would just push
> >the problem back to the origin of life: if there were such huge numbers
> >of unrelated possibilities, why was there an universal common ancestor
> >at all, rather than a huge number of unrelated ones?
>
> This would point to one origin of life, not multiple origins of life. That
> is why. If life evolved many times, then there should be many lineages under
> this assumption. But with one origin of life, from which all others
> evolved, then having one family is quite reasonable, explainable
>and expected.
Yes, what I wrote underlines that there was one origin of life, not
multiple ones, and having one family is expected. But the point I made
is that this fact strongly argues against your assumption that huge
numbers of synonymous families are possible. If that were the case, you
would expect multiple families, even if there were only one origin of
life.
> >> Given the work of Joyce
> >> and others (which you didn't mention in your reply) they have
>found that if
> >> you choose a function and search for it with random molecules and random
> >> mutation, you can find any given function with a probability of 1 in 10^14
> >> to 1 in 10^18. I cite this:
> >> Andrew Ellington and Jack W. Szostak "used small organic dyes as the
> >> target. They screened 10 13 random-sequence RNAs and found molecules
> >> that bound tightly and specifically to each of the dyes.
> >> "Recently they repeated this experiment using random-sequence
> >> DNAs and arrived at an entirely different set of dye-binding
> >> molecules. ...
> >> "That observation reveals an important truth about directed
> >> evolution (and indeed, about evolution in general): the forms
> >> selected are not necessarily the best answers to a problem in some
> >> ideal sense, only the best answers to arise in the evolutionary
> >> history of a particular macromolecule."~Gerald F. Joyce, "Directed
> >> Evolution," Scientific America, Dec. 1992, p. 94-95.
> >p.48 not 94
> >> And I cite this:
> >> "We designed a pool of random sequence RNAs, using the minimal ATP
> >> apatamer as a core structure. By creating a pool that was
> >> predisposed to bind ATP specifically and with high affinity we hoped
> >> to increase the likelihood of generating molecules with ATP-dependent
> >> kinase activity. The ATP apatamer core was surrounded by three
> >> regions of random sequence of 40, 30 and 30 nucleotides in length,
> >> respectively. The ATP-binding domain itself was mutagenized such
> >> that each base had a 15% chance of being non-wild-type, to allow for
> >> changes in the apatamer sequence that might be required for optimal
> >> activity. To increase the likelihood of finding active molecules, we
> >> attempted to create a pool containing as many different molecules as
> >> possible. Because it is difficult to obtain an acceptable yield from
> >> the sysnthesis of a single oligonucleotide of this lenght (174
> >> nucleotides), we made two smaller DNA templates and linked them
> >> together to generate the full-lenght DNA pool. Transcription of this
> >> DNA yielded between 5 x 10^15 and 2 x 10^16 different RNA
> >> molecules."~Jon R. Lorsch and Jack W. Szostak, "In Vitro Evolution of
> >> New Ribozymes with Polynucleotide Kinase Activity," Nature, 371,
> >> Sept. 1994, p. 31
> >> We can act as if the probability is very low to find a given
>functionality,
> >> like YECs act as if the earth is young, but acting like it isn't going to
> >> change the fact that functionality is found much more readily than
> >> anti-evolutionary activists want to believe.
> >
> >Glenn, you know very well that I am neither a YEC nor an
> >anti-evolutionary activist (cf.
> >http://www.asa3.org/ASA/PSCF/1999/PSCF12-99Held.html). All I insist on
> >is that an adequate mechanism for producing evolutionary novelty is as
> >yet elusive.
>
> This has nothing to do with YEC, it has to do with multiple families of
> biopolymers being able to perform the same function. And the probability
> argument which you are using IS an anti-evolutionary argument. Very few
> pro-evolutionists are worried about it because they know the data I just
> posted but which you failed to give comment.
Just like in Sept. 2000, when we discussed this last time, you keep
talking about RNA artificial selection in vitro, rather than protein
natural selection in vivo or prebiotic random walk emergence of minimal
function, which is very different, and I explained why. I know that very
few pro-evolutionists are worried about this, but this is no factual
argument, but an appeal to authority - which I would not expect from
you.
> Pay attention to the issue at hand. I am saying that ignoring the data I
> posted above is exactly LIKE, ANALOGOUS, SIMILAR to the way the YECs act.
> And indeed, you skipped right by it without any comment.
The issue at hand is random evolution of novel protein functionality,
and, in particular, the first minimal functionality of a novel protein,
before natural selection can set in. This has nothing to do with
artificial selection of RNA in vitro, particularly if some of the
functionality selected is already present. It's you who are evading the
issue, not I.
> >>So, given that I am mentioning this work for a second time, will
>you respond
> >> to it's import now?
> >
> >You have not mentioned these papers (if I remember correctly), but
> >similar ones, and I responded in detail. But I may do it again, giving
> >you a new example if you insist. A. Lombardi, et al., "Miniaturized
> >metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000),
> >11922, attempted to design a minimal redox enzyme, but haven't achieved
> >their goal as yet. Their dimeric undecapeptide can hold an iron atom,
> >but is unstable, being too small to shield off the environmental water.
> >The invariant of their (intelligently designed) construct amounts to at
> >least 5 specific amino acid occupations, which is too much to be
> >attainable by an evolutionary process without selection.
>
> What Lombardi is doing is not at all what Joyce, Szostak and Ellington are
> doing. Lombardi is trying to shrink the proteins down to miniature versions,
> of smaller length.
This is exactly my point, see above. These miniature proteins are the
ones which may give indications about the origin of semantic or
functional biological information, about which I was talking (case a). I
did not want to deal with improvement of a preexisting functionality
(case b), because there you may just be taking over some "information"
from the environment by means of selection. And I did not doubt there
are some RNA functions (case c) that are not very difficult to find (if
you do have RNA!), just as Joyce, Szostak and others have found, even if
you are looking for a function not yet present in the starting mix.
Again, we have no means of telling whether any information has emerged
de novo. With proteins, there is a way of dealing with semantic
information (II), cf. Yockey's book. With RNA, I know of no similarly
promising way of dealing with functional information, because residue
conservation is much less clearly definable (you have only 4
nucleotides, and there is the additional complication of base pairing).
So, I am looking for examples of case a, but you keep pointing to
examples of case b and/or case c.
> The article starts with:
> "Miniaturized proteins are peptide-based synthetic models of natural
> macromolecular systems. They contain a minimum set of constituents necessary
> for an accurate reconstruction of defined structures and for a fine-tuned
> reproduction of defined functions" A. Lombardi, et al., "Miniaturized
> metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000), 11922.
>
> Joyce and the others are producing similar functionality from similar length
> molecules. Your Lombardi article is a rabbit trail full of red herrings. It
> is a non-sequitur response. It is taking something different and throw it
> out there which confuses rather than illuminates.
No, Glenn, they are not at all similar. There are fundamental
differences between proteins and RNAs. Structure-function relationships
are completely different; and with proteins, you need the
genotype-phenotype code translation - to just mention two factors. You
questioned my concept of semantic biological information, but you refuse
to consider my definition of it. I don't see anything relevant to this
type of information in the RNA artificial selection work - although it
certainly is of interest in other respects. It's just not applicable to
what I said and you questioned.
> >> >This only works because you first give me the book, which contains all
> >> >the relevant semantic information. With the signal, you just send me
> >> >ln(3) bits of information, not lots.
> >>
> >>I believe that is exactly what I said in my note. I haven't sent
>you lots of
> >> shannon information, but I have sent you lots of colloquial information.
> >
> >[Sorry, I should have written log2(3), instead of ln(3).] How can you
> >transmit colloquial information (in the book) without any Shannon
> >information? Whatever you transmit through whatever medium can be
> >measured by Shannon information (which, however, also includes all the
> >uninteresting noise and the irrelevant part of the colloquial
> >information).
>
> Either log or ln will work with Shannon entropy. One merely uses a different
> constant.
When you are talking about a certain amount of information, it does
matter which log you use. It's only in fractions (comparisons) that the
constant drops out.
> And you need to study up on Shannon entropy because you are not
> even getting signal to noise correct. A signal is what you want to
> transmit. It is the sequence you have in your hand. That sequence does not
> have noise. Noise is what happens to the signal as it goes through the
> system. It is the differences at the reception end from what was actually
> sent. Generally people try not to transmit the noise from the start of the
> system and they only want to transmit signal. To consciously transmit noise
> would be like blowing a big fan on a microphone when Pavarotti comes up to
> sing. He won't be happy and neither will the audience. However, in the act
> of transmitting Pavarotti's voice, system noise, phase distortion, amplitude
> absorption, frequiency dispersion etc, etc, all occur and that is the noise.
I agree with all these principles. But I wasn't using "signal" and
"noise" in this technical sense at all. In your example of the book and
the pointer following it, I was talking of the colloquial information in
the book, which contained all the information you wanted me to know,
namely the operational variant you afterwards pointed to, but also the
two other variants which were not to be executed and which therefore
were redundant. Furthermore, any colloquial text usually contains some
redundant writing. This is what I called "all the uninteresting noise
and the irrelevant part", hoping you would understand what I meant. If
you transmit anything more than the absolutely minimal algorithm needed
to produce the message intended to be transmitted, you are transmitting
noise. Now, neither colloquial book contents nor a pointer consisting of
an 8-bit character is free of redundancy or noise in this sense.
> >> >You want to keep the signal small
> >> >in order to transmit it fast, therefore it cannot carry all the semantic
> >> >information you want me to have for executing your plan, so you transmit
> >> >the large amount of information beforehand and make the signal nothing
> >> >but a pointer to one of the 3 large texts you transmitted beforehand.
> >>
> >> You miss my point. You had stated that semantic information is related to
> >> SHannon information. I gave you a case where that wasn't the case. Shannon
> >> information isn't related to semantic information.
> >
> >It _is_ related, see above, just not 1-to-1. I didn't miss your point,
> >but your example doesn't work. Without the book the transmitted pointer
> >is of no use at all. Its Shannon information remains the same, but its
> >semantic information is zero. With the book and the pointer, the total
> >Shannon information transmitted is huge, the semantic information just
> >equivalent to what you wanted to have me know at the end, namely about
> >one third of the semantic information in the book.
>
> Lets use Lucien's excellent example of 328945 in decimal or 504F1 in hex,
> 1202361 in octal, or 1010000010011110001 in binary. The shannon entropy is
> different for each sequence but each sequence has the same meaning.
> If I do my math correctly, 1010000010011110001 has
> H = -K .4 ln(.4) + K .6 ln(.6)= K(.366+ .306) = .672 K
> 504F1 has
> H = -K 5 x .2 x ln(.2))= 1.609 K
> 1202361 has
> H = (.357 + .357 + .277 +.277 +.277) K = 1.545 K
> And 328945 has
> H = 6 x (1/6)ln(1/6) = 1.791 K
> Same meaning different Shannon entropies because the Shannon entropy has
> absolutely NOTHING to do with meaning, semantic meaning, function or
> anything other than a measure of how easy or hard it will be to transmit a
> given sequence.
> If you disagree, then please show the mathematics showing that Shannon
> entropy is related to meaning and how.
I don't dispute these calculations at all. But again and again, I have
emphasized that we have to distinguish between
(I) Maximum information carrying capacity;
(II) Functional information relevant for biological systems.
Shannon entropy is related to (I), not directly to (II). Meaning
(biological or otherwise) is found in (II) and is a function of a
functional system like a given language or biological system. (I), which
is a function of sequence length and alphabet size, specifies nothing
but a maximum amount of functional information (II) which can be stored
in a given sequence having a maximal capacity (I). Never have I claimed
a 1-to-1 correspondence between a "value" of (I) and a "value" of (II).
You may compute the Shannon entropy of a given DNA sequence (4-letter
alphabet) or a given protein (20-letter alphabet). You'll get different
values, even for a length ratio of 3:1. Or you may compute protein
sequence entropies by taking into considerations further restrictions
given by biological circumstances like available monomer frequencies,
sequence restrictions, amino acid frequencies at given positions within
protein families, etc. Do you want to call these Shannon entropies?
Yockey does [H.P. Yockey, "Information theory and molecular biology"
(Cambridge Univ.Press, 1992, ISBN 0-521-35005-0)]: "... the entropy of
the genome is the Shannon entropy or the Kolmogorov-Chaitin algorithmic
entropy" (p.261). The algorithmic entropy has to do with the shortest
possible algorithm generating a given sequence. "The entropy that is
applicable to the case of the evolution of the genetic message is ...
the Shannon entropy of information theory or the Kolmogorov-Chaitin
algorithmic entropy" (p.312). "... _highly organized_ sequences ... have
a large Shannon entropy and are embedded in the portion of the Shannon
entropy scale also occupied by _random sequences_" (p.313).
Yockey also shows the connection to meaningful biological information:
"Let us consider evolution as a communication system from past to
present. At some time in the history of life the first cytochrome c
appeared. As a result of drift, random walk and natural selection, this
ancestor genetic message was communicated along the dendrites of a
fractal ... representing a phylogenetic tree ... Some dendrites lead to
modern organisms, the sequence having changed with time. Thus the
original genetic message of the common ancestor specifying cytochrome c,
regarded as an input, has many outcomes that nevertheless carry the same
specificity. The evolutionary processes can be considered as random
events along an ergodic Markov chain ... that have introduced
uncertainty in the original genetic message. This uncertainty is
measured by the conditional entropy in the same manner as the
uncertainty of random genetic noise is measured ... Since the
specificity of the modern cytochrome c is preserved, although many
substitutions have been accepted, this conditional entropy may be
subtracted from the source entropy ..., to obtain the mutual entropy or
information content needed to specify at least one cytochrome c sequence
... The information content of the sequence that determines at least one
cytochrome c molecule is the sum of the information content of each
site. The total information content is a measure of the complexity of
cytochrome c" (p.132). For the mathematical formulation, please refer to
Yockey's book.
> >Homonyms may be difficult to find in biology! They occasionally occur in
> >our languages, even within the same language.
>
> See Szostak and Ellington above and Joyce. They are finding homonyms in
> biology but you don't seem to want to discuss them.
This is in vitro RNA chemistry using some biochemical molecules. It may
not have much to do with biology. The RNA world is completely
hypothetical, and we have no idea how it might have emerged. Presumed
natural evolutionary processes in it are completely different from known
evolutionary processes in living organisms.
> >> And American english doesn't
> >>have terms like 'jobworthy', or 'puckle' or 'bobbies' as English
>english and
> >>Doric english do. How do you quantify the clear and obvious (to
>me) semantic
> >> meaning when you don't know the semantic meaning. And because of this,
> >> semantic information becomes SUBJECTIVE not OBJECTIVE. It has nothing
> >> whatsoever do do with ambiguity. Puckle is a clearly defined word with no
> >> imprecision.
> >> Hearing German means nothing to me because I don't know the language. I
> >> can't even tell if someone using a gutteral language is really speaking
> >> German. I can have an idea that they are, but that doesn't mean that they
> >> are. Thus I can't OBJECTIVELY determine meaning without being in on the
> >> private agreement about what sounds mean what.
> >
> >It's the same with biological functions we don't understand yet. I never
> >claimed to understand all biological functionality, even of a single
> >enzyme. But I claim that biological molecules _do_ have precise
> >functions - and therefore semantic information -, just as linguistic
> >words do. Meaning is relative to a specific language, as you maintain,
> >and it's the same with biological "words", but this doesn't eliminate
> >information for the system that "knows" the appropriate language. And
> >that's what counts in biology.
> >
> >> >> It is the same problem as trying to determine which of the following
> >> >> sequences has meaning.
> >> >> ni ru gua wo shou bu de bu dui jiao wo hao hao?
> >> >[I skip some of your long "message"]
> >> >> 7ZPTF0)WNO1%OSYYCP20NFGlP#DOWN:AQ[OVV,JFUsyjdyj
> >> >> If you can tell which has meaning, then you can determine biological
> >> >> functionality.
> >> >
> >> >Which meaning? Which functionality? What language or code? I.e. I agree
> >> >that meaning or biological functionality is not derivable from the
> >> >sequence alone, but must be found by the knowledge of the language or
> >> >biological observations.
> >>
> >> The very fact that you have to ask what meaning, what language what code
> >> admits of the fact that meaning isn't objectively determinable.
>
> Of my meaning test you wrote:
> >I don't think you seriously require knowledge of Chinese for anyone who
> >wants to think about biology... ;-)
>
> No, one doesn't need to know chinese to think about biology. But if you are
> going to claim that meaning is related to shannon entropy or that one can
> tell meaning via shannon's entropy, then I would suggest my test is an
> appropriate test of that hypothesis. One can't predict meaning by looking at
> a sequence any more than one can predict functionality by looking at a
> single molecule. Indeed, one can't even predict IF there is a function. If
> you could then you could pass my test WITHOUT knowing Chinese, which I speak
> very poorly (hen bu hou).
Of course, today one cannot predict biological function (if any) from a
sequence alone. I never claimed this. However, as researchers are
getting better at understanding the biological systems which can "read"
and express such sequences in the appropriate functional environment, a
measure of meaningful prediction will emerge. This is what the new field
of proteomics is all about. This confirms the relationship between
information (I) and information (II).
> >> The concept is useless, empty and misleading. It does nothing
>for us other
> >> than make us feel like we are really being scientific when in fact we
> >> aren't.
> >
> >Maybe we'd better talk about this again after you had a look at Yockey's
> >book. Otherwise, we may not get any productive discussion.
>
> I have my notes from Yockey's book which includes lots of info from that
> part of the book. Why don't we give it a go?
>
> >> Can you cite an experiment which shows that the same is not applicable to
> >>proteins? I mean experimental data not merely someones opinion.
>Afterall RNA
> >> is related to DNA and DNA makes proteins.
> >
> >I did that last time we discussed this, if I remember correctly. Of
> >course, you realize that positive results (which are feasible with
> >artificial selection of RNA in vitro) are published, but this is usually
> >not done (or not possible!) with negative results (with natural
> >selection of proteins in vivo). Thus, we can at most expect to find
> >partial results, such as the one by Lombardi I cited above.
>
> Lombardi is irrelevant as I noted above. They aren't even doing the same
> thing. And why don't you refresh my memory about what you said. I haven't
> been on this list very much for the past 2 years and my memory isn't that
> good.
>
> glenn
Lombardi is very relevant. But I'll be happy to look at more relevant
work in the field of minimal amino acid placement requirements for a
specific protein function (including possible homonyms) if you can
provide the references.
Peter
...............................................................................
At your request, I append some statements extracted from my following
posts:
Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)
Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)
Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)
Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)
Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)
> [snip]
But let's look more closely at what really happens in evolution! Hubert
P. Yockey ("A calculation of the probability of spontaneous biogenesis
by information theory", J.theoret.Biol. 67 (1977), 377) compared the
then known sequences of the small enzyme cytochrome c from different
organisms. He found that 27 of the 101 amino acid positions were
completely invariant, 2 different amino acids occurred at 14 positions,
3 at 21, etc., more than 10 nowhere. Optimistically assuming that the
101 positions are mutually independent and that chemically similar amino
acids can replace each other at the variable positions without harming
the enzymatic activity, he calculated that 4 x 10^61 different sequences
of 101 amino acids might have cytochrome c activity. But this implies
that the probability of spontaneous emergence of any one of them is only
2 x 10^(-65), which is way too low to be considered reasonable (it is
unlikely that these numbers would change appreciably by including all
sequences known today). A similar situation applies to other enzymes,
such as ribonucleases.
Thus, a modern enzyme activity is extremely unlikely to be found by a
random-walk mutational process. But "primitive" enzymes, near the origin
of life, may be expected to have much less activity and to be much less
sensitive to variation. Unfortunately, before someone synthesizes a set
of "primitive" cytochromes c, we have no way of knowing the effects of
these factors.
What we can do, however, is to estimate how many invariant sites can be
expected to be correctly occupied by means of a random walk before a new
enzyme activity becomes selectable by darwinian evolution (of course,
such an invariant set may be distributed among more sites which are
correspondingly more variable, without affecting the conclusions). So,
let's start with some extremely optimistic assumptions (cf. P. R¸st,
"How has life and it's diversity been produced?" PSCF 44 (1992), 80):
Let's assume that all of the Earth's biomass consists of the most
efficient biosynthesis "machines" known, bacteria, and all of them
continually churn out test sequences for a new enzyme function, which
doesn't exist yet in any organism. They start with random sequences or
sequences having a different function. Natural selection starts only
after a minimal enzymatic activity of the type wanted is discernable. In
today's biosphere, t = 10^16 moles of carbon are turned over yearly,
there are n = 10^14 bacteria per mole of carbon, each bacterium having b
= 4.7 x 10^6 base pairs in its DNA. This yields R = tnb = 4.7 x 10^36
nucleotide replications per year on Earth.
In protein biosynthesis, there are c = 61/20 = 3.05 codons per amino
acid, a = 2.16 mutations per amino acid replacement (geometric average
of all possible shortest mutational walks in the modern code table), a
mutation rate of 1 mutation in m = 10^8 nucleotides replicated.
Therefore, r = 1/(c(3/m)^a) = 5.8 x 10^15 nucleotide replications are
required for 1 specific amino acid replacement (the factor 3 represents
the codon length in the triplet code).
In order to get s specific amino acid replacements, r^s nucleotide
replacements are needed, and the average waiting period for 1 hit
anywhere on Earth is W = (r^s)/R. For s = 1, W = 4 x 10^(-14) seconds;
for s = 2, W = 4 minutes; for s = 3, W = 40 billion years!
Thus the minimal set for a starting enzymatic activity cannot contain
more than 2 specific amino acid occupations! Of course, for the origin
of life, biosynthesis "machines" like bacteria were not yet available,
and certainly not in an amount equalling today's biomass! Does it still
sound reasonable to assume that biological information is easily
generated by random processes? Or is there something wrong with the
model underlying the above estimate?
If God used only random processes and natural selection when He created
life 3.8 billion years ago, we should be able to successfully simulate
it in a computer. You may even cheat: the genome sequences of various
non-parasitic bacteria and archaea are available. The challenge stands.
By grace alone we proceed, to quote Wayne.
..............
Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)
> [snip]
Glenn:
You are right, there IS randomness in all these 21-letter sequences, no
matter whether they were generated by encrypting a meaningful phrase or
by running a random number generator, and ANY meaningful 21-letter
message can be generated from ANY of the 26^21 possible sequences if the
right key is found.
But this fact does NOT imply that meaning or semantics can arise
spontaneously by random processes, without some intelligent input of
information. Either this happens when the sender encrypts his message
and gives the key to the designated receiver, or when an eavesdropper
searches for meaning, using very much intelligence and effort in the
process.
Do such encrypted messages really tell us anything about the process of
evolution? There, we have a random number generator alright, and we have
natural selection. But for finding meaning, natural selection isn't as
patient and powerful as an intelligent cryptographer with his computer.
In the evolutionary process, the only possible natural source of
information is the environment. But the extraction of this information
is extremely slow, probably only a fraction of a bit per generation -
when any useful mutants are available at all. And if they are, they must
penetrate the entire population before being fixed. For small selective
advantages and large populations, the mutation still risks being lost by
random drift.
If we compare this process with the huge amount of information in
today's biosphere, I'm pretty sure 4 billion years is by far too little
time. It is estimated that about 1000 different protein folds exist in
living organisms, comprising about 5000 different protein families (Wolf
Y.I., Grishin N.V., Koonin E.V. "Estimating the number of protein folds
and families from complete genome data", J.Molec.Biol. 299 (2000),
897-905). When we compare the prebiotic Earth with today's biosphere as
a whole, each of these folds, families and individual proteins with
their functions had to arise at least once somewhere. There is NO
evidence that all or most of them could be derived from one or a few
initial sequences through step-by-step mutation, each of the
intermediates being positively selected, and this within a few billion
years.
In my post, I was discussing the evolution of functional proteins in a
DNA-RNA-protein world, not evolution in an RNA world. I never talked
about ribozymes (I did mention ribonucleases, but these are protein
enzymes). I know about the in vitro selection of functional ribozymes,
but I do not consider these as valid models of evolution at all. They
just are techniques for finding active ribozymes among as many sequences
as possible. Of course, mutagenizing steps generate new diversity, but
the selection procedures most certainly are NOT natural. What we can
learn from some of these experiments is the frequency of a given
ribozyme activity among the pool of RNA sequences supplied (which
usually is just a very tiny sample of all possible sequences, and of
unknown bias).
Further problems of the ribozyme work are: (1) Usually artificial
"evolution" tapers off at activities several orders of magnitude lower
than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
Szostak, Science 261, 1411). (2) We don't yet know whether there ever
was an RNA world. (3) We don't know whether it would be viable at all.
(4) We don't know how it could have arisen by natural processes. Leslie
E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
(1998), 491):
"There are three main contending theories of the prebiotic origin of
biomonomers [1. strongly reducing primitive atmosphere, 2. meteorites,
3. deep-sea vents]. No theory is compelling, and none can be rejected
out of hand ... The situation with regard to the evolution of a
self-replicating system is less satisfactory; there are at least as many
suspects, but there are virtually no experimental data ... [There is] a
very large gap between the complexity of molecules that are readily
synthesized in simulations of the [suspected] chemistry of the early
earth and the molecules that are known to form potentially replicating
informational structures ... Several alternative scenarios might account
for the self-organization of a self-replicating entity from prebiotic
organic material, but all of those that are well formulated are based on
hypothetical chemical syntheses that are problematic ... I have
neglected important aspects of prebiotic chemistry (e.g. the origin of
chirality, the organic chemistry of solar bodies other than the earth,
and the formation of membranes) ... There is no basis in known chemistry
for the belief that long sequences of reactions can organize
spontaneously - and every reason to believe that they cannot."
Against this background, I think it is moot, at present, to speculate
about the probabilities of evolutionary steps in an RNA world. We DO
know, on the other hand, how the microevolutionary mechanisms work in
our world. This is why I chose to deal with this only, rather than with
ribozymes.
You are right in pointing out that Yockey revised his probability
estimate for cytochrome c (now iso-1-cytochrome c) in his book
"Information theory and molecular biology" (Cambridge: Cambridge
Univ.Press, 1992). On p.254, he gives the probability of accidentally
finding any one of the presumably active iso-1-cytochromes c as 2 x
10^(-44), which is 21 orders of magnitude better than his 1977 estimate
for cytochrome c. But I think most of this difference is NOT due to new
experimental evidence (e.g. new sequences), but to his refined
calculating method, taking into account adjusted probabilities for the
individual amino acids, to find their "effective number", so it is
hardly likely that this new estimate will increase any more. As 10^(-44)
is still much too low to be of any use, I didn't think it worth while to
try to present his much more complicated new procedure.
One problem which remains is his assumption that there are no
interdependencies between the different amino acid occupations within
the sequence. On p.141, he even cites one observed case where the
equivalence prediction of his procedure fails. We don't know how many
more there are. Such interdependencies would reduce the overall
probability massively.
Furthermore, Yockey deals with modern cytochromes c (and some artificial
derivatives) only, which are the result of a few billion years of
optimization. A "primitive" enzyme may be more easily accessible. The
only reason I quoted him was that we have NO information about ANY
"primitive" enzyme.
The important point is to find cases where natural selection does NOT
work (yet), because then only we can do meaningful probability
calculations, which apply only to random walks without selection of
intermediate steps. The case I considered was the origin of a new
enzymatic activity which did not exist before (anywhere in the
biosphere, e.g. a new one of those 1000 folds, and using wildly
over-optimistic assumptions). As soon as a minimal activity has arisen,
natural selection can attack and speed up evolution by unknown amounts.
This is another reason why the artificial ribozyme selection experiments
are irrelevant in this connection.
By the way, I would still be very interested to hear any comments about
the model I calculated, from you, Glenn, or anyone else!
In both of the cases you quote, an initial catalytic activity of the
type selected for was present initially (gamma-thiophosphate transfer in
Lorsch J.R., Szostak J.W., Nature 371 (1994), 31, and
oligoribonucleotide linkage in Bartel D.P., Szostak J.W., Science 261
(1993), 1411), and the same applies, as far as I know, to all other in
vitro ribozyme selection experiments done to date.
Thus, on both counts, random-path mutagenization to generate a
previously non-existing activity and natural vs. intelligent selection,
in vitro ribozyme selection experiments are NOT valid models of the
crucial steps in darwinian evolution, and the artificial ribozyme
figures of 10^(-16) or 10^(-13) are irrelevant. The apocryphal joke
about a horse's teeth is therefore quite inappropriate. We do NOT
dispose af ANY experimental or observational data about these critical
steps which would indicate whether macroevolution by natural means alone
is plausible or not - even quite apart from the origin of life itself.
.............
Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)
> [snip]
You keep misunderstanding what I argued. There are (at least) five
different types of search processes that have surfaced in our
discussion:
(a) search for a meaningful letter sequence among random ones,
(b) artificial selection of a functional ribozyme from a collection of
random RNA sequences,
(c) evolution of a functional ribozyme in RNA world organisms,
(d) evolution of a protein by mutation of the DNA and natural selection
of the protein,
(e) a random DNA mutational walk finding a minimally active protein.
I fully agree with you that both (a) and (b) are relatively easy, and
certainly successfully doable (although you may be overestimating the
fraction of letter sequences representing a recognizable meaning - but I
don't know). These are the only two types you have been dealing with up
to now. As we don't know anything about the feasibility of an RNA
world, it is too uncertain to speculate about the chances for success of
(c). But suppose there was a viable RNA world, I assume (c) might not
have been much more difficult than (b) - apart from needing more time.
So we may also agree on (c). With (d), there is an additional layer of
complexity between the mutable genotype (DNA) and the selectable
phenotype (protein), namely translation using a triplet code and a 64:21
code table. So, numerical estimates derived from (a) or (b) cannot be
applied immediately. In (a) and (b) each individual string or molecule
has to be considered an "organism", while in (d), an organism is very
much more complex, and consequently, there usually are very much fewer
of them in a population capable of exchanging information. But we know
from experiments that the process, microevolution, works. As expected,
it is much slower than (b), and its progress usually levels off quite
rapidly, because the starting enzymes we can work with are already
pretty well optimized for their job. So, I don't hesitate to concede
that (d) also is workable and has been going on for the past 3.8 billion
years.
Where we part company, for the moment, is with case (e), which you have
never considered in our discussion, although my argument focussed on
this case alone, from the beginning, with the calculated model of the
probability of a random walk leading to a minimal enzyme activity within
the geologically available time. What's so different about case (e)? As
the activity wanted does not yet exist, not even to a minimal degree,
there is nothing to select, and natural selection of intermediates in
the mutational random walk just is not possible - by definition. Both in
(a) and (b), and presumably in (c), some activity or meaning is present
in the sample collection from the beginning, or can be generated
relatively easily by mutagenization. In (d), it is present by
definition, because (e) is its precursor.
A question which remains, of course, is the amount of semantic
information at the transition point between (e) and (d). If this is just
a few bits, my problem doesn't exist. What we can do is to try to define
an upper and a lower limit for this transition point. Presumably, the
two limits are very far from each other, but this is the best we can do
for the moment. For the upper limit we may look at the amount of
semantic information required for a modern (i.e. a known) enzyme. This
is what Yockey did. To find a lower limit, we may estimate how much
semantic (specified) information can be generated in a random walk and
how much time this would take. And that's exactly what I tried to
present for discussion in my first post. But you dismissed my
(tentative) conclusion out of hand, without discussing it, by referring
to cases (a) and (b), which cannot be compared with it at all.
> [snip]
All this is just Shannon information. For a string of length L and 4
nucleotides, the maximum amount of information corresponds to 4^L
possibilities. This may be called information potential. But none of
this tell us anything about usable or semantic information or meaning in
the sense of specification of biological function. Mutations add nothing
to the semantic information until you test them by the environment.
> [snip]
Your calculation omits some very crucial details about how an organism
functions and how the biosphere communicates. Before you apply natural
selection, you have no semantic or functional information whatever. Your
string of a huge amount of Shannon information (which equals amount of
randomness or entropy) is nothing but raw material for selection, bit by
bit. First you need a functioning organism coded by the string (how do
you get that?), then you can start testing each of the other bits
against the environment in which this organism lives - a rather slow
process. Furthermore, it's no use having all these bits randomly
distributed in 10 million bags (species), or even further spread out
among the individuals of a species. Biology only works if the right
information is in the right place at the right time. Each individual
must have all the information it requires. That will slow down the
process tremendously. For each bit of information, you must consider
that it can be input into the biosphere almost anywhere on earth. One
bit improves cytochrome c in a fish on an Australian shelf, the next one
improves a kinase in a worm in Canadian soil, the next one improves an
ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
rock, etc. This may help if each of the functionalities needed is
already in place in each organism and is just made a little bit better.
To make use of the improvements, the other organisms of the same species
would have to trade their genes among themselves, which is not a matter
of seconds, nor even of a few years. And if other species should profit,
the trade between species or even higher taxa is much slower. But, most
importantly, how about the origin of new functionalities by process (e)?
This last factor might easily transcend any estimate for process (d) by
a transastronomical magnitude.
> [snip]
No, you misunderstood. You may want to read the Wolf et al. paper. Their
1000 protein folds don't concern the problem of folding specific
proteins into their native configurations. Different proteins whose
sequences are somewhat similar and which have somewhat similar functions
are grouped into protein families and these into less similar
superfamilies. Different superfamilies which, despite unrecognizable
sequence similarity fold into the (almost) same 3-dimensional structure
(or "fold") belong to the same "fold". And of these folds, there are an
estimated 1000. How each individual sequence folds into its own specific
native conformation when exiting from the ribosome is an entirely
different question. So I'll just snip out your comments on this.
> [snip]
This fits in very nicely with Yockey's cytochrome c estimate. Now, using
his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
10^114 possible sequences, and the probability of finding any one of the
10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!
> [snip]
> In other words, there are lots and lots of proteins which will perform the
> function they studied also. Why is this never really raised and discussed by
> the anti-evolutionists?
At least for the last 20 years, this has been taken into consideration
by critics of evolution (e.g. in my papers at the 1988 Tacoma, WA,
conference about Sources of Information Content in DNA, and in PSCF 44
(June 1992), 80). But nevertheless, even with this caveat, asking
questions about the feasibility of evolution is not accepted in the
established big journals (in the early 80's, I tried J. of theoretical
Biology, Nature, Origins of Life, Philosophy of Science, and a German
journal, all in vain). It is not politically correct to question the
possibility of evolution. The editors' justifications of refusal were
quite evasive. As you see, even the huge numbers of possibly active
sequences are by far not sufficiently huge.
> [snip]
> absolutely conserved. The results reveal that high level of
>degeneracy in the
> information that specifies a particular protein fold."~John F. Reidhaar-Olson
> and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
> helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
> Genetics, 7:315, 1990. p. 306
These artificial mutations were targeted intelligently to specific small
sequence regions to be tested, which makes it practical to recover
biologically active mutants. Thus, this is not an experimental
simulation of darwinian evolution. If you want to use these results for
probability estimates, you have to factor this in.
> [snip]
> And before you say that there is an invariant region that must be as it is in
> order to assure protein function, have you ruled out that other sequences in
> other protein folded structures can't perform the same thing?
The sequences of the same fold are already taken into consideration in
the 10^57 sequences. Whether there are sequences of different folds with
the same activity is not known. If I remember correctly, cases of
different folds having the same activity are extremely rare, if they
exist at all.
> [snip]
What I meant with "unknown bias" is this: the starting pool of RNAs was
certainly about random (within the limits of biochemical precision), but
this was only a minute fraction of all possible sequences. Whatever is
contained therein has a greater chance of being selected than sequences
not in the starting pool, which just might, but need not, be formed by
later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
instance, indicate that their starting pool already contained the ATP
binding site required, "which greatly increased the odds of finding
catalytically active sequences". Furthermore, they suggest it would be
better to mix, match and modify small functional domains.
> [snip]
> > Further problems of the ribozyme work are: (1) Usually artificial
> > "evolution" tapers off at activities several orders of magnitude lower
> > than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
> > Szostak, Science 261, 1411). (2) We don't yet know whether there ever
> > was an RNA world. (3) We don't know whether it would be viable at all.
> > (4) We don't know how it could have arisen by natural processes. Leslie
> > E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
> > (1998), 491):
>
> All arguments from ignorance and all arguments that we will never know
> therefore we can beleive what we want. Is there anything positive
>that you can
> offer from your point of view about what data we should observe in
>some future
> experiment that would prove that evolution is incompatible with the evidence.
> By this, I don't mean the other guy's failure. I want to see if you have
> anything you can predict that if found would be amazing and support your view
> that randomness plays no role in living systems.
The don't-knows are Orgel's! (you clipped out his very relevant comments
I quoted.)You don't want to claim he hasn't done anything worth while,
during several decades of work, to solve these questions, do you? It's
not just one "guy's failure", but the failure of a whole field of
research, in ALL research groups having had a try at it. Orgel is one of
the leaders in the field.
> [snip]
.............
Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)
> [snip]
As a basis for discussion, I repeat the definition of the 5 different
cases:
> > (a) search for a meaningful letter sequence among random ones,
> > (b) artificial selection of a functional ribozyme from a collection of
> > random RNA sequences,
> > (c) evolution of a functional ribozyme in RNA world organisms,
> > (d) evolution of a protein by mutation of the DNA and natural selection
> > of the protein,
> > (e) a random DNA mutational walk finding a minimally active protein.
The problem we keep running into is that you assume that (a) and (b) are
representative for (d) and (e), which I contest. I group the points
discussed under different headings, A **** etc.:
A **** Is it necessary to distinguish (a) and (b) from (d) and (e)?
> I raised that only as a response to your contention that proteins wouldn't
> behave as does an RNA. I think the evidence says that they do.
They don't: a nucleotide is worth 2 bits, an amino acid about 4.3 bits
which can only be selected as a whole. This may not amount to much
difference if each mutational step is selected individually, but
whenever you have intermediates without functional improvement, the
probability factors are multiplied at each step. RNA can be made by
"organisms" consisting of 1 RNA molecule each, in a soup containing RNA
polymerase and 4 nucleotide triphosphates, whereas a selection system
doing translation of DNA (on which mutation works) across RNA into
protein (on which selection works) requires a bacterium. You may
mutagenize RNA at rates of 10^(-4), perhaps also at 10^(-3) per
nucleotide and generation, but a bacterium will hardly survive such
treatments (the usual, i.e. naturally optimized, mutation rate is
10^(-8)). This rate also multiplies in each time a step leads to an
unselected intermediate.
> [snip]
That different sequences of the same protein family (having recognizable
sequence similarities) often have the same function (but in different
organisms or environments!) is clear. The experimental evidence for
different folds having the same function, however, is very meager if
they occur at all (I don't know of any example, although it might be
feasible occasionally).
> > This is what Yockey did. To find a lower limit, we may estimate how much
> > semantic (specified) information can be generated in a random walk and
> > how much time this would take. And that's exactly what I tried to
> > present for discussion in my first post. But you dismissed my
> > (tentative) conclusion out of hand, without discussing it, by referring
> > to cases (a) and (b), which cannot be compared with it at all.
>
> It ignores the possibility I discuss above about different families of
> solutions. With the RNA experiments, we have already seen the same experiment
> run twice yeilding totally different sequences that perform the same function
> exactly as I illustrated in the sentences above.
RNAs aren't proteins, although both can be specified by DNA. And
sentences can be compared even less with proteins. They are analogous
because sentences, RNA, and proteins all may contain coded information,
but an analogy may not be used to transfer ALL details. Christ being a
vine doesn't mean he is literally rooted in the ground.
> [snip]
B **** What is the frequency of active RNA's in ribozyme selection (b)?
> The question is how efficient is nature at finding solutions.
> The experiments with biopolymers that I have cited clearly show that
> functionality occurs at a rate of 10^-13 or so. In the case of one
>of Joyce's
> RNAs the classical probability argument would say that he had
>something like a
> 1 chance in 10^236 of finding a useful sequence. But Joyce has been showing
> that he can find functionality in a vat of 10^13 ribozymes. Surely that must
> cause the anti-evolutionist pause because at that rate, there are
>10^223 or so
> different sequences that will perform a given function. I really fail to see
> how someone can not see the implication of this except for theological
> reasons.
To which paper are you referring? We would have to look at the details.
Exactly the opposite conclusion was drawn in C.Wilson, J.W.Szostak,
Nature 374 (1995), 777: "A pool of 5 x 10^14 different random sequence
RNAs was generated... On average, any given 28-nucleotide sequence has a
50% probability of being represented... Remarkably, a single sequence
accounted for more than 90% of the selected pool... This result
indicates that there are relatively few solutions to the problem of
binding biotin." The probability of accidentally hitting on a functional
combination composed of L nucleotides is 4^L, no matter how large N, the
length of the randomized sequence is. Your conclusion that with N=392
(10^236 different sequences), finding one active sequence among 10^13
(L=22) implies that there are 10^236/10^13 = 10^223 active sequences of
length 392 is formally correct but completely irrelevant, as the
392-22=370 other nucleotide positions add nothing at all to the
functionality. If L=370, instead, a completely different overall
probability results. Your insistence on the 10^13 to 10^14 figure is
entirely arbitrary. That this same figure keeps popping up in different
experiments may just mean that this amount of RNA is practical to work
with. Even in RNA selection, probabilities depend very much on the
length of the RNA sequence selected, WHICH function is being selected,
as well as other details. So you cannot generalize. And especially, you
cannot draw conclusions regarding natural selection in a DNA-to-protein
organism from results of artificial RNA selection.
C **** In what sense is meaning compatible with randomness?
> > I fully agree with you that both (a) and (b) are relatively easy, and
> > certainly successfully doable (although you may be overestimating the
> > fraction of letter sequences representing a recognizable meaning - but I
> > don't know). These are the only two types you have been dealing with up
> > to now. As we don't know anything about the feasibility of an RNA
> > world, it is too uncertain to speculate about the chances for success of
> > (c).
>
> As I have said at least twice before, I am not discussing the RNA
>world. I am
> merely pointing out that the classical anti-evolutionary position
>which claims
> (erroneously) that randomness is incompatible with meaning or specificity is
> clearly false.
Randomness, entropy, Shannon information deal with statistical
properties of sequences. From the sequence alone, it is impossible to
say whether it has meaning, specificity, biological functionality. This
must be tested in a replicating system or organism. Randomness does NOT
generate meaning, we need selection to recognize meaning. If we have a
mutational path consisting of one or more steps, AND none of the
intermediate mutants (for paths of >1 steps) represents an improvement
on the wild type (starting sequence), the increase in meaning or
functional information corresponds to the improvement observed in the
final mutant of the path with respect to the wild type. Where does this
information increment come from? From the information contained in the
environment? Did it emerge accidentally? From God's guidance? It's
impossible to be sure as far as science is concerned. All we can do is
calculate the probability of the random walk mutational path; if it is
something like 10^(-13) or larger, we hardly care. If it's 10^(-130),
would you like to say there is no problem about randomness generating
meaning?!
> [snip]
D **** Is darwinian evolution (d) faithfully modelled by ribozyme
selection (b)?
> > In the evolutionary process, the only possible natural source of
> > information is the environment. But the extraction of this information
> > is extremely slow, probably only a fraction of a bit per generation -
> > when any useful mutants are available at all. And if they are, they must
> > penetrate the entire population before being fixed. For small selective
> > advantages and large populations, the mutation still risks being lost by
> > random drift.
>
> Having looked at informational flow calculations for the genome, like those
> Spetner published in Nature in 1964, I am not at all impressed with his
> calculations. There is most assuredly more than 1 bit of
>information generated
> per generation. This is especially true in long sequences in which many
> mutations occur during a generation.
How do you know? Each intermediate organism must be viable in order to
contribute to the evolution of its genome. In bacterial evolution
experiments you sometimes find single-step mutants being selected, but
double-step mutants through a non-selected intermediate have not been
documented, to my knowledge. With RNA, viability in a non-selected state
is not an issue. Multiple mutations in the same RNA molecule between
selections (in vitro) are easily possible, but whether they are in the
DNA coding for a bacterium has not been demonstrated. It is just
assumed.
> > Furthermore, it's no use having all these bits randomly
> > distributed in 10 million bags (species), or even further spread out
> > among the individuals of a species. Biology only works if the right
> > information is in the right place at the right time. Each individual
> > must have all the information it requires. That will slow down the
> > process tremendously. For each bit of information, you must consider
> > that it can be input into the biosphere almost anywhere on earth. One
> > bit improves cytochrome c in a fish on an Australian shelf, the next one
> > improves a kinase in a worm in Canadian soil, the next one improves an
> > ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
> > rock, etc. This may help if each of the functionalities needed is
> > already in place in each organism and is just made a little bit better.
> > To make use of the improvements, the other organisms of the same species
> > would have to trade their genes among themselves, which is not a matter
> > of seconds, nor even of a few years. And if other species should profit,
> > the trade between species or even higher taxa is much slower.
>
> First off, bacteria have sex with other bacteria of different species all the
> time. There is a blizzard of genetic material that flows through the
> biological world, trading genomes and genes. (see La Ronde, Scientific
> American June 1994 P. 28-29
This reference is incorrect: I couldn't find it. I am not disputing that
genes are traded rapidly among bacteria. What I emphasized is that a NEW
mutant gene representing an improvement, which first is present as only
ONE molecule in the biosphere, has to spread to all individuals and to
all species which are to profit from it. We are talking of thousands of
positive mutations required to build up each of thousands of efficient
proteins, the set of which is basically the same today in virtually all
species. Your simple calculation is not realistic, because you assume
that the moment a helpful mutation is available anywhere on earth it can
be used immediately as a basis for further improvements anywhere else on
earth.
> > A question which remains, of course, is the amount of semantic
> > information at the transition point between (e) and (d). If this is just
> > a few bits, my problem doesn't exist. What we can do is to try to define
> > an upper and a lower limit for this transition point. Presumably, the
> > two limits are very far from each other, but this is the best we can do
> > for the moment. For the upper limit we may look at the amount of
> > semantic information required for a modern (i.e. a known) enzyme.
>
> Oxytocin has only 8 amino acids. Several others have that also. An enzyme
> does not a priori have to have a long sequence.
Oxytocin is a biologically active peptide, not an enzyme. There are lots
of small, but biologically active things, down to ions like Ca++. Active
peptides usually aren't even translated from an mRNA (I'm not sure about
oxytocin), but synthesized by rather large enzyme complexes. Enzymes and
other biologically active proteins have sizes of usually a few hundred,
and up to a few thousand amino acids. They often are composed of domains
with their own tertiary structure, where domains are usually around 100
amino acids. As an enzyme has to fold into a more or less fixed steric
structure, in order to very specifically hold one or more substrates and
catalyze a very specific reaction, it cannot be too short.
> So tell me what exactly is your definition of 'primitive' enzymes? How would
> you recognize one? What objective criteria would you use? Is Oxytocin
> primitive because it is short? Or are the enzymes of cyanobacteria primitive
> because cyanobacteria are so old?
A "primitive" enzyme (or enzyme of "minimal activity") would be just
above the transition from process (e) to (d). Such transitions would
happen anytime during the history of life, whenever a basically novel
activity was emerging, from the origin of life to the origin of humans.
If we had such an enzyme, we would detect that it has a small activity,
but we still would not know if a precursor was already active (apart
from a probably unpracticable exhaustive mutant search). To find out by
what mutational random-walk it originated would probably be hard.
E **** Some misunderstandings in the scientific realm:
> > > "Extrapolating to the rest of the protein indicates that there should be
> > > about 10^57 different allowed sequences for the entire 92-residue domain.
> >
> > This fits in very nicely with Yockey's cytochrome c estimate. Now, using
> > his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
> > 10^114 possible sequences, and the probability of finding any one of the
> > 10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!
>
> And once again, it ignores the data found by Szostak and colleagues that a
> repeat of the same selection experiment yields vastly different sequences to
> solve the same biological problem.
You yourself brought in this example (Reidhaar-Olson & Sauer, 1990), in
order to refute Yockey's result. Szostak's ribozyme results are a
different case.
> [snip]
> > Whatever is
> > contained therein has a greater chance of being selected than sequences
> > not in the starting pool, which just might, but need not, be formed by
> > later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
> > instance, indicate that their starting pool already contained the ATP
> > binding site required, "which greatly increased the odds of finding
> > catalytically active sequences". Furthermore, they suggest it would be
> > better to mix, match and modify small functional domains.
>
> The ATP is irrelevant as far as the frequency of the functionality is
> concerned.
You are contradicting Lorsch & Szostak concerning their own work!
> > The don't-knows are Orgel's! (you clipped out his very relevant comments
> > I quoted.) You don't want to claim he hasn't done anything worth while,
> > during several decades of work, to solve these questions, do you? It's
> > not just one "guy's failure", but the failure of a whole field of
> > research, in ALL research groups having had a try at it. Orgel is one of
> > the leaders in the field.
>
> So we base our position upon other people's failure. Most scientific theories
> are based upon positive experimental support, not other people's
>failure. This
> is the wrong approach for Christians to take. If we depend upon failure, what
> happens when they finally succeed?
If the ribozyme selection results would constitute any positive
experimental support for the early evolution of life, do you think Orgel
would not see it?
> [snip]
F **** Some misunderstandings in the theological/philosophical realm:
> > Your calculation omits some very crucial details about how an organism
> > functions and how the biosphere communicates. Before you apply natural
> > selection, you have no semantic or functional information whatever. Your
> > string of a huge amount of Shannon information (which equals amount of
> > randomness or entropy) is nothing but raw material for selection, bit by
> > bit. First you need a functioning organism coded by the string (how do
> > you get that?), then you can start testing each of the other bits
> > against the environment in which this organism lives - a rather slow
> > process.
>
> I think you keep trying to mix the problem here. I started this thread merely
> by pointing out that randomness isn't incompatible with semantical meaning. I
> think I proved this. Now you want to change it to the origin of
>life where you
> think you have a better defense for your case. First off, we don't need a
> functioning organism to to have selection. We merely need
>reproduction. Now I
> will freely admit I don't know how the raw molecules would
>reproduce and right
> now no one else does either. However, to claim that our lack of knowledge is
> equivalent to a law of nature seems to rest your case on our continued
> ignorance. History has shown over and over again that that is a weak place to
> rest one's case.
No, I want to focus on case (e), the initial, random-walk search for a
minimal enzymatic activity in a fully functional DNA-RNA-protein
organism in which darwinian evolution works. I just have to constantly
fend off all your linguistic (a) and in vitro ribozyme (b)
probabilities. Not because I don't like them, but because there really
are crucial differences between the cases (a) to (e), see at the
beginning of this post. I never contested that (Shannon) randomness is
compatible with semantical meaning (phenotypically tested). We need a
functioning organism for cases (d) and (e), just reproduction for (b).
> [snip]
................................................................................
-- Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland <pruest@dplanet.ch> - Biochemistry - Creation and evolution "..the work which God created to evolve it" (Genesis 2:3)
This archive was generated by hypermail 2b29 : Sat May 18 2002 - 21:36:25 EDT