Hi Peter,
On Sun Sep 24 02:20:20 2000, pruest@dplanet.ch wrote:
> Glenn:
>
> You are right, there IS randomness in all these 21-letter sequences, no
> matter whether they were generated by encrypting a meaningful phrase or
> by running a random number generator, and ANY meaningful 21-letter
> message can be generated from ANY of the 26^21 possible sequences if the
> right key is found.
>
> But this fact does NOT imply that meaning or semantics can arise
> spontaneously by random processes, without some intelligent input of
> information. Either this happens when the sender encrypts his message
> and gives the key to the designated receiver, or when an eavesdropper
> searches for meaning, using very much intelligence and effort in the
> process.
>
> Do such encrypted messages really tell us anything about the process of
> evolution? There, we have a random number generator alright, and we have
> natural selection. But for finding meaning, natural selection isn't as
> patient and powerful as an intelligent cryptographer with his computer.
Once again, you are ignoring the fact that when experimenters make random
strings of RNA and then search for novel functionality, they find strings to
perform the task with a frequency of 10^-14 or so. While they are not all
perfectly efficient they do their task. When it comes to the comparision with
language, I once calculated that there are over 330,000 ways to convey the
concept that if you pick your nose you will get warts. I ceased counting
because I got tired, not because I ran out of ideas. All of these were with
sequences of 28 letters or less. If you add mispellings, which don't destroy
meaning (a technique often used in cryptography to foil frequency analysis) I
could add a thousand ways to mispell each sequence yet still retain its
meaning. Such mis-spellings would look like: waarts ar spred bi playcing thi
fingur in thi noz or wurtz arre sbred by plaising da feenger en a nos. THe
meaning is still there so the sequence performs its function. Thus there are at
least 330 million sequences for just this concept.
For the sake of argument, let us suppose that there are 300,000 different ways
to express the same concept in 21 letters or less. And lets assume that each
can be misspelled without loss of meaning in 1000 different ways (which may be
a vast underestimate. And then assume that there are a trillion different
concepts which have the same traits as what we see. (The human language is so
flexible that a trillion concepts is not impossible at all.) Then we have 10^21
different sequences which will perform a useful function. How does that
compare to the number of possible sequences? with 21 letters there are 26^21
hwihc is 10^29 so we estimate that useful sequences are found in the range of
10^-8 or one in 10 million. Is that too low a rate for random processes to
stumble upon a meaningful sequence? No. At one per second (and my computer
can do it quicker than this, we should find a meaningful sentence on average
every 3.2 years. That hardly seems out of the realm of possibility. And it
certainly is not a rate that would deter evolution over millions of years.
[snip]
> If we compare this process with the huge amount of information in
> today's biosphere, I'm pretty sure 4 billion years is by far too little
> time.
Do you have a calculation or is this merely an emotional feeling? Upon what do
you base your estimate of the total information on earth today? I would suggest
the following. We know that microbes vastly outnumber us and indeed modern
research is showing that the vast majority of living matter on earth may
actually be contained in the rocks below our feet. Let us assume that there
have been 10 million species on earth and we will give them each a 3 billion
long nucleotide genome (a bit generous). Yockey, (Molecular Evolution and
INformation Theory, p. 377-380) points out that there are a maximum of 6 bits
of information per codon. Thus, we have 20 billion bits of information max in
the genome of an species and thus there are 2 x 10^17 bits of information in
the biosphere today. I have seen suggestions that there might have been as many
as a billion different species over geologic time, so multiply the above by
100. I will assume (but justify below) that the small addition of bits from the
individuals of a species is too small to worry about (see below) Is there time
to generate that info? Of course there is. There is more than enough time. To
show it I need to take a diversion into info theory.
Consider the sequence
cttg
That represents a max of 24 bits as we discussed above from Yockey. If we
allow polyploidy to occur, and we copy this and attach it to itself, we have
the sequence
cttgcttg
Which now represents an increase of one bit of information. Why one bit?
becuase the sequence is compressible. It is ordered. Copying itself doesn't
add to the informational content. Only when you mutate it do you add
information to the system. (REMEMBER: Information is not that ill-defined word
we use in English and equivocate to the english word 'meaning'. Information is
defined by a mathematical equation and has nothing to do with 'meaning' or
specificity.) Mutations add information to the system because they make the
sequence LESS compressible.
Now, because of this fact about copying adding only 1 bit, you get 1 bit of
information for every clone on earth--plus 20 billion for the first species.
This is why the additional one bit of information from each individual organism
isn't enough to worry about.
So, if the earth has 10^19 bits of information how rapidly does that have to
develop? 100 bit per second as 10^19 is 100 times the number of seconds in 4.5
billion years. This is not a rapid rate.
It is estimated that about 1000 different protein folds exist in
> living organisms, comprising about 5000 different protein families (Wolf
> Y.I., Grishin N.V., Koonin E.V. "Estimating the number of protein folds
> and families from complete genome data", J.Molec.Biol. 299 (2000),
> 897-905). When we compare the prebiotic Earth with today's biosphere as
> a whole, each of these folds, families and individual proteins with
> their functions had to arise at least once somewhere. There is NO
> evidence that all or most of them could be derived from one or a few
> initial sequences through step-by-step mutation, each of the
> intermediates being positively selected, and this within a few billion
> years.
If you are going to say that protein folding is too complex to have just
happened, I would suggest that you take a look at the following:
"Clearly, a protein cannot sample all of its conformations (e.g., 3100 10^48
for a 100 residue protein) on an in vivo folding timescale (<1 s). To
investigate how the conformational dynamics of a protein can accommodate
subsecond folding time scales, we introduce the concept of the native topomer,
which is the set of all structures similar to the native structure (obtainable
from the native structure through local backbone coordinate transformations
that do not disrupt the covalent bonding of the peptide backbone). We have
developed a computational procedure for estimating the number of distinct
topomers required to span all conformations (compact and semicompact) for a
polypeptide of a given length. For 100 residues, we find 3 × 10^7 distinct
topomers. Based on the distance calculated between different topomers, we
estimate that a 100-residue polypeptide diffusively samples one topomer every 3
ns. Hence, a 100-residue protein can find its native topomer by random sampling
in just 100 ms. These results suggest that subsecond folding of modest-sized,
single-domain proteins can be accomplished by a two-stage process of (i)
topomer diffusion: random, diffusive sampling of the 3 × 10^7 distinct topomers
to find the native topomer (0.1 s), followed by (ii) intratopomer ordering:
nonrandom, local conformational rearrangements within the native topomer to
settle into the precise native state." Derek A. Debe, Matt J. Carlson, and
William A. Goddard III, "The topomer-sampling model of protein folding" PNAS,
Vol. 96, Issue 6, 2596-2601, March 16, 1999, p. 2596
"Or results suggest that an average sized protein domain can find its native
topology without any mechanisms to simplify the conformational search. Thus the
topomer-sampling model is fundamentally different from folding models that
insist that regions of correctly folded structure form during the early stages
of protein folding, before a structure with the native topology has been
sampled." Derek A. Debe, Matt J. Carlson, and William A. Goddard III, "The
topomer-sampling model of protein folding" PNAS, Vol. 96, Issue 6, 2596-2601,
March 16, 1999, p. 2599 "Barron and coworkers have recently used Raman optical
activity experiments to show that residues in disordered regions in molten
globule states 'flicker' between the allowed regions of the Ramachandran plot
at rates of ~10^12 s^-1." Derek A. Debe, Matt J. Carlson, and William A.
Goddard III, "The topomer-sampling model of protein folding" PNAS, Vol. 96,
Issue 6, 2596-2601, March 16, 1999, p. 2600
**
"We will show that as few as N/24 interresidue restraints reduce the number of
topologies sufficiently so that a simple residue burial score can identify the
native topology in a very small set of candidates (typically <5)." Derek A.
Debe et al, "Protein Fold Determination from Sparse Distance Restraints: The
Restrained Generic Protein Direct Monte Carlo Method," J. Phys. Chem. B. 103
(1999):3001-3008, p. 3001
"We present the generate-and select hierarchy for tertiary protein structure
prediction. The foundation of this hierarchy is the Restrained Generic Protein
(RGP) Direct Monte Carlo method. the RGP method is a highly efficient off-
lattice residue buildup procedure that can quickly generate the complete set of
topologies that satisfy a very small number of interresidue distance
restraints. For three restraints uniformly distributed in a 72-residue protein,
we demonstrate that the size of this set is ~`10^4." Derek A. Debe et al,
"Protein Fold Determination from Sparse Distance Restraints: The Restrained
Generic Protein Direct Monte Carlo Method," J. Phys. Chem. B. 103(1999):3001-
3008, p. 3001
Protein folding is much simpler than we have heretofore thought. And as usual,
it was the evolutionists who actually went out and studies the issue. The anti-
evolutionists were content to thow stones rather than do experiments.
>
> In my post, I was discussing the evolution of functional proteins in a
> DNA-RNA-protein world, not evolution in an RNA world. I never talked
> about ribozymes (I did mention ribonucleases, but these are protein
> enzymes). I know about the in vitro selection of functional ribozymes,
> but I do not consider these as valid models of evolution at all. They
> just are techniques for finding active ribozymes among as many sequences
> as possible.
It is always a bit amazing to me how no experiment is every considered to be
good evidence of evolution by those who don't like evolution. Why do you think
that is? The claim that useful variants of long biopolymers are too rare to be
found is one that is claimed over and over and over again by the anti-
evolutionary crowd, yet when one points them to an example where usefulness is
found at a relatively high level of probability, the claim is made that it
isn't evidence at all. It most assuredly is evidence that the rates of useful
biopolymers has been vastly underestimated by the anti-evolutionary crowd if
nothing else.
But if you want to talk about proteins, as you indicated above consider this:
"Examination of over 30 residues in the N-terminal domain of [lambda]
repressor reveals that a surprisingly large number of positions are quite low
in informational content. Nearly half of the positions examined in helix 1 and
helix 5 will accept nine or more different residues, and only a few positions
are absolutely conserved. THis suggests that there is a high level of
degeneracy in the folding process; that is, there are many possible seqeunces
that will specify a protein that resembles the N-terminal domain of [lambda]
repressor. Moreover, if the criterion for neutral mutations were changed from
the present requirement of 5-10% activity compared to wild type, to the less
stringent requirement that the protein simply be folded, the level of
degeneracy would presumably be even higher." p. 315
"Extrapolating to the rest of the protein indicates that there should be about
10^57 different allowed sequences for the entire 92-residue domain. Clearly,
this is an extraordinarily rough calculation, and we do not intend to suggest
that we can accurately determine how many sequences would actually adopt a
structure resempling the N-terminal domain of [lambda] repressor. However, the
calculation does indicate in a qualitative way the tremendous degeneracy in the
information that specifies a particular protein fold."~John F. Reidhaar-Olson
and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
Genetics, 7:315, 1990 p. 315
In other words, there are lots and lots of proteins which will perform the
function they studied also. Why is this never really raised and discussed by
the anti-evolutionists? The authors continue
"A method of targeted random mutagenesis has been used to investigate the
informational content of 25 residue positions in two [alpha]-helical regions of
the N-terminal domain of [lambda] repressor. Examination of the functionally
allowed sequences indicates that there is a wide range in tolerance to amino
acid substituion at these positions. At positions that are buried in the
structure, there are severe limitations on the number and type of residues
allowed. At most surface positions, many different residues and residue types
are tolerated. However, at several surface positions there is a strong
preference for hydrophilic amino acids, and at one surface position proline is
absolutely conserved. The results reveal that high level of degeneracy in the
information that specifies a particular protein fold."~John F. Reidhaar-Olson
and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
Genetics, 7:315, 1990. p. 306
Degeneracy equals lots and lots of different proteins to perform the same task.
And before you say that there is an invariant region that must be as it is in
order to assure protein function, have you ruled out that other sequences in
other protein folded structures can't perform the same thing?
Of course, mutagenizing steps generate new diversity, but
> the selection procedures most certainly are NOT natural.
Of course they aren't natural as we have had to speed up the process, or are
you advocating getting one's Ph.D when one is 2 million years old? To study
things at the rate they naturally occur would require that long in order to do
the research. This seems to be a silly suggestion that means that we don't
have to draw any conclusions until we are 2 million years old. And surprise, we
won't be able to live that long so we can always claim that we aren't seeing
evolution.
What we can
> learn from some of these experiments is the frequency of a given
> ribozyme activity among the pool of RNA sequences supplied (which
> usually is just a very tiny sample of all possible sequences, and of
> unknown bias).
Not unknown bias. The ribozymes were made randomly. Randomly means no bias. If
you have a charge of bias in their experimental procedure, then be specific and
to the point. Vague charges of bias (more in hope than in evidence) to avoid
the conclusions required by the data is a poor way of avoiding the issue.
>
> Further problems of the ribozyme work are: (1) Usually artificial
> "evolution" tapers off at activities several orders of magnitude lower
> than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
> Szostak, Science 261, 1411). (2) We don't yet know whether there ever
> was an RNA world. (3) We don't know whether it would be viable at all.
> (4) We don't know how it could have arisen by natural processes. Leslie
> E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
> (1998), 491):
All arguments from ignorance and all arguments that we will never know
therefore we can beleive what we want. Is there anything positive that you can
offer from your point of view about what data we should observe in some future
experiment that would prove that evolution is incompatible with the evidence.
By this, I don't mean the other guy's failure. I want to see if you have
anything you can predict that if found would be amazing and support your view
that randomness plays no role in living systems.
I am asking that you cease doing what all antievolutionists do, which is stone
throwing, and actually propose a workable system that can be verified. Can you
do this?
> Against this background, I think it is moot, at present, to speculate
> about the probabilities of evolutionary steps in an RNA world. We DO
> know, on the other hand, how the microevolutionary mechanisms work in
> our world. This is why I chose to deal with this only, rather than with
> ribozymes.
If you will go back and look at what I said, rather than what you thought I
said, I never applied the ribozyme data to the RNA world. In fact, in this
entire thread that last sentence is the first time I have used the term RNA
world. What I have said all along is that useful sequences are found at a far
higher probabbility than anti-evolutionists have ever admitted. Is that so
hard to understand?
>
> You are right in pointing out that Yockey revised his probability
> estimate for cytochrome c (now iso-1-cytochrome c) in his book
> "Information theory and molecular biology" (Cambridge: Cambridge
> Univ.Press, 1992). On p.254, he gives the probability of accidentally
> finding any one of the presumably active iso-1-cytochromes c as 2 x
> 10^(-44), which is 21 orders of magnitude better than his 1977 estimate
> for cytochrome c.
The reason I hit you so hard is that I know that you are in the area of biology
and write as an apologist. I have grown very tired of apologists who insist on
using 20, 30 and 40 year old data as if it is dogma and can't be change. It
shows that we are doing sloppy apologetics by not keeping up in the areas about
which we write. If you and I were 30 years behind our respective fields of
employment, I can guarentee you that we would both be unemployed. At least I
know I would be in the oil industry. If we keep up with our fields for the
sake of our employment, why don't we keep up when we are working for the Lord???
> One problem which remains is his assumption that there are no
> interdependencies between the different amino acid occupations within
> the sequence. On p.141, he even cites one observed case where the
> equivalence prediction of his procedure fails. We don't know how many
> more there are. Such interdependencies would reduce the overall
> probability massively.
>
> Furthermore, Yockey deals with modern cytochromes c (and some artificial
> derivatives) only, which are the result of a few billion years of
> optimization. A "primitive" enzyme may be more easily accessible. The
> only reason I quoted him was that we have NO information about ANY
> "primitive" enzyme.
Actually that isn't quite true. We find bits and pieces of enzymes in oil. We
know certain proteins that appear in oil when sponges evolved, others appear
when diatoms evolved, others when angiosperms evolved, and still others appear
in oils generated only after grasses appear. We are not totally blind about
past proteins.
> By the way, I would still be very interested to hear any comments about
> the model I calculated, from you, Glenn, or anyone else!
>
I thought http://www.calvin.edu/archive/asa/200009/0125.html did a good job so
I didn't see any reason to respond redundantly.
> In both of the cases you quote, an initial catalytic activity of the
> type selected for was present initially (gamma-thiophosphate transfer in
> Lorsch J.R., Szostak J.W., Nature 371 (1994), 31, and
> oligoribonucleotide linkage in Bartel D.P., Szostak J.W., Science 261
> (1993), 1411), and the same applies, as far as I know, to all other in
> vitro ribozyme selection experiments done to date.
It is present because it is found in the vat not because it was introduced by
the experimenter.
>
> Thus, on both counts, random-path mutagenization to generate a
> previously non-existing activity and natural vs. intelligent selection,
> in vitro ribozyme selection experiments are NOT valid models of the
> crucial steps in darwinian evolution, and the artificial ribozyme
> figures of 10^(-16) or 10^(-13) are irrelevant.
I think you have misunderstood what the experimenters are doing. They are not
introducing the solution to the vat.
Glenn
http://www.flash.net/~mortongr/dmd.htm
This archive was generated by hypermail 2b29 : Sun Sep 24 2000 - 11:17:26 EDT