Re: Random origin of biological information

From: Dawsonzhu@aol.com
Date: Sat Oct 07 2000 - 04:18:41 EDT

Next message: glenn morton: "RE: Evolution scores vs SAT scores. What else would you expect?"

Previous message: Dawsonzhu@aol.com: "Re: Evolution scores vs SAT scores. What else would you expect?"
Maybe in reply to: pruest@pop.dplanet.ch: "Random origin of biological information"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

<<
As a basis for discussion, I repeat the definition of the 5 different
cases:
> > (a) search for a meaningful letter sequence among random ones,
> > (b) artificial selection of a functional ribozyme from a collection of
> > random RNA sequences,
> > (c) evolution of a functional ribozyme in RNA world organisms,
> > (d) evolution of a protein by mutation of the DNA and natural selection
> > of the protein,
> > (e) a random DNA mutational walk finding a minimally active protein.

The problem we keep running into is that you assume that (a) and (b) are
representative for (d) and (e), which I contest. I group the points
discussed under different headings, A **** etc.:
>>

Another major problem in these discussions is that we all come
at this with different backrgounds with their own languages.
Consequently, whenever these discussions surface, "words" are
exchanged back and forth but "meaning" is usually the victim
and almost no further understanding is reached.

If I can summarize what you are saying here.... I think what you
*mean* is that the biological systems are "complex". Typically, a
researcher will focus on RNA or DNA or proteins (and only a
specific subset of even that!), so a general knowledge of the
totallity of this issue may very likely be a human impossibility
given the limits of our life times. Moreover, even a subset of
that knowledge (say RNA --- my subject area) is not achievable,
and the same certainly goes for DNA or protein research.
Finally, experimental work is always a reduction to a small
subset under highly controlled environmental conditions which
may or may not properly reflect what is really going on in an
open dynamic system.

Hence, one thing I see you saying here is that because
of our ignorance, it is pure hubris to claim that we have it all
figured out. Is this something of your point?

Indeed, one of the dangers of unbridled arrogance is the havoc
that can result. The Titanic is a nice example
of late 19th/early 20th century hubris. If we make the same
blind folly in regards to biology, we're likely to sink more
than a ship in the wake of our haughty blaphemy. Scientific
discovery should make us humble and hard lessons are the
rewards that befall the proud; but the "rain" will likely fall
on both the evil and the good.

I can agree that the system is complex when considered as a
whole, but I do not commit myself to any position on this matter.

I hope that your faith is not built on the impotence of
our current knowledge to achieve a more representative system.
I will not *exclude* the possibility that God has left an "Intel
inside" written somewhere for us to find, but a survey of
400 years of science and the scientific method strongly speaks
against proposing that God's ways are Intel's ways. The nuts and
bolts of God's handiwork may be entirely indistinguishable from
the creation. Indeed, a masterpiece of engineering is like that
isn't it? We may know the maker, but no name even need be
mentioned it is so great. And God, who even suffered his son
to die on the cross can do far greater works than that.
So I think it is also important to remember that relying
on ignorance as proof of the handiwork of God is a somewhat
precarious position to stand on.

I can accept your argument of our blithering
ignorance, but I'm not sure that this is a good reason to argue
for the existence of God. Your point is more sophisticated than
the run of the mill "god of the Gaps strategy" (caps purposefully
inverted --- WKD), but I strongly encourage you to keep aiming for
the "God" rather than getting to overly trapped in the "Gaps".

<<
A **** Is it necessary to distinguish (a) and (b) from (d) and (e)?

> I raised that only as a response to your contention
> that proteins wouldn't behave as does an RNA. I think
> the evidence says that they do.

They don't: a nucleotide is worth 2 bits, an amino acid about 4.3 bits
which can only be selected as a whole. This may not amount to much
difference if each mutational step is selected individually, but
whenever you have intermediates without functional improvement, the
probability factors are multiplied at each step. RNA can be made by
"organisms" consisting of 1 RNA molecule each, in a soup containing RNA
polymerase and 4 nucleotide triphosphates, whereas a selection system
doing translation of DNA (on which mutation works) across RNA into
protein (on which selection works) requires a bacterium. You may
mutagenize RNA at rates of 10^(-4), perhaps also at 10^(-3) per
nucleotide and generation, but a bacterium will hardly survive such
treatments (the usual, i.e. naturally optimized, mutation rate is
10^(-8)). This rate also multiplies in each time a step leads to an
unselected intermediate.
>>

OK, I confess, I work with RNA. However, prions (mad cow disease
or transmissible spongiform encephalopathies" (TSE))
is an example of a self replicating protein. There is debate
about whether TSE is really caused by prions, for example

http://www.pbs.org/wgbh/nova/madcow/prions.html

and the mechanism of replication is not at all clear....

(for a general discussion, this might be a good
place to start

http://www-micro.msb.le.ac.uk/335/Prions.html

A list of sources:
http://www.cyber-dyne.com/~tom/quick_links.html)

In any case, with the long list of qualifiers now
introduced, this does indicate that the analogy
may quite likely be there.

It is far less clear how the complex RNA/DNA/protein
system developed or evolved, but studying the separate
parts in isolation is a start. It may not be so easy
to explain how DNA and RNA developed (or codeveloped),
but without any other information to work with, it would
be reasonable to assume that they began separately and
eventually integrated. It might be a cautionary note
to realize that this might not be the case, but presently
we don't know.

As to information content, I really think that we have
to get away from individual bases defining information.
That *may* not be correct. Consider how Chinese characters
are recorded in the computer now. Each Chinese character
carries a specific meaning. The context of the character
within a sentence defines whether it functions as a noun,
an adjective, a verb, and so forth. An example would be
for example "go" (or move) (Chinese; "dong" Japanese: "dou"
or "ugoku" depending on the context). In both cases (be it
C:"dong" or J:"dou") Each character is saved as two
bytes rather than one and the way they are stored is quite
different and mutually incompatible. (Conventional alphabets
can be saved as one byte per characters.) However, the fact
that more than 10000 words can even be stored
as two bytes means that many "words" can be "compressed"
into much smaller units if we are only concerned with
translating the "meaning". These can later be expanded
by appropriate decompression methods --- an analogy
similar to the transcription of mRNA into a protein.

By anology then, if protein folds are the "nouns", and
linkages form "verbs", a simple instruction can be built
from a comparatively short collection of bytes. So I would
prefer to see the argument really look at the more
difficult issue of what *instructions* (aka information)
are actually *in* a completely folded structure. For
example, with RNA, the sequence

AAAAAAAAAAAAAAA CCCC UUUUUUUUUUUUUUU

forms a beautiful fold of A-RNA, but I think you would agree
that very little information is in the sequence because it is
a repetitive sequence of letters with a simple algorithm to
generate it. However, I could reconstruct this with

AGUAACGAGCAUUAG GAAA UUAAUGCUCGUUACU

you will still get a loop, but the sequence information is
much greater than the fold information that this sequence will
generate. This is why I am beginning to find it objectionable
when I see these "sequence" arguments.

I confess that I am not a diehard bioinformatic spokesman and
maybe I am will be found completely wrong, but I think
that finding structure and function exclusively from a
sequence without profuse appeals
to structural physics and chemistry is a truly mad way to do
business. In one sense, the complexity grows and the wonder
becomes more awesome as this structure information comes in,
but in another way, these subtle arguments about information
in the sequence become increasingly meaningless.

Glenn's comment:.....
> I think eventually we will find the same thing in proteins,
> and we have found it in RNAs. The solution that life uses,
> which seems so limiting, is merely the solution that life
> chose early in its evolution.

This is a reasonable position, however, don't forget that we
only have one example right now. If (or when) we find life
on other planets, we will be in a better position to say
whether this is true or not. Moreover, we don't know how the
"dirt" really works yet. Selection is always done with "dirt"
that has the function already. I think it is too early to
commit to the idea that *any* way is ok. Fitness landscapes
may suggest this point, but they don't handle the real
complexity either, so to what extent the complexity could
"fine tune" these matter is still unknown.

Peter responded:
<<
That different sequences of the same protein family (having recognizable
sequence similarities) often have the same function (but in different
organisms or environments!) is clear. The experimental evidence for
different folds having the same function, however, is very meager if
they occur at all (I don't know of any example, although it might be
feasible occasionally).
>>

I'm not sure I understand you here.
Protein folds are usually divided into families
that correspond to similar functions.

If you consider the cytochrome P450 family as an oxident,
then there are no "different functions", but if you consider
how many cytochrome P450 proteins there are, and the fact that
they serve to detoxify various natural toxins that plants
produce to protect themselves from preditors (aka humans),
and that many such proteins involved in the metabolizing
of drugs (which is a major issue related to dosage), then
there are a huge number of "different functions" that
the "same folds" can do. Of course, there will be limits.

Furthermore, the sentence that makes the folds is like
instructions in Japanese, or Russian, or English, or some African
language. The "words" as such are the sequence. The "enzyme"
(collection of folds) is the meaning.
Don't confuse the two. Instructions given in
an African language would go right past me, but I could understand
the same instructions in English. They could be exactly the same
instructions, corresponding to the "fold" --- the meaning that
is to be understood. So "sound" (or words) and "meaning" are not
the same thing. Some of the sentences in a language are easily
mutated into other sentences, some are not. There will also be
limits on the types of sentences that can be created, but that
would suggest that they can serve "different functions".

In fact, you say yourself to Glenn....
<<
You misunderstand. I said " different folds having the same activity "!
A "fold" in this sense is a set of protein families without recognizable
sequence similarities between them, but folding into the same tertiary
structure.
>>

I suppose the family of cytochrome P450 does the same "activity", but
it is a very loose word. We can define "activity" up to the level
of "enzyme" under the current circumstances.

[large snip --- sorry, this is just too long, and too cluttered to
respond to.]

>>
B **** What is the frequency of active RNA's in ribozyme selection (b)?

Glenn wrote:
> The question is how efficient is nature at finding solutions.
> The experiments with biopolymers that I have cited clearly show that
> functionality occurs at a rate of 10^-13 or so. In the case of one of
Joyce's
> RNAs the classical probability argument would say that he had something
like a
> 1 chance in 10^236 of finding a useful sequence. But Joyce has been
showing
> that he can find functionality in a vat of 10^13 ribozymes. Surely that
must
> cause the anti-evolutionist pause because at that rate, there are 10^223
or so
> different sequences that will perform a given function. I really fail to
see
> how someone can not see the implication of this except for theological
> reasons.

Peter responded:
To which paper are you referring? We would have to look at the details.
Exactly the opposite conclusion was drawn in C.Wilson, J.W.Szostak,
Nature 374 (1995), 777: "A pool of 5 x 10^14 different random sequence
RNAs was generated... On average, any given 28-nucleotide sequence has a
50% probability of being represented... Remarkably, a single sequence
accounted for more than 90% of the selected pool... This result
indicates that there are relatively few solutions to the problem of
binding biotin." The probability of accidentally hitting on a functional
combination composed of L nucleotides is 4^L, no matter how large N, the
length of the randomized sequence is. Your conclusion that with N=392
(10^236 different sequences), finding one active sequence among 10^13
(L=22) implies that there are 10^236/10^13 = 10^223 active sequences of
length 392 is formally correct but completely irrelevant, as the
392-22=370 other nucleotide positions add nothing at all to the
functionality. If L=370, instead, a completely different overall
probability results. Your insistence on the 10^13 to 10^14 figure is
entirely arbitrary. That this same figure keeps popping up in different
experiments may just mean that this amount of RNA is practical to work
with.
>>

I'm not quite sure I see your argument here. One single sequence out
of the vat of "random" sequences was selected in 90% of the cases.
On that basis you think that there is still only a very
small number of possible sequence that will effectively result in
selection? Is this the first argument?

I don't really follow the rest of your argument. The "392-22=370"
means the segment of the rybozyme that is selected is 22, and the
rest is unchanged?

<<
Even in RNA selection, probabilities depend very much on the
length of the RNA sequence selected, WHICH function is being selected,
as well as other details. So you cannot generalize. And especially, you
cannot draw conclusions regarding natural selection in a DNA-to-protein
organism from results of artificial RNA selection.
>>

Here, your argument seems to be in regards to a length dependence
on the segment being selected. That is a good point, this does depend
on the unchanged existing functionality of the
ribozyme to aid in the selection process.
The selectivity measurements would all be consistent because
the same active region is being tested and it ignore the rest of the
structure. I am willing to agree that there *could* be a length
dependence. However, whereas that adds to the complexity of the whole,
it still does not say that it can not follow such a route. It leaves
the problem in the state of appealing to future results
(or null results) such as the case may be.

Glenn wrote:
> Oxytocin has only 8 amino acids. Several others have that
> also. An enzyme does not a priori have to have a long sequence.

Peter responds:
Oxytocin is a biologically active peptide, not an enzyme. There are lots
of small, but biologically active things, down to ions like Ca++. Active
peptides usually aren't even translated from an mRNA (I'm not sure about
oxytocin), but synthesized by rather large enzyme complexes. Enzymes and
other biologically active proteins have sizes of usually a few hundred,
and up to a few thousand amino acids. They often are composed of domains
with their own tertiary structure, where domains are usually around 100
amino acids. As an enzyme has to fold into a more or less fixed steric
structure, in order to very specifically hold one or more substrates and
catalyze a very specific reaction, it cannot be too short.
>>

OK, your appeal to "length" and "functionality" is clear.
However, I still find it somewhat puzzling that you can firmly
(and correctly) appeal to the structure of the protein,
yet you still insist on the antiquated ways of viewing the
information coded on the sequence. I think you need to consider
structuring your arguments around minimum information required to form
an enzyme, rather than focus on the maximum possible information
that might be contained on a given sequence. I sense that these issues
are "orthogonal" in your view. I'm not convinced that they are.

Anyway, it takes me a long time to read through these things,
try to understand the arguments, hopefully check references,
and finally comment on it, so I'm not likely to answer any
response for a while.

By Grace alone do we proceed,
Wayne

Next message: glenn morton: "RE: Evolution scores vs SAT scores. What else would you expect?"
Previous message: Dawsonzhu@aol.com: "Re: Evolution scores vs SAT scores. What else would you expect?"
Maybe in reply to: pruest@pop.dplanet.ch: "Random origin of biological information"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 04:18:50 EDT