Re: Random origin of biological information

From: Dawsonzhu@aol.com
Date: Sat Oct 07 2000 - 04:18:41 EDT

  • Next message: glenn morton: "RE: Evolution scores vs SAT scores. What else would you expect?"

    Peter (pruest@dplanet.ch) wrote:

    <<
     As a basis for discussion, I repeat the definition of the 5 different
     cases:
    > > (a) search for a meaningful letter sequence among random ones,
    > > (b) artificial selection of a functional ribozyme from a collection of
    > > random RNA sequences,
    > > (c) evolution of a functional ribozyme in RNA world organisms,
    > > (d) evolution of a protein by mutation of the DNA and natural selection
    > > of the protein,
    > > (e) a random DNA mutational walk finding a minimally active protein.
     
     The problem we keep running into is that you assume that (a) and (b) are
     representative for (d) and (e), which I contest. I group the points
     discussed under different headings, A **** etc.:
    >>

    Another major problem in these discussions is that we all come
    at this with different backrgounds with their own languages.
    Consequently, whenever these discussions surface, "words" are
    exchanged back and forth but "meaning" is usually the victim
    and almost no further understanding is reached.

    If I can summarize what you are saying here.... I think what you
    *mean* is that the biological systems are "complex". Typically, a
    researcher will focus on RNA or DNA or proteins (and only a
    specific subset of even that!), so a general knowledge of the
    totallity of this issue may very likely be a human impossibility
    given the limits of our life times. Moreover, even a subset of
    that knowledge (say RNA --- my subject area) is not achievable,
    and the same certainly goes for DNA or protein research.
    Finally, experimental work is always a reduction to a small
    subset under highly controlled environmental conditions which
    may or may not properly reflect what is really going on in an
    open dynamic system.

    Hence, one thing I see you saying here is that because
    of our ignorance, it is pure hubris to claim that we have it all
    figured out. Is this something of your point?

    Indeed, one of the dangers of unbridled arrogance is the havoc
    that can result. The Titanic is a nice example
    of late 19th/early 20th century hubris. If we make the same
    blind folly in regards to biology, we're likely to sink more
    than a ship in the wake of our haughty blaphemy. Scientific
    discovery should make us humble and hard lessons are the
    rewards that befall the proud; but the "rain" will likely fall
    on both the evil and the good.

    I can agree that the system is complex when considered as a
    whole, but I do not commit myself to any position on this matter.

    I hope that your faith is not built on the impotence of
    our current knowledge to achieve a more representative system.
    I will not *exclude* the possibility that God has left an "Intel
    inside" written somewhere for us to find, but a survey of
    400 years of science and the scientific method strongly speaks
    against proposing that God's ways are Intel's ways. The nuts and
    bolts of God's handiwork may be entirely indistinguishable from
    the creation. Indeed, a masterpiece of engineering is like that
    isn't it? We may know the maker, but no name even need be
    mentioned it is so great. And God, who even suffered his son
    to die on the cross can do far greater works than that.
    So I think it is also important to remember that relying
    on ignorance as proof of the handiwork of God is a somewhat
    precarious position to stand on.

    I can accept your argument of our blithering
    ignorance, but I'm not sure that this is a good reason to argue
    for the existence of God. Your point is more sophisticated than
    the run of the mill "god of the Gaps strategy" (caps purposefully
    inverted --- WKD), but I strongly encourage you to keep aiming for
    the "God" rather than getting to overly trapped in the "Gaps".

    <<
     A **** Is it necessary to distinguish (a) and (b) from (d) and (e)?
     
    > I raised that only as a response to your contention
    > that proteins wouldn't behave as does an RNA. I think
    > the evidence says that they do.
     
     They don't: a nucleotide is worth 2 bits, an amino acid about 4.3 bits
     which can only be selected as a whole. This may not amount to much
     difference if each mutational step is selected individually, but
     whenever you have intermediates without functional improvement, the
     probability factors are multiplied at each step. RNA can be made by
     "organisms" consisting of 1 RNA molecule each, in a soup containing RNA
     polymerase and 4 nucleotide triphosphates, whereas a selection system
     doing translation of DNA (on which mutation works) across RNA into
     protein (on which selection works) requires a bacterium. You may
     mutagenize RNA at rates of 10^(-4), perhaps also at 10^(-3) per
     nucleotide and generation, but a bacterium will hardly survive such
     treatments (the usual, i.e. naturally optimized, mutation rate is
     10^(-8)). This rate also multiplies in each time a step leads to an
     unselected intermediate.
    >>

    OK, I confess, I work with RNA. However, prions (mad cow disease
    or transmissible spongiform encephalopathies" (TSE))
    is an example of a self replicating protein. There is debate
    about whether TSE is really caused by prions, for example

    http://www.pbs.org/wgbh/nova/madcow/prions.html

    and the mechanism of replication is not at all clear....

        (for a general discussion, this might be a good
        place to start

        http://www-micro.msb.le.ac.uk/335/Prions.html

        A list of sources:
        http://www.cyber-dyne.com/~tom/quick_links.html)

    In any case, with the long list of qualifiers now
    introduced, this does indicate that the analogy
    may quite likely be there.

    It is far less clear how the complex RNA/DNA/protein
    system developed or evolved, but studying the separate
    parts in isolation is a start. It may not be so easy
    to explain how DNA and RNA developed (or codeveloped),
    but without any other information to work with, it would
    be reasonable to assume that they began separately and
    eventually integrated. It might be a cautionary note
    to realize that this might not be the case, but presently
    we don't know.

    As to information content, I really think that we have
    to get away from individual bases defining information.
    That *may* not be correct. Consider how Chinese characters
    are recorded in the computer now. Each Chinese character
    carries a specific meaning. The context of the character
    within a sentence defines whether it functions as a noun,
    an adjective, a verb, and so forth. An example would be
    for example "go" (or move) (Chinese; "dong" Japanese: "dou"
    or "ugoku" depending on the context). In both cases (be it
    C:"dong" or J:"dou") Each character is saved as two
    bytes rather than one and the way they are stored is quite
    different and mutually incompatible. (Conventional alphabets
    can be saved as one byte per characters.) However, the fact
    that more than 10000 words can even be stored
    as two bytes means that many "words" can be "compressed"
    into much smaller units if we are only concerned with
    translating the "meaning". These can later be expanded
    by appropriate decompression methods --- an analogy
    similar to the transcription of mRNA into a protein.

    By anology then, if protein folds are the "nouns", and
    linkages form "verbs", a simple instruction can be built
    from a comparatively short collection of bytes. So I would
    prefer to see the argument really look at the more
    difficult issue of what *instructions* (aka information)
    are actually *in* a completely folded structure. For
    example, with RNA, the sequence

    AAAAAAAAAAAAAAA CCCC UUUUUUUUUUUUUUU

    forms a beautiful fold of A-RNA, but I think you would agree
    that very little information is in the sequence because it is
    a repetitive sequence of letters with a simple algorithm to
    generate it. However, I could reconstruct this with

    AGUAACGAGCAUUAG GAAA UUAAUGCUCGUUACU

    you will still get a loop, but the sequence information is
    much greater than the fold information that this sequence will
    generate. This is why I am beginning to find it objectionable
    when I see these "sequence" arguments.

    I confess that I am not a diehard bioinformatic spokesman and
    maybe I am will be found completely wrong, but I think
    that finding structure and function exclusively from a
    sequence without profuse appeals
    to structural physics and chemistry is a truly mad way to do
    business. In one sense, the complexity grows and the wonder
    becomes more awesome as this structure information comes in,
    but in another way, these subtle arguments about information
    in the sequence become increasingly meaningless.
     
    Glenn's comment:.....
    > I think eventually we will find the same thing in proteins,
    > and we have found it in RNAs. The solution that life uses,
    > which seems so limiting, is merely the solution that life
    > chose early in its evolution.
     
    This is a reasonable position, however, don't forget that we
    only have one example right now. If (or when) we find life
    on other planets, we will be in a better position to say
    whether this is true or not. Moreover, we don't know how the
    "dirt" really works yet. Selection is always done with "dirt"
    that has the function already. I think it is too early to
    commit to the idea that *any* way is ok. Fitness landscapes
    may suggest this point, but they don't handle the real
    complexity either, so to what extent the complexity could
    "fine tune" these matter is still unknown.

    Peter responded:
    <<
     That different sequences of the same protein family (having recognizable
     sequence similarities) often have the same function (but in different
     organisms or environments!) is clear. The experimental evidence for
     different folds having the same function, however, is very meager if
     they occur at all (I don't know of any example, although it might be
     feasible occasionally).
    >>

    I'm not sure I understand you here.
    Protein folds are usually divided into families
    that correspond to similar functions.

    If you consider the cytochrome P450 family as an oxident,
    then there are no "different functions", but if you consider
    how many cytochrome P450 proteins there are, and the fact that
    they serve to detoxify various natural toxins that plants
    produce to protect themselves from preditors (aka humans),
    and that many such proteins involved in the metabolizing
    of drugs (which is a major issue related to dosage), then
    there are a huge number of "different functions" that
    the "same folds" can do. Of course, there will be limits.

    Furthermore, the sentence that makes the folds is like
    instructions in Japanese, or Russian, or English, or some African
    language. The "words" as such are the sequence. The "enzyme"
    (collection of folds) is the meaning.
    Don't confuse the two. Instructions given in
    an African language would go right past me, but I could understand
    the same instructions in English. They could be exactly the same
    instructions, corresponding to the "fold" --- the meaning that
    is to be understood. So "sound" (or words) and "meaning" are not
    the same thing. Some of the sentences in a language are easily
    mutated into other sentences, some are not. There will also be
    limits on the types of sentences that can be created, but that
    would suggest that they can serve "different functions".

    In fact, you say yourself to Glenn....
    <<
     You misunderstand. I said " different folds having the same activity "!
     A "fold" in this sense is a set of protein families without recognizable
     sequence similarities between them, but folding into the same tertiary
     structure.
    >>

    I suppose the family of cytochrome P450 does the same "activity", but
    it is a very loose word. We can define "activity" up to the level
    of "enzyme" under the current circumstances.

    [large snip --- sorry, this is just too long, and too cluttered to
    respond to.]

    >>
     B **** What is the frequency of active RNA's in ribozyme selection (b)?

    Glenn wrote:
    > The question is how efficient is nature at finding solutions.
    > The experiments with biopolymers that I have cited clearly show that
    > functionality occurs at a rate of 10^-13 or so. In the case of one of
    Joyce's
    > RNAs the classical probability argument would say that he had something
    like a
    > 1 chance in 10^236 of finding a useful sequence. But Joyce has been
    showing
    > that he can find functionality in a vat of 10^13 ribozymes. Surely that
    must
    > cause the anti-evolutionist pause because at that rate, there are 10^223
    or so
    > different sequences that will perform a given function. I really fail to
    see
    > how someone can not see the implication of this except for theological
    > reasons.

    Peter responded:
     To which paper are you referring? We would have to look at the details.
     Exactly the opposite conclusion was drawn in C.Wilson, J.W.Szostak,
     Nature 374 (1995), 777: "A pool of 5 x 10^14 different random sequence
     RNAs was generated... On average, any given 28-nucleotide sequence has a
     50% probability of being represented... Remarkably, a single sequence
     accounted for more than 90% of the selected pool... This result
     indicates that there are relatively few solutions to the problem of
     binding biotin." The probability of accidentally hitting on a functional
     combination composed of L nucleotides is 4^L, no matter how large N, the
     length of the randomized sequence is. Your conclusion that with N=392
     (10^236 different sequences), finding one active sequence among 10^13
     (L=22) implies that there are 10^236/10^13 = 10^223 active sequences of
     length 392 is formally correct but completely irrelevant, as the
     392-22=370 other nucleotide positions add nothing at all to the
     functionality. If L=370, instead, a completely different overall
     probability results. Your insistence on the 10^13 to 10^14 figure is
     entirely arbitrary. That this same figure keeps popping up in different
     experiments may just mean that this amount of RNA is practical to work
     with.
    >>

    I'm not quite sure I see your argument here. One single sequence out
    of the vat of "random" sequences was selected in 90% of the cases.
    On that basis you think that there is still only a very
    small number of possible sequence that will effectively result in
    selection? Is this the first argument?

    I don't really follow the rest of your argument. The "392-22=370"
    means the segment of the rybozyme that is selected is 22, and the
    rest is unchanged?

    <<
    Even in RNA selection, probabilities depend very much on the
     length of the RNA sequence selected, WHICH function is being selected,
     as well as other details. So you cannot generalize. And especially, you
     cannot draw conclusions regarding natural selection in a DNA-to-protein
     organism from results of artificial RNA selection.
    >>

    Here, your argument seems to be in regards to a length dependence
    on the segment being selected. That is a good point, this does depend
    on the unchanged existing functionality of the
    ribozyme to aid in the selection process.
    The selectivity measurements would all be consistent because
    the same active region is being tested and it ignore the rest of the
    structure. I am willing to agree that there *could* be a length
    dependence. However, whereas that adds to the complexity of the whole,
    it still does not say that it can not follow such a route. It leaves
    the problem in the state of appealing to future results
    (or null results) such as the case may be.
     
    Glenn wrote:
    > Oxytocin has only 8 amino acids. Several others have that
    > also. An enzyme does not a priori have to have a long sequence.

    Peter responds:
     Oxytocin is a biologically active peptide, not an enzyme. There are lots
     of small, but biologically active things, down to ions like Ca++. Active
     peptides usually aren't even translated from an mRNA (I'm not sure about
     oxytocin), but synthesized by rather large enzyme complexes. Enzymes and
     other biologically active proteins have sizes of usually a few hundred,
     and up to a few thousand amino acids. They often are composed of domains
     with their own tertiary structure, where domains are usually around 100
     amino acids. As an enzyme has to fold into a more or less fixed steric
     structure, in order to very specifically hold one or more substrates and
     catalyze a very specific reaction, it cannot be too short.
    >>

    OK, your appeal to "length" and "functionality" is clear.
    However, I still find it somewhat puzzling that you can firmly
    (and correctly) appeal to the structure of the protein,
    yet you still insist on the antiquated ways of viewing the
    information coded on the sequence. I think you need to consider
    structuring your arguments around minimum information required to form
    an enzyme, rather than focus on the maximum possible information
    that might be contained on a given sequence. I sense that these issues
    are "orthogonal" in your view. I'm not convinced that they are.

    Anyway, it takes me a long time to read through these things,
    try to understand the arguments, hopefully check references,
    and finally comment on it, so I'm not likely to answer any
    response for a while.

    By Grace alone do we proceed,
    Wayne
     

     



    This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 04:18:50 EDT