RE: Polyphyly and the origin of life

From: Peter Ruest (pruest@pop.mysunrise.ch)
Date: Thu May 23 2002 - 12:41:09 EDT

Next message: Adrian Teo: "RE: Catholic Church and Morality"

Previous message: Loren Haarsma: "Randomness (was Re: Science Education and the Church)"
Maybe in reply to: Peter Ruest: "Polyphyly and the origin of life"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Glenn Morton wrote (20 May 2002 21:59:02 -0700):
>
> >From: Peter Ruest [mailto:pruest@pop.mysunrise.ch]
> >Sent: Monday, May 20, 2002 7:54 AM
> >> >That's artificial selection of RNA function in vitro, rather than
> >> >spontaneous emergence of minimal protein function without selection in
> >> >vivo (or in the prebiotic world).
> >>
> >> The interesting thing I notice is that in your first statement, the
> >> statement which I criticized, you have no requirement for me to
> >show that it
> >> arose out of nothing and no restriction to non-vitro experiments.
> >
> >Do you mean "in vivo", rather than "in vitro"?
>
> No, I mean in vitro. In vitro means in glass. One could put it in a coke can
> I presume.

Hi Glenn
Sorry, I miscounted the logical negations in your clause "no restriction
to non-vitro". My fault! I know what in vitro means. I did this for many
years.

> >My first statement which you criticized (1 May 2002 20:30:36 -0700) was:
> >>>The amount of meaningful or semantic information contained in a system
> >>>may be defined as the minimal length of an algorithm capable of
> >>>specifying it (M.V. Volkenstein, "Punctualism, non-adaptionism,
> >>>neutralism and evolution", BioSystems 20 (1987), 289). This would
> >>>exclude all features irrelevant for meaning or functionality. The
> >>>meaningful information contained in today's biosphere may be
> >>>approximated by a (purely theoretical) minimal set of genome parts
> >>>"streamlined" to include the code for whatever is really required for
> >>>the organisms represented in the biosphere, but nothing else. Its amount
> >>>is such that the improbability of its generation by random-variation /
> >>>natural-selection processes, starting with a prebiotic universe, is
> >>>vastly transastronomical.
> >
> >I clearly was talking about the origin of life (or of the biosphere as a
> >whole), with its crucial first living system(s) originating out of a
> >prebiotic environment, in which there is no biological information (thus
> >"nothing"). As each in vitro experiment using artificial selection
> >artificially introduces plenty of functional information, it cannot be
> >used (at least not without an appropriate correction) for estimating the
> >possible amount of information II originating spontaneously. Thus, the
> >requirements were clearly stated.
>
> No they don't introduce functional information. If you recall, some of the
> quotes I posted said that they had randomized the sequence so that .25 of
> the sequences were totally random. That is degrading functional information
> (assuming such information really exists).

Functional information can be introduced (1) by starting with some
biological function already contained in the RNA being subjected to
evolution. I agree that this is not the case if you completely randomize
the starting RNA. You concede that not in all experiments this is the
case. But functional information is also introduced (2) in each
artificial selection experiment by the selection itself (for each
selection step this amounts to at least 1 bit, a yes/no decision), and
(3) in each natural selection of a functional molecule in vivo.

> >This refers to the general problem of information II. But in order to
> >find any estimate of amounts of information II, we have to consider much
> >simpler systems. And the only one I could think of to-date, which might
> >offer a hope of getting at such values, is the origin of a novel
> >enzymatic functionality in its minimal form, just before natural
> >selection sets in. This situation, of course, can only be investigated
> >in a modern biological system including genetic coding, transcription,
> >translation, folding, and probably only in the context of a large family
> >of orthologous proteins, such as the cytochromes c.
>
> So I am waitg for you to tell me the 'amounts of information II any
> sequence has. To date you can't define how one would estimate such a
> quantity. Indeed, you have said it isn't possible, but now you say it is.
> What is the equation for estimating the 'amount of information II"

I have described and discussed this problem, in detail, in P. Ruest,
"How has life and its diversity been produced", PSCF 44/2 (June 1992),
80-94 (available at
http://www.asa3.org/ASA/PSCF/1992/PSCF6-92Rust.html), in the chapter
"Semantic information".

Yockey has shown how an invariant for an orthologous protein family is
calculated: this is what I call information II, if it corresponds to the
first ancestral sequence of the family at the time when the first
selectable functionality emerged (the minimal-functionality protein).
The invariants in my example below, transformed into entropies according
to Yockey's formula, would be information II. Unfortunately, today, no
minimal-functionality proteins exist in the biosphere. They would have
to be synthesized artificially.

Here I give an example calculation for an estimate of the maximum
feasible length of a random-walk (i.e. before selection can set in)
mutational path needed, on the average, to reach a specified invariant
for an enzyme family in vivo (extremely optimistic parameters assumed),
or the average time needed to find such an invariant of a given size for
a novel activity:
n=3 nucleotides/codon
d=3.05 codons/aminoacid [={(4^n)-3}/20, with 3 stop codons]
j=2.16 mutations per specified aminoacid replacement (geom.average)
m=10^(-8) mutations per nucleotide replicated -->
r=average number of nucleotide replications required
   per specified amino acid replacement:
r=1/[d{(nm)^j}]=5.8*10^15
C=10^16 moles carbon/year metabolized in today's biosphere
B=10^14 bacteria per mole carbon
N=4.7*10^6 nucleotide pairs/bacterium assumed -->
R=number of nucleotides replicated per year in the biosphere:
R=CBN=4.7*10^36
s=invariant=number of specified aminoacid replacements required
   for minimal novel enzymatic activity -->
t=time required:
t=(r^s)/R years
For s=1: t=4*10^(-14) seconds
for s=2: t=4 minutes
for s=3: t=40 billion years
Known invariants:
s=~30 for simple enzymes (cytochrome c, ribonuclease)
s=~5 for specific enzyme adaptations
   (e.g. stomach lysozyme for foregut fermenters)
I am receptive for any possibly better modelling ideas.

> >As I told you before, information II cannot be determined from
> >artificial selection in vitro because we don't know how much information
> >is artificially introduced into the molecules selected. RNA can only
> >model the purely hypothetical RNA world, of which we don't even know
> >whether it is viable at all - even ignoring the problem of its
> >initiation. And RNA selection experiments are done with the help of
> >biological protein-enzymes! Thus, again, there is no possibility of
> >estimating biologically relevant information II. Remember that, as you
> >have emphasized yourself, information II is absolutely undefined apart
> >from the (right, biological!) context of a molecule considered.
>
> If you can't define 'biologically relevant information II', then you have
> nothing worth speaking of in science. YOu have a belief, and nothing more.
> Science demands definitions which are objective. The only way I can see that
> you can prove objectivity in your definition of information II is for you to
> determine which sequence contains it, something you keep avoiding.

I think I have operationally defined 'biologically relevant information
II', even though no mathematical definition exists. The problem is that
proof is available neither for the existence of such information II, NOR
FOR ITS NON-EXISTENCE. Its non-existence is just ASSUMED by most people,
including ALL atheists - for obvious reasons. Don't you think that its
discussion might be a legitimate endeavour among those who don't just
reject it out of hand - out of a philosophical prejudice?

For theological reasons, I suspect that the existence of information II,
as I defined if, might never be strictly provable. Freedom of the will
with respect to choosing to believe in God would seem to imply that, in
principle, no stringent proof that God exists is possible. This would
also imply that we'll NOT be able to PROVE that evolution of some novel
functionalities is not possible. Of course, the opposite proof, namely
that spontaneous emergence of all existing functionalities is feasible,
is also NOT possible. But should we, for this reason, stop and forbid
any thinking about the feasibility of spontaneous emergence of life and
its complexity? I think not.

If you don't agree, there is no use for us to continue this discussion.
But at least, you should be ready to acknowledge that your way of
argumenting is absolutely one-sided, and therefore unscientific, in
requiring strict proofs of those proposing the existence of information
II, but requiring nothing of the sort of those claiming the opposite. If
we look at the evidence available, I think it points much more towards
the reality of such information. Trying to evade this issue, by
dogmatically defining it offside or outside of science, doesn't strike
me as very openminded.

> >Under the designation "multiple families", you are mixing up some
> >fundamentally different concepts (orthologs and paralogs are
> >subgroupings of homologs (which have significantly similar sequences)):
> >(1) orthologs in different species are derived by common ancestry from
> >the same ancestral protein;
> >(2) paralogs in the same or different species are derived from
> >independent evolution from a gene duplication in some ancestral species;
> >(3) xenologs are homologs obtained by lateral gene transfer;
> >(4) different families are sets of orthologs, where the different sets
> >are (usually) paralogs of each other (domain shuffling may introduce
> >additional levels of complexity between paralogous families, and
> >supersets of families more distantly related may form superfamilies).
>
> No, I am not mixing up these concepts. I am speaking of totally different
> sequences which have ABSOLUTELY NO RELATIONSHIP WITH THE OTHER SEQUENCE. I
> know what I am saying and it isn't what you are trying to make me say.
>
> I wrote:
> >> No, not at all. There is indeed evidence of multiple origins of life and
> >> then a period of mixing of genomes among the early metazoans.
> >
> >You mean protozoans or prokaryotes, rather than metazoans.
>
> You are correct, I miswrote there.
>
> >It's the
> >protozoans (unicellular organisms), and in particular the prokaryotes
> >(without a nucleus: the archaea and bacteria), which exchanged genes,
> >possibly quite liberally. This became much more difficult with metazoans
> >(multicellular organisms).
> >
> >As far as a possible multiple origin of life is concerned, we don't have
> >anything beyond speculation. The evidence pointing to multiple lateral
> >gene transfers is no evidence at all for multiple origins of life.
>
> It is evidence that one can't automatically assume a single origin of life.
> Such a mixing would clearly mask any multiple origins of life.

Agreed.

> >Mixing of genes of different origin in the same organism (by means of
> >lateral gene transfer) implies that this organism becomes a hybrid to
> >some extent, and if you want to trace all genes, the phylogenetic tree
> >becomes reticulate.
>
> Yes, and any multiple origins of life would not be easy to detect under
> these assumptions.

Ok.

> >Yet (apart from domain shuffling) this does not
> >imply any reticulation of the individual gene trees (as opposed to the
> >organismal tree): each gene (or more precisely, each functional protein
> >domain) has its own unique descent and originated in a particular
> >species at a particular time.
>
> I disagree with your use of this data.

I am not trying to prove monophyly here - although, on the basis of
other data, I think it is the most likely assumption. Cases of xenologs,
on the other hand, are quite insufficient to make a strong case for
polyphyly.

Each case of xenology contributes to reticulation of the organismal
tree, but not to reticulation of that particular gene tree, since there
are, at the point of lateral gene transfer, not two different sources
for this transferred gene, but only one.

When domain shuffling comes in, on the other hand, a case might occur
where a new gene is generated by genetic recombination involving two
domains, one of which was already resident in that particular organism,
whereas the other one entered it from some other organism as a xenolog.
This produces a hybrid gene having a gene tree with a reticulation.

If the two domains remain intact, i.e. the recombination occurs beyond
the boudaries of functionality of the two domains, both of them might
retain their functionalities, which are, as a rule, different from each
other. Thus, this gives us no indication of the same function being
executed by two independently evolved protein sequences.

If, on the other hand, the recombination cuts out some part of one or
both source domains, a new domain might be formed. Now, the domain from
which a part was snipped will most probably lose its original function,
and if both source domains were shortened, both functions will probably
be lost. Again, this gives us no indication of the same function being
executed by two independently evolved proteins.

An extreme case can be imagined, in which two independently evolved
domains of exactly the same function come together by lateral gene
transfer and are recombined somewhere in the middle of both source
domains, producing an entirely new hybrid domain which, in addition,
reconstitutes the same function each source domain had, to begin with.
This is the only case I can conceive of, where gene tree reticulation
could tell us something about the same function being executed by
independently evolved sequences. Do you think such an almost-miracle is
likely to happen often? Do you know of any published example?

> >And if you look back to the definition of "synonymous families" given
> >above, you see that my claim stands, that "huge numbers" of them are
> >very unlikely.
>
> I don't agree with your assumption. Indeed, experimental evidence wouldn't
> agree.

Which assumption? Which evidence? Ok, give me just one published example
of two protein families, where (1) each family consists of orthologs,
i.e. clearly has a common ancestor, and (2) the two families execute
exactly the same function, and (3) the respective common ancestors of
the two families evolved independently! Instead of families, there could
theoretically be just two single proteins with properties (2) and (3),
but this would make it virtually impossible to show that (3) applies.
But you are not talking of just one example, but of "huge numbers"!

> >Now, this is confusing, Glenn. The relationship between DNA, RNA and
> >proteins is (to a first approximation) coding, transcription and
> >translation. If the code for a multifunctional protein is contained in a
> >gene, of course the resulting protein is multifunctional. The different
> >functionalities usually reside in different domains of the protein.
> >Where did I claim there couldn't be multifunctional proteins? I never
> >believed that.
>
> Well, you erroneously claimed that there were no multifunctional proteins.
> At least that is what you had written. You were wrong. You wrote Sat
> 5/18/02 8:55 that:
>
> > >Agreed - in principle. Yet, if there are any other families (let alone
> > >huge numbers) which will perform the same function (in the same
> > >organismal environment), I find it strange that no such example has been
> > >found to date, as far as I know.

Now tell me why (a) a multifunctional protein, which by definition is
one single protein performing at least two different functions, and (b)
two synonymous proteins, which by definition are very different, because
they evolved independently, but perform one and the same function,
should be the same thing! Case (b) interests us, not case (a). Your
claim is still confusing.

> And then I went out to find them.

You did not at all.

> > In any case,
> >multifunctional proteins perform different functions in the same
> >organism, and if we want to find out anything about the de novo
> >emergence of any one of these functions (information II), we have to
> >look at the family of the domain, in whose (simple) function we are
> >interested, and go back in time to the common ancestor of the domain.
>
> This simply isn't true. Multifunctional proteins, as I posted last night,
> perform different functions IN THE SAME ORGANISM.

That's exactly what I wrote, word for word (apart from capitalizing), in
the second line of my paragraph just above your reply! What do you want
to say?

> >Multiple-function proteins are irrelevant to the question under
> >investigation, see above.
>
> No they aren't, you said they didn't exist. They are relevant to measure
> your knowledge of the field, and they are relevant to the measurement of
> probability. Besides, as I noted, no one believes, save you, that proteins
> were what arose first.

I just showed you above that I didn't say they didn't exist, and I
explained again why they are irrelevant. Besides, as I noted in my last
post to you, I never claimed proteins arose first.

> >You keep mixing up multiple families and multifunctional proteins.
>
> NO, I am showing that multifunctionality demonstrates that the probability
> of finding a given function in probability space is less than even Yockey
> calculated. Multifunctionality is related to multiple familis. If protein X
> does both function A and B and Protein Y does function B and C, then Y is
> multifunctional and is multifamily.
>
> glenn

Multifunctionality demonstrates nothing of the kind. It just shows that
multiple functions can sometimes exist peaceably side by side in the
same protein - and presumably, that this is advantageous in these cases.
Multifunctionality has nothing to do with synonymous families, i.e.
independently evolved families of identical functionality.

Now, in your last sentence, for the first time, you provide a completely
new case. Does this indicate what exactly you meant all along with your
combination of multifunctionality and synonymy? I read this as functions
A, B, C being due to protein domains A, B, C. If not, you would first
have to give some indications about the sequence-function relationships,
and of sequence dissimilarity and gene trees indicating different
origins, before the case could be discussed profitably. In any case, X
and Y would then not be fully synonymous, the part(s) of the molecules
performing the common function B could be influenced by A and C,
respectively, and the relative contributions of the different parts to
the functional information II would be extremely difficult to
disentangle.

For the case of domains (one function - one domain): X = A--B, Y = B--C
(or perhaps better: X = A--B, Y = C--B): these apparently are two
two-domain proteins sharing a domain B, but differing in the other
domain, A or C. They presumably arose by domain shuffling. It may be
presumed (if it is not known) that B has the same (or a similar)
function in both X and Y, whereas the functions of A and C are probably
different. We clearly have multifunctionality in both X and Y, but just
as clearly they are not necessarily synonymous. If X and Y WERE indeed
synonymous as entire proteins, it would be a valid case. But then the
interest about synonymity would have to focus on A versus C as domains,
and it would have to be shown that they do not share a common ancestral
domain, since B in both proteins clearly does have the same ancestral
domain. Can you indicate a paper documenting such a case, sufficiently
well researched to answer the above questions?

Peter

-- 
Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland
<pruest@dplanet.ch> - Biochemistry - Creation and evolution
"..the work which God created to evolve it" (Genesis 2:3)

Next message: Adrian Teo: "RE: Catholic Church and Morality"
Previous message: Loren Haarsma: "Randomness (was Re: Science Education and the Church)"
Maybe in reply to: Peter Ruest: "Polyphyly and the origin of life"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu May 23 2002 - 13:51:34 EDT