Evolvability of new functions

From: pruest@pop.dplanet.ch
Date: Sat Oct 28 2000 - 15:04:08 EDT

  • Next message: John Burgeson: "Numerology"

    Hi Tim,
    thank you for your extensive comments!

    Tim Ikeda <tikeda@sprintmail.com> wrote:
    >
    > David Campbell wrote:
    > >> I would see the combination of parts from several
    > >> different genes followed by selection for a new function as a relatively
    > >> substantial innovation.
    > ...
    > Peter:
    > >I agree that this is a relatively substantial innovation. But,
    > >nevertheless, I would consider the amount of novel information gained
    > >to be relatively small. ("Information" has even more different meanings
    > >than "micro-" and "macroevolution"!) I would justify this claim as
    > >follows: After a single nucleotide mutation, the mutant and the
    > >wild-type are subject to natural selection, whose "answer" to the
    > >mutation is "yes" or "no" or something in-between, i.e. at most 1 bit of
    > >information. The same consideration applies to any more complex
    > >mutation, such as a new gene composed of shuffled exons: as far as
    > >natural selection is concerned, the gain of information from the
    > >environment is at most 1 bit. If this seems counter-intuitive, we must
    > >ask whether this new construct was produced in a single step, such as an
    > >unequal crossing-over. If yes, then it was a simple step, like a simple
    > >mutation or deletion. If it required a series of coordinated steps, the
    > >intermediates in this path probably were not under any selection, and
    > >the probability of end product formation may have been extremely small.
    >
    > Hmm... Sounds like Lee Spetner...

    Never heard of Spetner...

    > There is a very serious difficulty, IMO (and others' - consult earlier
    > discussions in sci.bio.info-theory), in relating sequence and structural
    > information measurements with metrics derived from selection. For example,

    I fully agree that a metric derived from selection cannot be used for
    estimating sequence and structural information. But this is NOT what I
    am doing. What I call functional or semantic information, given by
    sequence / structure in a given environment, cannot be measured in any
    way I know of. The closest we can get is what H.P.Yockey did
    ("Information theory and molecular biology" (Cambridge: Cambridge
    Univ.Press, 1992), p.254) for the presumptive functional information
    contained in a modern protein (family). But even this doesn't tell us
    how this information arose. Presumably, the earliest structures
    displaying this function were very much simpler. The only source for
    functional information in biological systems we know of is the
    environment acting in natural selection. Each event of fixation of a
    genetic change of any type and size is at most a yes/no answer: at most
    1 bit of information.

    But natural selection can only test a functional feature already present
    to some minimal degree. If we consider the entire historical
    developmental path of a functionality (e.g. an enzyme), including all of
    the functional information contained in it, its specific activity must
    have started sometime with a minimal amount of activity just sufficient
    to make it selectable. And before that? This is the interesting part of
    its history, because without selection, we can estimate a probability of
    random emergence. Afterwards, normal darwinian evolution sets in, and I
    see no way of estimating probabilities. There may be many other critical
    points in the evolution of a new function, but this is certainly the
    first one of them - and it is habitually ignored by evolutionary
    biologists.

    > a single point mutation can allow a bacterium to survive on one growth
    > medium where it couldn't previously. What could that point mutation have
    > done? Is there only one bit of change involved?
    >
    > It could have changed one amino acid to another in a protein. What is
    > the net change in the information content of the protein?
    >
    > It could have erased a stop codon, permitting expression of a longer
    > protein. How many bits of information are in the longer sequence?
    >
    > Or, the mutation could have wiped out a promoter, preventing the
    > expression of the protein. Is that information change positive or
    > negative with respect to the protein?
    >
    > Or, the mutation could have generated a new splice site -- How much
    > information change in the resultant protein?
    >
    > Or, the mutation could have replaced a proline with an aspartate,
    > taking the break out of an alpha-helix. What's the difference
    > in information content?
    >
    > These cases are not readily quantifiable. The question is: With
    > respect to what is the information metric derived? Sure, the
    > difference between my having one or two hundred-dollar bills in
    > my pocket may represent an informational difference of one bit,
    > by I can do a heck of a lot more with two of those bills than I
    > can with one.

    What is not quantifiable here is the amount of functional information
    acquired by the system in its entire history. I was only considering the
    last step of selection - which yields at most one bit of additional
    information, no matter what type of change this last step represented.
    The only reason I brought it up at all is because natural selection is
    the only natural source of biological information we know of. Of course,
    the probabilities of the different types of changes which might have
    produced the new function may be very different, and are usually not
    estimable. Even if this last step alone produced a new function never
    before found in the biosphere, the functional properties of the new
    protein are certainly a consequence of the sequence / function
    properties of its precursor(s). I would not consider this to be nothing,
    even if it didn't display any of the new function at all, because it
    represents a very specific prerequisite for the new function: you cannot
    splice together any two odd sequences and obtain a specific function
    required at the moment.

    > Peter:
    > > But to assume that ALL functionalities emerged in such a manner,
    > > without any non-selectable intermediates, is entirely speculative. How
    > > do you know this is "the vast majority" of genes? You yourself concede
    > > that the origin of "the first gene" is not dealt with. There are an
    > > estimated 1000 different protein folds (each grouping a series of protein
    > > families or superfamilies) in the biosphere, considering the globular,
    > > water-soluble proteins only (Y.I.Wolf, N.V.Grishin, E.V.Koonin, "Estimating
    > > the number of protein folds and families from complete genome data",
    > > J.Mol.Biol. 299 (2000), 897-905). Almost by definition, these 1000 folds
    > > are not related to each other by exon shuffling and gene duplication.
    >
    > I think that may be hard to tell. For example, alpha-helices can move and
    > be rearranged by recombination and duplication. I think some porins and
    > other transmembrane proteins have likely arisen from events such as these.

    Ok, I reduce my claim by adding "usually".
     
    > > Each one of them had to originate somewhere at least once during the past
    > > 3.8 billion years. Thus, it would be more realistic to talk about "the
    > first
    > > 1000 genes" whose emergence cannot be accounted for at present. These are
    > the
    > > cases I am considering when I talk about a mutational random walk without
    > > intermediate selection until a minimal selectable activity happens to be
    > > produced. These are cases I consider macroevolutionary steps posing
    > considerable
    > > informational problems deserving careful attempts at estimating their
    > probability
    > > and at possibly finding more realistic evolutionary scenarios than merely
    > > assuming that "it must have happened somehow" through selectable
    > intermediates.
    > > You may call these the most elementary cases of Behe's "irreducibly complex
    > > systems" - whose non-existence has not yet been made plausible.
    >
    > One thing about the "first 1000 folds" (I think fewer perhaps, but nevermind),
    > is that they seem to be common to all the major divisions of life. I'm not
    > sure how to peer behind the curtain of 2-3 billion years ago when the
    > major divisions appear to have split. However, one thing that comes to mind
    > is that horizontal transfer may have been a major factor in early life
    > (which may account for the relatedness between groups). With horizontal
    > transfer, the pool is a little bigger and testing may go somewhat faster.
     
    Wolf et al.'s estimate of 1000 different folds refers to the entire
    biosphere; horizontal transfers are already taken into consideration.

    > One other thing you've brought up previously was the suggestion that
    > the different protein families may represent local optima for possible
    > (or viable?) structures.

    I don't recall ... Was it the structural requirements for a compact,
    stable fold, in addition to the functional requirements for catalysis?
    This was an argument against assuming small peptides could serve as
    viable proteins.

    > Those regions of "evolutionary stability"
    > may be attactors for structural convergence. I'm not sure what may
    > represent the first steps toward these stable regions, but is it possible
    > that once these steps begin, some convergence to a stable form would
    > occur?

    What do you mean by "attractors for structural convergence"? Chaotic
    attractors? Selection peaks of a fitness surface in parameter space? I
    don't see the connection with the problem of finding the first minimal
    activity for a given function. At those points, by definition, the
    fitness surface is absolutely flat: nothing is selectable, we can only
    have random walks. Once the selectable steps begin, of course, normal
    darwinian evolution is possible. What do you mean by "evolutionary
    stability" in this context?

    > David:
    > >> Obviously, examining every known gene sequence to determine the relative
    > >> frequency of egene duplication, exon shuffling, and the like is not
    > feasible.
    > >> However, the general pattern that emerges as one examines a gene, one finds
    > >> related genes with different functions. If there are 1000 truly novel
    > genes,
    > >> that is still a lot less than the total number of genes in humans, for
    > example.
    > >> I did not mean to imply that all functions evolved by duplication and
    > >> modification of existing genes, but rather that it was extremely common.
    > Peter:
    > > If each selected mutational step adds 1 bit of information from the
    > > environment to a genome, the biosphere can collect quite a lot of
    > > information from the environment. But how about the "truly novel genes"?
    >
    > The counter you're making seems to be that in instances where it is
    > clear that a modified sequence gives rise to a function which it didn't
    > possess before, that these aren't truly novel genes but an un-novel
    > mixing of old ones.

    Not quite. What I call a truly novel gene is one whose function has
    never before existed in the entire biosphere, no matter what led to the
    last step which originated the first minimal amount of the new activity.
    If it is a mixing of old genes, the new gene may display a combination
    of the old functions (whose novelty is a matter of definition, but these
    cases need not concern us here), or possibly (but very unlikely)
    something entirely new, while the old functions no longer exist (perhaps
    due to clipping). For a reasonable discussion of such a possibility, we
    should have actual examples where this happened.

    Maybe I should distinguish between (1) the emergence of one of Wolf et
    al.'s 1000 folds and (2) a novel function whose initial emergence
    required 2 or more changes (mutations, shufflings, ...) going through
    non-selectable intermediates. I just assumed that cases of (2) are most
    likely to be found among (1). But this doesn't imply that each (1) must
    be a (2), or that each (2) leads to a new (1).

    > I wonder what "truly novel genes" one would expect
    > to find from duplication, recombination, mutation and deletion of
    > _previously existing_ sequences? To what does "novelty" apply: the
    > new function, the new arrangement of DNA sequences, or the _ultimate_
    > original origin of the sequences from which the components of the new
    > function were derived?

    Novelty applies to the biological function having never existed before
    in the entire biosphere. There might be many different ways in which
    novelty may emerge, but the easiest conceivable way (IMO) is a sequence
    of point mutations in a gene duplicate (possibly in a pseudogene state)
    leading to a minimal combination of specific amino acid occupations
    defining a new active site in the protein product. In order to bypass,
    for the moment, difficulties with the definition of the amount of
    functional information, we'd better not begin with cases where some
    previous function is incorporated into the new one.

    > Because it's clear that new functions can and do
    > arise from biochemical mechanisms which we have observed.

    Are there known cases which fit my definition of novelty?

    > Given that
    > sequences do tend to fall into families (with vertical and sometimes
    > horizontal linkages) in which many members can exhibit different
    > functions, this suggests (to me, at least), that much of the variation
    > seen can be understood by descent with modification rather than by
    > "spontaneous insertion".

    This is the reason why I concentrate on folds (i.e. sequences without
    any recognizable homology), rather than families. What do you mean by
    "spontaneous insertion"?
     
    > > Their minimally active form must have arisen by truly random-walk> > mutagenesis. Of which type of information - step-by-step selected or
    > > random-walk generated - is there more in the biosphere? I think we don't
    > > know. But what I am getting at is the challenge of the random-walk type.
    > > Even if this concerns only a few percent of all existing genes, it poses
    > > a big problem, as darwinian evolution cannot be invoked. Don't you think
    > > so?
    >
    > This is certainly an interesting problem. It's also related to what
    > "minimal" activity is, which is a relative question. Is the fitness
    > topology absolutely flat between peaks in most areas or not? This
    > is very difficult to determine. No studies of possible mutational
    > variation can be exhaustive and we're kidding ourselves if going through
    > a few thousand or a few million variations of an existing sequence
    > will tell us what we need to know about the possible activity of the
    > ur-protein, especially if may not be certain of the original
    > function and context of the original sequence.

    I agree, and I never intended to approach the problem in this way. I
    define "minimal" activity by an absolutely flat fitness topology some
    distance away from where the fitness starts to go up (cf. the above
    definition of "novelty"). Of course we don't know any ur-protein
    sequence. At best we might approach a last-common-ancestor sequence. But
    the minimal protein for a given function presumably is still much
    simpler. It is much more hypothetical, too. But we may estimate the
    probability of emergence for a given number of specific amino acid
    occupations. According to my model estimate, this number cannot be
    higher than 2 (my post of 22 Sep 2000). We could then compare this
    number with the known invariances found in protein families, and
    possibly folds, certainly much higher than 2.
     
    > David:
    > >> The example of a pseudogene reactivated, discussed in other posts, would
    > >> be a case of passing through unselected "random" intermediates before
    > >> arriving at a useful function.
    >
    > Peter:
    > > Yes, and this is exactly one of the interesting cases. Do you know of
    > > any case where such a path via unselected intermediates has been
    > > documented in a real biological system, not just stated as a general
    > > hypothesis? I am eager to find such cases!
    >
    > Hmm... I recall a series of papers by Daniel Dykhuizen (and Dan Hartl) in
    > the late-'80s & early '90s about natural variation in genes of the lactose
    > operon in bacteria (I think E. coli but possibly S. typhimurium). Using
    > metabolic modelling and competition experiments in chemostats they showed
    > that although natural variations the lac permease and b-galactosidase
    > sequences
    > were often effectively neutral with respect to growth on lactose, there were
    > real diffences to be found during growth on other carbohydrates. In
    > modelling the pathway, these relationships correlated with the measured
    > activities of the enzyme variants. So, in the absence of these alternate
    > carbon sources (some of which you wouldn't expect the bacterium to see
    > often), the lac system could tolerate variation with little effect on the
    > net metabolic flux. Thus under "normal" conditions some intermediates
    > appeared to have escaped selection... at least until conditions changed
    > (different growth environments) when suddenly those variants which
    > arose from previously unselected intermediates became fixed via selection.

    I haven't searched for this paper. But I happen to have a copy of
    B.G.Hall & H.S.Malik, "Determining the evolutionary potential of a
    gene", Mol.Biol.Evol. 15 (1998), 1055. They analyzed a cryptic E.coli
    beta-galactosidase ebgA ("evolved b-galactosidase"). In the absence of
    the normal (paralogous) lacZ b-galactosidase, ebgA can be used, and
    after 2 specific mutations it works as efficiently as lacZ. Why does it
    exist? 25 years earier, ebgA had been thought to represent a newly
    evolved function. Now, a phylogenetic tree of 14 b-galactosidases
    indicates that the separation between ebgA and lacZ must have occurred
    more than 2.2 billion years ago. Apparently, an occasional use for ebgA
    ensured its persistence during this time. The same may be true of
    Dykhuizen & Hartl's cryptic enzymes. Such cases, therefore, don't
    provide clear evidence for evolution of a new function by a random
    mutational walk.

    > There is another interesting case related to the "most elementary cases
    > of Behe's 'irreducibly complex systems'". This is a little off the main
    > topic of protein origins, but I think an elementary case can be found in
    > the evolution of streptomycin resistance. It's been known that some
    > mutations which give rise to streptomycin resistance can reduce the growth
    > rates of the bacteria relative to "wild-type" strains on media without
    > the antibiotic. So it was thought that if the selective pressure of
    > streptomycin resistance was removed the resistant strains would eventually
    > become less common in the environment. But studies showed that these
    > resistant strains persisted, even though they had not encountered
    > streptomycin in a long while. It turned out that these strains had
    > acquired a second mutation which suppressed the problems of carrying
    > streptomycin resistance. When either of these mutations were carried in
    > separate strains (strains with either the streptomycin resistance gene
    > or the suppressor gene), growth was slower, compared to strains without
    > both mutant genes. When both were present the strains grew as well as
    > those lacking both genes. This represents an "elementary" IC system:
    > a strain lacking one of the two mutations could not compete against the
    > wild-type in normal growth -- both mutations were necessary. Interestingly,
    > this system arose in much the way that one would expect an IC system
    > to evolve: indirectly, through steps of selection under conditions that
    > were not the same as where the system finally emerged.

    If I remember correctly, streptomycin resistance occurs by a ribosomal
    mutation. It is to be expected that, in the absence of the antibiotic,
    the mutant would be worse off than the wild-type; and it would not be
    surprising if in the presence of the antibiotic, the mutant would be
    under some selective pressure to get another mutation elsewhere that
    would mitigate the damage done by the first one, without eliminating the
    protection from streptomycin. If this should turn out to be the case, it
    would not constitute an IC system, as each mutation can be selected by
    itself and the intermediate is viable.

    > Regards,
    > Peter Ruest
    > pruest@dplanet.ch



    This archive was generated by hypermail 2b29 : Sat Oct 28 2000 - 15:01:43 EDT