Re: Evolvability of new functions

From: Tim Ikeda (tikeda@sprintmail.com)
Date: Thu Oct 26 2000 - 00:40:08 EDT

  • Next message: glenn morton: "atmospheric origin of life?"

    David Campbell wrote:
    >>> In general, the vast majority of new genes seem to be produced from
    >>> manipulation of existing genes-mixing and matching parts, duplicating
    >>> and then modifying, etc....<<

    Peter replies:
    >>>The origin of a new protein by exon shuffling may also be considered a
    >>>cooptation of a set of preexisting functionalities. This also applies
    >>>to duplicates of genes happening to already possess an initial minimal
    >>>activity of a new kind. In such cases, selection is possible from the
    >>>start. This is microevolution and does not pose any informational
    problems.<

    David :
    >> Micro and macroevolution are defined in too many different ways for me
    >> to be sure what you mean here. Exon shuffling and gene duplication followed
    >> by modification both produce novel information, something some intelligent
    >> design advocates claim is impossible. I don't think you are trying to
    >> support such claims, but I would see the combination of parts from several
    >> different genes followed by selection for a new function as a relatively
    >> substantial innovation.
    ...
    Peter:
    >I agree that this is a relatively substantial innovation. But,
    >nevertheless, I would consider the amount of novel information gained
    >to be relatively small. ("Information" has even more different meanings
    >than "micro-" and "macroevolution"!) I would justify this claim as
    >follows: After a single nucleotide mutation, the mutant and the
    >wild-type are subject to natural selection, whose "answer" to the
    >mutation is "yes" or "no" or something in-between, i.e. at most 1 bit of
    >information. The same consideration applies to any more complex
    >mutation, such as a new gene composed of shuffled exons: as far as
    >natural selection is concerned, the gain of information from the
    >environment is at most 1 bit. If this seems counter-intuitive, we must
    >ask whether this new construct was produced in a single step, such as an
    >unequal crossing-over. If yes, then it was a simple step, like a simple
    >mutation or deletion. If it required a series of coordinated steps, the
    >intermediates in this path probably were not under any selection, and
    >the probability of end product formation may have been extremely small.

    Hmm... Sounds like Lee Spetner...
    There is a very serious difficulty, IMO (and others' - consult earlier
    discussions in sci.bio.info-theory), in relating sequence and structural
    information measurements with metrics derived from selection. For example,
    a single point mutation can allow a bacterium to survive on one growth
    medium where it couldn't previously. What could that point mutation have
    done? Is there only one bit of change involved?

    It could have changed one amino acid to another in a protein. What is
    the net change in the information content of the protein?

    It could have erased a stop codon, permitting expression of a longer
    protein. How many bits of information are in the longer sequence?

    Or, the mutation could have wiped out a promoter, preventing the
    expression of the protein. Is that information change positive or
    negative with respect to the protein?

    Or, the mutation could have generated a new splice site -- How much
    information change in the resultant protein?

    Or, the mutation could have replaced a proline with an aspartate,
    taking the break out of an alpha-helix. What's the difference
    in information content?

    These cases are not readily quantifiable. The question is: With
    respect to what is the information metric derived? Sure, the
    difference between my having one or two hundred-dollar bills in
    my pocket may represent an informational difference of one bit,
    by I can do a heck of a lot more with two of those bills than I
    can with one.

    Peter:
    > But to assume that ALL functionalities emerged in such a manner,
    > without any non-selectable intermediates, is entirely speculative. How
    > do you know this is "the vast majority" of genes? You yourself concede
    > that the origin of "the first gene" is not dealt with. There are an
    > estimated 1000 different protein folds (each grouping a series of protein
    > families or superfamilies) in the biosphere, considering the globular,
    > water-soluble proteins only (Y.I.Wolf, N.V.Grishin, E.V.Koonin, "Estimating
    > the number of protein folds and families from complete genome data",
    > J.Mol.Biol. 299 (2000), 897-905). Almost by definition, these 1000 folds
    > are not related to each other by exon shuffling and gene duplication.

    I think that may be hard to tell. For example, alpha-helices can move and
    be rearranged by recombination and duplication. I think some porins and
    other transmembrane proteins have likely arisen from events such as these.

    > Each one of them had to originate somewhere at least once during the past
    > 3.8 billion years. Thus, it would be more realistic to talk about "the
    first
    > 1000 genes" whose emergence cannot be accounted for at present. These are
    the
    > cases I am considering when I talk about a mutational random walk without
    > intermediate selection until a minimal selectable activity happens to be
    > produced. These are cases I consider macroevolutionary steps posing
    considerable
    > informational problems deserving careful attempts at estimating their
    probability
    > and at possibly finding more realistic evolutionary scenarios than merely
    > assuming that "it must have happened somehow" through selectable
    intermediates.
    > You may call these the most elementary cases of Behe's "irreducibly complex
    > systems" - whose non-existence has not yet been made plausible.

    One thing about the "first 1000 folds" (I think fewer perhaps, but nevermind),
    is that they seem to be common to all the major divisions of life. I'm not
    sure how to peer behind the curtain of 2-3 billion years ago when the
    major divisions appear to have split. However, one thing that comes to mind
    is that horizontal transfer may have been a major factor in early life
    (which may account for the relatedness between groups). With horizontal
    transfer, the pool is a little bigger and testing may go somewhat faster.

    One other thing you've brought up previously was the suggestion that
    the different protein families may represent local optima for possible
    (or viable?) structures. Those regions of "evolutionary stability"
    may be attactors for structural convergence. I'm not sure what may
    represent the first steps toward these stable regions, but is it possible
    that once these steps begin, some convergence to a stable form would
    occur?

    David:
    >> Obviously, examining every known gene sequence to determine the relative
    >> frequency of egene duplication, exon shuffling, and the like is not
    feasible.
    >> However, the general pattern that emerges as one examines a gene, one finds
    >> related genes with different functions. If there are 1000 truly novel
    genes,
    >> that is still a lot less than the total number of genes in humans, for
    example.
    >> I did not mean to imply that all functions evolved by duplication and
    >> modification of existing genes, but rather that it was extremely common.
    Peter:
    > If each selected mutational step adds 1 bit of information from the
    > environment to a genome, the biosphere can collect quite a lot of
    > information from the environment. But how about the "truly novel genes"?

    The counter you're making seems to be that in instances where it is
    clear that a modified sequence gives rise to a function which it didn't
    possess before, that these aren't truly novel genes but an un-novel
    mixing of old ones. I wonder what "truly novel genes" one would expect
    to find from duplication, recombination, mutation and deletion of
    _previously existing_ sequences? To what does "novelty" apply: the
    new function, the new arrangement of DNA sequences, or the _ultimate_
    original origin of the sequences from which the components of the new
    function were derived? Because it's clear that new functions can and do
    arise from biochemical mechanisms which we have observed. Given that
    sequences do tend to fall into families (with vertical and sometimes
    horizontal linkages) in which many members can exhibit different
    functions, this suggests (to me, at least), that much of the variation
    seen can be understood by descent with modification rather than by
    "spontaneous insertion".

    > Their minimally active form must have arisen by truly random-walk
    > mutagenesis. Of which type of information - step-by-step selected or
    > random-walk generated - is there more in the biosphere? I think we don't
    > know. But what I am getting at is the challenge of the random-walk type.
    > Even if this concerns only a few percent of all existing genes, it poses
    > a big problem, as darwinian evolution cannot be invoked. Don't you think
    > so?

    This is certainly an interesting problem. It's also related to what
    "minimal" activity is, which is a relative question. Is the fitness
    topology absolutely flat between peaks in most areas or not? This
    is very difficult to determine. No studies of possible mutational
    variation can be exhaustive and we're kidding ourselves if going through
    a few thousand or a few million variations of an existing sequence
    will tell us what we need to know about the possible activity of the
    ur-protein, especially if may not be certain of the original
    function and context of the original sequence.

    David:
    >> The example of a pseudogene reactivated, discussed in other posts, would
    >> be a case of passing through unselected "random" intermediates before
    >> arriving at a useful function.

    Peter:
    > Yes, and this is exactly one of the interesting cases. Do you know of
    > any case where such a path via unselected intermediates has been
    > documented in a real biological system, not just stated as a general
    > hypothesis? I am eager to find such cases!

    Hmm... I recall a series of papers by Daniel Dykhuizen (and Dan Hartl) in
    the late-'80s & early '90s about natural variation in genes of the lactose
    operon in bacteria (I think E. coli but possibly S. typhimurium). Using
    metabolic modelling and competition experiments in chemostats they showed
    that although natural variations the lac permease and b-galactosidase
    sequences
    were often effectively neutral with respect to growth on lactose, there were
    real diffences to be found during growth on other carbohydrates. In
    modelling the pathway, these relationships correlated with the measured
    activities of the enzyme variants. So, in the absence of these alternate
    carbon sources (some of which you wouldn't expect the bacterium to see
    often), the lac system could tolerate variation with little effect on the
    net metabolic flux. Thus under "normal" conditions some intermediates
    appeared to have escaped selection... at least until conditions changed
    (different growth environments) when suddenly those variants which
    arose from previously unselected intermediates became fixed via selection.

    There is another interesting case related to the "most elementary cases
    of Behe's 'irreducibly complex systems'". This is a little off the main
    topic of protein origins, but I think an elementary case can be found in
    the evolution of streptomycin resistance. It's been known that some
    mutations which give rise to streptomycin resistance can reduce the growth
    rates of the bacteria relative to "wild-type" strains on media without
    the antibiotic. So it was thought that if the selective pressure of
    streptomycin resistance was removed the resistant strains would eventually
    become less common in the environment. But studies showed that these
    resistant strains persisted, even though they had not encountered
    streptomycin in a long while. It turned out that these strains had
    acquired a second mutation which suppressed the problems of carrying
    streptomycin resistance. When either of these mutations were carried in
    separate strains (strains with either the streptomycin resistance gene
    or the suppressor gene), growth was slower, compared to strains without
    both mutant genes. When both were present the strains grew as well as
    those lacking both genes. This represents an "elementary" IC system:
    a strain lacking one of the two mutations could not compete against the
    wild-type in normal growth -- both mutations were necessary. Interestingly,
    this system arose in much the way that one would expect an IC system
    to evolve: indirectly, through steps of selection under conditions that
    were not the same as where the system finally emerged.

    [...]
    Regards,
    Tim Ikeda
    tikeda@sprintmail.com



    This archive was generated by hypermail 2b29 : Thu Oct 26 2000 - 00:41:14 EDT