Re: Evolution of proteins in sequence space

From: Lawrence Johnston (johnston@uidaho.edu)
Date: Fri Aug 03 2001 - 10:19:20 EDT

Next message: bivalve: "Re: Wheel of God"

Previous message: george murphy: "Re: Wheel of God"
In reply to: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: george murphy: "Atoms and God (was Re: Evolution of proteins in sequence space)"
Next in thread: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Reply: george murphy: "Atoms and God (was Re: Evolution of proteins in sequence space)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Peter - Thanks x 10^6 for that beautiful analysis of our situation in sequence space. It
looks to me like this leaves us with two options:

1, we adopt Van Til's hypothesis of ultra-smart atoms or 2. Assume that Someone has been
injecting huge amounts of information into the Universe, from outside. Other options?

Peter Ruest said:

> Proteins may evolve in two basically different modes. One mode is by a
> sequence of point mutations. The other mode is by genetic recombination
> of preexisting modules or fragments. (Let me ignore deletions which
> presumably are deleterious in the vast majority of cases - except
> perhaps for some occasional deletions of entire codons.) Each of the new
> sequences produced must then be accepted (and fixed in the population)
> by natural selection or by random drift (if it is lost, it does not
> contribute to evolution). Novel sequence information is generated in the
> first case, series of several point mutations, only.
>
> Basically, any sequence within the transastronomically huge
> combinatorial space of the L^20 possible sequences of proteins of length
> L would be accessible during evolution, if there is a mutational path
> which leads from an existing sequence to the target considered and which
> does not contain any intermediates which are selected against (or even
> lethal). In order to evaluate this mechanism of evolution and the
> probability of its success, we should have an idea about the frequency
> of useful sequences in sequence space. This information has been
> missing, but now some indications about it are available.
>
> Keefe A.D., Szostak J.W., "Functional proteins from a random-sequence
> library", Nature 410 (2001), 715-718, generated a library of 6x10^12
> proteins, each containing 80 contiguous random amino acids, and enriched
> those proteins that bound to ATP. They found four new families of
> ATP-binding proteins unrelated to each other and unrelated to the
> natural ones. The selectively enriched substitutions were distributed
> over 62 of the 80 randomized amino acids, and a core domain of 45 amino
> acids sufficient for ATP-binding was defined. Keefe et al. estimated
> that roughly 1 in 10^11 of all random-sequence proteins have ATP-binding
> activity.
>
> Silverman J.A., Balakrishnan R., Harbury P.B., "Reverse engineering the
> ([beta]/[alpha])8 barrel fold", Proceedings of the National Academy of
> Sciences USA 98 (2001), 3092-3097, analyzed the most commonly occurring
> fold among protein catalysts, the TIM (triosephosphate isomerase) barrel
> consisting of 8 analogous units of beta sheet, loop, alpha helix, and
> turn, which together form a barrel accommodating a variable active site,
> used in a large family of different enzymes. Silverman et al. applied
> combinatorial mutagenesis of 182 amino acid positions in the barrel and
> functional selection for TIM activity in E.coli, requiring a minimal
> threshold of 10^-4 of wild-type activity. They estimate that fewer than
> 1 in 10^10 of the sequences in their degenerate library are able to
> complement in vivo.
>
> Thus, the two estimates agree quite well, even though they are derived
> in very different ways. If we look at protein sequence space, less (how
> much?) than 1 in 10^10 sequences is a triosephosphate isomerase enzyme,
> and 1 in 10^11 sequences binds ATP, which is a partial activity of many
> enzymes.
>
> As the human genome contains an estimated 30,000 genes, and the number
> of different protein folds is estimated to be a few thousand, we may, as
> a very rough approximation, assume that there are less than 10^4
> basically different protein families in the biosphere, within each of
> which a number of similar proteins can be derived from each other by
> feasible evolutionary paths.
>
> The question is whether each of the 10^4 different protein families can
> be similarly derived from one or very few initial sequences, or by
> random mutational walks. If a novel enzyme or other functional protein
> is to arise, which is not easily derivable by a few selected mutations
> from an already existing one, we need a mutational random walk. The
> probability of finding any sequence with the activity required is about
> 10^-11. If, at a given moment in the evolution of a species, any one of
> 10^4 different novel activities will prove advantageous, the probability
> of finding any such sequence is about 10^-7.
>
> These estimates assume that directed evolution in the lab is a valid
> model for natural evolution. Of course, this is not the case, as in
> directed evolution one does not have to bother about the viability of
> each intermediate organism in a linear sequence of point mutations, but
> only about the isolated activity of a new protein sequence after several
> or many mutations. Directed evolution jumps around in sequence space,
> whereas natural evolution is limited to single-step paths, and none of
> these steps must go downhill on the fitness surface.
>
> How, then, is it possible that any one of the 10^3 or 10^4 basically
> different protein folds (families) arose (anywhere in the biosphere),
> let alone all of them? If there was the need for 10^3 different searches
> with probabilities of around 10^-10, it seems a hopeless proposition.
> (And the few million years available for the formation of the first
> viable organism appear transastronomically inadequate.)
>
> The only possibility of a way out seems to be to claim that every single
> one of the different protein families used in the biosphere are
> intimately connected in sequence space, such that simple linear
> sequences of point mutations, with all intermediates naturally selected,
> will do for all proteins. In this case, more than 99.999999999% (eleven
> nines altogether) of sequence space is barren for life and was never
> visited by any sequence during evolution. Whether this is a feasible
> proposition will have to be shown experimentally.
>
> This still leaves us with the mystery of the origin of the first living
> organism capable of natural evolution.
>
> But the very interesting finding of the two papers mentioned is that the
> protein sequence space is extremely sparcely populated with useful
> sequences. This makes evolution (which, for theological reasons, I
> believe has happened) an astonishingly marvellous process.

All God's best, Larry Johnston

"He has made everything beautiful in its time. He has also set
eternity in the hearts of men" - - Ecclesiastes 3:11, NIV trans

================================================
Lawrence H. Johnston home:917 E. 8th st.
professor of physics, emeritus Moscow, Id 83843
University of Idaho (208) 882-2765
http://www.uidaho.edu/~johnston/ =====================

Next message: bivalve: "Re: Wheel of God"
Previous message: george murphy: "Re: Wheel of God"
In reply to: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: george murphy: "Atoms and God (was Re: Evolution of proteins in sequence space)"
Next in thread: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Reply: george murphy: "Atoms and God (was Re: Evolution of proteins in sequence space)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Aug 03 2001 - 10:17:54 EDT