Evolution of proteins in sequence space

From: pruest@pop.dplanet.ch
Date: Fri Aug 10 2001 - 10:22:53 EDT

Next message: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"

Previous message: bivalve: "Wheels"
Next in thread: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: D. F. Siemens, Jr.: "Re: Evolution of proteins in sequence space"
Maybe reply: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: John W Burgeson: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Dawsonzhu@aol.com: "Re: Evolution of proteins in sequence space"
Maybe reply: Keith B Miller: "Re: Evolution of proteins in sequence space"
Maybe reply: D. F. Siemens, Jr.: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Keith B Miller: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Wayne Dawson wrote (WD, 6 Aug 2001 11:21:49 EDT):
>
> Peter Ruest wrote (PR, 02 Aug 2001 17:18:27 +0200):
>
> > Basically, any sequence within the transastronomically huge
> > combinatorial space of the L^20 possible sequences of proteins of length
> > L would be accessible during evolution, if there is a mutational path
> > which leads from an existing sequence to the target considered and which
> > does not contain any intermediates which are selected against (or even
> > lethal). In order to evaluate this mechanism of evolution and the
> > probability of its success, we should have an idea about the frequency
> > of useful sequences in sequence space. This information has been
> > missing, but now some indications about it are available.
>
WD: > You should at least write a mental note somewhere that correlation
effects in a polymer are *not* limited to single peptides, nor single
nucleotides, nor any other monomer that you can name. Typically,
nearest neighboring monomers tend to be coupled due to the lack of free
rotation about their bond axes. For nucleotides, this includes
correlation between the aromatic rings. For peptides, it is more
complicated because you have more interactions: hydrophobic,
hydrophilic, and acid/base interactions. In any case, you can (and
should) expect a polymer to have correlation between its nearest
neighbors. <

PR: I assume you meant "amino acid" when you wrote "peptide". I agree
with your comment, as I have also dealt with the problem of the
interdependence between amino acid positions before. I certainly do not
claim each one of the L^20 _formal_ sequence possibilities would
correspond to a _physically possible_ protein. All that is needed for my
discussion of the two papers I dealt with is the formal protein
configurational space, as non-viable amino acid sequences may be
specified by possible DNA sequences, but are then weeded out anyway.

> (snip) <

In your post of 8 Aug 2001 10:15:24 EDT, you corrected the following
paragraph:

WD: > I thought over what I wrote here and I think this needs to be
corrected....
>
> > Hence, I would be inclined to argue that the number of degrees
> > of freedom have been greatly overestimated in L^20, and
> > L^(20/3) is a more realistic estimate of the odds involved.
> > That is admittedly still a big number for any long protein
> > chain, and may still lead to astronomically huge odds, but
> > certainly not _as_ huge.
>
> I was reasoning that the distinguishability of the different peptides is reduced by the extended persistence length, but that should have been worked out from the following.
>
> (1) The persistence length affects the *base* of the expression L' = L/3 (approximately).
>
> (2) On the other hand, the exponent (n') of the expression should be largely defined by the basic chemistry of the interacting side chains: hydrophobic, hydrophilic acid, base. That allows a maximum of say 8 categories
>
> hydrophobic: weakly -> moderately -> strongly
> hydrophilic: weakly -> moderately -> strongly
> acidic
> basic
>
> I think weakly hydrophobic/philic is really the same thing (Gly for example), but perhaps a special class involving steric interactions (e.g., Trp or Pro) could also be invoked, so perhaps a maximum of 8 classes of truly *distinguishable* peptides is reasonable in this case.
>
> Of course there are some examples where a single peptide change can be lethal, but more often the changes are far less pernicious tending only to accumulate noticeable problems in old age. In any case, polymorphism in the human genome makes such things as the CD4 receptor more vulnerable to HIV infection in some groups, and less so in others, so variation in proteins is not something particularly profound.
>
> Thus, I think 8 represents an estimate of the chemically *distinguishable* set of peptides in a sequence which means the exponent in the expression is probably about 8. Smaller values are probably too small, but I also don't see a lot of reason to argue that there should be more categories in such a rough estimation procedure. Certainly 20 is pushing it.
>
> This means that a reasonable estimate on the upper bounds for the odds of getting a correct sequence are probably around (L/3)^8. Again, this can be a large number for L large. <

PR: Again, I agree with you that interdependencies between amino acids
at different positions eliminates the _physically_ possible occurrence
of a large part of the _formally_ possible sequences. The only thing I
am not so sure of is the specific formula you are deriving - both for
the persistence length and for the number of amino acid categories.
Depending on the particular cases, some of the equivalences you assume
may not always apply, increasing (or occasionally decreasing) the number
of distinguishable physically possible proteins. Also, long-range
interactions between amino acids in a protein are common.

WD: > Since there is as yet no evidence of intelligent life elsewhere in
the universe, the probability of this process progressing to the point
where intelligent life can emerge is clearly small. Perhaps "bacteria"
levels of "life" may exist elsewhere but even that remains questionable
if the exponent really implies "inevitable" as some people might wish to
think. <

PR: I would even doubt the feasibility of natural self-organization of
matter forming viable "proto-bacteria", apart from divine guidance
(providence).

WD: > In that sense, a chance in a trillion is not to far out of reason
to allow possibility in God's formation economy, but not mere
inevitability. Since I have enough problems with my own ego and
submitting to Christ's call in my life, and I'm sure I am not alone in
that regard, that seems like God's divine wisdom in action. <

PR: Here, I continue with your post of 6 Aug 2001 11:21:49 EDT,
discussing what I wrote primarily about the darwinian evolution of
already existing, viable genomes, not about the emergence of the first
life.

> > (snip)
> >
> > As the human genome contains an estimated 30,000 genes, and the number
> > of different protein folds is estimated to be a few thousand, we may, as
> > a very rough approximation, assume that there are less than 10^4
> > basically different protein families in the biosphere, within each of
> > which a number of similar proteins can be derived from each other by
> > feasible evolutionary paths.
> >
> > The question is whether each of the 10^4 different protein families can
> > be similarly derived from one or very few initial sequences, or by
> > random mutational walks. If a novel enzyme or other functional protein
> > is to arise, which is not easily derivable by a few selected mutations
> > from an already existing one, we need a mutational random walk. The
> > probability of finding any sequence with the activity required is about
> > 10^-11. If, at a given moment in the evolution of a species, any one of
> > 10^4 different novel activities will prove advantageous, the probability
> > of finding any such sequence is about 10^-7.
>
WD: > I am still not sure myself exactly what to make of the folds. Do
they represent a language? If so, to what extent: are they mere
commands or is there something more? By (admittedly rather dangerously
poor) analogy, the early 8008 processor functioned successfully with
only 17 instructions. Hence, if the "function" of a protein is quite
limited, then the required "instruction set" could also be quite small.
<

PR: Once life is here, the letters, in the language metaphor, could
correspond to the nucleotides, the words to the amino acids, the
sentences or instructions (commands) to the proteins. If we assume that
one protein can easily evolve into a slightly different one (within the
same protein family or fold), we need not be overly concerned about this
variability. But evolution into a different family (or fold) is
presumably much more difficult. The 8008 processor analogy is not useful
- unless a viable self-replicating "organism" consisting of just a few
(such as 17) types of proteins (or RNAs) only is really synthesized in
the lab. Restricting the size of the instruction set or the
functionalities of the instructions is of no use if this kills the
organism.

WD: > So one thing that seems to need clarification is the level of
complexity of a given protein. There is a big difference between the
complexity of a human language, and the complexity of a simple computer
program carrying out a small instruction set. Likewise, how many
instructions are actually necessary is not fully clear to me. In that
case, it is not so much the _number_ of folds, but what the folds
actually _do_ that needs to be defined clearly. <

PR: This is exactly what we need to know. (1) What is the minimal set of
proteins (or RNAs) required for life? and (2) what is the minimal
complexity of each of these polymers? At present, no one has the
slightest idea about this. All we know is the complexity of the simplest
organisms living today, which is of the order of a few hundred proteins
of at least a few dozen required amino acid positions of at least 8
(according to your estimate) different types, that is, very much more
complex than any computer language, and clearly way beyond random-walks
and self-organization.

> > These estimates assume that directed evolution in the lab is a valid
> > model for natural evolution. Of course, this is not the case, as in
> > directed evolution one does not have to bother about the viability of
> > each intermediate organism in a linear sequence of point mutations, but
> > only about the isolated activity of a new protein sequence after several
> > or many mutations. Directed evolution jumps around in sequence space,
> > whereas natural evolution is limited to single-step paths, and none of
> > these steps must go downhill on the fitness surface.
> >
> > How, then, is it possible that any one of the 10^3 or 10^4 basically
> > different protein folds (families) arose (anywhere in the biosphere),
> > let alone all of them? If there was the need for 10^3 different searches
> > with probabilities of around 10^-10, it seems a hopeless proposition.
> > (And the few million years available for the formation of the first
> > viable organism appear transastronomically inadequate.)
> >
> > The only possibility of a way out seems to be to claim that every single
> > one of the different protein families used in the biosphere are
> > intimately connected in sequence space, such that simple linear
> > sequences of point mutations, with all intermediates naturally selected,
> > will do for all proteins. In this case, more than 99.999999999% (eleven
> > nines altogether) of sequence space is barren for life and was never
> > visited by any sequence during evolution. Whether this is a feasible
> > proposition will have to be shown experimentally.
> >
> > This still leaves us with the mystery of the origin of the first living
> > organism capable of natural evolution.
> >
> > But the very interesting finding of the two papers mentioned is that the
> > protein sequence space is extremely sparsely populated with useful
> > sequences. This makes evolution (which, for theological reasons, I
> > believe has happened) an astonishingly marvelous process.
>
WD: > You are beginning to rant again here. I can agree to some extent
that the laboratory conditions _somewhat_ favor the expectations of the
experimentalist. When these ideal conditions are removed, and these
materials have to compete with all the other crud in a vat full of brown
tar, it is not particularly clear that the results will be favorable. <

PR: Apparently, my last sentence, "This makes evolution (which, for
theological reasons, I believe has happened) an astonishingly marvelous
process", has misled you into thinking I am happy with the usual
evolutionary speculations. This is not at all the case. On the contrary,
on the scientific level, I fully agree with your skepticism. The only
way we can expect favorable evolutionary results in by way of divine
guidance (or providence). In my post of 28 Nov 2000 17:30:17 +0100 (ASA
digest vol #1889), I explained what I mean by this (extended version to
appear in PSCF, Sep 2001).

WD: > I think it also pertinent to say here that often the one thing
that seems seriously lacking in these exchanges (perhaps more so from
the evolution side) is reverence for how astoundingly lucky we really
are to even have the privilege to think about where we came from. YEC
folk err greatly in other ways, but I recognize that (in part) this is
because they respect the Lord. In much the same way, I'm sure this is
probably at the heart of ID arguments, viz., by invoking evolution, we
seem to be denying the Lord's providence in our lives. I think it fair
enough to say that ignoring the Lord is folly, and I understand that I
have regularly come up short on more than this account alone. <

PR: Without the Lord's providence, evolution is clearly incapable of
achieving what it is usually believed to do. But autonomous evolution is
not what God created. The most powerful and perfect computer does
nothing whatsoever without the appropriate software, input data, and
starting command.

WD: > That being said, I am not fully decided on this matter, but I
would contend that there are a lot of curious properties in polymers
that allow for interesting possibilities. The abiogenesis arguments
although persuasive *may* turn out to be wrong, but they are certainly
arguments that can be tested and a testable hypothesis is something that
a scientist can work on. "Give up" arguments are not (or at least, not
until the funding runs out).
>
> As I currently see it, the major problems that currently plague an abiogenesis scenario are probably as follows.
>
> (1) A power source for running an RNA world. RNA does not appear to have a very large diversity of catalytic activity (at least compared to proteins). Without an engine and something to burn, the RNA world would "run out of gas" rather quickly. Introducing proteins brings us back to the chicken or egg question and greatly increases the complexity of the prebiotic world.
>
> (2) The "replicaters" in a prebiotic world. If proteins must be an integral part of the abiogenesis process, the transcription machinery becomes more complicated as well. There have been a few attempts at replicaters for RNA (I suspect mostly inadequate), but if this must include the replication of proteins, then the difficultly of making "first base" becomes far more insurmountable.
>
> (3) Even if we can eventually find a way to explain (1) and (2), let's not forget that life is an astoundingly lucky privilege and we should not forget to honor the Lord. Our call to follow Christ is in no way diminished whether life came about by probabilities or miracles. Life itself is itself a "miracle," and it is blessing that we *can* even chose to follow.
>
> by Grace we proceed,
> Wayne

PR: Fully in agreement. And I would add, the most important ingredient
lacking (apart from divine guidance/providence), is a source for the
information needed to define the option to be chosen at each of the
myriad crucial but random-looking events.

Peter

-- 
--------------------------------------------------------------
Dr Peter Ruest			Biochemistry
Wagerten			Creation and evolution
CH-3148 Lanzenhaeusern		Tel.:	++41 31 731 1055
Switzerland			E-mail:	<pruest@dplanet.ch
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	In biology - there's no free lunch -
		and no information without an adequate source.
	In Christ - there is free and limitless grace -
		for those of a contrite heart.
--------------------------------------------------------------

Next message: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Previous message: bivalve: "Wheels"
Next in thread: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: D. F. Siemens, Jr.: "Re: Evolution of proteins in sequence space"
Maybe reply: bivalve: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: John W Burgeson: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Dawsonzhu@aol.com: "Re: Evolution of proteins in sequence space"
Maybe reply: Keith B Miller: "Re: Evolution of proteins in sequence space"
Maybe reply: D. F. Siemens, Jr.: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Maybe reply: Keith B Miller: "Re: Evolution of proteins in sequence space"
Maybe reply: Howard J. Van Till: "Re: Evolution of proteins in sequence space"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Aug 10 2001 - 10:22:23 EDT