An interesting article has appeared recently, which might be pertinent
to the discussion about the origin of new proteins: B. Kuhlman & D.
Baker, "Native protein sequences are close to optimal for their
structures", Proc. Natl. Acad. Sci. USA 97 (12 Sep 2000), 10383-8.
In Monte Carlo computer simulations, the authors generated proteins,
starting with random amino acid sequences, with the aim of duplicating
known tertiary protein structures. For their evolutionary steps, they
didn't take DNA mutations, but one amino acid replacement at a time
(except cysteine), in combination with one of its possible rotational
conformations (a total of 150 "rotamers" for all amino acids). The
backbone coordinates were held constant, and an energy function was used
as the evaluation criterion. One million substitutions were made per
run, and the lowest-energy sequence from 5 different runs was used for
comparison with the native sequence. The test set included 108 proteins
with <30% sequence identity with each other and crystal structures with
resolution better than 3.0 Angstrom.
The result: "Remarkably, in the designed sequences 51% of the core
residues and 27% of all residues were identical to the amino acids in
the corresponding positions in the native sequences." Natural selection
of a protein is guided by the requirements of (1) conformational
stability, (2) catalytic activity, (3) any additional interactions with
other molecules needed. This simulation dealt with the first factor
only. Presumably, not all - or even not many - of the designed sequences
had the biological activities of the native proteins. If the evaluation
procedure had taken all factors into account (if this had been
possible), the hit rate might have been even much higher. Concerning the
SH3 domain, which includes >400 naturally occurring proteins, and for
which they also tested the covariances between pairs of positions, the
authors wrote: "... it appears that evolution has sampled most of the
sequence space compatible with the SH3 structural core, and has to some
extent reached equilibrium." And, regarding possible de novo protein
design: "Since there appear to be so few good sequences for a unique
structure, the probability that there is any good sequence for any
single novel backbone structure may be very small."
The fact that, for a given protein, almost all runs (starting from
independent random sequences) converged to produce sequences nearly
identical with the single native sequence indicates that the
requirements for a given protein cannot usually be satisfied by
different, completely unrelated constructs. Of course, any amino acid
sequence has a certain structural stability, i.e. any sequence is
selectable from the start for structural stability. But as a given
biological activity certainly is not contained in most sequences,
natural selection for this activity usually cannot start immediately,
implying the need for an initial random mutational walk (whose
probability can, in principle, be estimated).
Interestingly, this finding that useful sequences are extremely scarce
within sequence space, appears to apply to ribozymes, as well: C.Wilson,
J.W.Szostak, Nature 374 (1995), 777 wrote: "A pool of 5 x 10^14
different random sequence RNAs was generated... On average, any given
28-nucleotide sequence has a 50% probability of being represented...
Remarkably, a single sequence accounted for more than 90% of the
selected pool... This result indicates that there are relatively few
solutions to the problem of binding biotin." The evolution of ribozymes,
however, is much faster, as there is no genotype-phenotype translation
requiring an average of >2 DNA mutations per specific amino acid
substitution.
Usually, it is assumed that any biological function whatsoever may be
produced by a sequence of single mutations whose products are, in each
case, subject to natural selection immediately. This assumption may have
to be reconsidered.
Peter Ruest
This archive was generated by hypermail 2b29 : Tue Oct 24 2000 - 11:05:31 EDT