Re: Evolution of proteins in sequence space

From: Dawsonzhu@aol.com
Date: Mon Aug 13 2001 - 12:29:31 EDT

Next message: Joel Z Bandstra: "RE: Is Jonah to be taken literally?"

Previous message: John W Burgeson: "Re: Homosexuality"
Maybe in reply to: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: Keith B Miller: "Re: Evolution of proteins in sequence space"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Peter Ruest wrote:

(Note: there were roughly three issues brought up in
this discussion. Since the emails are becoming unmanageable
I will divide my responses accordingly.)

PR: I assume you meant "amino acid" when you wrote "peptide".

WD: OK, perhaps I should specify "single peptide" when I mean "amino
acid". I chose to use "peptide" to avoid any possible misreading
between "amino acid" and "nucleic acid". (I typically have problems
with eye/mind/mouth coordination: I _see_ one thing, I _read_ it
as something else, and I _express_ it as a third thing...and people
who can't walk and chew gum simultaneously think they have
problems....)

[large snip]

> WD:> This means that a reasonable estimate on the upper bounds for the odds
> of getting a correct sequence are probably around (L/3)^8. Again, this can
> be a large number for L large. <
>
> PR: Again, I agree with you that interdependencies between amino acids
> at different positions eliminates the _physically_ possible occurrence
> of a large part of the _formally_ possible sequences. The only thing I
> am not so sure of is the specific formula you are deriving - both for
> the persistence length and for the number of amino acid categories.
> Depending on the particular cases, some of the equivalencies you assume
> may not always apply, increasing (or occasionally decreasing) the number
> of distinguishable physically possible proteins. Also, long-range
> interactions between amino acids in a protein are common.
>

What you need to understand about measuring a polymer is that the
most empirically descriptive parameter is probably its "links"
(equivalent to the persistence length). This is the _correct_
way to measure the entropy of RNA and protein folding as my
accepted article to Journal of Theoretical Biology shows. (It's
somewhere in press process right now as far as I know.) This
is how you would come to understand the folding kinetics, this is
how you would come to understand the maximum size of the domains
that are formed, and it is how you can understand what is the
"scaffolding" in the structure. Basically, entropy is a
fundamental constraint on how these things fold, and sets the
limits on what extent they can fold. So you need to see these
things from a _flexibility_ view (i.e., as LINKS) to even begin
to understand how they are likely to behave.

Yes, indeed it is true that the persistence length is variable
in RNA and is surely more so in proteins. The value 3 is an
_average_ (at least for the proteins measured). Whereas I
would grant that poly(Gly) probably has a small persistence length
(maybe 1.5 peptides due to two axes of free rotation), a real protein
has a mixture of different amino acids. Some of these like Pro
surely have a persistence length of around 5 peptides. A reasonable
range for the persistence length (given what little is known) is
between 1.5 and 5 peptides, so for an arbitrary generic protein
with no other specific information, a reasonable _average_ for
that persistence is 3 peptides. So in (L')^(n'), L' = L/3.

The point where I am not so clear myself is the issue of
the exponent: whether there are 8 categories or 10 or 12
or whatever. However, I strongly think 20 is too many.

A good reason to question this is that there is research
where a minimal set of amino acids are used with the intention
to engineer the same protein structure. There have been several
articles in Nature Structural Biology on this minimum set.
Some have claimed 5 is the minimal set. However, I have read
one author who claims that any set containing less than 10
would only result in random structures if you look beyond
the simple folds. Nevertheless, I'm sure even he would put
that "minimal set" at around 10 rather than 20. I do think
at present 5 is too small, and 10 is probably more realistic
(although I am not a specialist on this matter).

Another reason is that there is really not so much difference
(from a crude estimation) between Ile, Lys, and Val for example.
They are roughly the same size, similar hydrophobic character,
and similar functional groups. There are some adverse effects
from exchanging Val for Ile (I'm currently working on a problem
related to just that issue), but the effect from a progeny
perspective is small. Hydrophilic amino acids show greater
variation in functional groups forming the side chain, but I
still don't think we can add a lot more categories because
these surfaces are typically the ones that are exposed to
water.

The major deciding factors on beta sheets is the hydrophobic
effects, and when these peptides are exchanged, the packing
is often unfavorable with some loss in free energy, but in
at least some cases the structure still folds correctly,
and it often tolerates a cavity in such a replacement
(e.g., Ile or Leu for Val).

So the upshot is that I can maybe agree that n' *might*
be underestimated in my example. However, I would still
argue that you're pushing the argument to say that more
than 20 distinct amino acids are *required* for building
a functional protein. I guess there are different degrees
of "ideal" here, but I think if the protein folds the
same, has the same functions properly, and contains only 10
different residues, then that is a minimal set and the one
we should be assuming in these discussions. At present
some of these folds can even be done with 5.

So we can maybe revise this to (L')^(n') apprx (L/3)^(10)
where I would still favor n' = 8, and I don't make
any issue of n' = 10, or even n' = 12 (to allow for
some unspecified uncertainties). However, I suspect that
increasing the parameter to n' > 12 is probably difficult
to realistically justify.

[large snip] I will respond to the rest of this post in
a future email.

Finally, just in case I forget somewhere in this
discussion myself, I want to make clear that
irrespective of whether we evolved through a formation
economy or God inserted information at various points
in the process, our lives are a blessing. We are
allowed to think for ourselves. In that way, we really
are higher than the angels. If God merely programmed
the universe, our minds would perceive nothing more
than the universe we live in and the rules it follows,
and I doubt that we would even have the chance to
perceive that there could be more than we think:
that indeed, God could be in this picture somewhere.
If we were meant to be God's "robots",
we would simply "do what God said", but we're not, we
have a choice. The choice is to have faith or not to
have faith. I chose to follow in obedience to Christ
because the message speaks to my heart on how I *should*
live, and what makes a right way of life. Maybe I am a
fool and I should live to maximize my gain in _this_ life,
but I chose to follow _in faith_ that there is more, much
more.

by Grace we proceed,
Wayne

Next message: Joel Z Bandstra: "RE: Is Jonah to be taken literally?"
Previous message: John W Burgeson: "Re: Homosexuality"
Maybe in reply to: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"
Next in thread: Keith B Miller: "Re: Evolution of proteins in sequence space"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Aug 13 2001 - 12:29:48 EDT