Re: Evolution of proteins in sequence space

From: Dawsonzhu@aol.com
Date: Mon Aug 13 2001 - 12:29:31 EDT

  • Next message: Joel Z Bandstra: "RE: Is Jonah to be taken literally?"

    Peter Ruest wrote:

    (Note: there were roughly three issues brought up in
    this discussion. Since the emails are becoming unmanageable
    I will divide my responses accordingly.)

    PR: I assume you meant "amino acid" when you wrote "peptide".

    WD: OK, perhaps I should specify "single peptide" when I mean "amino
    acid". I chose to use "peptide" to avoid any possible misreading
    between "amino acid" and "nucleic acid". (I typically have problems
    with eye/mind/mouth coordination: I _see_ one thing, I _read_ it
    as something else, and I _express_ it as a third thing...and people
    who can't walk and chew gum simultaneously think they have
    problems....)

    [large snip]

    > WD:> This means that a reasonable estimate on the upper bounds for the odds
    > of getting a correct sequence are probably around (L/3)^8. Again, this can
    > be a large number for L large. <
    >
    > PR: Again, I agree with you that interdependencies between amino acids
    > at different positions eliminates the _physically_ possible occurrence
    > of a large part of the _formally_ possible sequences. The only thing I
    > am not so sure of is the specific formula you are deriving - both for
    > the persistence length and for the number of amino acid categories.
    > Depending on the particular cases, some of the equivalencies you assume
    > may not always apply, increasing (or occasionally decreasing) the number
    > of distinguishable physically possible proteins. Also, long-range
    > interactions between amino acids in a protein are common.
    >

    What you need to understand about measuring a polymer is that the
    most empirically descriptive parameter is probably its "links"
    (equivalent to the persistence length). This is the _correct_
    way to measure the entropy of RNA and protein folding as my
    accepted article to Journal of Theoretical Biology shows. (It's
    somewhere in press process right now as far as I know.) This
    is how you would come to understand the folding kinetics, this is
    how you would come to understand the maximum size of the domains
    that are formed, and it is how you can understand what is the
    "scaffolding" in the structure. Basically, entropy is a
    fundamental constraint on how these things fold, and sets the
    limits on what extent they can fold. So you need to see these
    things from a _flexibility_ view (i.e., as LINKS) to even begin
    to understand how they are likely to behave.

    Yes, indeed it is true that the persistence length is variable
    in RNA and is surely more so in proteins. The value 3 is an
    _average_ (at least for the proteins measured). Whereas I
    would grant that poly(Gly) probably has a small persistence length
    (maybe 1.5 peptides due to two axes of free rotation), a real protein
    has a mixture of different amino acids. Some of these like Pro
    surely have a persistence length of around 5 peptides. A reasonable
    range for the persistence length (given what little is known) is
    between 1.5 and 5 peptides, so for an arbitrary generic protein
    with no other specific information, a reasonable _average_ for
    that persistence is 3 peptides. So in (L')^(n'), L' = L/3.

    The point where I am not so clear myself is the issue of
    the exponent: whether there are 8 categories or 10 or 12
    or whatever. However, I strongly think 20 is too many.

    A good reason to question this is that there is research
    where a minimal set of amino acids are used with the intention
    to engineer the same protein structure. There have been several
    articles in Nature Structural Biology on this minimum set.
    Some have claimed 5 is the minimal set. However, I have read
    one author who claims that any set containing less than 10
    would only result in random structures if you look beyond
    the simple folds. Nevertheless, I'm sure even he would put
    that "minimal set" at around 10 rather than 20. I do think
    at present 5 is too small, and 10 is probably more realistic
    (although I am not a specialist on this matter).

    Another reason is that there is really not so much difference
    (from a crude estimation) between Ile, Lys, and Val for example.
    They are roughly the same size, similar hydrophobic character,
    and similar functional groups. There are some adverse effects
    from exchanging Val for Ile (I'm currently working on a problem
    related to just that issue), but the effect from a progeny
    perspective is small. Hydrophilic amino acids show greater
    variation in functional groups forming the side chain, but I
    still don't think we can add a lot more categories because
    these surfaces are typically the ones that are exposed to
    water.

    The major deciding factors on beta sheets is the hydrophobic
    effects, and when these peptides are exchanged, the packing
    is often unfavorable with some loss in free energy, but in
    at least some cases the structure still folds correctly,
    and it often tolerates a cavity in such a replacement
    (e.g., Ile or Leu for Val).

    So the upshot is that I can maybe agree that n' *might*
    be underestimated in my example. However, I would still
    argue that you're pushing the argument to say that more
    than 20 distinct amino acids are *required* for building
    a functional protein. I guess there are different degrees
    of "ideal" here, but I think if the protein folds the
    same, has the same functions properly, and contains only 10
    different residues, then that is a minimal set and the one
    we should be assuming in these discussions. At present
    some of these folds can even be done with 5.

    So we can maybe revise this to (L')^(n') apprx (L/3)^(10)
    where I would still favor n' = 8, and I don't make
    any issue of n' = 10, or even n' = 12 (to allow for
    some unspecified uncertainties). However, I suspect that
    increasing the parameter to n' > 12 is probably difficult
    to realistically justify.

    [large snip] I will respond to the rest of this post in
    a future email.

    Finally, just in case I forget somewhere in this
    discussion myself, I want to make clear that
    irrespective of whether we evolved through a formation
    economy or God inserted information at various points
    in the process, our lives are a blessing. We are
    allowed to think for ourselves. In that way, we really
    are higher than the angels. If God merely programmed
    the universe, our minds would perceive nothing more
    than the universe we live in and the rules it follows,
    and I doubt that we would even have the chance to
    perceive that there could be more than we think:
    that indeed, God could be in this picture somewhere.
    If we were meant to be God's "robots",
    we would simply "do what God said", but we're not, we
    have a choice. The choice is to have faith or not to
    have faith. I chose to follow in obedience to Christ
    because the message speaks to my heart on how I *should*
    live, and what makes a right way of life. Maybe I am a
    fool and I should live to maximize my gain in _this_ life,
    but I chose to follow _in faith_ that there is more, much
    more.

    by Grace we proceed,
    Wayne



    This archive was generated by hypermail 2b29 : Mon Aug 13 2001 - 12:29:48 EDT