>Paul,
>
>Thanks a lot for your input on this.
>
>Since you apparently know a lot more about this than I do (believe me, no
big compliment! :^> ), mind if I ask a couple follow-up questions? (Note I
do not wait for a reply!)
>
>(1) Is the identity being claimed a base-pair-level identity, or something
broader? (I take it the former -- how else would one read "human
polypeptide"-- but I just want to be sure.)
>
>(2) Given that I'm a bit disappointed that these estimates are apparently
somewhat rough (which would account for the variety of precise numbers I've
heard, 95-99%, usually 98 or 99%), have there been any representative (or
random), small-scale base-pair comparisons to confirm or disconfirm the
general reliability of the extrapolation? If so, what are the results of
such tests?
>
>(3) Is there a strong consensus in the scientific community that these
estimates are very reliable? If so, is this confidence the result of strong
conviction that the methodology is sound (realistic/alethic conviction), or
simple a result of there being no competing, clearly superior method of
estimating (pragmatic conviction)?
>
>If you have both the time and interest (I guess I'm -assuming- you have the
knowledge :^> ), I'd appreciate your sharing that with the rest of us.
>
>Thanks, Paul!
>
I also want to express my appreciation to Paul for his
comments. This is a topic I know very little about and
am hoping to learn some.
In his book <Information Theory and Molecular Biology>,
Hubert Yockey comments some on similarity and the
related issue of phylogenetic tree construction. Here are
a few quotes that may stimulate further discussions:
===================
Regarding terminology ["similarity"; "information content"]:
A distinguished group of molecular biologists (Jukes & Bhushan,
1986; Reeck _et al_., 1987; Lewin, 1987) has called attention to
sloppy terminology in the misuse of 'homology' and 'similarity'.
Nevertheless, editors still permit authors to qualify 'homology'
and to confuse that word with similarity. _Mutual entropy_ is the
correct and robust concept and measure of similarity so that the
sooner _per cent identity_ disappears from usage the better. Mutual
entropy is a mathematical idea that reflects the intuitive feeling
that there is a quantity which we may call information content
in homologous protein sequences. Clearly, the shortest message
which describes at least one member of the family of sequences is
what one would properly call the information content.
-- Hubert Yockey, _Information Theory and Molecular Biology_,
Cambridge University Press, 1992, p. 337.
===================
Regarding methodology:
The mathematical methods that have been used to calculate
phylogenetic trees and thereby to aquire a picture of evolution
have employed a number of _ad hoc_ procedures. These procedures
tend to bias the purported solution by putting into the problem
what the investigator wishes to find.
-- Hubert Yockey, ibid, p.341.
He also recommends Markov chains as being the best way to study
evolution in terms of homologous protein families and mentions that
several "recent" papers have appeared using this method.
====================
Regarding indeterminacy:
The implied determinism in the construction of phylogenetic trees
needs to be questioned and evaluated. The Perron-Frobenius theorem
shows that Markov processes approach an equilibrium that is
independent of the starting conditions. Thus the information in
the protein sequences of modern organisms has some indeterminacy
with regard to the evolutionary history. These sequences are not
letters of gold carved in tablets of stone.
-- Hubert Yockey, ibid, p.342.
====================
========================
Brian Harper | "People of that kind are academics, scholars,
Associate Professor | and that is the nastiest kind of man I know."
Applied Mechanics | -- Blaise Pascal
Ohio State University |
========================