Just a few quick thoughts...
> This is a great example. Yes, this throws all the authors conclusions
> into doubt as far as I'm concerned. Actually, it is still possible to do what
> the authors proposed to do (but correctly :) but it would take a lot more
> effort.
Yeah, I think that's a significant understatement. :-)
> The criticism you raise turns out to be exactly the criticism Yockey
> levels at many published studies. It turns out to be a common
> mistake, so I'm kicking myself that I didn't recognize it.
>
> Yockey mentions exactly the type situation you are discussing above.
>
> "The third position in eight of the familiar triplet codons in the
> set space G^3 may vary indefinitely among the four nucleotides
> in either DNA or RNA alphabets without changing the read off
> amino acid. The specificity of these codons is defined by the first
> two nucleotides"
>
> With respect to the authors conclusions, it is obvious I think that a
> mutation in the third position of these eight codons would not
> change the information content regardless of the frequencies of
> the codons involved.
Right. It all depends on what the alphabet is and what the probabilities
are for the message ensemble.
> As I mentioned above, it is possible for information theory to take
> this type of situation into account and Yockey does give some
> examples how to do it in his book. But the type of correlations
> that can be accounted for in this way must be statistical in nature,
> information theory still cannot address the meaning or value of
> the message. For example, for the case Yockey mentions above
> one has a set of constraints imposed on the possible messages
> in the ensemble. These constraints reduce the uncertainty and
> thus the information content.
Would you mind briefly going over how Yockey explains it can be
done? (I don't want to bother you for huge quotes, but if you can
give a paragraph summary I'd appreciate it.)
After the discussion, I'm still in the dark as to how Info theory
might apply to biology. I wonder one can use it at all at the
genome level (other than making dead obvious claims that the genome
has an alphabet and thus coding characteristics and such). How
about this: does the genome use an optimal code in some sense?
Does it use any ECC features? If so, are these optimal in some
sense? I'm not sure how to answer these questions, but they seem
like a place for applying info theory to the genome, since IT is
pretty good at rooting out such optimalities, and one might
reasonably expect them to exist.
The problem, like you keep saying, is that it is probably necessary
to go 'all the way to function' to fool yourself into thinking you
have covered all the decoding operation. And that is precisely where
IT stops being useful. If two proteins do the same job, and one
is a result of mutated DNA, is this ECC? I'm not sure... I know
there have been lots of looks taken at the genetic code. (I once
attended a horrible talk the point of which was to point out that
the code was 'not linearly separable' into categories like hydrophilic/
hydrophobic, and such. Yawn.
-Greg