insertion/deletion homologies

R. Joel Duff (Virkotto@intrnet.net)
Fri, 20 Feb 1998 09:32:50 -0600 (CST)

Reflectorites,

There has been some discussion of homology as an argument for common
ancestry or organisms. I wanted to take another stab at giving some
examples of sequence homology as evidence of common ancestry. Specifically
I want to look at insertions/deletions and inversions as "evidences" of
past history.

Take the hypothetical DNA sequence of 9 samples (anything from species to
representatives of orders/plesions/classes/divisions) of a portion of an
intron in a ribosomal DNA gene. Also given is a hypothetic pedigree for
these sequences.

1) ACGTATATATA----------------------------------GGGAAACAAAAAAAAAAATTT

Insertion of CCCGGG

2) ACGTATATATA-CCCGGG---------------------------GGGAAACAAAAGAAAAAATTT

Insertion of ATAT

3) ACCTATATATA-CCCGGGATAT-----------------------GGGAAACAAAAGAAAAAATTT

Duplication of CCGGGATAT

4) ACCTATATATA-CCCGGGATATCCGGGATAT---------------GGGAAAAAAAGAAAAATATTT

Mutation of GGG to CCC
______________
5) ACCTATATATA-CCCGGGATATCCCCCATAT---------------GGGAAAAAAAGAAAAATATTT

Duplication of GATATCCCCCATAT
_______________________________________ inverted sequence
6) ACCTATATATA-CCCGGGATATCCCCCATATGATATCCCCCATAT-GGGAAAAAAAGAAAAAAATTT

Inversion of 35 nts

7) ACCTATAT-CCCATATCCCCCATATCATATCCCGGATATCCCGGGTA-AAAAAAAAGAAAAAAATTT

Deletion of 22 nts

8) ACCTATAT-CCCATAT----------------------TCCCGGGTA-AAAAAAAAGAAACAAATTT

Mutation of AT to TA

9) ACCTATAT-CCCATTA----------------------TCCCGGGTA-AAAAAAAAGAAAAATATTT

Of course the above is written in an progressive order that might be
derived after acquiring the sequence. It also assumes knowledge of what is
the plesiomorphic (ancestral) state. One could just as well start with #9
and work to #1. Also there are many other possible arangemets. For
example one could have #4 as an ancestor and #s 3,2,and 1 are all derived
directly from #4 by differnt sized deletions. Either way the message will
be the same.

If each of these sequences came from representatives of different families
and presumably are not related by a common ancestor how then were these
patterns of nucleotide sequences obtained. I find it difficult to accept
that each possibly started with the same sequence and has simply had
insertion or deletion since that time since to go from point A (such as
sample 7) to point B (sample 3) one seems to need either an inversion
followed by mutations followed by deletion or deletion/inversion/mutation
or deletion/mutation/inversion etc... That is not so difficult to imagine
but when the intermediate steps are found in organisms considered by other
analyses to be intermediate in the evolution of these organisms are found
then it suggests common ancestry might better explain the data. Further a
particular insertion or deletion may be found in half them members of a
family but not the other half and these two halfs of the family might be
considered two subfamilies by other molecular and morphological analyses.
Why would we have the same organisms that form a monophyletic group using
morphology, show the same deletion of just a few bases in an intron? I
think it can be argued that there is no functional or environmental
constraints that might make it necessary for one subfamily to have these
bases while the others do not. It certainy seems to suggest that all the
members of one subfamily were derived from a common ancestror that had an
insertion.

The hypothetical example I gave above is not uncommon especially in intron
sequences. Another region this is seen in is the Intergenic Spacer (IGS or
NTS) between the Large Subunit (LSU) and Small-Subunit (SSU) nuclear rDNA
cistrons. Examples very much like what I presented are found in the Bean
family where there are species with small tandon repeats and other species
where the tandom repeats are as a group repeated (300+bps), other species
have large insertions and then in other species some of the tandom repeat
and insertion are deleted. The pattern of losses and gains follows very
closely the estimated phylogenetic relationships of genera within this
family.

An example of a real data set which can be viewed on the web is that of an
intron in the Nad5 gene of the mitchondrial genome of liverworts. This
intron is found in all mosses and most ferns but is absent in flowering
plants. The sequence alignment which is found at:

http://www.biologie.uni-ulm.de/bio2/knoop/nad5/n5lebintaln.gif

shows an alignment of liverwort genera. Note that though one may think
that all liverworts are basically alike they really show as much diversity
(both in morphology and molecules) as any two flowering plants that you
want to think of. Looking through the data you will find the first four
share many insertions and deletions and sometimes they share these
insertions with the last 2 in the data set. These 4 are classically
thought to be closely related (relative to the others in the data set)
based on morphology.

I have just recently acquired sequence from an intron in the 19S (SSU) rDNA
of the mitochondrial genome. I have found this intron in four of the six
mosses I have looked at. It is not found in either the hornworts or
liverworts (the other two divisions/superdivisions/plesions or whatever
classification system you like) but I have found it in the Lycopod Isoetes.
Virtually every data set places the mosses as sister to the vascular
plants and the lycopods as being the most basal of those vascular plants.
The four mosses all have sequences approximately 1100 bps in length. Of
these two mosses are very similar in size and sequence (>95 sequence
similarity) sharing many many small small (1-5bp) deletions and insertions
yet these two genera are in different subclasses of the Bryopsida. The
other two sequences have many unique insertions and deletions as I would
expect since they are very very unusual mosses compared to the rest
(Sphagnum and Takakia belonging to their other classes). Now the wild
thing is that Isoetes (the vascular plant) has an intron in the exact same
position on the SSU rDNA secondary structure but it is only 236 bps. The
sequence can be aligned with the moss sequence very well except that it
included huge deletions relative to the moss sequence. So the alignment
looks something like:

Moss 1 ______________________________________________ _____________ ___
Moss 2 _____________ ______________ _________________________ _________
Moss 3 _____________ ______________ _________________________ _________
Isoetes _______ _______ ___ _ ____ _ ___ __

Severl intersting things occur each of which I could have predicted (and
did) before I examined the sequence.
1) The sequence of Isoetes and the mosses would be more dissimilar than
either mosses to one another or Isoetes species to one another. This is
very true. The primary sequence is about 88% similar (high enough to be
reliably aligned). Among the 3 species of Isoetes I have looked at which
represent extremes in the genes the sequence in virtually identical (99%)
which I would expect since this is the case with nearly all measures of
variability within this genus.

2) The places were there are insertions or deletions or high sequence
variability among the mosses will be completely absent among Isoetes so it
will be the "conserved" sequences which are maintained. This is exactly
what is seen. The portions of the sequence that Isoetes have match to
sequences that are perfectly conserved in the mosses. The suggesting here
is that particular portions of the sequence are part of the structurally
important parts of the transcribed intron (secondary structure) and thus
are more tightly conserved.

This example, I suppose, could be explained by suggesting that God created
each of these plants with a large intron which has since been losing
portions of sequence. What is lost is simply dictated by the functional
constraints. Still I see, and won't be surprised with further work, that
there are patterns to the losses. i.e. if all the members of one Class all
have the identical 15bp deletion it seems improbably that all 100 or so
samples all lost the same 15 bps of DNA independently. On the flip side
one would have to argue that the members of the other Classes of mosses had
all gained those 15bps independently.

Hopefully there is some usefull information here. I apologize for my
muddles wording at times. If you have made it this far down I appreciate
your patience.

Regards,

Joel Duff

,-~~-.___.
Joel and Dawn Duff / | ' \ Spell Check?
Carbondale IL 62901 ( ) 0
e-mail: duff@siu.edu \_/-, ,----'
or virkotto@intrnet.net ==== //
/ \-'~; /~~~(O)
* * * * * * / __/~| / | * * *
\\\/// \\\/// =( _____| (_________| \\\///