|
Molecular genealogy (also called
genetic genealogy) is the
application of the
techniques for characterizing an individual's DNA
to
the task of testing a hypothesis that two or more
individuals share a common ancestor and of estimating how far back in time
that ancestor lived. Supplemented by
traditional genealogical records, this information can be used to define
and identify branches of families that have spread apart and lost contact
and thereby support or disprove the identity of a suspected ancestor.
This is especially useful when there is family tradition and
circumstantial evidence to support a connection but no documentation can
be found.
In a broader scope, a growing number of
surname studies are being
conducted to ascertain whether families with the same surname are of
common or independent origin. Combining the results of these studies
with traditional genealogical information, researchers hope to hone in on
geographical areas from which their families may have arisen--thereby
giving themselves new areas in which to explore traditional resources.
Characterizing an
Individual's DNA
DNA is a
long, double chain of subunits called nucleotides or bases. In spite
of its size and complexity, there are only four different bases, each
referred to by its first letter: adenine (A), guanine (G), cytosine
(C) and thymine (T). As the two chains line up next to each other to
make a DNA molecule, adenine (A) and thymine (T) pair only with each other as do
guanine (G) and cytosine (C). A DNA molecule is characterized by the
sequence of these base pairs (bps) along the
chain. (Amazingly, it is this sequence that codes for all of an
organism's traits.)
Mutations
and the shuffle of maternal and paternal genes through sexual
reproduction, ensure that each member of a species (except identical
twins) has a unique DNA sequence. The ideal way to distinguish an
individual from all the other people on Earth would be to describe the
entire sequence of nucleotides in his or her DNA. However, since
each human genome (all the DNA in a person's chromosomes) is made up of
more than 3 billion nucleotide bps, describing an individual's complete
DNA would be far to complicated and expensive to be practical!
Instead,
scientists have looked for sets of nucleotide sequences that are highly
polymorphic--that is, sections of DNA where a variety of different
sequences (called alleles) are found among
individuals in the
same human population. These sets are referred to as
markers and are usually given names which often seem quite
arbitrary (read senseless) to the layperson but which usually reflect some
esoteric coding of scientific data from the lab which has defined it.
Only about
five percent of human DNA is actually thought to code for traits.
Most of the rest is made of long, apparently nonfunctional, stretches of
nucleotide bps (sometimes referred to a "junk" dna.) Within these nonfunctional stretches are short, moderately
repetitive base pair sequences. The number of repeats is inherited
and is
easily detectable making them ideal identifying markers. The number
of repeating units can occasionally change during evolution and descent.
They are thus useful markers for familial relationships and have been used
in paternity testing, forensic science and in the identification of human
remains.
There are
two types of these repetitive sequences. VNTRs (variable number tandem repeats) are repeated sequences that
typically range from 10 to 80 bps. These occur fairly frequently in
the human genome but there are relatively few different types.
Short tandem repeat (STR) sequences
(sometimes called microsatellites) are much
shorter (2-10 bps) and may be repeated as many as 100 times at a given
location on a chromosome. The human genome contains hundreds of
thousands of these STRs all evenly distributed on all the chromosomes.
STRs represent ideal markers for genetic typing because of their rich
diversity, wide distribution, and polymorphism. As a further
advantage, they are technically somewhat easier to characterize than VNTRs.
Here is a
simplified example. Humans have two sets of 23 chromosomes--one set
from their mother and one set from their father. So, for example, an
individual, Thelma, might inherit a chromosome #17 marker with a short
sequence of four bps repeated eight times from her mother, Ethel, and the same
sequence repeated three times from her father, Art.
To
illustrate :
Maternal
chromosome #17 GATCGATCGATCGATCGATCGATCGATCGATC
CTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG
Paternal chromosome #17 GATCGATCGATC
CTAGCTAGCTAG
In real
life, more than one STR must be analyzed to establish a person's identity.
A marker on DNA from a hair found at the scene of a crime may match one marker of a
suspect. However there will most likely be thousands of unrelated
people with the same pattern for that one marker. Increasing the
number of markers examined increases
the chances of an accurate identification. Matches in three selected STRs gives more than a 2000 to 1 probability that the DNA samples are from
the same person. Using nine STRs gives more than a 1 billion to 1
probability.
In 1997,
the FBI announced the selection of 13 STR markers to be used in forensic
investigations. If any two samples of DNA obtained from different
sources (say a crime scene and a suspect) have matching numbers of repeats
at all 13 markers, it is virtually certain they are from the same person.
Conversely, and as important, if the markers do not match, it can be said
with complete confidence that the samples are from two different
individuals.
DNA Testing for Genealogy
Differs From DNA Testing to Identify an Individual
As the
number of repeats of these markers are inherited, it is logical to expect that
individuals descending from a common ancestor would share the same values
for the same markers. Unfortunately for the genealogist (and fortunately
for the forensic scientist), the shuffling of maternal and paternal
chromosomes at every generation means that each individual carries an
assortment of DNA from all of his or her ancestors that guarantees his or
her uniqueness.
At each
generation, the number of ancestors doubles, meaning that each individual
potentially carries DNA from as many as 8192 tenth great grandparents (and
so on back through time.) To further complicate matters, during the
time that chromosomes are being sorted into sperm and egg cells, matching
chromosome pairs get together and in a process called
crossover exchange segments in a pretty much
random way so that each chromosome itself is a blend of ancestral DNA.
Following that shuffle, the members of the matching
pairs split up and go
into the sperm or egg cell--also in a totally random way. How
likely is it then that any, much less a given set of, DNA markers from one
common ancestor will have been preserved in 100% of third or fifth or
tenth cousins.
In truth,
this Diaspora of DNA over the generations does limit the potential of
chromosomal DNA
testing for genealogy but with
one powerful exception.
In human
males, the
members of 22 pairs of chromosomes look similar when viewed under a
microscope. However, the twenty-third pair is mismatched, with two unlike
chromosomes, called
X
and
Y. In the cells of a female, both members of
chromosome pair #23 are X chromosomes The X and Y chromosomes are called the
sex chromosomes,
because they differ between the sexes and
because they carry the genes that determine the sex of the individual.
The other 22 chromosomes are called autosomal chromosomes
or simply
autosomes.
Whereas the DNA of autosomal
chromosome pairs is shuffled and swapped repeatedly through the
generations, the Y chromosome swaps less than five percent of its DNA with
its partner X. Furthermore, these small exchangeable parts are
limited to known locations on the chromosome (called pseudoautosomal
regions:).
It is the Y chromosome that is of major
interest to the genealogist. A large number of STR markers
have been described for the Y chromosome that show great variability within populations but
virtually no variability between fathers and sons. Handed unchanged
from father to son, the Y chromosome markers become a signature or
fingerprint for the surname which is passed down in the same way in many
cultures.
As such, it is an ideal tool for verifying
paternal lineages
as male relatives who have
an uninterrupted male-male link
between them will share
the same, or very similar, Y-chromosome signatures.
Y chromosome testing
is particularly useful when a connection between different branches of a
family, perhaps with the same or similar surnames, is suspected but cannot
be proven from written records.
|