Contexo Home Page ] Introduction ] Basic Chemistry ] Cell Chemistry ] Cell Structure ] Mitochondria ] Cell Nucleus ] Chromosomes ] Mitosis ] Meiosis ] Proteins ] DNA ] DNA Replication ] Gene Expression ] Mutation ] [ Molecular Genealogy ] Collecting Your Own DNA ] Polymerase Chain Reation ] Primers ] DNA Sequencing ] How Microsatellite Repeats Are Counted ] YSTR Database Allele Frequency Charts ] Dorsey DNA Surname Project Home Page ] Links ]

Search this site:

Search for


 I Weave Together Information

Molecular Genealogy

Molecular genealogy
(also called genetic genealogy) is the application of the techniques for characterizing an individual's DNA to the task of testing a hypothesis that two or more individuals share a common ancestor and of estimating how far back in time that ancestor lived. Supplemented by traditional genealogical records, this information can be used to define and identify branches of families that have spread apart and lost contact and thereby support or disprove the identity of a suspected ancestor.  This is especially useful when there is family tradition and circumstantial evidence to support a connection but no documentation can be found. 

In a broader scope,  a growing number of surname studies are being conducted to ascertain whether families with the same surname are of common or independent origin.  Combining the results of these studies with traditional genealogical information, researchers hope to hone in on geographical areas from which their families may have arisen--thereby giving themselves new areas in which to explore traditional resources.

 Characterizing an Individual's DNA

DNA is a long, double chain of subunits called nucleotides or bases.  In spite of its size and complexity, there are only four different bases, each referred to by its first letter:  adenine (A), guanine (G), cytosine (C) and thymine (T).  As the two chains line up next to each other to make a DNA molecule, adenine (A) and thymine (T) pair only with each other as do guanine (G) and cytosine (C).  A  DNA molecule is characterized by the sequence of these base pairs (bps) along the chain.  (Amazingly, it is this sequence that codes for all of an organism's traits.) 

Mutations and the shuffle of maternal and paternal genes through sexual reproduction, ensure that each member of a species (except identical twins) has a unique DNA sequence.  The ideal way to distinguish an individual from all the other people on Earth would be to describe the entire sequence of nucleotides in his or her DNA.  However, since each human genome (all the DNA in a person's chromosomes) is made up of more than 3 billion nucleotide bps, describing an individual's complete DNA would be far to complicated and expensive to be practical! 

Instead, scientists have looked for sets of nucleotide sequences that are highly polymorphic--that is, sections of DNA where a variety of different sequences (called alleles) are found among individuals in the same human population. These sets are referred to as markers and are usually given names which often seem quite arbitrary (read senseless) to the layperson but which usually reflect some esoteric coding of scientific data from the lab which has defined it.

Only about five percent of human DNA is actually thought to code for traits.  Most of the rest is made of long, apparently nonfunctional, stretches of nucleotide bps (sometimes referred to a "junk" dna.) Within these nonfunctional stretches are short, moderately repetitive base pair sequences.  The  number of repeats is inherited  and is easily detectable making them ideal identifying markers. The number of repeating units can occasionally change during evolution and descent.  They are thus useful markers for familial relationships and have been used in paternity testing, forensic science and in the identification of human remains. 

There are two types of these repetitive sequences.  VNTRs (variable number tandem repeats) are repeated sequences that typically range from 10 to 80 bps.  These occur fairly frequently in the human genome but there are relatively few different types. 

Short tandem repeat (STR) sequences (sometimes called microsatellites) are much shorter (2-10 bps) and may be repeated as many as 100 times at a given location on a chromosome.  The human genome contains hundreds of thousands of these STRs all evenly distributed on all the chromosomes.  STRs represent ideal markers for genetic typing because of their rich diversity, wide distribution, and polymorphism.  As a further advantage, they are technically somewhat easier to characterize than VNTRs. 

Here is a simplified example.  Humans have two sets of 23 chromosomes--one set from their mother and one set from their father.  So, for example, an individual, Thelma, might inherit a chromosome #17 marker with a short sequence of four bps repeated eight times from her mother, Ethel, and the same sequence repeated three times from her father, Art.  

To illustrate :


Paternal chromosome  #17 GATCGATCGATC

In real life, more than one STR must be analyzed to establish a person's identity.  A marker on DNA from a hair found at the scene of a crime may match one marker of a suspect.  However there will most likely be thousands of unrelated people with the same pattern for that one marker.  Increasing the number of markers examined increases the chances of an accurate identification.  Matches in three selected STRs gives more than a 2000 to 1 probability that the DNA samples are from the same person.  Using nine STRs gives more than a 1 billion to 1 probability.

In 1997, the FBI announced the selection of 13 STR markers to be used in forensic investigations.  If any two samples of DNA obtained from different sources (say a crime scene and a suspect) have matching numbers of repeats at all 13 markers, it is virtually certain they are from the same person.  Conversely, and as important, if the markers do not match, it can be said with complete confidence that the samples are from two different individuals.

DNA Testing for Genealogy Differs From DNA Testing to Identify an Individual

As the number of repeats of these markers are inherited, it is logical to expect that individuals descending from a common ancestor would share the same values for the same markers.  Unfortunately for the genealogist (and fortunately for the forensic scientist), the shuffling of maternal and paternal chromosomes at every generation means that each individual carries an assortment of DNA from all of his or her ancestors that guarantees his or her uniqueness.  

At each generation, the number of ancestors doubles, meaning that each individual potentially carries DNA from as many as 8192 tenth great grandparents (and so on back through time.)  To further complicate matters, during the time that chromosomes are being sorted into sperm and egg cells, matching chromosome pairs get together and in a process called crossover exchange segments in a pretty much random way so that each chromosome itself is a blend of ancestral DNA.  Following that shuffle, the members of the matching pairs split up and go into the sperm or egg cell--also in a totally random way.  How likely is it then that any, much less a given set of, DNA markers from one common ancestor will have been preserved in 100% of third or fifth or tenth cousins.

In truth, this Diaspora of DNA over the generations does limit the potential of chromosomal DNA testing for genealogy but with one powerful exception

In human males, the members of 22 pairs of chromosomes look similar when viewed under a microscope. However, the twenty-third pair is mismatched, with two unlike chromosomes, called X and Y.  In the cells of a female, both members of chromosome pair #23 are X chromosomes  The X and Y chromosomes are called the sex chromosomes, because they differ between the sexes and because they carry the genes that determine the sex of the individual.  The other 22 chromosomes are called autosomal chromosomes or simply autosomes

Whereas the DNA of autosomal chromosome pairs is shuffled and swapped repeatedly through the generations, the Y chromosome swaps less than five percent of its DNA with its partner X.  Furthermore, these small exchangeable parts are limited to known locations on the chromosome (called pseudoautosomal regions:). 

It is the Y chromosome that is of major interest to the genealogist.  A large number of STR markers have been described for the Y chromosome  that show great variability within populations but virtually no variability between fathers and sons.  Handed unchanged from father to son, the Y chromosome markers become a signature or fingerprint for the surname which is passed down in the same way in many cultures.  As such, it is an ideal tool for verifying paternal lineages as male relatives who have
 an uninterrupted male-male link
 between them will share the same, or very similar, Y-chromosome signatures.

Y chromosome testing  is particularly useful when a connection between different branches of a family, perhaps with the same or similar surnames, is suspected but cannot be proven from written records. 

The Path of Y Chromosome DNA Through Four Generations

Male sharing Y chromosome via an  uninterrupted male-male link  with a common ancestor
Male with Y chromosome from outside the paternal line
Females--have NO Y chromosome

Barring a rare mutation or a non-paternity event (an adoption, a son taking his stepfather's name, or a "maternal indiscretion"), males represented by boxes filled with dark red will have identical values for all Y chromosome markers.  Studies of nearly 5000 father son pairs showed only 14 STR mutations between the two generations with only one of those being a two step repeat and 13 being an increase of only one repeat of one marker.

Kayser, Manfred, Lutz Roewer, Minttu Hedman, Lotte Henke, Jurgen Henke, Silke Brauer, Carmen Kruger, Michael Krawczak, Marion Nagy, Tadeusz Dobosz, Reinhard Szibor, Peter de Knijff, Mark Stoneking, and Antti Sajantila,  "Characteristics and frequency of germline mutations at microsatellite ;oci from the human Y chromosome, as revealed by direct observation in father/son pairs",  Am. J. Hum. Genet. 66: 1580-1588, 2000

Hit Counter

Where Can I Go From Here?? 

Contexo Home Page ] Introduction ] Basic Chemistry ] Cell Chemistry ] Cell Structure ] Mitochondria ] Cell Nucleus ] Chromosomes ] Mitosis ] Meiosis ] Proteins ] DNA ] DNA Replication ] Gene Expression ] Mutation ] [ Molecular Genealogy ] Collecting Your Own DNA ] Polymerase Chain Reation ] Primers ] DNA Sequencing ] How Microsatellite Repeats Are Counted ] YSTR Database Allele Frequency Charts ] Dorsey DNA Surname Project Home Page ] Links ]

This web is lovingly dedicated to the memory of
Mr. James Dorsey
who so graciously and enthusiastically
donated his DNA to solve our family mystery. 

Jim Dorsey
2/12/1930 — 4-30-2002

 ©Copyright 2002-2009
Not for profit, educational website