(version 26 Feb 2007)
This material is copyrighted and MAY NOT be used for commercial purposes
| You are visitor number | since 26 Feb 2007 |
There are two outcomes when we use DNA to compare a crime scene sample and a suspect: Exclusion and Failure to Exclude.
With an Exclusion, we are done, as the suspect did not contribute that crime sample.
The fun begins when we have a failure to exclude, namely the DNA sample from the crime scene and from our suspect match, or (more correctly) failure to exclude.
In the earlier days of DNA testing (RFLP markers), there was concern as to whether two bands actually "matched", and how one computes the probability of a random person also producing a match.
The National Research Council, or NRC was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of further knowledge and advising the federal government. They produce reports on a number of issues where science and public policy overlap, such as food safety. They were requested to produce a document on DNA evidence.
In 1992, the NRC issued DNA Technology in Forensic Science. You can read this online.
However, this report was very controversial, with many feeling that the proposed methods over very overlay conservative and that population genetics was not well represented.
As a result, coupled with a change in technology moving from RFLP to PCR based system, a second report (usually called NRC II was issued in 1996.
National Resource Council (NRC) (1996). The Evaluation of Forensic DNA Evidence. National Academy Press, Washington DC. NRCII is also online.
The methods for computing match probabilties suggested by the NRC II are now the common standard in DNA court cases around the country.
At first blush, computing match probabilties would seen easy.
For example, suppose we have two marker loci, where the crime sample has an 11,12 at the first marker (call it M1) and a 13,13 at the second marker (call in M2).
Note (lecture 5) that the number refers to the alleles, so this sample is a 11, 12 heterozygote at marker one and a 13 homozygote at marker 2.
Thus, from probability, your initial calculation of the match probability would be
So that if the frequency of allele11 is 0.3, the frequency of allele 12 is 0.1, and the frequency of allele 13 is 0.1, the match probability is just
We have made two key assumptions to get this simple result:
This is often called the product rule.
Where can we go wrong?
Population genetics is the field that deals with how genes behave in populations. The Hardy-Weinberg law, from the early days of the 1900's states that under certain circumantsncaece,
Freq(Aa) = 2*freq(A)*freq(a)
In other words, it allows us to relate genotype frequencies to allele frequencies.
Two critical assumptions (for our purposes) as
Departures from Random mating typically result in homozogotes being more common, heterozygotes less common.
If theta measures the departure from random mating, then
Suppose freq(A) = 0.1, theta = 0.03
freq(A)2 = 0.12 = 0.01
freq(A)2 + theta*freq(A)*(1-freq(A)) = 0.01 + 0.03 *0.1*0.9 = 0.0397
Even this small a value of theta results in over a four-fold difference. With 13 markers, this translates into a 413 = 67,108,864, or over a 67 million-fold difference
We can also see depatures from Hardy-Weinberg if our "population" is really two (or more) subpopulations and we do not account for this structure.
Two critical assumptions (for our purposes) as
When two (or more) markers reside on the same chromosome, they tend to be inherited in pairs
This is typically not a problem for autosomal markers, as these are chosen to reside on different chromosomes.
This will be an issue when we deal with Y-chromosomal markers.
Population structure, admixture (matings between different populations), and other historical features of recent population structure can creat associations even between unlinked loci.
Hence, knowing that the genotype is AA at the first marker tells us the individual is from population 1 and hence the second marker has to be BB.
This statistical association is oftened call (somewhat misleadingly) linkage disequilibrium or LD, although no linkage is required in this example. It has also been called gametic phase disequilibrium, so you see why the term LD is used.
While this is an extreme example, even knowing a small amount of information (say a probability now becoming 0.15 instead of 0.10 by knowing one of the markers) can have a huge impact over multiple loci.
Recommendation 4.1.
Recommendation 4.2.
Recommendation 4.3.
Recommendation 4.4.