Lectures 12-13: Match Probabilities: Population Genetics and the NRC II

(version 7 Jan 2008)

This material is copyrighted and MAY NOT be used for commercial purposes

 You are visitor number   since 7 Jan 2008 

Computing Match Probabilities

There are two outcomes when we use DNA to compare a crime scene sample and a suspect: Exclusion and Failure to Exclude.

With an Exclusion, we are done, as the suspect did not contribute that crime sample.

The fun begins when we have a failure to exclude, namely the DNA sample from the crime scene and from our suspect match, or (more correctly) failure to exclude.

In the earlier days of DNA testing (RFLP markers), there was concern as to whether two bands actually "matched", and how one computes the probability of a random person also producing a match.

The National Research Council, or NRC was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of further knowledge and advising the federal government. They produce reports on a number of issues where science and public policy overlap, such as food safety. They were requested to produce a document on DNA evidence.

In 1992, the NRC issued DNA Technology in Forensic Science. You can read this online.

However, this report was very controversial, with many feeling that the proposed methods over very overlay conservative and that population genetics was not well represented.

As a result, coupled with a change in technology moving from RFLP to PCR based system, a second report (usually called NRC II was issued in 1996.

National Resource Council (NRC) (1996). The Evaluation of Forensic DNA Evidence. National Academy Press, Washington DC. NRCII is also online.

The methods for computing match probabilities suggested by the NRC II are now the common standard in DNA court cases around the country.

Computing Match Probabilities: Why all of the Fuss?

At first blush, computing match probabilities would seen easy.

For example, suppose we have two marker loci, where the crime sample has an 11,12 at the first marker (call it M1) and a 13,13 at the second marker (call in M2).

Note (lecture 5) that the number refers to the alleles, so this sample is a 11, 12 heterozygote at marker one and a 13 homozygote at marker 2.

Thus, from probability, your initial calculation of the match probability would be

Pr(Match) = [ 2*Pr(11 at M1)*Pr(12 at M1) ]* Pr(13 at M2)2

So that if the frequency of allele11 is 0.3, the frequency of allele 12 is 0.1, and the frequency of allele 13 is 0.1, the match probability is just

Pr(Match) = (2*0.3*0.1) * 0.12 = 0.0006

We have made two key assumptions to get this simple result:

  1. Hardy-Weinberg: freq(Aa) = 2*freq(A)*freq(a), freq(A) = freq(A)2

  2. Independence: The events at Marker one have no impact on the events at Marker two, so that

    This is often called the product rule.

Where can we go wrong?

Departures from Hardy-Weinberg

Population genetics is the field that deals with how genes behave in populations. The Hardy-Weinberg law, from the early days of the 1900's states that under certain circumstances,

Freq(AA) = freq(A)2

Freq(Aa) = 2*freq(A)*freq(a)

In other words, it allows us to relate genotype frequencies to allele frequencies.

Two critical assumptions (for our purposes) as

Random mating

Population Structure

Lack of Independence

Two critical assumptions (for our purposes) as

Genetic Linkage

Statistical Associations Among Unlinked Markers

The NRC II Recommendations

Recommendation 4.1.

Recommendation 4.2.

Recommendation 4.3.

Recommendation 4.4.