# Lecture 8: DNA markers for Forensic use

(version 7 Feb 2008)

This material is copyrighted and MAY NOT be used for commercial purposes

 You are visitor number since 7 Jan 2008

# DNA Variation

Any genetic marker (or region) that shows variation is potentially of use in Forensics.

Ideally, we would like markers that are very variable -- with a number of different alleles all at roughly equal frequencies.

Example: Suppose we have a marker with 2 alleles (A and a), each with frequency 1/2. What the is chance that two random individuals have the same genotype?

Prob(1st is AA, 2nd is AA) + Prob(1st is Aa, 2nd is Aa) + Prob(1st is aa, 2nd is aa) = (1/2)2*(1/2)2 +2(1/2)(1/2)* 2(1/2)(1/2)+ (1/2)2*(1/2)2 = 0.375

Likewise, as we add more alleles, the match probability decreases

 Number of alleles Match Probability 2 0.375 3 0.185 4 0.109 5 0.072 6 0.051 8 0.029 10 0.019

For seven equally-frequent markers, match probability is 0.038, while using the 7-marker D3S1358 CODIS marker, match probability is 0.0707.

 Allele frequency 12 0.015 13 0.015 14 0.1341 15 0.2896 16 0.2287 17 0.1616 18 0.1616

Source: RCMP Website, data for Caucasians

With 5 such loci, the match probability is (0.0707)5 = 1 in 560,000.

With 10 such loci, the match probability is (0.0707)10 = 1 in 318 Billion.

With 13 such loci, the match probability is (0.0707)13 = 1 in 900 Trillion.

# The Quest for Variation: RFLP Markers

The first set of genetic markers widely used were RFLP, or restriction fragment length polymorphisms.

Restriction enzymes are part of a bacteria's ''immune'' system. These are enzymes that cut DNA at specific sites (typically a four or a 6 base-pair sequence). Bacterial DNA is modified to be protected, while foreign DNA, such as incoming viruses, are not. Restriction enzymes thus cut up any foreign DNA, effectively protecting the cell from most viruses.

Thus, we can take a long piece of DNA and cut it with a restriction enzyme, generating numerous fragments. Even a single-base change will destroy a restriction enzyme target site. Likewise, even if a site is the same in two molecules, the lenght of DAN sequence between them may change. Thus if two DNA molecules differ in sequence, they likely have different lengths for the fragments produced following treatment with restriction enzymes.

This results in the two DNA sampling showing different fragment length patterns when run on a sequencing size gel. Here's an ideal case:

In this ideal case, the resulting fragments (for a target region of interest) are few and typically clean. Thus, we can easily see a match vs. no match.

In reality, the bands are often more numerous and it can be hard to ascertain if two bands are indeed the same length (and hence a match) or instead differ in length (and hence no match).

Indeed, the lack of standardization over what was an exact match lead to several court cases not looking favorably on early DNA evidence.

A further complication was that RFLP analysis required rather large amounts of DNA (blood spots the size of a nickel or larger). This made splitting of samples for independent analysis difficult, and also allowed for evidence to degrade very quickly.

# The Quest for Variation: STR Markers

The concern over exactly what a match was, as well as the large amounts of DNA required, were resolved with the development in the early 1990s of STR or Short Tandem Repeat markers that could be amplified by PCR.

If one looks at DNA, repeats are often common, wherein the same sequence is repeated over and over again in an array. In such arrays, the number of copies is often found to vary highly.

Indeed, the initial RFLP markers involved regions of DNA called VNTRs, or Variable number of tandem repeats, wherein the repeat unit was rather long, often several hundred bases.

By contrast, STRs typically have repeat units of just a few bases,

This individual would be scored as a 7,8 at the marker locus.

A key innovation was to develop primers for PCR reactions that would amplify the region containing an STR of interest.

This allowed for very small amounts of a DNA sample to be able to generate a large amount of DNA for study

A large number of potential STRs are known. What forensic scientists were searching for were markers that were

• Variable
• Showed roughly similar allele frequencies among different ethnic groups

In the US, the result was to select 13 markers, the so-called CODIS markers on 13 different autosomes (the core locis), along with the AMEL (Amelogenin) marker on the X (AMELX) and Y (AMELY) to be able to tell if a DNA sample came from a male or a female.

CODIS stands for Combined DNA index system

In the UK, the SMG+, for Second Generation Multiplex Plus markers are used, which contain 10 core autosomal STR and AMEL.

Both the US and UK databases have AMEL in common, along with 8 core loci in common.

Markers in SGM+, not CODIS are D2 (D2S1338), D19 (D19S433).

Markers in CODIS, not in SGM are TOPX, Dr, CSF , D7, D13