Lecture 43: Population Genetics I:
Variation, Hardy-Weinberg, and Linkage Disequilibrium
(version 8 November 2005)
This material is copyrighted and MAY NOT be used for commercial purposes
| You are visitor number
|| since 8 November 2005)
Population vs. Quantitative Genetics
- Population genetics: Deals directly with genotypes
- Quantitative genetics: Deals with phenotypes and make inferences
about the underlying genetics
Measures of Genetic Variation
Ideally, we would be able to look at phenotypic variation and directly infer
the underlying amounts of genetic variation. However, we cannot do this.
A brief history of the struggle to measure variation
- 1800's - 1900's: Variation between the phenotypes of domestic breeds (Darwin)
- The consistence differences between breeds was taken as evidence of genetic differences
- Unclear if the variation within breeds was due to genetic or environmental variation (or both)
- 1900's - 1940's: detection of visible mutants in natural populations (mostly human genetic diseases)
- 1930's - 1950's: chromosomal inversion polymorphisms in Drosophila (Dobzhansky)
- 1960's: variation in protein sequence as measured by charge changes detected by starch-gel electrophoresis (Lewontin and Hubby). The electrophoretically-detectable variants are referred to as allozyme variants
- The so-call find-em and grind-em approach to measuring biochemical variation as scored through allozyme variation
- Extensive allozyme variation was found in almost all populations
- a few exceptions, such as the California elephant seals, which show essentially no allozymic variation
- late 1980's-present: Indirect and direct measures of DNA variation
- RFLP, restriction fragment length polymorphisms
- RAPDs, randomly amplified polymorphic DNAs
- Direct sequencing of alleles at a locus
- One of the 1st studies looked that the Drosophila ADH locus (Krietman, 1983)
- Extensive polymorphism found in most studies, typically about 1/1000 nucleotides is polymorphic
- For example, you and your neighbor differ at (on average) 22,000,000 nucleotide pairs.
Measures of variation
With DNA sequencing, essentially all loci are 100 percent polymorphic if a large enough sample is used, as two individuals only have to differ at one base pair to be different in DNA sequence, and a typical locus is 1000 - 10,000 bp.
Historical measures of variation (from the electrophoretic days where only a few allozyme variants were detected per locus)
- Heterozygosity: What fraction of individuals are heterozygotes for the particular locus being examined
- Fraction of polymorphic loci : what fraction of loci measured
show polymorphism (two or more alleles at reasonable frequencies)
- the more current measure is the fraction of polymorphic sites -- What fraction of base pairs at a locus shows variation in the population? Typical values are about 1/500 to 1/1000.
Allele and genotype frequencies: A single locus
Obtaining allele frequencies from genotype frequencies
Freq(A1) = Freq(A1 A1 homozygotes) + (1/2) Freq(all A1 heterozygotes)
Predicting genotype frequencies from allele frequencies
Assuming the Hardy-Weinberg conditions
- Random mating (individuals mate independent of their genotype)
- No selection (all genotypes leave, on average, the same number of offspring)
- Large population size (genetic drift can be ignored)
- Allele frequencies the same in both sexes
- Autosomal loci
- Freq(AA) = Freq(A from father)*Freq(A from mother) = Freq(A)*Freq(A)
- The frequency of a homozygote is the square of the allele frequency
Freq(Aa) = Freq(A from father)*Freq(a from mother) + Freq(a from father)*Freq(A from mother) = 2* Freq(A)*Freq(a)
- The frequency of a heterozygote is twice the product of allele frequencies
- Further, the allele frequencies remain unchanged each generation (Homework problem!!)
What happens when the sexes have different allele frequencies? (Homework problem!!)
Gamete and genotype frequencies: Two loci
When considering two (or more) loci, one must also account for the presence ot
Under random mating, gametes combine at random. Hence, if the frequency of
(say) an AB gamete is 0.4 and an ab gamete is 0.1, then
- freq(AB/AB) = 0.42
- freq(ab/ab) = 0.12
- freq(AB/ab) = 2*0.4*0.1
However, the frequencies of gametes in a population can change by recombination from generation to generation
unless they are in linkage equilibrium (which is also called gametic-phase equilibrium).
At linkage equilibrium,
the gametes have frequencies expected by independence of alleles,
i.e. Freq(AB gamete) = freq(A)*freq(B)
Dynamics of linkage disequilibrium
If linkage disequilibrium exists, how does it change over time under random mating?
Let DAB(t) denote the disequilibrium for a particular gamete type, where
- DAB(t) = Freq(AB in generation t) - Freq(A)*Freq(B)
( We also use the notation pAB = Freq(AB), pA = Freq(A), etc.)
How does D change over time?
- Freq(AB in gen. t+1) - Freq(A)*Freq(B) =
(1-c) [ Freq(AB in gen. t) - Freq(A)*Freq(B) ]
Hence, the recursion for linkage disequilibrium is
- D(t+1) = D(t)*(1-c)
- so that D(t) = D(0)*(1-c)t
Thus, for a particular gamete type, say AB (A and B represent
particular alleles at two different loci), then
- DAB(0) = Freq(AB in the initial generation) - Freq(A)*Freq(B)
- DAB(t) = [ Freq(AB in the generation t) - Freq(A)*Freq(B) ] = DAB(0)(1-c)t
- Putting these together, Freq(AB in the generation t)
- = Freq(A)*Freq(B) + [Freq(AB in the initial generation) - Freq(A)*Freq(B)]
= Freq(A)*Freq(B) + DAB(0)(1-c)t
Example: Human blood group data
Consider the Ainu population. Does this show indications of linkage disequilibrium?
If linkage equilibrium is present, then (say) freq(MS gamete) = freq(M)*freq(S)
- freq(M) = freq(MS) + freq(Ms) = 0.024 + 0.381 = 0.405
- freq(N) = 1 - freq(M) = 1 - 0.405= 0.595
- freq(S) = freq(MS) + freq(NS) = 0.024 + 0.247 = 0.271
- freq(s) = 1 - freq(S) = 1 - 0.271 = 0.729
Is this true? freq(MS gamete) = 0.024, while freq(M)*freq(S) = 0.405*0.271 = 0.110
Hence, the initial disequilibrium for the MS gamete is
- DMS(0) = 0.024 - 0.110 = -0.086
If random mating and other Hardy-Weinberg assumptions hold, if the M-S locus distance is c = 0.1, what is the expected MS equilibrium after one generation of recombination?
- DMS(1) = (1-c)* DMS(0) = 0.9*(-0.086) = -0.0774
- Hence, Freq(MS in gen 1) = freq(M)*freq(S) + DMS(1) = 0.110 -0.0774 = 0.0326
What about after 20 generations?
- DMS(20) = (1-c)20* DMS(0) = 0.920*(-0.086) = -0.0774 = -0.010
- Hence, Freq(MS in gen 20) = freq(M)*freq(S) + DMS(20) = 0.110 -0.010 = 0.0995
Most genes show linkage-disequilibrium between very tightly-linked markers
The usefulness of this observation is that we can use tightly-linked markers
as indicators of whether a particular chromosome carries a disease allele.
How does this association arise?
Even after hundreds of generations, most chromosomes carrying the mutant allele
also contain the tightly linked alleles on the original chromosome.
This feature has been exploited for very fine mapping of diseases genes, an approach called
For many mutant alleles, there is a predominant haplotype (collection of very tightly-linked markers) with which it is associated, reflecting the haplotype of the original chromosome on which the mutant arose.
Equating the probability of no recombination
to the observed proportion q of disease-bearing chromosomes with this predominant haplotype
gives q= (1-c)t, where t is the age of the mutation or the age of the founding population (whichever is more recent). Solving for recombination frequency gives
Hastbacka et al. (1992) examined the gene for diastrophic dysplasis (DTD),
an autosomal recessive disease, in Finland.
A number of marker loci were examined, with the CSF1R locus showing the most
striking correlation with DTD. The investigators were able to unambiguously determine the haplotypes of 152 DTD-bearing chromosomes and 123 normal chromosomes for the sampled individuals. Four alleles of the CSF1R marker gene were detected.
| Marker Allele
|| Normal chromosome
|| DTD chromosome
Here, q = 0.947, while the current Finnish population traces back to around 2000 years to a small group of founders, which underwent around t=100 generations of exponential growth.
Using these estimates of q and t, gives an estimated recombination frequency between the CSF1R gene and the DTD gene as
c = 1- (0.947)(1/100) = 0.00051.