Lecture 27: Quantitative Genetics I. The Basic Statistical Foundations

(version 24 April 2003)

This material is copyrighted and MAY NOT be used for commercial purposes

 You are visitor number   since 17 April 2002 

Quantitative genetics: the nature of continuous characters

Fisher (1918) building on the work of plant breeders (1900's - 1910) suggested the polygenetic model.

As an example, suppose each capital letter allele adds one to the character value. For one and two loci, the distribution of genotypic values are

For 10 loci of equal effect, each with the frequency of the capital letter allele being 1/2,

If we cannot determine the genotype of an individual, we would at least like to predict what the resemblance between relatives should be.

Why? We wish to predict the response to selection

The Basic Model

The basic foundation of quantitative genetics is to express the observed phenotypic value of an individual into components due to genetic effects and due to environmental effects:

z = G + E

Before moving on, we need ...

A Brief Statistical Aside: Variances, Covariances, and Regressions

The Variance, Var(x)

The Variance of a random variable x provides a measure of the spread of the data around its mean value (E[x]), with

Var( x ) = E[(x - E[x] )]2 = E [ x2 ] - ( E[x]) 2

where E denotes the expectation or average value. Hence, if the variance is small, most of the data is close to the mean value and there is fairly high predictability.

If the variance is large, there is poor predictability.

Example of computing Var

Suppose x takes on the values 1, 2, 3 with probability 0.2, 0.3, and 0.5 (respectively).

The Covariance, Cov(x,y)

Define the Covariance between two random variables x and y as

Cov(x,y) = E [ (x- E[x]) * (y-E[y] )] = E[ x*y] - E[x]*E[y]

The covariance provides a measure of association between two random variables. Suppose we measure two character values, x and y, in an number of individuals. Is there any association between x and y?


Example of computing Cov

Probability x y
0.1 0 0
0.3 0 1
0.2 1 0
0.4 1 1


An example of a situation where Cov(x,y) = 0, but x is entirely predicted by knowing y is a parabola. Note here, however, that there is no linear association between x and y.

If the random variables x and y are uncorrelated, then Cov(x,y) = 0.

Two variables that are independent are uncorrelated .

Properties of Covariances

  1. Cov(x,y) = Cov(y,x)

  2. Cov(a*x,y) = a Cov(x,y) (for any constant a)

  3. Cov(x,y+z) = Cov(x,y) + Cov(x,z)

  4. Cov(a+x,y) = Cov(x,y) (for any constant a)

  5. Cov(x,x) = Var(x)

    Correlation, p(x,y)

    Two very closely related statistics are the regression of one variable on another and the correlation between two variables. The correlation p(x,y) provides a scaled measure of association

    p(x,y) = Cov(x,y) / [ Var(x) * Var(y) ] 1/2

    Note that this implies

    Cov(x,y) 2 = p(x,y) 2 [ Var(x) * Var(y) ]

    The correlation ranges from +1 (perfectly positively correlated) to -1 (perfectly negatively correlated).

    The advantage of using the correlation is that we can compare the strength of linear relationships between different pairs of variables.

    Compare the scatterplot below with the one shown under the covariance discussion. Clearly, the linear association between the two variables is much stronger in the plot below. However, the plot above could easily have a larger covariance if the associated variances are larger. However, the correlation for the below graph is much greater than the correlation for the above graph.

    The Regression of y on x

    Finally, we can make explicit use of the covariance between two variables in obtaining the best linear predictor of the value of y given we know x. This is also called the regression of y on x. For example, suppose we know an individual's height (x). What can we say about their weight (y)?

    As a second example, consider the following data where each point represents the average value of a character in the parents and the average value of the character in their offspring, giving a parent-offspring regression

    Mathematically, we express the best linear relationship of y given we know x as

    y = a + b y | x x

    For example, with human height, if your parents have height x, the predicted (average) height in your sibs is y = 23.4 + 0.65 x.

    The slope of the regression of y on x follows from the covariance, with

    b y | x = Cov(x,y) / Var(x)

    Likewise the intercept a is given by

    a = E[y] - b y | x E[x]

    How well does the regression predict the value of y?

    Recall that Var(y) provides a measure of its predictability.

    It can be shown that the variation in y given we observe x is

    Var(y | know x) = (1-p(x,y)2)Var(y)

    Hence, the fraction of the total variance in y accounted for by knowing x is p(x,y)2.

    For example, for the human-height data, it can be shown that the correlation is 0.65.

    Fisher's Decomposition of the Genetic Value

    R. A. Fisher (1918) showed that the genetic value G can be further decomposed as

    G = u +A + D

    where

    Hence,

    z = G + E = u + (A+D) + E

    Fisher's great insight :

    Information for relatives allows us to estimate Var(A), Var(D). Using these estimates, we can predict the resemblance between different sets of relatives.

    Genetic Covariance between relatives

    Consider the covariance between the genotypic values of two relatives (R1 and R2). By construction A and D are uncorrelated.. Hence,

    Cov(GR1, GR2) = Cov(AR1+DR1, AR2+ DR2)

    = Cov(AR1, AR2) + Cov(DR1, DR2)

    If two relatives share only one allele ibd, then

    If two relatives share both alleles ibd, then

    Just what do A and D represent?

    Additive effects

    Consider the genotypic value for a given locus with (potentially) many alleles, B1, ..., Bk.

    (In a random-mating population), The additive effect of an allele Bi is simply the mean genotypic value for an individual carrying a copy of allele Bi.

    Hence, the predicted genotypic value of Bi Bj is

    Predicted[Gij]= u + ai + aj

    The variance Var(A) = 2 Var(a), is called the Additive genetic Variance.

    Of course, the above are predicted values. The difference between the predicted and actual values for the genotypic value at each locus is defined as the dominance deviation

    D ij = Gij - Predicted[Gij] = G ij - ( u + ai + aj)

    The variance in the dominance deviations, Var(D), is called the Dominance Variance.

    Thus to obtain A and D, we simply sum over all loci.

    Breeding Values

    A is called the breeding value (BV).