Lecture 46: Quantitative Genetics I. The Basic Statistical Foundations

(version 29 November 1999)

This material is copyrighted and MAY NOT be used for commercial purposes

 You are visitor number   since 20 July 1999 

Quantitative genetics: the nature of continuous characters

Fisher (1918) building on the work of plant breeders (1900's - 1910) suggested the polygenetic model.

As an example, suppose each capital letter allele adds one to the character value. For one and two loci, the distribution of genotypic values are

For 10 loci of equal effect, each with the frequency of the capital letter allele being 1/2,

If we cannot determine the genotype of an individual, we would at least like to predict what the resemblance between relatives should be.

Why? We wish to predict the response to selection

The Basic Model

The basic foundation of quantitative genetics is to express the observed phenotypic value of an individual into components due to genetic effects and due to environmental effects:

z = G + E

Before moving on, we need ...

A Brief Statistical Aside: Variances, Covariances, and Regressions

The Variance, Var(x)

The Variance of a random variable x provides a measure of the spread of the data around its mean value (E[x]), with

Var( x ) = E[(x - E[x] )]2 = E [ x2 ] - ( E[x]) 2

where E denotes the expectation or average value. Hence, if the variance is small, most of the data is close to the mean value and there is fairly high predictability.

If the variance is large, there is poor predictability.

Example of computing Var

Suppose x takes on the values 1, 2, 3 with probability 0.2, 0.3, and 0.5 (respectively).

The Covariance, Cov(x,y)

Define the Covariance between two random variables x and y as

Cov(x,y) = E [ (x- E[x]) * (y-E[y] )] = E[ x*y] - E[x]*E[y]

The covariance provides a measure of association between two random variables. Suppose we measure two character values, x and y, in an number of individuals. Is there any association between x and y?


Example of computing Cov

Probability x y
0.1 0 0
0.3 0 1
0.2 1 0
0.4 1 1


An example of a situation where Cov(x,y) = 0, but x is entirely predicted by knowing y is a parabola. Note here, however, that there is no linear association between x and y.

If the random variables x and y are uncorrelated, then Cov(x,y) = 0.

Two variables that are independent are uncorrelated .

Properties of Covariances

  1. Cov(x,y) = Cov(y,x)

  2. Cov(a*x,y) = a Cov(x,y) (for any constant a)

  3. Cov(x,y+z) = Cov(x,y) + Cov(x,z)

  4. Cov(a+x,y) = Cov(x,y) (for any constant a)

  5. Cov(x,x) = Var(x)

    Correlation, p(x,y)

    Two very closely related statistics are the regression of one variable on another and the correlation between two variables. The correlation p(x,y) provides a scaled measure of association

    p(x,y) = Cov(x,y) / [ Var(x) * Var(y) ] 1/2

    Note that this implies

    Cov(x,y) 2 = p(x,y) 2 [ Var(x) * Var(y) ]

    The correlation ranges from +1 (perfectly positively correlated) to -1 (perfectly negatively correlated).

    The advantage of using the correlation is that we can compare the strength of linear relationships between different pairs of variables.

    Compare the scatterplot below with the one show under the covariance discussion. Clearly, the linear association between the two variables is much stronger in the plot below. However, the plot above could easily have a larger covariance if the associated variances are larger. However, the correlation for the below graph is much greater than the correlation for the above graph.

    The Regression of y on x

    Finally, we can make explicit use of the covariance between two variables in obtaining the best linear predictor of the value of y given we know x. This is also called the regression of y on x. For example, suppose we know an individual's height (x). What can we say about their weight (y)?

    As a second example, consider the following data where each point represents the average value of a character in the parents and the average value of the character in their offspring, giving a parent-offspring regression

    Mathematically, we express the best linear relationship of y given we know x as

    y = a + b y | x x

    For example, with human height, if your parents have height x, the predicted (average) height in your sibs is y = 23.4 + 0.65 x.

    The slope of the regression of y on x follows from the covariance, with

    b y | x = Cov(x,y) / Var(x)

    Likewise the intercept a is given by

    a = E[y] - b y | x E[x]

    How well does the regression predict the value of y?

    Recall that Var(y) provides a measure of its predictability.

    It can be shown that the variation in y given we observe x is

    Var(y | know x) = (1-p(x,y)2)Var(y)

    Hence, the fraction of the total variance in y accounted for by knowing x is p(x,y)2.

    For example, for the human-height data, it can be shown that the correlation is 0.65.

    Lecture 47