(version 29 November 1999)
|
| You are visitor number | since 20 July 1999 |

Fisher (1918) building on the work of plant breeders (1900's - 1910) suggested the polygenetic model.
As an example, suppose each capital letter allele adds one to the character value. For one and two loci, the distribution of genotypic values are
For 10 loci of equal effect, each with the frequency of the capital letter allele being 1/2,

If we cannot determine the genotype of an individual, we would at least like to predict what the resemblance between relatives should be.
Why? We wish to predict the response to selection


Before moving on, we need ...

The Variance of a random variable x provides a measure of the spread of the data around its mean value (E[x]), with
If the variance is large, there is poor predictability.

Suppose x takes on the values 1, 2, 3 with probability 0.2, 0.3, and 0.5 (respectively).

Define the Covariance between two random variables x and y as

| Probability | x | y |
| 0.1 | 0 | 0 |
| 0.3 | 0 | 1 |
| 0.2 | 1 | 0 |
| 0.4 | 1 | 1 |
An example of a situation where Cov(x,y) = 0, but x is entirely predicted by knowing y is a parabola. Note here, however, that there is no linear association between x and y.

If the random variables x and y are uncorrelated, then Cov(x,y) = 0.
Two variables that are independent are uncorrelated .
Properties of Covariances

Two very closely related statistics are the regression of one variable on another and the correlation between two variables. The correlation p(x,y) provides a scaled measure of association
Note that this implies
The correlation ranges from +1 (perfectly positively correlated) to -1 (perfectly negatively correlated).
The advantage of using the correlation is that we can compare the strength of linear relationships between different pairs of variables.
Compare the scatterplot below with the one show under the covariance discussion. Clearly, the linear association between the two variables is much stronger in the plot below. However, the plot above could easily have a larger covariance if the associated variances are larger. However, the correlation for the below graph is much greater than the correlation for the above graph.


Finally, we can make explicit use of the covariance between two variables in obtaining the best linear predictor of the value of y given we know x. This is also called the regression of y on x. For example, suppose we know an individual's height (x). What can we say about their weight (y)?
As a second example, consider the following data where each point represents the average value of a character in the parents and the average value of the character in their offspring, giving a parent-offspring regression

Mathematically, we express the best linear relationship of y given we know x as
For example, with human height, if your parents have height x, the predicted (average) height in your sibs is y = 23.4 + 0.65 x.
The slope of the regression of y on x follows from the covariance, with
Likewise the intercept a is given by

Recall that Var(y) provides a measure of its predictability.
It can be shown that the variation in y given we observe x is
Hence, the fraction of the total variance in y accounted for by knowing x is p(x,y)2.
For example, for the human-height data, it can be shown that the correlation is 0.65.