Lecture 9: Introduction to Probability theory
(version 31 August 2004)
This material is copyrighted and MAY NOT be used for commercial purposes
| You are visitor number |
 |
since 31 August 2004 |
Introduction to Probability
- Events are possible outcomes of some random processes
- Examples of events :
- you pass 320
- The genotype of a random individual is Bb
- the weight of a random individual is less than 150 pounds
- We can define the probability of a particular event, say A, as the fraction of outcomes in which event A occurs.
- Denote Probability of A by Pr(A), or Prob(A)
- For example, when flipping a coin once, the possible outcome is heads or tails.
- Pr(Head) = 0.75 means that chance is 75% that the coin will be a head and hence
- Pr(Tail) = 1 - Pr(Head) = 0.25.
Useful Rules of Probability
- Probabilities are between zero (never occur) and one (always occur)
- Pr(A) lies between zero and one for all A.
- Probabilities sum to one
- The sum of probabilities of all mutually exclusive events is one.
- For example, if there are n possible outcomes, Pr(1) + Pr(2) + .. + Pr(n) = 1
- Hence, Pr(1) = 1 - ( Pr(2) + .. + Pr(n) )
The AND and OR Rules
- AND rule: If A and B are independent events (knowledge of one event tells us nothing about the other event), then the probability that BOTH A and B occur is
- Pr(A and B) = Pr(A) Pr(B)
- Hence generally AND = multiply probabilities
- OR rule: If A and B are exclusive events (nonoverlapping), then the probability that EITHER A or B occurs is
- Pr(A or B) = Pr(A) + Pr(B)
- Hence generally OR = add probabilities
Example 1
Suppose we are rolling a fair dice and flipping a fair coin
- What is the probability of rolling an even number on the dice?
- A single roll of a fair dice has possible outcomes 1, 2, 3, 4, 5, 6 each with the same probability, 1/6. Rolling an even number means rolling 2 OR 4 OR 6. These three events (2, 4, 6) are nonoverlapping, and hence exclusive, so we can use the OR = add rule, giving
- Pr(Roll even) = Pr(2) + Pr(4) + Pr(6) = 3/6 = 1/2
- What is the probability of rolling a 5 and then getting a head in the coin flip?
- The dice roll and coin flip are independent events as the outcome of one does not influence the outcome of the other. Hence,
- Pr( Head AND roll 5) = Pr(Head) * Pr(5) = 1/2*1/6 = 1/12
Conditional Probability
How do we compute joint probabilities when A and B are NOT independent (i.e., knowing that A has occurred provides information on whether or not B has occurred).
- The joint probability of A and B, Pr(A,B) , is the product of the probability of B, Pr(B), with the Probability of A given B, Pr(A | B).
- Pr(A,B) = Pr(A | B) Pr(B)
- Pr (A | B) is called the conditional probability of A given B
- Pr (A | B)= Pr(A,B) / Pr(B)
- A and B are said to be independent if Pr(A | B) = Pr(A), so that knowing event B occurred gives us no information about event A.
- An important use for conditional probabilities is to compute the probability of some complex event by conditioning on other events.
For example, suppose that event A occurs under one of three other (mutually exclusive) events, say B, C, and D.
Then
- Pr(A) = Pr(A|B)*Pr(B) + Pr(A|C)*Pr(C) + Pr(A|D)*Pr(D)
For example, suppose there are three genotypes with different disease risks, where event A is having the disease, and B, C, and D are three different genotypes.
Pr(A|D) is the risk of the disease for genotype D, and so forth. The overall risk of the disease is just the weighted risk over all genotypes.
Example 2
- Suppose we cross two Aa parents, where AA and Aa offspring are yellow, while aa offspring are green. Here,
- Pr(AA) = Pr(aa) = 1/4
- Pr(Aa) = 1/2
- Thus, Pr(Yellow) = Pr(AA) + Pr(Aa) = 3/4.
What is the probability that a yellow offspring has genotype Aa?
Using our formula,
-
Pr(genotype = Aa | offspring = Yellow) = Pr(genotype =Aa, offspring = Yellow) /
Pr(offspring = Yellow) = (1/2)/(3/4) = 2/3.
Likewise
-
Pr(genotype = AA | offspring = Yellow) = (1/4)/(3/4) = 1/3.
Disease Relative Risks
What is the risk that you will have a disease given your sib (brother/sister) does?
This is quantified by the disease relative risk, RR,
where
- RR = Prob(sib 1 affected | sib 2 is) / Prob(random individual affected)
- Thus, RR is the increase in your risk over that for a random individual.
- Note that RR = 1 if Prob(sib 1 affected | sib 2 is) = Prob(random individual affected), i.e. you have no increased risk given a relative has the disease.
Hence the disease relative risk is the increase in the conditional probability for a sib (or other relative) vs. a random individual.
As an example, consider diabetes. The probability that a random individual (from the US population) has type 1 diabetes is 0.4 percent. This is also referred to as the population prevalence, K. However, the frequency of diabetes in families with an affected sib is 6 percent. The resulting relative risk that an individual has diabetes, given that its sib does, is 6/0.4 = 15.
What is the probability that a pair of sibs both have diabetes?
- Pr(Both sibs affected) = Pr(2nd affected | 1st is) Pr(1st affected) = 0.06 * 0.004 = 0.00024
- Note that Pr(2nd affected | 1st is) = RR*K, as RR = Pr(2nd affected | 1st is) / K. Hence Pr(Both sibs affected) = (RR*K)*K = (K2)* RR
- Hence, the population frequency of families with both sibs affected is 15 times
more common than expected by chance (i.e., if the disease is independent of family membership, which is K2).
Example 3: Rheumatoid Arthritis
Consider the following data for individuals with rheumatoid arthritis (from Del Junco et al, 1984)
| |
Disease |
No disease |
Total |
| Sibs of affected individuals |
21 |
475 |
496 |
| Spouses of affected individuals |
12 |
661 |
673 |
- Prob(2nd sib affected | 1st sib affected) = 21 / 496 = 0.042
- Prob(random affected) = 12 / 673 = 0.018
- Relative Risk, RR = 0.042 / 0018 = 2.374
Example 4: Putting all the pieces together: Lotto
Consider the Arizona State Lottery, wherein you pick 6 numbered balls out of 40. If all six of your balls are drawn, you win. What is the chance of this happening?
Prob(win jackpot) = (6/40)*(5/39)*(4/38)*(3/37)*(2/36)*(1/35) = 1/ 5,245,786
How long must one play lotto to have a reasonable (say 50 percent) chance of winning the jackpot?
Suppose you buy 100 different lotto tickets for each drawing. How many such drawings do you have to play to have
a 50 percent chance of winning (at least) one jackpot?
- Since you pick 100 out of 5,245,786 possible numbers, the
probability of winning on any given drawing is
- 100 / 5,245,786 = 0.000019
- Likewise, the probability of losing is
- 1- Pr( winning ) = 1 - 0.000019 = 0.999981.
- The probability of losing
k drawings in a row is
- We are interested in the number of drawings k such that this
probability is 0.5 or less,
- Solving for k by taking logs gives
- Lotto drawings are twice a week, so there are 52*2 = 104 drawings per
year, taking you 36,360/104 = 363 years to have this many plays.
If you win on the 36,360-th try, you will
have spent almost
$3.64 million to win (most likely) 1.5 million. See why the state of Arizona likes Lotto?
For those interested: References on the use of Statistics in DNA evidence
Interpreting DNA Evidence: Statistical Genetics for Foresnsic Scientists, Ian Evett and Bruce Weir (1998).