Lecture 2: Useful Probability Distributions

(version 27 August 1999)

This material is copyrighted and MAY NOT be used for commercial purposes

 You are visitor number   since 20 July 1999 

This lecture is designed for projection at 18 pt type font and has several greek characters embedded as18 pt figures. Hence, it may look rather odd when viewed at different fontsizes. You instead may wish to view the pdf version using the Adobe Acrobat feature of many browsers.

Distributions Counting the Number of Discrete Events

The Binomial distribution

Example 1

What is the probability that a family of seven children has all girls?

The Poisson distribution

How many successes for rare events? Here we don't necessary know either the total number of trials or the success rate. Rather, we simply use the expected number of successes for our experiment.

Letting = expected number of successes, then the probability that we observe k successes is

In particular,

The Poisson and binomial are connected in that = np, so that if n is large (i.e., n >> ), the Poisson provides a quick approximation of the binomial.

The Poisson distribution follows by assuming that the expected number of events occurring in some small interval of time is

so that the rate of events per time unit is

Hence, the Poisson assumes that there is a constant rate of successes occurring, which is akin to assuming independent trials.

Example 2

Suppose we are going to screen one million bacteria for mutations where the mutation rate is 10-7, leading us to expect 0.1 mutants (on average). What is the probability we observe no mutations? More than two mutants? Here the expected number of mutants is = 0.1, implying

Example 3

The Acme Cookie company proclaims that each of their "Big Bite Me" cookies has, on average, five chocolate chips. What is the chance that you get a cookie with no chocolate chips? Here = 5 and Pr(0) = e-5 = 0.0067.

Summary: Binomial vs. Poisson

Distribution Known parameters
Binomial Number of Trials, n

Probability of success/trail, p

Poisson Expected number of successes, np =

Note that if n >> , we can set = np and use the Poisson to approximate the Binomial.

Waiting Time Distributions

How long until the first success?

The Geometric distribution

How many trials are required to reach the first success?

Let p = Probability of a success. If all trials are independent, then

Hence, while the average number of trials required is 1/p, there is a distribution about this average value.

Example 4

Suppose both parents are Aa, where the aa genotype displays a horrific disease, and as a consequence the family stops having children after the first such child appears. What is the probability they will have 1, 2, 3, or 4 children? Since Pr( aa)= 1/4 ,

Family size Probability Cumulative
1 p = 0.250 0.25
2 p(1-p) = 0.188 0.4375
3 p(1-p) 2 = 0.141 0.578
4 p(1-p) 3 = 0.105 0.684

Here is the distribution for up to 12 children

The Exponential distribution

Gives the waiting time to the first success when time is now continuous (i.e., how long until your light bulb fails?) Here the distribution parameter is , the success rate per time interval. Hence, the mean time until a success is 1/.

The exponential distribution is very closely related to the geometric, with and p having essentially the same role. Noting that

(1-x)n approximately equals e-xn for |x| << 1 ,

the geometric probability can be approximated by

Under the exponential distribution, the waiting time probabilities are given by the appropriate area under the curve given by

e- t

Thus, the probability that the first success occurs at, or before, time T is

Example 5

Suppose you have a constant risk of 0.01 per year of getting cancer. This gives an average age for getting cancer at age 100. What is the probability you are cancer-free at ages 20, 40, 60, and 80?

Pr(no cancer at age T) = 1 - Pr(get cancer at or before age T)

= 1 - ( 1- e-0.01*T) = e-0.01*T

giving

Age Probability cancer-free
20 0.819
40 0.670
60 0.549
80 0.449

Here is the distribution out to age 200

The Normal Distribution

The parameters for this distribution are the mean and the variance (a measure of the spread). The square root of the variance, , is referred to as the standard deviation.

As the figure shows, the mean corresponds to the peak of the distribution, while the variance measures the spread. The larger the variance, the more spread out the distribution is.

The probability of a particular event is just given by the area under the normal (bell-shaped) curve.

Under the normal distribution, 95 percent of all values lie within 1.95 Standard Deviations of the mean,

Normal approximation of the binomial

For n large and p moderate, number of successes roughly follows a normal distribution with mean = np and variance = np(1-p).

Thus, approximately 95% of the values fall within the interval

Example

The probability that in individual is a certain genotype is expected to be 0.05. Should we be suspicious if we observe 70 such genotypes in a population of 1000 individuals? Using the normal approximation with n = 1000 and p = 0.05, = np = 50, = np(1-p) = 47.5. Since the upper 95% limit is + 1.96* = 63.5, on average we expect such an excess of genotypes to occur less than 5% of the time. Hence, we should indeed be suspicious.


Onto: Lecture 3