|You are visitor number||since 10 Jan 2002|
Current Version: 2 Feburary 2002
Not for the faint-of-heart: All the very technical details
Back to the TMRCA calculator
Method: Walsh, Bruce , 2001. Estimating the time to the MRCA for the Y chromosome or mtDNA for a pair of individuals, Genetics 158: 897--912
The goal is to use genetic markers (here on the Y chromosome) to estimate the TMRCA, the Time to the Most Recent Common Ancestor (MRCA), which is how many generations the two Y chromosomes are from a common ancestor.
The basic idea is simple: Individuals that match at a higher fraction of markers are more closely related. The formal logic is as follows: One can image the chromosome as a clock that slowly ticks (i.e., one "tick" of the clock equals one mutation). Thus, a chromosome is a molecular clock that ticks randomly within a specified rate. This paradoxically sounding phrase means that a clock running longer has a higher probability of having more ticks than a clock that has been running shorter. The more time, the more ticks and the older the time back to the MRCA.
Estimates of TMRCA are thus based on the observed number of mutations by which the two Y chromosomes differ. Since mutations occur at random, the estimate of a TMRCA is not an exact number (i.e., 7 generations), but rather a probability distribution, a function that gives the probability that the TMRCA is a certain number of generations or less (i.e., a 47% probability that the TMRCA is 16 generations or less). This website shows the plot of these functions for the various marker matches for 12 and 21 marker tests. As one uses more and more markers, the distribution becomes tighter and tighter about its mean value, and estimates have higher precision.
There are two fundamental assumptions we need to deal with in order to translate an observed number of mutational differences into a probability distribution for the TMRCA: We must count the true number of mutations and we must be able to determine the rate of the clock (i.e., assumptions about the mutation rate)
If we simply count the number of markers at which two individuals disagree as the number of mutations, we potentially run into problems. First, some of the markers can differ by one step, by two steps, or by more than this. Should we count a two-step difference as one mutation? Two (or more) mutations? Likewise, even if two markers appear identical in two individuals (and hence we would score this as no mutations), there is always a small probability that each has experienced the same mutation since the MRCA (and hence the true mutant count for this marker is two). We use two approaches to find the number of true mutations. The Infinite alleles model is just a fancy population-geneticist term for "what you see is what you get" -- the assumption that the observed number of mutations equals the true total number of mutations. On the other extreme is the stepwise mutational model, which corrects for so-called multiple hits -- mutants we might have missed. When the fraction of matches is very high, both methods given essentially the same probability curve. They only differ significantly as individuals become increasingly dissimilar.
The second issue is setting the rate of the clock. This just a function of the mutation rate. It is highly likely that the mutation rates will differ across markers, and markers with higher mutation rates give faster clocks. Faster clocks are a good thing, in that they allow for more precision in estimating TMRCA. We make the initial assumption that the mutation rate is the same for each marker, something we will adjust as new data become available. We compute TMRCA using two different mutation rates --- the standard average (over a bunch of studies) of around 0.002 (1/500) per generation (so that on average there is about one mutation every 500 generations), and a faster rate that is consistent with at least some of the data.