Lecture 1: The Human Genome Project
(version 31 July 2002)
Automated sequencing machines at the Center for Genome Research in the Whitehead Institute
This material is copyrighted and MAY NOT be used for commercial purposes
| You are visitor number |
 |
since 31 July 2002 |
The Human Genome project
Two competing groups
Timeline to reach the sequence.
Some Details from the Sequence
Protein Coding Sequences
There appear to be about 30,000 to 40,000 protein-coding genes in the human genome. This very surpisingly is only about twice as many as in worm or fly. However, the genes are more complex, with more alternative splicing generating a larger number of protein products.
Only a very small fraction of the genome (3%) codes for proteins.
The typical length of a coding sequence is similar (1,311 bp for worm, 1,497 bp for fly and 1,340 bp for human). However, the worm and fly exon distributions have a fatter tail, resulting in a larger mean size for
internal exons (218 bp for worm versus 145 bp for human).
Intron size distributions differ substantially among species. The worm and fly
each have a reasonably tight distribution, with most introns near the preferred minimum intron length (47 bp for worm, 59 bp for
fly) and an extended tail (overall average length of 267 bp for worm and 487 bp for fly). Intron size is much more variable in
humans, with a peak at 87 bp but a very long tail resulting in a mean of more than 3,300 bp.
The full set of proteins (the proteome) encoded by the human genome is more complex than those of invertebrates. Vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.
What fraction of genes are shared with other animals, eukaryotes and bacteria?
Gene Regulation
There is a significant increase in the number of transcription factors in humans.
Noncoding RNAs
Current known distribution on noncoding RNAs (very likely a significant underestimate)
Repeated Sequence Composition
Current known distribution on noncoding RNAs (very likely a significant underestimate)
About half of the human genome derives from transposable elements,
There has been a marked
decline in the overall activity of such elements in the hominid lineage. DNA transposons appear to have
become completely inactive and long-terminal repeat (LTR) retroposons may also have done so.
Twenty human genes have been recognized as probably derived from transposons
Simple sequence repeats (SSRs) account for 3% of the human genome.
Gene Order
Large blocks of DNA have remained intact from mouse to humans.