This is a summary of the progress of the Human Genome Project that I have been maintaining since large-scale sequencing data began to accumulate. When I started this project, I was a graduate student at the Department of Molecular Biotechnology at the University of Washington where I was working in the laboratory of Leroy Hood, who is now at the Institute for Systems Biology.
This report was never meant to be official, but to my knowledge it is the only available record of the progress of the Human Genome Project. There are a number of other sources that keep a tally of the current status of the project, but these don't generally have historical data. If you're interested in how I got all these data, check out my METHODS. Please feel free to cite this page if you use this material in a citation-appropriate forum. Previous citations of these data include "An independent perspective on the Human Genome Project" by Steven Koonin in Science (279:36-37).
The original official target date for completion of the HGP was 2005. Completion can be described in a number of ways. The number or percentage of the 3.0-3.2 Gigabases (note that this was the estimated size of the genome when I began this tabulation, and I have used this to keep the denominator consistent for the historical percentage chart) that have been determined is the most obvious. But this number must be tempered with the accuracy of the data. At the outset of the Human Genome Project, the target accuracy was one error out of 10,000 bases determined. Much of the data for some of the historical time points had an accuracy less than 1:10,000; this data is not counted here, but might legitimately be considered as making up part of the complete Human Genome Project. Additionally, knowledge about where the undetermined bases are can be factored into assessments of the completeness of the Human Genome Project. For example, if there is a gap in the data, but its location is unknown, this can be considered less complete than if the location of that gap is known. Currently, in part due to the double-barreled strategy (also known as whole genome shotgun) used by Celera to sequence the Human Genome, most of the gaps are well characterized.
More information on the Human Genome Project
The history of our bet on when the Human Genome Project would be complete
Last Updated: June 12, 2006
Considering non-redundant finished sequence, roughly (2.86 Gb)/(3.1 Gb)=92.3% of the
genome had been sequenced.
In May 2004, for the hg17 human genome assembly, considering non-redundant finished sequence, 2,851,333,782 bp of the genome had been sequenced.
In March 2006, for the hg18 human genome assembly, considering non-redundant finished sequence, 2,858,015,675 bp of the genome had been sequenced.
We thus can fairly safely conclude that the finished length of the human genome is 2.86 Gb. This is less than the 3.1 Gb prediction of 1995 for two reasons. First, the genome is a little smaller than anticipated. Second, some complex regions of the genome have been deemed outside the scope of the Human Genome Project.
If you have commentary, corrections, or other input, please do not hesitate to contact me.
Return to home page.