This is a summary of the progress of the Human Genome Project that I
have been maintaining since large-scale sequencing data began to accumulate.
When I started this project, I was a graduate student at the Department of
Molecular Biotechnology at the University
of Washington where I was working in the laboratory of Leroy Hood, who is
now at the Institute for Systems
Biology.
This report was never meant to be
official, but to my knowledge it is the only available record of the progress
of the Human Genome Project. There are a number of other sources that keep a
tally of the current status of the project, but these don't generally have
historical data. If you're interested in how I got all these data, check out my
METHODS. Please feel free to cite this
page if you use this material in a citation-appropriate forum. Previous
citations of these data include "An independent perspective on
the Human Genome Project" by Steven Koonin in Science (279:36-37).
The original official target date for completion of the HGP was 2005.
Completion can be described in a number of ways. The number or percentage of
the 3.0-3.2 Gigabases (note that this was the estimated size of the genome when
I began this tabulation, and I have used this to keep the denominator
consistent for the historical percentage chart) that have been determined is
the most obvious. But this number must be tempered with the accuracy of the
data. At the outset of the Human Genome Project, the target accuracy was one
error out of 10,000 bases determined. Much of the data for some of the
historical time points had an accuracy less than 1:10,000; this data is not
counted here, but might legitimately be considered as making up part of the
complete Human Genome Project. Additionally, knowledge about where the
undetermined bases are can be factored into assessments of the completeness of
the Human Genome Project. For example, if there is a gap in the data, but its
location is unknown, this can be considered less complete than if the location
of that gap is known. Currently, in part due to the double-barreled strategy
(also known as whole genome shotgun) used by Celera to sequence the Human
Genome, most of the gaps are well characterized.
More information on the Human Genome Project
The history of our bet on when the Human Genome Project would be complete
Last Updated: June 12, 2006
Considering non-redundant finished sequence, roughly (2.86 Gb)/(3.1 Gb)=92.3% of the
genome had been sequenced.
In May 2004, for the hg17 human genome
assembly, considering non-redundant finished sequence, 2,851,333,782 bp of the
genome had been sequenced.
In March 2006, for the hg18 human genome
assembly, considering non-redundant finished sequence, 2,858,015,675 bp of the
genome had been sequenced.
We thus can fairly safely conclude that
the finished length of the human genome is 2.86 Gb. This is less than the 3.1 Gb
prediction of 1995 for two reasons. First, the genome is a little smaller than
anticipated. Second, some complex regions of the genome have been deemed
outside the scope of the Human Genome Project.
If you have commentary, corrections, or other input, please do not hesitate to contact me.
Return to home page.