Main Menu


The National Institute of Health: Public Human Genome Project

When the idea of sequencing the whole human genome became serious, the public National Institute of Health (NIH) approved the initiation and funding of a large project known as the Public Human Genome Project. The goal was to obtain a draft of the 2.91 billion base pair(bp) sequence of the human genome for future research. In January of 1990, the National Center for Human Genome Research was establish with the appropriate laboratory and computing technology to initiate the project. The widely accepted method of sequencing genomes larger than one million base pairs long is through the use of a technique called Bacteria Artificial Chromosome (BAC) end sequencing.

BAC-end sequencing involves the breaking up of the genome into pieces of DNA with lengths of 150 kilobasepairs (150,000 base pairs) and inserted into circular bacterial chromosomes with known sequences. Each BAC is then replicated and the genomic DNA insert is broken up into smaller fragments and inserted into plasmids, circular pieces of DNA that can be inserted into bacteria and replicated along with the bacteria in culture. This plasmid can be isolated, and the insert removed. This insert is small enough to be sequenced through automated sequencers, thus allowing geneticists to sequence them and then put them together to make up the whole BAC insert. In addition, the ends of the inserts have repeats amongst each other, thus allowing them to be arranged in order relative to each other. This process is known as Shotgun Sequencing and can be easily visualized by this diagram:

The ends of the pieces of DNA inserted into BACs have around 500 - 600 base pairs that are repeats of other BACs, allowing them to be identified relative to each other. Thus, once all the BAC inserts were sequenced piece by piece, it would allow them to align them in the correct order this way, forming the whole genome. The NIH used this to their advantage, attempting to divide the project amongst many different organizations and nations. Their plan was to use collaborative efforts to complete the project. The human genome was broken up into 450,000 BACs and disseminated amongst different parties to be sequences, and eventually reassembled as a whole once all the parties were finished with their part. The project launched in 1990 was to receive funding from the NIH as well as the Department of Energy for a predicted 15 year $3 billion dollar project. The concept was idealistic, using computing technology for the process of sequencing and assembling plasmid inserts and BACs to form the complete genome. It would ultimately result in the formation of a ten-fold coverage of the human genome, ensuring accuracy of the sequences obtained. This process of using computers to record and assemble the sequence from bottom up, as well as improvements in sequencing technology quickly decreased the costs and increased the rate of large-scale DNA sequencing. The results of this improvement in computing and sequencing technology was apparent when looking at the progress of the publicly funded international Human Genome Project. It took 4 years for the project to sequence its first billion base pairs, but 4 months to sequence its second billion. In addition, the costs of this sequencing decreased drastically with the continuing improvements in technology.

Image courtesy of the Department of Energy

While the costs and time of sequencing was decreasing, it had its share of many problems. Cooperation between organizations and nations eventually dwindled, as each party had their own projects and goals. Although the project was originally intended to be completed by 2005, only 5% of the genome had been sequenced by 1998. It was clear that this project could not possibly be completed within the 15 years it was given. This major shortcoming in the goals of the project attracted the attention of many people, particularly that of the company Celera.

Back to top


Copyright 2006 © Biocomputing Admin